Building Spore: An Agentic Harness from First Principles

Building Spore: An Agentic Harness from First Principles

A 15-part series on building spore-core — a production-grade agentic harness built from scratch. Covers harness engineering, IoC architecture, sandbox isolation, context budgeting, termination policy, memory, observability, and multi-language implementation.

Building Spore: An Agentic Harness from First Principles

Most agent projects don't fail because the model wasn't smart enough. They fail because nobody treated the harness as a real engineering problem.

The harness is everything around the model: the loop that drives execution, the context manager that decides what the model sees on each turn, the termination policy that knows when the run is actually done, the sandbox that keeps tools from doing things they shouldn't, the memory system that separates what happened this session from what we've learned across all sessions, the observability layer that makes improvement possible. In most frameworks, these are afterthoughts. Scaffolding bolted together and hoped at.

spore-core is an attempt to build the harness as a first-class engineering artifact — typed, testable, composable, and self-improving. This series documents that build from scratch, one component at a time, with the reasoning behind every design decision made explicit.

It's fifteen posts. The first two are conceptual and accessible to anyone building agent systems regardless of stack. Posts 3 through 8 are component deep-dives, progressively more technical. Posts 9 and 10 cover human-in-the-loop and multi-agent patterns. Posts 11 and 12 cover deployment and building the harness in four languages (Rust, TypeScript, Python, Go). Posts 13 through 15 are retrospective — observability, evaluation, and what an external code review of the spec caught that the design phase missed.

If you've ever shipped an agent system and spent more time fighting the environment than the actual problem, this series is for you.

#PostWhat It Covers
1The Model Is Not the ProblemThe case for harness engineering. Agent = Model + Harness, and the harness is the variable.
2The Wrong Mental Model (And the Right One)Why layers are the wrong abstraction and inversion of control is the right one. The full component map.
3The Agent Is One TurnWhat the agent loop actually is. Five real loop strategies. Why termination is a policy decision, not the model's call.
4The Sandbox Is a Capability, Not a ContainerIsolation without Docker overhead. Four isolation modes. Why bash is the hard problem.
5Context Is Not a Dump, It's a BudgetCache-aware context assembly. Three-block architecture. Compaction, truncation, and post-compaction drift detection.
6The Harness Knows When You're Done (The Model Doesn't)Termination policy, error routing, and the middleware chain. Why the model will declare victory early, every time.
7Memory Is Two Different ThingsEpisodic vs semantic memory. The four-level identity hierarchy. Why conflating them causes real problems.
8The Harness Gets Smarter (The Model Doesn't)The guide lifecycle and the improvement flywheel. Why automated proposals always start in pending review.
9Human in the Loop Without Blocking the ThreadAsync suspend/resume. The permission model. Why HITL has to be designed in from the start, not bolted on.
10The Agent Is One Turn (But Sometimes There Are Two Agents)Multi-agent patterns without orchestration complexity. Sequential agents, SubagentTool, and what's deferred to post-v1.
11The Harness Is Stateless (So It Can Go Anywhere)Deployment surfaces: CLI, REST, library, queue worker, subprocess. The ecosystem strategy.
12Building Spore in Four LanguagesRust, TypeScript, Python, Go. Real implementation lessons including the RPITIT dyn-compatibility problem and the RecordingModelInterface pattern.
13The Observability Stack Is Not OptionalWhy traces are the foundation, not the feature. Grafana + Tempo + Loki + Prometheus. The local outbox pattern.
14The Eval Harness Is Not a BenchmarkHow to measure harness improvement without Goodhart's Law. Three task tiers. Statistical comparison vs eyeballing.
15What the Code Review Taught UsExternal review surfaced four real gaps in the spec. What they were, why they weren't obvious, and what changed because of them.

spore-core is open source. Follow along on GitHub.