
Building Spore: An Agentic Harness from First Principles
A 15-part series on building spore-core — a production-grade agentic harness built from scratch. Covers harness engineering, IoC architecture, sandbox isolation, context budgeting, termination policy, memory, observability, and multi-language implementation.
Building Spore: An Agentic Harness from First Principles
Most agent projects don't fail because the model wasn't smart enough. They fail because nobody treated the harness as a real engineering problem.
The harness is everything around the model: the loop that drives execution, the context manager that decides what the model sees on each turn, the termination policy that knows when the run is actually done, the sandbox that keeps tools from doing things they shouldn't, the memory system that separates what happened this session from what we've learned across all sessions, the observability layer that makes improvement possible. In most frameworks, these are afterthoughts. Scaffolding bolted together and hoped at.
spore-core is an attempt to build the harness as a first-class engineering artifact — typed, testable, composable, and self-improving. This series documents that build from scratch, one component at a time, with the reasoning behind every design decision made explicit.
It's fifteen posts. The first two are conceptual and accessible to anyone building agent systems regardless of stack. Posts 3 through 8 are component deep-dives, progressively more technical. Posts 9 and 10 cover human-in-the-loop and multi-agent patterns. Posts 11 and 12 cover deployment and building the harness in four languages (Rust, TypeScript, Python, Go). Posts 13 through 15 are retrospective — observability, evaluation, and what an external code review of the spec caught that the design phase missed.
If you've ever shipped an agent system and spent more time fighting the environment than the actual problem, this series is for you.
| # | Post | What It Covers |
|---|---|---|
| 1 | The Model Is Not the Problem | The case for harness engineering. Agent = Model + Harness, and the harness is the variable. |
| 2 | The Wrong Mental Model (And the Right One) | Why layers are the wrong abstraction and inversion of control is the right one. The full component map. |
| 3 | The Agent Is One Turn | What the agent loop actually is. Five real loop strategies. Why termination is a policy decision, not the model's call. |
| 4 | The Sandbox Is a Capability, Not a Container | Isolation without Docker overhead. Four isolation modes. Why bash is the hard problem. |
| 5 | Context Is Not a Dump, It's a Budget | Cache-aware context assembly. Three-block architecture. Compaction, truncation, and post-compaction drift detection. |
| 6 | The Harness Knows When You're Done (The Model Doesn't) | Termination policy, error routing, and the middleware chain. Why the model will declare victory early, every time. |
| 7 | Memory Is Two Different Things | Episodic vs semantic memory. The four-level identity hierarchy. Why conflating them causes real problems. |
| 8 | The Harness Gets Smarter (The Model Doesn't) | The guide lifecycle and the improvement flywheel. Why automated proposals always start in pending review. |
| 9 | Human in the Loop Without Blocking the Thread | Async suspend/resume. The permission model. Why HITL has to be designed in from the start, not bolted on. |
| 10 | The Agent Is One Turn (But Sometimes There Are Two Agents) | Multi-agent patterns without orchestration complexity. Sequential agents, SubagentTool, and what's deferred to post-v1. |
| 11 | The Harness Is Stateless (So It Can Go Anywhere) | Deployment surfaces: CLI, REST, library, queue worker, subprocess. The ecosystem strategy. |
| 12 | Building Spore in Four Languages | Rust, TypeScript, Python, Go. Real implementation lessons including the RPITIT dyn-compatibility problem and the RecordingModelInterface pattern. |
| 13 | The Observability Stack Is Not Optional | Why traces are the foundation, not the feature. Grafana + Tempo + Loki + Prometheus. The local outbox pattern. |
| 14 | The Eval Harness Is Not a Benchmark | How to measure harness improvement without Goodhart's Law. Three task tiers. Statistical comparison vs eyeballing. |
| 15 | What the Code Review Taught Us | External review surfaced four real gaps in the spec. What they were, why they weren't obvious, and what changed because of them. |
spore-core is open source. Follow along on GitHub.