Frameworks Don't Work — Except in Demos

Most agent frameworks ship with the same promise: chain a few calls together, add memory, deploy. The demo works. The blog post writes itself. The tweet goes viral.

Then you try to run it in production.

The agent forgets context mid-conversation. It retries the same failing tool call in a loop. There is no way to inspect what happened, no way to pause it, no way to let a human step in when the stakes are high. The framework gave you a starting point — but production demands a finishing point.

Where frameworks stop

Frameworks are designed for composition, not operation. They help you wire together an LLM, a vector store, and a tool. They do not help you run that wiring reliably, observe it under load, or govern it across a fleet of agents.

Here is where they consistently break down:

State management. Real agents are stateful. They run for minutes, hours, sometimes days. They need durable memory that survives restarts, not in-memory dictionaries that vanish when the process dies.

Failure handling. Production agents encounter rate limits, timeout errors, malformed responses, and tool failures. Frameworks retry naively or not at all. There is no circuit-breaking, no fallback routing, no graceful degradation.

Observability. When an agent produces a wrong answer, you need to trace the full decision path — every LLM call, every tool invocation, every memory retrieval. Frameworks log inconsistently or not at all.

Human oversight. High-stakes decisions require human-in-the-loop checkpoints. Frameworks assume full autonomy because that is what makes the demo impressive.

Governance. In production, you need audit trails, cost controls, access policies, and compliance boundaries. Frameworks treat these as someone else’s problem.

Libraries versus operating systems

The distinction matters. A library gives you building blocks. An operating system gives you the runtime — scheduling, resource management, inter-process communication, observability, security.

Agent frameworks are libraries. They help you build the agent. They do not help you run it.

What production agents need is an operating system: something that manages the full lifecycle from deployment to monitoring to shutdown. Something that handles the unglamorous work — retries, state persistence, cost tracking, human escalation — so the agent can focus on its actual task.

The gap is not closing

Framework maintainers know about these limitations. But closing the gap means fundamentally changing what the framework is. Adding state management, observability, governance, and human oversight to a composition library turns it into a platform. And platforms require a different architecture, a different team, and a different level of commitment.

That is why the gap persists. Frameworks keep optimizing for the first ten minutes of the developer experience. Production keeps demanding the other ten thousand hours.

The question is not whether your agent works in a demo. The question is whether it works on a Tuesday afternoon when the LLM provider is rate-limiting, the database is slow, and a customer is waiting for an answer that actually matters.