Stateless compute is one of the best-aged ideas in infrastructure. The horizontally-scaled web, container orchestration, serverless, edge workers — all of it rests on the same assumption: a worker has no memory of what came before. Receive a request, produce a response, terminate. Anything that survives between requests goes in a database, a cache, or a queue. That assumption was right for twenty years. It is wrong for agents.Documentation Index
Fetch the complete documentation index at: https://docs.vers.sh/llms.txt
Use this file to discover all available pages before exploring further.
What statelessness actually assumes
The stateless worker is a useful fiction. It lets you run a thousand of them in parallel, replace any of them without coordination, and scale horizontally without distributed-systems pain. But it only works because of an implicit bargain: the worker’s job is small enough and self-contained enough that its context fits in the request. A web handler processing a POST has its context right there in the body. A Kubernetes pod running a Go service has its context in the config it booted with. A Lambda handler gets its context as the event payload. In every case, the worker needs to know nothing about what happened yesterday, what it did an hour ago, or what’s in/tmp. Whatever’s needed arrives with the request.
The operating assumption is that context is data. Data can be moved. Data can be serialized. Data goes in a database.
Then the workers got bigger jobs
Agents broke the bargain. An agent running a coding task or an autonomous research job isn’t processing a request — it’s doing work. For a long time. With accumulating state. The job takes twenty minutes, an hour, sometimes longer. Over that span, the worker:- Clones three repos
- Runs
npm installand waits eight minutes - Seeds a local Postgres
- Opens four long-lived shells
- Edits forty files in
/workspace - Spawns a dev server bound to
:3000 - Logs in to three OAuth-authenticated services
- Loads a 7GB model into memory
- Caches tokenizer output
- Pre-computes embeddings for a 400-page PDF
- JIT-compiles hot paths of a Python library
- Has a
node_modules/directory it’s been warming for half an hour
Context is not a message array
When an agent framework tries to “checkpoint” an agent, it usually serializes the message history — the transcript of prompts and tool calls — and calls it done. That’s the smallest possible definition of context, and it’s the only one that fits in the stateless-compute model. The rest of the context, the context that actually mattered, is gone:| Filesystem state | All the files the agent created, edited, downloaded, cached. Forty modified source files. A seeded database. A built binary. |
| Process state | The running dev server. The background tsc --watch. The shell with the half-typed command. The language server holding its symbol index. |
| Network state | Open sockets with keep-alives still alive. OAuth refresh tokens the agent negotiated. Rate-limit windows it’s tracking. A WebSocket to a collaboration tool. |
| Memory state | The model weights it spent three minutes loading. The LRU cache warmed by fifty tool calls. The JIT-compiled hot path. The in-process vector store. |
| Environmental state | The packages the agent installed to fix a dependency error. The env vars it exported. The DNS override it needed for a flaky service. |
| Collaboration state | The branch it pushed to, the pull request it opened, the review comment it was half-way through writing. |
The two bad workarounds
Everyone building serious agent systems has hit this wall. There are two bad workarounds everyone tries.1. Make the agent itself stateless
Pretend the worker has no context between tool calls. Serialize everything the agent needs into the next prompt. Recompute the world before every operation. This works in demos. It does not work in production, because:- Every tool call pays the provisioning cost again.
- Compounding work (install, then build, then test) serializes the whole pipeline into each step’s context window.
- The agent loses the cheap check that’s fundamental to engineering: is the thing I built still there?
- Long-running side effects (background processes, open connections, filesystem caches) can’t be expressed in a prompt at all.
2. Keep the agent running forever
Put the worker on a long-lived VM or container and never kill it. Now the state is real, because the process holding it never dies. But:- You’ve given up horizontal scale. The worker can only do one thing at a time, on one machine.
- Forking into parallel explorations is impossible. The state is a single timeline on a single host.
- Recovery from crashes is manual. When the process dies, its state dies.
- Rescheduling is impossible. The worker is pinned to its node.
- Snapshots are possible in theory but cost seconds-to-minutes in practice, so they happen rarely if ever.
What agents actually need
The primitive agents need isn’t statelessness and it isn’t a long-lived single VM. It’s something that doesn’t have a common name yet:- Branchable state. At any point, an agent should be able to fork its current state into N parallel continuations, try them all, keep the winners, discard the rest.
- Committable state. At any milestone, the worker should be able to snapshot its full state — filesystem, memory, processes, sockets — into an immutable, content-addressable reference it can return to.
- Restorable state. That reference should boot a fresh worker to byte-identical state anywhere, anytime, in microseconds.
- Cheap. All of this in latencies that allow it inside the hot path, not as a batch operation. If checkpointing takes thirty seconds, you’ll checkpoint once an hour. If it takes 258µs, you’ll checkpoint on every tool call.
What this implies
If you’re building agent infrastructure today, you’re going to hit this wall. Every serious team has. The question isn’t whether context exceeds the message array — it does, always — it’s what you do when you notice. The reflex is to work around: serialize more into prompts, sit on longer-lived VMs, build bespoke snapshotting on top of whatever hypervisor you’re using. These are reasonable tactics and they take you further than you’d expect. They do not take you all the way. At some point the cost of rebuilding the context from scratch, or the cost of pinning your agent to a single long-lived node, becomes the dominant cost in your system. At that point you need a different primitive. That’s what Vers is. Branching, committing, restoring a running VM in microseconds — because the thing holding your agent’s context isn’t a message history, it’s an entire machine, and the only way to preserve it honestly is to preserve the machine itself. Statelessness was correct. It was correct for a kind of worker that almost always fit the bargain. Agents don’t fit. Once you accept that, the shape of the right infrastructure clarifies quickly.Further reading
The cost of rebuilding state
Most engineering time is not spent doing work — it’s spent re-creating the conditions under which work can be done.
Agent swarms tutorial
The pattern that emerges when branching state is cheap: fork one golden worker into N parallel continuations.
Architecture
How the branching, committing, and restoring primitives actually work.
Why Vers?
The shorter, less philosophical version — how the primitive differs from sandboxes, hypervisors, and schedulers.