Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vers.sh/llms.txt

Use this file to discover all available pages before exploring further.

Stateless compute is one of the best-aged ideas in infrastructure. The horizontally-scaled web, container orchestration, serverless, edge workers — all of it rests on the same assumption: a worker has no memory of what came before. Receive a request, produce a response, terminate. Anything that survives between requests goes in a database, a cache, or a queue. That assumption was right for twenty years. It is wrong for agents.

What statelessness actually assumes

The stateless worker is a useful fiction. It lets you run a thousand of them in parallel, replace any of them without coordination, and scale horizontally without distributed-systems pain. But it only works because of an implicit bargain: the worker’s job is small enough and self-contained enough that its context fits in the request. A web handler processing a POST has its context right there in the body. A Kubernetes pod running a Go service has its context in the config it booted with. A Lambda handler gets its context as the event payload. In every case, the worker needs to know nothing about what happened yesterday, what it did an hour ago, or what’s in /tmp. Whatever’s needed arrives with the request. The operating assumption is that context is data. Data can be moved. Data can be serialized. Data goes in a database.

Then the workers got bigger jobs

Agents broke the bargain. An agent running a coding task or an autonomous research job isn’t processing a request — it’s doing work. For a long time. With accumulating state. The job takes twenty minutes, an hour, sometimes longer. Over that span, the worker:
  • Clones three repos
  • Runs npm install and waits eight minutes
  • Seeds a local Postgres
  • Opens four long-lived shells
  • Edits forty files in /workspace
  • Spawns a dev server bound to :3000
  • Logs in to three OAuth-authenticated services
  • Loads a 7GB model into memory
  • Caches tokenizer output
  • Pre-computes embeddings for a 400-page PDF
  • JIT-compiles hot paths of a Python library
  • Has a node_modules/ directory it’s been warming for half an hour
At minute 45, it crashes. Or gets preempted. Or its container gets rescheduled. Or the cluster autoscaler decides to recycle its pod. Or the user cancels the task and asks to “pick up from here” tomorrow. What’s the context to restore?

Context is not a message array

When an agent framework tries to “checkpoint” an agent, it usually serializes the message history — the transcript of prompts and tool calls — and calls it done. That’s the smallest possible definition of context, and it’s the only one that fits in the stateless-compute model. The rest of the context, the context that actually mattered, is gone:
Filesystem stateAll the files the agent created, edited, downloaded, cached. Forty modified source files. A seeded database. A built binary.
Process stateThe running dev server. The background tsc --watch. The shell with the half-typed command. The language server holding its symbol index.
Network stateOpen sockets with keep-alives still alive. OAuth refresh tokens the agent negotiated. Rate-limit windows it’s tracking. A WebSocket to a collaboration tool.
Memory stateThe model weights it spent three minutes loading. The LRU cache warmed by fifty tool calls. The JIT-compiled hot path. The in-process vector store.
Environmental stateThe packages the agent installed to fix a dependency error. The env vars it exported. The DNS override it needed for a flaky service.
Collaboration stateThe branch it pushed to, the pull request it opened, the review comment it was half-way through writing.
Restore the message array and you’ve restored the description of the work. The work itself — every byte the agent actually produced, every pid still running, every cache that wasn’t cold — is lost. The next run starts over. For a chat assistant handling one turn at a time, this is fine. For an agent ten minutes into a compound task, it’s catastrophic. It means the checkpoint is theater: a thing you can load and feel like you’re resuming from, while actually paying the full provisioning cost again, just with the transcript preloaded.

The two bad workarounds

Everyone building serious agent systems has hit this wall. There are two bad workarounds everyone tries.

1. Make the agent itself stateless

Pretend the worker has no context between tool calls. Serialize everything the agent needs into the next prompt. Recompute the world before every operation. This works in demos. It does not work in production, because:
  • Every tool call pays the provisioning cost again.
  • Compounding work (install, then build, then test) serializes the whole pipeline into each step’s context window.
  • The agent loses the cheap check that’s fundamental to engineering: is the thing I built still there?
  • Long-running side effects (background processes, open connections, filesystem caches) can’t be expressed in a prompt at all.
This is the “recursive cron job” school of agent design. It reduces a 45-minute job to 45 sequential one-minute jobs, each paying the full setup tax. Total wall time explodes, and you’re still fragile — the agent can’t inherit anything useful from its prior self.

2. Keep the agent running forever

Put the worker on a long-lived VM or container and never kill it. Now the state is real, because the process holding it never dies. But:
  • You’ve given up horizontal scale. The worker can only do one thing at a time, on one machine.
  • Forking into parallel explorations is impossible. The state is a single timeline on a single host.
  • Recovery from crashes is manual. When the process dies, its state dies.
  • Rescheduling is impossible. The worker is pinned to its node.
  • Snapshots are possible in theory but cost seconds-to-minutes in practice, so they happen rarely if ever.
This is the “big expensive VM” school. It accepts the statelessness is the wrong answer and doubles down on statefulness without making statefulness cheap. You end up with one agent per VM, zero parallelism, and panic when the VM crashes.

What agents actually need

The primitive agents need isn’t statelessness and it isn’t a long-lived single VM. It’s something that doesn’t have a common name yet:
  • Branchable state. At any point, an agent should be able to fork its current state into N parallel continuations, try them all, keep the winners, discard the rest.
  • Committable state. At any milestone, the worker should be able to snapshot its full state — filesystem, memory, processes, sockets — into an immutable, content-addressable reference it can return to.
  • Restorable state. That reference should boot a fresh worker to byte-identical state anywhere, anytime, in microseconds.
  • Cheap. All of this in latencies that allow it inside the hot path, not as a batch operation. If checkpointing takes thirty seconds, you’ll checkpoint once an hour. If it takes 258µs, you’ll checkpoint on every tool call.
With those primitives, the failure modes of the two bad workarounds disappear. You don’t pay provisioning cost per tool call — you inherit state from the previous one. You don’t pin one agent to one host — you branch across hosts. You don’t panic when a worker crashes — you restore from the last commit. The reason those primitives aren’t standard yet is that they were too expensive to build until recently. A VM fork that takes two seconds is novel but useless inside a tool-call loop. A VM fork that takes 258µs is a primitive — and the software you can build on top of it looks nothing like what stateless compute enables.

What this implies

If you’re building agent infrastructure today, you’re going to hit this wall. Every serious team has. The question isn’t whether context exceeds the message array — it does, always — it’s what you do when you notice. The reflex is to work around: serialize more into prompts, sit on longer-lived VMs, build bespoke snapshotting on top of whatever hypervisor you’re using. These are reasonable tactics and they take you further than you’d expect. They do not take you all the way. At some point the cost of rebuilding the context from scratch, or the cost of pinning your agent to a single long-lived node, becomes the dominant cost in your system. At that point you need a different primitive. That’s what Vers is. Branching, committing, restoring a running VM in microseconds — because the thing holding your agent’s context isn’t a message history, it’s an entire machine, and the only way to preserve it honestly is to preserve the machine itself. Statelessness was correct. It was correct for a kind of worker that almost always fit the bargain. Agents don’t fit. Once you accept that, the shape of the right infrastructure clarifies quickly.

Further reading

The cost of rebuilding state

Most engineering time is not spent doing work — it’s spent re-creating the conditions under which work can be done.

Agent swarms tutorial

The pattern that emerges when branching state is cheap: fork one golden worker into N parallel continuations.

Architecture

How the branching, committing, and restoring primitives actually work.

Why Vers?

The shorter, less philosophical version — how the primitive differs from sandboxes, hypervisors, and schedulers.