Documentation Index
Fetch the complete documentation index at: https://docs.vers.sh/llms.txt
Use this file to discover all available pages before exploring further.
This tutorial walks through building a Recursive Language Model (RLM) — an inference pattern where an LLM writes code to programmatically explore data instead of reading it all at once. We’ll use ts-rlm for the framework and Vers VMs for isolated code execution.
What You’ll Learn
- What RLMs are and why they outperform raw prompting on large contexts
- Building an RLM that analyzes documents through iterative code execution
- Adding custom tools (API calls, databases, file access)
- Using Vers VMs as sandboxed interpreters for safe code execution
Prerequisites
- Vers CLI installed and authenticated
- Bun runtime installed
- An Anthropic or OpenAI API key
The Problem RLMs Solve
LLMs have context windows — 200K tokens for Claude, 128K for GPT-4o. But context windows have two problems:
- Cost: Sending 200K tokens per request is expensive
- Accuracy: Models get worse at finding specific information as context grows (Lost in the Middle)
RLMs solve both. Instead of feeding the entire document to the model, you put the data in a variable and give the model a REPL. The model writes code to search, filter, and extract what it needs — touching only the relevant parts.
Traditional: LLM ← entire 500-page document → answer
RLM: LLM → "print(document.slice(0, 500))" → sees preview
LLM → "print(document.match(/revenue/gi).length)" → 47 matches
LLM → "llmQuery('Summarize: ' + document.slice(12000, 14000))" → sub-summary
LLM → FINAL(answer)
The model explores the data programmatically, calling sub-LLMs only on the slices it needs.
Step 1: Set Up the Project
Create a Vers VM
mkdir rlm-tutorial && cd rlm-tutorial
vers init
Edit vers.toml:
[machine]
mem_size_mib = 4096
vcpu_count = 2
fs_size_vm_mib = 4096
[rootfs]
name = "default"
[kernel]
name = "default.bin"
vers run --vm-alias rlm
vers connect
Install Dependencies
Inside the VM:
curl -fsSL https://bun.sh/install | bash
source ~/.bashrc
mkdir -p /root/rlm-project && cd /root/rlm-project
bun init -y
bun add ts-rlm
Step 2: Build a Basic RLM
Create basic.ts:
import { RLM, configure, AnthropicAdapter } from "ts-rlm";
configure(
new AnthropicAdapter({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-sonnet-4-20250514",
})
);
const rlm = new RLM({
signature: "document, question -> answer",
maxIterations: 10,
maxLLMCalls: 20,
verbose: true,
});
const document = `
# Q4 2024 Financial Report
## Revenue: $2.5 billion (+15% YoY)
- Cloud Services: $1.2B (48%)
- AI Products: $800M (32%)
- Legacy Software: $500M (20%)
## Regional Breakdown
- North America: $1.5B (60%)
- Europe: $625M (25%)
- Asia Pacific: $375M (15%)
## Key Metrics
- Enterprise customers: 15,000
- Net revenue retention: 125%
- Gross margin: 72%
- Operating margin: 28%
- Employee count: 8,500
- R&D spend: $450M (18% of revenue)
## 2025 Outlook
Projected full-year revenue: $12B
Primary growth drivers: AI market expansion, enterprise cloud adoption
Planned headcount increase: 2,000 new hires
`;
const result = await rlm.forward({
document,
question: "What percentage of revenue comes from the fastest-growing segment?",
});
console.log("Answer:", result.answer);
console.log("Steps:", result.trajectory.length);
Run it:
ANTHROPIC_API_KEY=<your_key> bun basic.ts
With verbose: true, you’ll see the model’s reasoning and code at each step. A typical trajectory:
- Explore:
print(document.slice(0, 300)) — sees the structure
- Search:
print(document.match(/growth|fastest|YoY/gi)) — finds growth indicators
- Extract: reads the AI Products line, sees 32%
- Verify: calls
llmQuery() to confirm AI is the fastest-growing segment
- Answer:
FINAL("32% — AI Products is the fastest-growing segment at $800M")
Five steps instead of sending the whole document.
Tools let the LLM interact with external systems from inside the REPL. Create with-tools.ts:
import { RLM, configure, AnthropicAdapter, Signature } from "ts-rlm";
configure(
new AnthropicAdapter({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-sonnet-4-20250514",
})
);
const signature = new Signature(
{
query: { desc: "User's question about system data" },
},
{
answer: { desc: "Answer based on retrieved data" },
sources: { desc: "List of data sources consulted", type: "array" },
},
"Answer questions by fetching and analyzing system data."
);
const rlm = new RLM({
signature,
maxIterations: 15,
maxLLMCalls: 10,
verbose: true,
tools: {
// Read files from the VM filesystem
readFile: async (path: string): Promise<string> => {
const proc = Bun.spawnSync(["cat", path]);
return proc.stdout.toString() || `Error: ${proc.stderr.toString()}`;
},
// List directory contents
listDir: async (path: string): Promise<string> => {
const proc = Bun.spawnSync(["ls", "-la", path]);
return proc.stdout.toString() || `Error: ${proc.stderr.toString()}`;
},
// Run a shell command
runCommand: async (cmd: string): Promise<string> => {
const proc = Bun.spawnSync(["bash", "-c", cmd]);
return proc.stdout.toString() || proc.stderr.toString();
},
// Fetch a URL
fetchUrl: async (url: string): Promise<string> => {
const res = await fetch(url);
return await res.text();
},
},
});
const result = await rlm.forward({
query: "What Linux distribution is this system running and how much disk space is available?",
});
console.log("Answer:", result.answer);
console.log("Sources:", result.sources);
ANTHROPIC_API_KEY=<your_key> bun with-tools.ts
The model calls readFile("/etc/os-release") and runCommand("df -h"), then synthesizes the answer. It never needs you to paste system info into the prompt.
The runCommand tool above executes arbitrary shell commands. In production, restrict what commands are allowed or use a dedicated Vers VM per execution (see Step 5).
Step 4: Process Unbounded Data
The real power of RLMs shows with data that doesn’t fit in a context window. Create large-dataset.ts:
import { RLM, configure, AnthropicAdapter } from "ts-rlm";
configure(
new AnthropicAdapter({
apiKey: process.env.ANTHROPIC_API_KEY,
model: "claude-sonnet-4-20250514",
})
);
// Generate a large dataset (simulating real-world data)
const tickets = Array.from({ length: 10000 }, (_, i) => ({
id: i + 1,
status: ["open", "closed", "pending"][Math.floor(Math.random() * 3)],
priority: ["low", "medium", "high", "critical"][Math.floor(Math.random() * 4)],
category: ["billing", "technical", "account", "feature"][Math.floor(Math.random() * 4)],
created: new Date(2024, Math.floor(Math.random() * 12), Math.floor(Math.random() * 28) + 1).toISOString(),
resolution_hours: Math.random() > 0.3 ? Math.floor(Math.random() * 72) + 1 : null,
description: `Ticket ${i + 1}: ${["Cannot login", "Billing error", "Feature request", "Performance issue", "Data export broken"][Math.floor(Math.random() * 5)]}`,
}));
const dataset = JSON.stringify(tickets);
console.log(`Dataset size: ${(dataset.length / 1024).toFixed(0)}KB, ${tickets.length} tickets`);
const rlm = new RLM({
signature: "tickets, question -> answer, methodology",
maxIterations: 15,
maxLLMCalls: 10,
verbose: true,
});
const result = await rlm.forward({
tickets: dataset,
question: "What's the average resolution time for critical tickets by category? Which category has the worst response time?",
});
console.log("\nAnswer:", result.answer);
console.log("Methodology:", result.methodology);
console.log("Steps:", result.trajectory.length);
ANTHROPIC_API_KEY=<your_key> bun large-dataset.ts
The model can’t read 10,000 tickets at once. Instead it:
- Parses the JSON:
const data = JSON.parse(tickets)
- Filters:
const critical = data.filter(t => t.priority === "critical")
- Groups:
const byCategory = Object.groupBy(critical, t => t.category)
- Computes: averages per category using
reduce
- Answers with the numbers and methodology
The LLM touched maybe 2,000 tokens of code and output. The full dataset was 500KB+ but never entered the context window directly — it lived in a JavaScript variable.
Step 5: Isolated Execution with Vers VMs
The examples above run code on whatever machine hosts the interpreter. For production use — especially with untrusted data or tools that make network calls — you want isolation. Vers VMs give you that.
The Pattern
Instead of running the BunInterpreter locally, you can implement a custom CodeInterpreter that executes code inside a Vers VM:
import type { CodeInterpreter, FinalAnswerResult, ToolFunction } from "ts-rlm";
class VersInterpreter implements CodeInterpreter {
private vmId: string;
readonly tools: Record<string, ToolFunction> = {};
constructor(private commitId: string) {
this.vmId = "";
}
async start(): Promise<void> {
// Restore a fresh VM from golden image
const res = await fetch("https://api.vers.sh/api/v1/vm/from_commit", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${process.env.VERS_API_KEY}`,
},
body: JSON.stringify({ commit_id: this.commitId }),
});
const data = await res.json();
this.vmId = data.vm_id;
}
async execute(code: string, variables?: Record<string, unknown>): Promise<FinalAnswerResult | string | null> {
// Write variables + code to VM, execute via SSH, parse output
// Each iteration runs in the same VM — state persists
const script = `
${variables ? Object.entries(variables).map(([k, v]) => `const ${k} = ${JSON.stringify(v)};`).join("\n") : ""}
${code}
`;
const output = await sshExec(this.vmId, `cd /root && bun -e '${escapeForShell(script)}'`);
// Parse for FINAL() calls, return appropriately
return output || null;
}
async shutdown(): Promise<void> {
// Delete the VM
await fetch(`https://api.vers.sh/api/v1/vm/${this.vmId}`, {
method: "DELETE",
headers: { Authorization: `Bearer ${process.env.VERS_API_KEY}` },
});
}
}
Pass it to the RLM:
const rlm = new RLM({
signature: "data, question -> answer",
interpreter: new VersInterpreter("<golden-image-commit-id>"),
maxIterations: 10,
});
Now every code execution step runs inside an isolated Vers VM. The model can rm -rf / and it only destroys its own sandbox.
Branching for Parallel Exploration
Vers branching enables a pattern that local interpreters can’t: exploring multiple solution paths simultaneously.
Step 1-5: explore data, narrow down candidates
|
vers branch
/ \
Path A Path B
(approach 1) (approach 2)
\ /
compare results
|
best answer
Branch the VM at a decision point. Both branches have identical state (variables, intermediate results, installed packages). Run different analysis strategies in parallel. Take the better result.
This is especially useful for:
- Ambiguous queries where the right analytical approach isn’t obvious upfront
- Validation — run the same analysis two different ways and compare
- Exploration — one branch does statistical analysis, another does semantic analysis via
llmQuery()
How RLMs Compare
| Approach | Context used | Cost | Accuracy on large data |
|---|
| Raw prompting | Entire document | High | Degrades with size |
| RAG | Retrieved chunks | Medium | Misses cross-references |
| RLM | Only code + output | Low | Scales with data size |
RLMs trade latency for accuracy and cost. Each iteration is an LLM call (~1-3 seconds), and a typical task takes 5-10 iterations. But the total tokens consumed are much lower than sending the full context, and accuracy stays high because the model is systematically searching rather than scanning.
When to Use RLMs
Good fit:
- Data larger than ~50K tokens where specific information needs to be extracted
- Structured data (JSON, CSV, logs) where code can filter efficiently
- Tasks requiring aggregation, counting, or computation over data
- Multi-step analysis where intermediate results inform next steps
Poor fit:
- Short texts that fit comfortably in context
- Creative tasks (writing, brainstorming) where exploration isn’t needed
- Real-time applications where 10-30 seconds of iteration latency is too slow
- Unstructured prose where code-based search has no advantage over reading
Next Steps
- Browse the ts-rlm examples for more patterns (Zendesk chat, OpenRouter multi-model)
- Read the RLM paper (Zhang, Kraska, Khattab, 2025) for the theoretical foundation
- Try the agent swarms tutorial to run multiple RLMs in parallel across Vers VMs
- Explore the API Reference for programmatic VM management