Recursive Language Models

This tutorial walks through building a Recursive Language Model (RLM) — an inference pattern where an LLM writes code to programmatically explore data instead of reading it all at once. We’ll use ts-rlm for the framework and Vers VMs for isolated code execution.

What You’ll Learn

What RLMs are and why they outperform raw prompting on large contexts
Building an RLM that analyzes documents through iterative code execution
Adding custom tools (API calls, databases, file access)
Using Vers VMs as sandboxed interpreters for safe code execution

Prerequisites

Vers CLI installed and authenticated
Bun runtime installed
An Anthropic or OpenAI API key

The Problem RLMs Solve

LLMs have context windows — 200K tokens for Claude, 128K for GPT-4o. But context windows have two problems:

Cost: Sending 200K tokens per request is expensive
Accuracy: Models get worse at finding specific information as context grows (Lost in the Middle)

RLMs solve both. Instead of feeding the entire document to the model, you put the data in a variable and give the model a REPL. The model writes code to search, filter, and extract what it needs — touching only the relevant parts.

Traditional:  LLM ← entire 500-page document → answer
RLM:          LLM → "print(document.slice(0, 500))" → sees preview
              LLM → "print(document.match(/revenue/gi).length)" → 47 matches
              LLM → "llmQuery('Summarize: ' + document.slice(12000, 14000))" → sub-summary
              LLM → FINAL(answer)

The model explores the data programmatically, calling sub-LLMs only on the slices it needs.

Step 1: Set Up the Project

Create a Vers VM

mkdir rlm-tutorial && cd rlm-tutorial
vers init

Edit vers.toml:

[machine]
mem_size_mib = 4096
vcpu_count = 2
fs_size_vm_mib = 4096

[rootfs]
name = "default"

[kernel]
name = "default.bin"

vers run --vm-alias rlm
vers connect

Install Dependencies

Inside the VM:

curl -fsSL https://bun.sh/install | bash
source ~/.bashrc

mkdir -p /root/rlm-project && cd /root/rlm-project
bun init -y
bun add ts-rlm

Step 2: Build a Basic RLM

Create basic.ts:

import { RLM, configure, AnthropicAdapter } from "ts-rlm";

configure(
  new AnthropicAdapter({
    apiKey: process.env.ANTHROPIC_API_KEY,
    model: "claude-sonnet-4-20250514",
  })
);

const rlm = new RLM({
  signature: "document, question -> answer",
  maxIterations: 10,
  maxLLMCalls: 20,
  verbose: true,
});

const document = `
# Q4 2024 Financial Report

## Revenue: $2.5 billion (+15% YoY)
- Cloud Services: $1.2B (48%)
- AI Products: $800M (32%)
- Legacy Software: $500M (20%)

## Regional Breakdown
- North America: $1.5B (60%)
- Europe: $625M (25%)
- Asia Pacific: $375M (15%)

## Key Metrics
- Enterprise customers: 15,000
- Net revenue retention: 125%
- Gross margin: 72%
- Operating margin: 28%
- Employee count: 8,500
- R&D spend: $450M (18% of revenue)

## 2025 Outlook
Projected full-year revenue: $12B
Primary growth drivers: AI market expansion, enterprise cloud adoption
Planned headcount increase: 2,000 new hires
`;

const result = await rlm.forward({
  document,
  question: "What percentage of revenue comes from the fastest-growing segment?",
});

console.log("Answer:", result.answer);
console.log("Steps:", result.trajectory.length);

Run it:

ANTHROPIC_API_KEY=<your_key> bun basic.ts

With verbose: true, you’ll see the model’s reasoning and code at each step. A typical trajectory:

Explore: print(document.slice(0, 300)) — sees the structure
Search: print(document.match(/growth|fastest|YoY/gi)) — finds growth indicators
Extract: reads the AI Products line, sees 32%
Verify: calls llmQuery() to confirm AI is the fastest-growing segment
Answer: FINAL("32% — AI Products is the fastest-growing segment at $800M")

Five steps instead of sending the whole document.

Step 3: Add Custom Tools

Tools let the LLM interact with external systems from inside the REPL. Create with-tools.ts:

import { RLM, configure, AnthropicAdapter, Signature } from "ts-rlm";

configure(
  new AnthropicAdapter({
    apiKey: process.env.ANTHROPIC_API_KEY,
    model: "claude-sonnet-4-20250514",
  })
);

const signature = new Signature(
  {
    query: { desc: "User's question about system data" },
  },
  {
    answer: { desc: "Answer based on retrieved data" },
    sources: { desc: "List of data sources consulted", type: "array" },
  },
  "Answer questions by fetching and analyzing system data."
);

const rlm = new RLM({
  signature,
  maxIterations: 15,
  maxLLMCalls: 10,
  verbose: true,
  tools: {
    // Read files from the VM filesystem
    readFile: async (path: string): Promise<string> => {
      const proc = Bun.spawnSync(["cat", path]);
      return proc.stdout.toString() || `Error: ${proc.stderr.toString()}`;
    },

    // List directory contents
    listDir: async (path: string): Promise<string> => {
      const proc = Bun.spawnSync(["ls", "-la", path]);
      return proc.stdout.toString() || `Error: ${proc.stderr.toString()}`;
    },

    // Run a shell command
    runCommand: async (cmd: string): Promise<string> => {
      const proc = Bun.spawnSync(["bash", "-c", cmd]);
      return proc.stdout.toString() || proc.stderr.toString();
    },

    // Fetch a URL
    fetchUrl: async (url: string): Promise<string> => {
      const res = await fetch(url);
      return await res.text();
    },
  },
});

const result = await rlm.forward({
  query: "What Linux distribution is this system running and how much disk space is available?",
});

console.log("Answer:", result.answer);
console.log("Sources:", result.sources);

ANTHROPIC_API_KEY=<your_key> bun with-tools.ts

The model calls readFile("/etc/os-release") and runCommand("df -h"), then synthesizes the answer. It never needs you to paste system info into the prompt.

The runCommand tool above executes arbitrary shell commands. In production, restrict what commands are allowed or use a dedicated Vers VM per execution (see Step 5).

Step 4: Process Unbounded Data

The real power of RLMs shows with data that doesn’t fit in a context window. Create large-dataset.ts:

import { RLM, configure, AnthropicAdapter } from "ts-rlm";

configure(
  new AnthropicAdapter({
    apiKey: process.env.ANTHROPIC_API_KEY,
    model: "claude-sonnet-4-20250514",
  })
);

// Generate a large dataset (simulating real-world data)
const tickets = Array.from({ length: 10000 }, (_, i) => ({
  id: i + 1,
  status: ["open", "closed", "pending"][Math.floor(Math.random() * 3)],
  priority: ["low", "medium", "high", "critical"][Math.floor(Math.random() * 4)],
  category: ["billing", "technical", "account", "feature"][Math.floor(Math.random() * 4)],
  created: new Date(2024, Math.floor(Math.random() * 12), Math.floor(Math.random() * 28) + 1).toISOString(),
  resolution_hours: Math.random() > 0.3 ? Math.floor(Math.random() * 72) + 1 : null,
  description: `Ticket ${i + 1}: ${["Cannot login", "Billing error", "Feature request", "Performance issue", "Data export broken"][Math.floor(Math.random() * 5)]}`,
}));

const dataset = JSON.stringify(tickets);
console.log(`Dataset size: ${(dataset.length / 1024).toFixed(0)}KB, ${tickets.length} tickets`);

const rlm = new RLM({
  signature: "tickets, question -> answer, methodology",
  maxIterations: 15,
  maxLLMCalls: 10,
  verbose: true,
});

const result = await rlm.forward({
  tickets: dataset,
  question: "What's the average resolution time for critical tickets by category? Which category has the worst response time?",
});

console.log("\nAnswer:", result.answer);
console.log("Methodology:", result.methodology);
console.log("Steps:", result.trajectory.length);

ANTHROPIC_API_KEY=<your_key> bun large-dataset.ts

The model can’t read 10,000 tickets at once. Instead it:

Parses the JSON: const data = JSON.parse(tickets)
Filters: const critical = data.filter(t => t.priority === "critical")
Groups: const byCategory = Object.groupBy(critical, t => t.category)
Computes: averages per category using reduce
Answers with the numbers and methodology

The LLM touched maybe 2,000 tokens of code and output. The full dataset was 500KB+ but never entered the context window directly — it lived in a JavaScript variable.

Step 5: Isolated Execution with Vers VMs

The examples above run code on whatever machine hosts the interpreter. For production use — especially with untrusted data or tools that make network calls — you want isolation. Vers VMs give you that.

The Pattern

Instead of running the BunInterpreter locally, you can implement a custom CodeInterpreter that executes code inside a Vers VM:

import type { CodeInterpreter, FinalAnswerResult, ToolFunction } from "ts-rlm";

class VersInterpreter implements CodeInterpreter {
  private vmId: string;
  readonly tools: Record<string, ToolFunction> = {};

  constructor(private commitId: string) {
    this.vmId = "";
  }

  async start(): Promise<void> {
    // Restore a fresh VM from golden image
    const res = await fetch("https://api.vers.sh/api/v1/vm/from_commit", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${process.env.VERS_API_KEY}`,
      },
      body: JSON.stringify({ commit_id: this.commitId }),
    });
    const data = await res.json();
    this.vmId = data.vm_id;
  }

  async execute(code: string, variables?: Record<string, unknown>): Promise<FinalAnswerResult | string | null> {
    // Write variables + code to VM, execute via SSH, parse output
    // Each iteration runs in the same VM — state persists
    const script = `
      ${variables ? Object.entries(variables).map(([k, v]) => `const ${k} = ${JSON.stringify(v)};`).join("\n") : ""}
      ${code}
    `;
    const output = await sshExec(this.vmId, `cd /root && bun -e '${escapeForShell(script)}'`);
    // Parse for FINAL() calls, return appropriately
    return output || null;
  }

  async shutdown(): Promise<void> {
    // Delete the VM
    await fetch(`https://api.vers.sh/api/v1/vm/${this.vmId}`, {
      method: "DELETE",
      headers: { Authorization: `Bearer ${process.env.VERS_API_KEY}` },
    });
  }
}

Pass it to the RLM:

const rlm = new RLM({
  signature: "data, question -> answer",
  interpreter: new VersInterpreter("<golden-image-commit-id>"),
  maxIterations: 10,
});

Now every code execution step runs inside an isolated Vers VM. The model can rm -rf / and it only destroys its own sandbox.

Branching for Parallel Exploration

Vers branching enables a pattern that local interpreters can’t: exploring multiple solution paths simultaneously.

Step 1-5: explore data, narrow down candidates
         |
    vers branch
    /         \
  Path A       Path B
  (approach 1) (approach 2)
         \         /
     compare results
         |
    best answer

Branch the VM at a decision point. Both branches have identical state (variables, intermediate results, installed packages). Run different analysis strategies in parallel. Take the better result. This is especially useful for:

Ambiguous queries where the right analytical approach isn’t obvious upfront
Validation — run the same analysis two different ways and compare
Exploration — one branch does statistical analysis, another does semantic analysis via llmQuery()

How RLMs Compare

Approach	Context used	Cost	Accuracy on large data
Raw prompting	Entire document	High	Degrades with size
RAG	Retrieved chunks	Medium	Misses cross-references
RLM	Only code + output	Low	Scales with data size

RLMs trade latency for accuracy and cost. Each iteration is an LLM call (~1-3 seconds), and a typical task takes 5-10 iterations. But the total tokens consumed are much lower than sending the full context, and accuracy stays high because the model is systematically searching rather than scanning.

When to Use RLMs

Good fit:

Data larger than ~50K tokens where specific information needs to be extracted
Structured data (JSON, CSV, logs) where code can filter efficiently
Tasks requiring aggregation, counting, or computation over data
Multi-step analysis where intermediate results inform next steps

Poor fit:

Short texts that fit comfortably in context
Creative tasks (writing, brainstorming) where exploration isn’t needed
Real-time applications where 10-30 seconds of iteration latency is too slow
Unstructured prose where code-based search has no advantage over reading

Next Steps

Browse the ts-rlm examples for more patterns (Zendesk chat, OpenRouter multi-model)
Read the RLM paper (Zhang, Kraska, Khattab, 2025) for the theoretical foundation
Try the agent swarms tutorial to run multiple RLMs in parallel across Vers VMs
Explore the API Reference for programmatic VM management

Tutorials

Essays

Recursive Language Models

What You’ll Learn

Prerequisites

The Problem RLMs Solve

Step 1: Set Up the Project

Create a Vers VM

Install Dependencies

Step 2: Build a Basic RLM

Step 3: Add Custom Tools

Step 4: Process Unbounded Data

Step 5: Isolated Execution with Vers VMs

The Pattern

Branching for Parallel Exploration

How RLMs Compare

When to Use RLMs

Next Steps

Tutorials

Essays

Documentation Index

​What You’ll Learn

​Prerequisites

​The Problem RLMs Solve

​Step 1: Set Up the Project

​Create a Vers VM

​Install Dependencies

​Step 2: Build a Basic RLM

​Step 3: Add Custom Tools

​Step 4: Process Unbounded Data

​Step 5: Isolated Execution with Vers VMs

​The Pattern

​Branching for Parallel Exploration

​How RLMs Compare

​When to Use RLMs

​Next Steps

What You’ll Learn

Prerequisites

The Problem RLMs Solve

Step 1: Set Up the Project

Create a Vers VM

Install Dependencies

Step 2: Build a Basic RLM

Step 3: Add Custom Tools

Step 4: Process Unbounded Data

Step 5: Isolated Execution with Vers VMs

The Pattern

Branching for Parallel Exploration

How RLMs Compare

When to Use RLMs

Next Steps