Context Compaction

How OpenWalrus compresses conversation history to keep agents running within context limits, without losing identity or key facts.

Long-running agent sessions eventually fill the LLM's context window. Context compaction solves this by summarizing the conversation and replacing history with a dense summary, while preserving agent identity and key decisions.

How compaction triggers

Compaction is agent-initiated. When the agent decides the context is getting long, it calls the compact tool:

{
  "name": "compact",
  "parameters": {}
}

This returns a COMPACT_SENTINEL string that signals the execution loop to begin compaction.

The compaction flow

Agent calls compact tool
Dispatch returns COMPACT_SENTINEL plus recent journal entries (for continuity)
run_stream() detects the sentinel
Agent::compact() sends the full conversation history to the LLM with a compaction prompt
The LLM produces a dense summary
The summary is stored as a journal entry in memory with a 384-dim vector embedding (all-MiniLM-L6-v2)
History is replaced with a single user message containing the summary
The agent continues with a clean, compact context

Identity preservation

The compaction prompt includes the agent's full system prompt — including any <self>, <identity>, <profile>, and <journal> blocks injected by the memory hook. This ensures the LLM preserves the agent's name, personality traits, and learned user preferences in the summary.

The compaction prompt

The LLM receives instructions to:

Preserve — agent identity, user profile, key decisions and rationale, active tasks and status, important facts and constraints, relevant tool results
Omit — greetings and filler, superseded plans, tool calls whose results are already incorporated

The output is dense prose (not bullet points), designed to be self-contained context for the next part of the conversation.

Retrieving past context

Compaction summaries are stored as journal entries with vector embeddings. Use the distill tool to search past sessions by semantic similarity:

{
  "name": "distill",
  "parameters": {
    "query": "authentication implementation decisions",
    "limit": 5
  }
}

This returns past journal summaries ranked by similarity to your query. Combined with remember/relate for extracting durable facts, this gives agents cross-session continuity.

What's next

Memory — the graph storage backing journals and entities
Hooks — the on_compact hook method
Agents — how run_stream handles compaction