markdown.engineering
Lesson 40

Auto-Memory and the Dream System

How Claude Code silently extracts memories from every session — and consolidates them while you sleep.

01 Overview

Claude Code has two background processes running alongside every conversation that most users never see. The first runs at the end of each query: a forked sub-agent scans the transcript and writes durable facts to disk. The second runs across sessions: a consolidation agent wakes up after enough time and sessions have accumulated, reads the memory files, and rewrites them into something tighter. Together they form a two-layer memory lifecycle — extraction and consolidation — implemented in services/extractMemories/ and services/autoDream/.

Source files covered
services/extractMemories/extractMemories.ts · services/extractMemories/prompts.ts · services/autoDream/autoDream.ts · services/autoDream/config.ts · services/autoDream/consolidationLock.ts · services/autoDream/consolidationPrompt.ts · memdir/memdir.ts · memdir/memoryTypes.ts · memdir/paths.ts · memdir/memoryScan.ts · memdir/findRelevantMemories.ts · memdir/memoryAge.ts
Layer 1 — Per-turn

Extract Memories

After each final response, a forked agent reviews new messages and writes topic files to disk.

Layer 2 — Across sessions

Auto Dream

After 24 h + 5 sessions, a consolidation agent merges, prunes, and re-indexes the memory directory.

At query time

Relevant Recall

A Sonnet side-query selects up to 5 topic files that match the current query and injects them.

02 The Memory Directory

All persistent memory lives under a path computed by getAutoMemPath() in memdir/paths.ts. The resolution order is:

// memdir/paths.ts — resolution order (first defined wins)
1. CLAUDE_COWORK_MEMORY_PATH_OVERRIDE  // env var — Cowork space-scoped mount
2. getSettingsForSource('localSettings').autoMemoryDirectory  // settings.json
3. `~/.claude/projects/<sanitized-git-root>/memory/`  // default

The path is keyed on the canonical git root — all worktrees of the same repo share one memory directory. The function is memoized on getProjectRoot() because render-path callers fire on every tool-use re-render and the underlying logic (four getSettingsForSource calls each involving realpathSync) is not free.

Security note
projectSettings (committed .claude/settings.json) is intentionally excluded from the autoMemoryDirectory override. A malicious repo could otherwise silently redirect Claude's writes to ~/.ssh.

Directory structure

~/.claude/projects/<slug>/memory/
  MEMORY.md            # index — always injected into system prompt
  user_role.md         # topic files — one memory per file
  feedback_testing.md
  project_auth.md
  .consolidate-lock    # mtime == lastConsolidatedAt
  logs/                 # KAIROS/assistant-mode only: append-only daily logs
    2026/03/2026-03-31.md

MEMORY.md is the index. It is always loaded into the system prompt (truncated at 200 lines / 25 KB). Each line is a short pointer: - [Title](file.md) — one-line hook. The actual content lives in topic files.

03 The Four Memory Types

Memories are constrained to a closed taxonomy defined in memdir/memoryTypes.ts. The taxonomy exists to prevent content that is derivable from the current project state from polluting the store — code patterns, architecture, and git history are explicitly excluded.

type: user

User Profile

Role, goals, knowledge level, communication preferences. Helps Claude tailor future responses to this specific person.

type: feedback

Behavioral Guidance

Corrections AND confirmations. Stores the rule + Why: + How to apply: so edge cases can be judged, not blindly followed.

type: project

Project Context

Ongoing work, deadlines, incidents. Not derivable from code or git. Decays fast — the "why" line tells future-you if the memory is still load-bearing.

type: reference

External Pointers

Where to find information in external systems: Linear projects, Grafana dashboards, Slack channels.

Each topic file uses YAML frontmatter:

---
name: feedback_no_db_mocks
description: Integration tests must use a real DB — mocks masked a migration failure
type: feedback
---

# Don't mock the database in integration tests

**Why:** Prior incident where mock/prod divergence let a broken migration slip to production.
**How to apply:** Any test that exercises a DB query path must use a real (test) database.
What NOT to save
Code patterns, architecture, file paths, git history, debugging fix recipes, or anything already in CLAUDE.md. These exclusions apply even when the user explicitly asks. The rationale: code is always live-readable; memories of it go stale and generate false authority.
04 Layer 1 — Extract Memories

At the end of every query loop (when the model produces a final response with no tool calls), handleStopHooks fires executeExtractMemories() from services/extractMemories/extractMemories.ts. It uses the forked agent pattern — a perfect copy of the main conversation that shares the parent's prompt cache.

Closure-scoped state

All mutable state lives inside initExtractMemories() — not at module level. This is the same pattern used by confidenceRating.ts. Tests call initExtractMemories() in beforeEach to get a fresh closure. The key state variables are:

let lastMemoryMessageUuid: string | undefined  // cursor — only new messages since last run
let inProgress: boolean                          // overlap guard
let pendingContext: ...                           // stash for trailing run
let turnsSinceLastExtraction: number             // throttle counter

Overlap coalescing

If an extraction is already running when a new turn completes, the new context is stashed in pendingContext. When the running extraction finishes, it fires one trailing run using the stashed context — so no extraction is lost, but runs never overlap. Only the latest stashed context matters (it has the most messages), so repeated coalescing overwrites the stash.

Mutual exclusion with the main agent

The main agent's system prompt always includes full memory-save instructions. When the main agent writes to a memory path itself (detected via hasMemoryWritesSince()), the forked extraction is skipped entirely — advancing the cursor past that range so the two paths never double-write the same turn.

// extractMemories.ts — mutual exclusion check
if (hasMemoryWritesSince(messages, lastMemoryMessageUuid)) {
  // advance cursor, log event, return early
  logEvent('tengu_extract_memories_skipped_direct_write', { message_count })
  return
}

The extraction flow

flowchart TD A["handleStopHooks fires\nexecuteExtractMemories()"] --> B{"inProgress?"} B -->|"yes"| C["Stash context in pendingContext\nlogEvent coalesced"] B -->|"no"| D{"hasMemoryWritesSince\n(main agent wrote)?"} D -->|"yes"| E["Advance cursor\nSkip fork — main agent handled it"] D -->|"no"| F["scanMemoryFiles()\nBuild manifest of existing files"] F --> G["buildExtractAutoOnlyPrompt()\nor buildExtractCombinedPrompt()"] G --> H["runForkedAgent()\nmaxTurns=5, skipTranscript=true"] H --> I["Turn 1: parallel Read calls\nfor files to update"] I --> J["Turn 2: parallel Write/Edit calls"] J --> K["Advance lastMemoryMessageUuid cursor"] K --> L["extractWrittenPaths() from messages"] L --> M{"pendingContext\nstashed?"} M -->|"yes"| N["Run trailing extraction\n(isTrailingRun=true, skip throttle)"] M -->|"no"| O["appendSystemMessage:\n'Saved N memories'"]

Tool permissions for the forked agent

createAutoMemCanUseTool() returns a permission function shared by both extractMemories and autoDream. It enforces a strict allow-list:

// extractMemories.ts — createAutoMemCanUseTool()
if (tool.name === FILE_READ_TOOL_NAME || GREP || GLOB) return allow()
if (tool.name === BASH_TOOL_NAME && tool.isReadOnly(parsed.data)) return allow()
if ((EDIT || WRITE) && isAutoMemPath(input.file_path)) return allow()
return denyAutoMemTool(tool, reason) // logs + fires analytics event

The extraction prompt

The prompt sent to the forked agent (from prompts.ts) has a deliberate efficiency instruction that maps directly to the turn budget:

Prompt design — two-turn strategy
"Turn 1 — issue all Read calls in parallel for every file you might update. Turn 2 — issue all Write/Edit calls in parallel. Do not interleave reads and writes across multiple turns." This hard limit on interleaving is why maxTurns: 5 is sufficient for well-behaved runs.

The prompt also pre-injects the memory directory manifest (from scanMemoryFiles()) so the agent does not spend a turn on ls. The manifest format:

// memoryScan.ts — formatMemoryManifest()
// Output: one line per file
"- [feedback] feedback_no_db_mocks.md (2026-03-30T12:00:00Z): Integration tests must use a real DB"
05 Layer 2 — Auto Dream (Consolidation)

The dream system runs the same forked agent pattern but on a much longer cadence. It fires at the end of a query loop (via executeAutoDream()) only when three gates pass in order, cheapest first:

Gate 1 — Cheapest

Time Gate

Hours since lastConsolidatedAtminHours (default: 24). Cost: one stat() call per turn.

Gate 2

Session Gate

Transcript files with mtime > lastConsolidatedAtminSessions (default: 5). Cost: one directory scan, throttled to once per 10 minutes.

Gate 3

Lock Gate

No other process mid-consolidation. Implemented via a PID file whose mtime is the timestamp.

The lock file design

The lock file is .consolidate-lock inside the memory directory. Its design is elegant: the body is the holder's PID; the mtime of the file IS lastConsolidatedAt. This means reading "when did we last consolidate?" costs exactly one stat().

// consolidationLock.ts — acquire sequence
const [s, raw] = await Promise.all([stat(path), readFile(path, 'utf8')])
// If holder PID is still running AND lock isn't stale (>1h) → bail
if (isProcessRunning(holderPid)) return null
// Otherwise reclaim: write our PID, verify we won the race
await writeFile(path, String(process.pid))
const verify = await readFile(path, 'utf8')
if (parseInt(verify) !== process.pid) return null // lost the race

On failure, rollbackConsolidationLock(priorMtime) rewinds the mtime via utimes(), restoring the pre-acquire timestamp. On crash, the dead PID is detectable and the next process reclaims the lock.

The consolidation prompt — four phases

The dream prompt (from consolidationPrompt.ts) divides work into four phases. The agent runs as a forked agent with the same createAutoMemCanUseTool() permissions:

Phase 1

Orient

ls the memory dir, read MEMORY.md, skim topic files to avoid creating duplicates.

Phase 2

Gather recent signal

Check daily logs (KAIROS mode), grep transcripts narrowly for specific context. Transcripts are large JSONL — never read whole files.

Phase 3

Consolidate

Write or update topic files. Merge near-duplicates. Convert relative dates ("yesterday") to absolute dates so memories remain interpretable.

Phase 4

Prune and index

Update MEMORY.md — keep it under 200 lines / 25 KB. Remove stale pointers. Demote verbose index lines into topic files.

Dream progress tracking

Unlike the per-turn extractor, the dream agent registers a DreamTask in the app state, allowing the UI to show live progress. makeDreamProgressWatcher() streams each assistant message from the fork, extracts text blocks for display, and collects file paths from Edit/Write tool calls for the completion summary.

// autoDream.ts — progress watcher
function makeDreamProgressWatcher(taskId, setAppState) {
  return msg => {
    for (const block of msg.message.content) {
      if (block.type === 'text') text += block.text
      if (block.type === 'tool_use') toolUseCount++
      if (EDIT || WRITE) touchedPaths.push(input.file_path)
    }
    addDreamTurn(taskId, { text, toolUseCount }, touchedPaths, setAppState)
  }
}
06 Recall — findRelevantMemories

At query time, Claude Code does not inject all memory files into context — that would balloon the prompt with stale or irrelevant facts. Instead, findRelevantMemories() in memdir/findRelevantMemories.ts uses a Sonnet side-query to select up to five topic files relevant to the current query.

The two-step recall pipeline

// findRelevantMemories.ts
const memories = (await scanMemoryFiles(memoryDir, signal))
  .filter(m => !alreadySurfaced.has(m.filePath))

const selectedFilenames = await selectRelevantMemories(
  query, memories, signal, recentTools
)
// sideQuery → Sonnet, max_tokens: 256, JSON schema output
// Returns: { selected_memories: string[] }

The selector system prompt has a notable exception: if the model is actively using a tool (e.g. mcp__X__spawn is in recentTools), that tool's reference documentation memory is suppressed — the conversation already contains working usage, and surfacing docs is noise. Warnings and gotchas about those tools are still surfaced.

Staleness signals

memdir/memoryAge.ts computes human-readable age strings and injects a staleness caveat when a memory file is more than one day old:

// memoryAge.ts — freshnessText for memories > 1 day old
"This memory is 47 days old. Memories are point-in-time observations, not live state —
claims about code behavior or file:line citations may be outdated.
Verify against current code before asserting as fact."
Drift caveat (from memoryTypes.ts)
The system prompt includes: "A memory that names a specific function, file, or flag is a claim that it existed when the memory was written. It may have been renamed, removed, or never merged. Before recommending it: check the file exists; grep for the function." The section header is "Before recommending from memory" — evaluated better than "Trusting what you recall" because it triggers at the decision point rather than abstractly.
07 Full Memory Lifecycle
flowchart LR subgraph "During conversation" A["User query"] --> B["Main agent responds"] B --> C["handleStopHooks"] C --> D["executeExtractMemories\n(fire-and-forget)"] C --> E["executeAutoDream\n(fire-and-forget)"] end subgraph "Extract Memories fork (maxTurns=5)" D --> F["countModelVisibleMessagesSince\ncursor check"] F --> G["scanMemoryFiles → manifest"] G --> H["runForkedAgent\nbuildExtractAutoOnlyPrompt"] H --> I["Read existing files\n(turn 1, parallel)"] I --> J["Write/Edit topic files\n(turn 2, parallel)"] J --> K["Advance cursor\nappendSystemMessage"] end subgraph "Auto Dream fork (conditional)" E --> L{"Time gate\n≥24h?"} L -->|"no"| M["Return"] L -->|"yes"| N{"Session gate\n≥5 sessions?"} N -->|"no"| M N -->|"yes"| O{"Lock\navailable?"} O -->|"no"| M O -->|"yes"| P["runForkedAgent\nbuildConsolidationPrompt\n4-phase dream"] P --> Q["completeDreamTask\nappendSystemMessage 'Improved N memories'"] end subgraph "Memory directory" K --> R[("~/.claude/projects/slug/memory/\nMEMORY.md + topic files")] Q --> R end subgraph "At query time" S["New user query"] --> T["findRelevantMemories\nscanMemoryFiles → Sonnet selector"] R --> T T --> U["Inject up to 5 topic files\n+ staleness caveat if >1 day"] U --> S end
08 Team Memory (TEAMMEM flag)

When the TEAMMEM feature flag is enabled, the memory system gains a second directory alongside the private one. The combined mode uses buildExtractCombinedPrompt() which adds <scope> tags to each memory type block. The routing guidance is baked into each type definition rather than a separate routing section.

Scope rules per type
  • user — always private. Personal profile should never be shared.
  • feedback — default private; team only when the guidance is a project-wide convention (testing policy, build invariant), not a personal style preference.
  • project — private or team, strongly bias toward team. Context behind shared work belongs in the shared store.
  • reference — usually team. External system pointers are useful to all collaborators.

Sensitive data (API keys, credentials) must never be saved to team memories — the combined extraction prompt includes an explicit prohibition.

09 KAIROS / Assistant Mode

In long-lived assistant sessions (feature flag KAIROS), the memory model shifts from a live index to an append-only daily log. The agent writes to logs/YYYY/MM/YYYY-MM-DD.md as it works, rather than maintaining MEMORY.md directly. A separate nightly /dream skill distills the logs into topic files and updates MEMORY.md.

The log path pattern is stored in the prompt without today's literal date — the prompt is cached by systemPromptSection('memory', ...) and must not be invalidated on midnight rollover. The model derives the current date from a date_change attachment that is appended when midnight rolls, not from the prompt itself.

10 Cache Efficiency

The forked agent pattern is chosen precisely because it shares the parent's prompt cache. The forked agent gets the same system prompt and message history prefix as the main conversation, so those tokens are already cached — the fork only pays for new tokens past the cache boundary.

The extraction code logs cache metrics on every run:

// extractMemories.ts — cache hit logging
const hitPct = ((cache_read_tokens / totalInput) * 100).toFixed(1)
logForDebugging(`[extractMemories] cache: read=${cache_read} create=${cache_create} (${hitPct}% hit)`)

The tool list for the forked agent must match the main agent's tool list for cache sharing to work — the tools are part of the cache key. This is why createAutoMemCanUseTool() uses runtime permission denial rather than providing a different tool list to the fork.

Key Takeaways

  • Extraction runs at the end of every query turn as a forked sub-agent — the fork shares the prompt cache, so it's cheap. The cursor (lastMemoryMessageUuid) ensures only new messages are considered each run.
  • The lock file's mtime is the lastConsolidatedAt timestamp — no separate metadata store. Reading when we last consolidated costs exactly one stat().
  • The four-type taxonomy (user, feedback, project, reference) exists to keep memory stores free of content that is derivable from live code — memories should capture context that cannot be re-derived.
  • Recall is selective: a Sonnet side-query with a 256-token budget picks up to 5 relevant topic files per query, suppressing reference docs for tools currently in active use.
  • The "Before recommending from memory" section header in the system prompt outperformed abstract variants in evals — position and framing at the decision point matters for compliance.
  • Team memory scopes are enforced via prompt guidance, not filesystem permissions — the prompt's <scope> tags are the routing mechanism.

Knowledge Check

1. What technique does the memory extraction system use to ensure the forked agent can share the parent's prompt cache?
Correct! The tool list is part of the prompt cache key. Giving the fork a different tool list would break cache sharing. Instead, createAutoMemCanUseTool() returns a function that denies disallowed operations at runtime.
Not quite. The key insight is that the tool list is part of the cache key — changing it breaks cache sharing. The fork uses the same tools as the parent and enforces limits via runtime permission denial.
2. The .consolidate-lock file stores the last consolidation timestamp. Where exactly?
Correct! The mtime of .consolidate-lock IS lastConsolidatedAt. The body just holds the PID. Reading the last consolidation time costs exactly one stat().
Not quite. The timestamp is stored in the file's mtime — not its body or a separate file. This means reading it costs one stat() call with no parsing required.
3. Why are code patterns, architecture, and git history explicitly excluded from the memory taxonomy?
Correct! The core principle is: save context that cannot be re-derived. Code is always live-readable via grep/git. Memories of it become stale and generate assertions with false authority — citations to functions that were renamed or removed.
Not quite. The core reason is that code state is always re-derivable from the live repo. Stale code memories generate authoritative-sounding but wrong claims. The taxonomy focuses on context that has no other source.
4. When does the per-turn extraction agent skip running and advance the cursor instead?
Correct! hasMemoryWritesSince() scans assistant messages for Edit/Write tool calls targeting an auto-memory path. If found, the fork is skipped and the cursor advances — the main agent and background agent are mutually exclusive per turn.
Not quite. The key check is hasMemoryWritesSince() — if the main agent wrote to a memory file in the new messages, the forked extraction is skipped. This prevents the two paths from double-writing the same turn.
5. The recall system asks Sonnet to select relevant memories. What does it suppress from selection, and why?
Correct! If recentTools contains a tool the model is actively exercising, its reference/API documentation is not surfaced — the live conversation already shows working usage. Warnings and gotchas about those tools are still included.
Not quite. The suppression is for reference/API documentation of tools currently in active use. When you're already using a tool, surfacing its docs is noise — but warnings and gotchas about it are still surfaced.