5.413 min

Manage context effectively in large codebase exploration

Exploring a large or unfamiliar codebase floods the context window with verbose Read and Grep output, and over a long session the model starts giving inconsistent answers and reasoning from generic 'typical patterns' instead of the specific classes it actually discovered. The reliable fix is to stop treating the context window as memory: externalize findings to scratchpad files, delegate verbose investigation to subagents that return only distilled summaries, summarize between phases, and compact history when it fills. For long-running multi-agent explorations you also design crash recovery by exporting structured agent state to disk and reloading it from a manifest on resume.

Externalized-memory architecture: the main agent stays lean by delegating verbose work to subagents and treating the scratchpad file and state manifest on disk as authoritative memory.

How context degrades in long exploration sessions

A single Claude session has a finite context window, and code exploration is unusually good at filling it: every Grep hit, every full-file Read, and every directory listing lands in the transcript and stays there. As that history grows, two failure modes appear. First is the 'lost in the middle' effect, where the model reliably attends to the start and end of a very long input but skips details buried in the middle. Second, and more diagnostic for this task, is context degradation: in an extended session the model starts giving inconsistent answers to the same question and begins referencing 'typical patterns' or how code 'usually' looks, rather than citing the specific classes, files, and functions it discovered earlier in the same session.

That shift from concrete to generic is the exam's tell-tale signature. If an agent that earlier named RefundService.process() starts saying things like 'a typical refund handler would validate the order first,' it has effectively lost the earlier discovery even though the tokens are technically still in the window. Recognizing this symptom is what tells you the problem is context management, not a bad prompt or a weak model.

The critical corollary is that a bigger context window does not fix this. Moving to a higher-tier model or a larger window buys headroom but does not restore attention quality over a long, noise-heavy transcript. The durable solutions all share one idea: keep the working context lean and move authoritative findings out of the transcript and onto disk or into fresh, well-seeded contexts.

Scratchpad files: treat disk as the agent's memory

A scratchpad file is a plain file (for example NOTES.md or findings/refund-flow.md) where the agent records the key findings it must not lose: which class owns the refund flow, where the auth check lives, the exact file and line of an entry point. Because the file lives outside the conversation, it survives context boundaries: even after history is compacted, summarized, or a new session starts, the durable conclusions are one Read away.

The pattern has two halves that both matter. The agent writes findings as it discovers them, and then it references the scratchpad when answering later questions instead of trusting recollection. This is exactly what counteracts context degradation: rather than asking 'what did we decide about the refund flow?' and getting a vague 'typical pattern' answer, the agent re-reads its own recorded fact. In Claude Code you enable this by instructing the agent (often in CLAUDE.md or the task prompt) to maintain and consult a scratchpad throughout the investigation.

# NOTES.md  (exploration scratchpad)
## Refund flow
- Entry: api/refunds.py -> RefundController.create()
- Core logic: services/refund_service.py RefundService.process()
- Auth gate: middleware/verify_customer.py (runs BEFORE controller)
- Gotcha: async client added 2024-Q3; old sync path still in tests

The scratchpad is authoritative state, not scratch notes to be discarded. Treating it as the source of truth, and re-reading it, is the whole point.

Subagent delegation to isolate verbose output

The single most effective structural technique is to delegate a specific, self-contained investigation to a subagent so the verbose output never touches the main agent's context. When the main agent needs to answer 'find all test files' or 'trace every dependency of the refund flow,' it spawns a subagent (via the Task tool, or Claude Code's Explore subagent) that does the noisy work: dozens of Grep and Read calls that would otherwise bloat the main transcript. The subagent burns its own isolated context on that work and returns only a distilled summary. The main agent stays lean and preserves its role as the high-level coordinator that assembles overall understanding.

This is the same hub-and-spoke idea as multi-agent orchestration (tasks 1.2 and 1.3), applied specifically to protect context during exploration. The Explore subagent is purpose-built for this: it isolates the verbose discovery phase and hands back a summary rather than a raw transcript. Remember that subagents do not inherit the parent's context automatically, so you must pass the specific question and any needed prior findings explicitly in the subagent's prompt.

Main agent (lean, coordinating)
  -> Task: "Find all test files and report their directories and
            the modules they cover. Return a summary table only."
  <- summary (10 lines) instead of 40 file reads

The payoff is that the main agent's window fills with distilled conclusions, not raw file contents, so it can coordinate a large exploration far longer before degrading.

Phase-based exploration with summary injection

Large explorations are naturally multi-phase: map the structure, then trace a specific flow, then plan changes. If you let one long session run through all phases, the early phases' verbose output crowds out the later phases. The disciplined pattern is to summarize the key findings from each phase before starting the next, and inject that summary into the initial context of the next phase (or the next batch of subagents).

Concretely: after a 'map the repository' phase, distill it to a short structural summary ('monorepo, three services, refund logic in service B, tests colocated'), then start the 'trace the refund flow' phase seeded with just that summary rather than the full discovery transcript. Each phase begins with a clean, high-signal context instead of dragging forward the noise of everything that came before. This pairs naturally with scratchpad files: the phase summary is often written to and read from the scratchpad.

This is closely related to the session-management guidance in task 1.7. When prior tool results are stale or simply voluminous, starting a fresh, summary-seeded context is more reliable than resuming and replaying raw output. Here the same logic applies within one investigation, phase to phase, not just across days.

Using /compact to reclaim context mid-session

Claude Code exposes the /compact command for exactly the situation where an exploration session has filled with verbose discovery output. /compact replaces the accumulated conversation history with a model-generated summary of it, freeing context while preserving the thread of what has been learned and decided. You reach for it when you notice the context filling up (or the early signs of degradation) but you still want to keep going in the same session.

Distinguish /compact from /clear. /clear wipes the conversation entirely and starts clean with no memory of the session, which is right when you are switching to an unrelated task. /compact keeps a summarized carryover, which is right mid-investigation when the details are verbose but the conclusions still matter. Compaction is lossy, so it is a complement to scratchpad files, not a replacement: anything that absolutely must survive belongs in a file, because a summary may drop a specific line number or class name.

A healthy long-session loop looks like: explore, record durable findings to the scratchpad, and when the transcript is heavy with raw output, /compact to shrink it, relying on the scratchpad for anything the summary might have blurred.

Structured state persistence and crash recovery with manifests

Long-running multi-agent explorations need to survive interruptions (a crash, a timeout, an intentional pause) without redoing hours of investigation. The pattern is structured state persistence: each agent exports its state (findings, progress, open questions) to a known location on disk, and the coordinator writes or reads a manifest that indexes those exports. On resume, the coordinator loads the manifest and injects the relevant state back into each agent's prompt, so work continues from the last durable checkpoint rather than from scratch.

The manifest is the recovery contract. It records where each agent's state lives and what phase the work reached, so a restarted coordinator can reconstruct the system deterministically instead of hoping the SDK preserved anything. It does not, by default: agent state is not automatically durable across a crash, and resuming a session does not rebuild multi-agent state for you. You have to design the export-and-reload flow explicitly.

// manifest.json  (loaded by the coordinator on resume)
{
  "phase": "trace-dependencies",
  "agents": {
    "structure-map": "state/structure.json  (complete)",
    "refund-trace":  "state/refund.json     (in-progress)"
  }
}

This extends the scratchpad idea from a single agent's notes to a whole coordinated system's recoverable state: everything important lives at a known path, and the coordinator rehydrates from it on resume.

Anti-patterns to avoid

avoid

Pushing through one very long exploration session and trusting the model to keep recalling the specific classes and files it found earlier.

Why it fails: Extended sessions degrade: the model starts giving inconsistent answers and reasoning from 'typical patterns' rather than the concrete entities it discovered, so its later answers quietly stop reflecting the actual codebase.

instead Externalize findings to a scratchpad file and re-read it for later questions, delegate verbose investigation to subagents, summarize between phases, and /compact when the transcript fills.

avoid

Reading the entire codebase into the main agent's context up front so it 'has everything.'

Why it fails: This exhausts the context window immediately and triggers lost-in-the-middle effects, so attention on any given detail is worse, not better, and there is no room left for reasoning.

instead Explore incrementally (Grep to find entry points, Read to follow only relevant threads), and push bulk discovery into subagents that return distilled summaries to the coordinator.

avoid

Upgrading to a higher-tier model or larger context window to fix inconsistent, generic answers in a long session.

Why it fails: Degradation is an attention-quality problem over a long noisy transcript, not a capacity problem; a bigger window adds headroom but does not restore reliable recall of earlier specifics.

instead Keep the working context lean by design: scratchpad files as authoritative memory, subagent isolation of verbose output, phase summaries, and /compact.

avoid

Assuming a multi-agent exploration will survive a crash because the SDK or session resumption preserves agent state automatically.

Why it fails: Agent state is not durable across crashes by default and resuming does not reconstruct multi-agent state, so an interruption silently loses hours of investigation.

instead Design explicit crash recovery: each agent exports structured state to a known location and the coordinator loads a manifest on resume, injecting the saved state back into agent prompts.

Worked example: Understanding a legacy monolith to plan test coverage (Developer Productivity scenario)

You are building developer-productivity tooling on the Agent SDK, and an engineer points it at a large, unfamiliar legacy service to understand the refund flow well enough to add tests. This is a long, multi-phase exploration, exactly where context management decides success.

Phase 1, map the structure, isolated in a subagent. Rather than reading dozens of files into the main context, the coordinator spawns an Explore subagent: 'Map this repository. Report the services, where refund logic lives, and where tests are located. Return a summary, not file contents.' The subagent runs the noisy Grep and Read work in its own context and hands back ten lines. The coordinator writes those to a scratchpad:

# NOTES.md
## Structure
- Monorepo, 3 services; refunds in services/billing
- Entry: api/refunds.py -> RefundController.create()
- Core: services/refund_service.py RefundService.process()
- Tests colocated: services/billing/**/*_test.py

Phase 2, trace dependencies, seeded with the summary. For the next phase the coordinator does not replay Phase 1's transcript. It injects only the structural summary and spawns two parallel subagents with specific questions: 'find all test files covering billing' and 'trace every dependency of RefundService.process().' Each returns a distilled summary, which the coordinator appends to NOTES.md.

Catching degradation. Deep into the session, the engineer asks 'where is the auth check again?' and the agent replies 'a refund handler typically verifies the customer first.' That generic 'typically' is the degradation signature. The coordinator re-reads NOTES.md (which records middleware/verify_customer.py) and answers concretely, then runs /compact to shrink the now-bloated transcript while keeping the scratchpad as the source of truth.

Crash recovery. Because this exploration spans hours, each subagent exports its state to state/ and the coordinator maintains manifest.json recording each agent's file and phase. When the machine is rebooted mid-run, the resumed coordinator loads the manifest, sees refund-trace was in progress, injects that saved state into a fresh subagent prompt, and continues, instead of restarting the whole investigation.

The throughline: the main agent's context stays small and high-level, while the authoritative findings live on disk in the scratchpad and the state exports.

Exam tips

✓The signature of context degradation is an agent answering with generic 'typical patterns' instead of the specific classes and files it discovered earlier in the same session; the fix is context management, not a better prompt.
✓A larger context window or higher-tier model does NOT fix degradation; it is an attention-quality problem, so you must keep the working context lean and externalize findings.
✓Scratchpad files (e.g. NOTES.md) are the agent's durable memory: it writes key findings as it goes and re-reads them for later questions, surviving context boundaries and /compact.
✓Delegate verbose investigations ('find all test files', 'trace refund flow dependencies') to subagents (Task tool / Explore subagent) so raw output stays in their isolated context and only a distilled summary returns to the coordinator.
✓Summarize each exploration phase and inject the summary into the next phase's initial context rather than dragging forward the full discovery transcript.
✓/compact compresses the current session's history into a summary to reclaim context mid-investigation (unlike /clear, which wipes it); design crash recovery with per-agent state exports plus a manifest the coordinator loads on resume.

Official exam objectives for 5.4

Knowledge of

Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier
The role of scratchpad files for persisting key findings across context boundaries
Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding
Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume

Skills in

Spawning subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while the main agent preserves high-level coordination
Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation
Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context
Designing crash recovery using structured agent state exports (manifests) that the coordinator loads on resume and injects into agent prompts
Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output

Flashcards from this lesson

What is the tell-tale symptom of context degradation in a long exploration session?

The model gives inconsistent answers and reasons from generic 'typical patterns' instead of citing the specific classes and files it discovered earlier in the same session.

Does moving to a larger context window or higher-tier model fix context degradation?

No. It is an attention-quality problem over a long noisy transcript, so you must keep context lean and externalize findings (scratchpad, subagents, summaries, /compact).

How do scratchpad files counteract degradation?

The agent records key findings to a durable file as it goes and re-reads that file for later questions, so authoritative facts survive context boundaries and /compact instead of relying on recollection.

What does /compact do, and how does it differ from /clear?

/compact compresses the current session history into a summary to reclaim context while keeping the thread; /clear wipes the conversation entirely. /compact is lossy, so durable facts still belong in a scratchpad.

How do you design crash recovery for a long multi-agent exploration?

Each agent exports structured state to a known location and the coordinator maintains a manifest; on resume the coordinator loads the manifest and injects the saved state back into agent prompts. State is not durable automatically.

What should you do between exploration phases?

Summarize the key findings from the phase and inject that summary into the next phase's initial context, rather than dragging the full discovery transcript forward.

Why delegate 'trace refund flow dependencies' to a subagent?

The subagent absorbs the verbose Read/Grep output in its own isolated context and returns only a distilled summary, keeping the main agent's context lean for high-level coordination.

Study all flashcards with spaced repetition

Mark this lesson complete when you are confident.

← Previous

5.3 Implement error propagation strategies across multi-agent systems

5.5 Design human review workflows and confidence calibration