FAQ

Questions, answered

Logistics and the concepts that most often trip candidates up.

Logistics

How is the exam scored, and what score do I need to pass?

Your result is reported as a scaled score between 100 and 1,000. The minimum passing score is 720. The scale is not a raw percentage, so 720 does not mean you answered 72 percent of questions correctly. Scaled scoring lets Anthropic account for slight difficulty differences between exam forms, so treat 720 as the bar to clear rather than a fixed number of questions.

How many scenarios appear on my exam, and how many exist in total?

The exam draws from six defined scenarios, but any single sitting presents only four of them. You will not know in advance which four you get, so you should prepare across all six: Customer Support Resolution Agent, Code Generation with Claude Code, Multi-Agent Research System, Developer Productivity with Claude, Claude Code for CI, and Structured Data Extraction.

What is the question format?

Every question is multiple choice with exactly one correct response and three distractors. There are no multiple-select, fill-in-the-blank, or drag-and-drop items. The distractors are usually plausible-sounding but wrong for a specific reason, such as relying on probabilistic compliance where deterministic enforcement is required, or over-engineering a solution when a simpler fix addresses the root cause.

Is there a penalty for guessing?

No. There is no penalty for a wrong answer, and unanswered questions are scored as incorrect anyway. That means you should answer every question, even ones you are unsure about. Leaving a question blank can only hurt you, while a guess gives you a chance at credit at no additional cost.

What version of the exam is this, and does that matter?

This is version 0.1 of the Claude Certified Architect, Foundations exam, last updated Feb 10 2025. Being an early version means the blueprint, scenarios, and sample questions reflect current products (Claude API, Agent SDK, Claude Code, MCP) as of that date. Study the concepts as they are described in the blueprint rather than relying on older blog posts or features that may have changed.

Who is this exam for, and what background is assumed?

It targets a solution architect who designs and implements production applications with Claude and has roughly six or more months of hands-on experience across the Claude API, the Agent SDK, Claude Code, and MCP. The questions assume you have actually built things, so they test judgment about which pattern fits a situation rather than rote recall of definitions.

How is the exam weighted across domains?

Five domains carry different weights: Agentic Architecture and Orchestration (27 percent), Claude Code Configuration and Workflows (20 percent), Prompt Engineering and Structured Output (20 percent), Tool Design and MCP Integration (18 percent), and Context Management and Reliability (15 percent). Agentic architecture is the heaviest single area, so budget your study time accordingly.

Concepts

When should I enforce a rule with a hook versus writing it as a prompt instruction?

Use a hook (or a programmatic prerequisite gate) whenever compliance must be guaranteed, such as verifying a customer identity before a refund or blocking refunds above a threshold. Prompt instructions like 'always verify first' rely on the model choosing to comply, which has a non-zero failure rate. A hook intercepts the tool call deterministically and cannot be talked around. Prompts are fine for guidance and preferences where an occasional miss is acceptable, but they are the wrong tool when errors have financial, security, or legal consequences.

How do I decide between plan mode and direct execution?

Choose plan mode for complex work: large-scale changes, multiple valid approaches, architectural decisions, or edits spanning many files (for example a monolith-to-microservices split or a library migration touching 45-plus files). Plan mode lets you explore and design safely before committing, avoiding costly rework. Choose direct execution for simple, well-scoped changes like a single-file bug fix with a clear stack trace or adding one validation check. A common exam trap is starting in direct execution and switching to plan mode only if complexity emerges; if the requirements already state the change is large and architectural, plan mode is the right starting point.

What is the difference between tool_choice values auto, any, and forced?

With 'auto' the model decides whether to call a tool or reply with plain text. With 'any' the model must call some tool but picks which one, which is useful when you need structured output and several extraction schemas exist for an unknown document type. Forced selection, {"type": "tool", "name": "..."}, requires a specific named tool, which is how you guarantee something like extract_metadata runs first before enrichment steps. Remember that 'auto' can return conversational text, so it does not guarantee structured output on its own.

When will a retry fix an extraction problem, and when is it pointless?

Retries work for format and structural errors: malformed JSON, values placed in the wrong field, or figures that do not sum. Appending the specific validation error to a follow-up request, along with the original document and the failed output, guides the model to self-correct. Retries do not help when the required information is simply absent from the source, for example a field that only exists in an external document you never provided. In that case the model either keeps returning null or starts fabricating values, so the right move is a nullable schema field or supplying the missing source, not another retry.

When should I use the Message Batches API instead of the synchronous API?

The Message Batches API gives roughly 50 percent cost savings but has a processing window of up to 24 hours with no guaranteed latency SLA. Use it for non-blocking, latency-tolerant work like overnight reports, weekly audits, or nightly test generation. Do not use it for blocking workflows such as pre-merge checks where a developer is waiting on the result. Also note the batch API does not support multi-turn tool calling within a single request, and you correlate requests to responses using custom_id fields. A frequent trap answer proposes batching a blocking pre-merge check because it is 'often faster', which is not acceptable when someone is blocked.

Why is an independent review instance better than asking the model to review its own work?

A model that just generated code retains its reasoning context from generation, which makes it less likely to question its own decisions in the same session. Self-review instructions and even extended thinking do not fully overcome this bias. A second, independent Claude instance that lacks the generator's reasoning context approaches the code fresh and is more effective at catching subtle issues. This is why review architectures separate generation from review rather than folding both into one session.

When should I use .claude/rules/ with glob patterns instead of a subdirectory CLAUDE.md?

Use .claude/rules/ files with YAML frontmatter path globs when a convention applies to files identified by type or pattern regardless of location, for example test files spread throughout the codebase (paths: ["**/*.test.tsx"]). The rule loads only when a matching file is edited, which also saves context and tokens. A subdirectory CLAUDE.md is directory-bound, so it cannot cleanly cover files scattered across many folders. Reach for directory-level CLAUDE.md only when a convention genuinely maps to one directory tree.

What is the difference between project scope and user scope for commands and MCP servers?

Project scope means the configuration lives in the repository and is shared with the whole team through version control: .claude/commands/ for slash commands and .mcp.json for MCP servers, often with environment-variable expansion like ${GITHUB_TOKEN} so no secrets are committed. User scope means the configuration is personal and not shared: ~/.claude/commands/ for your own commands and ~/.claude.json for personal or experimental MCP servers, plus ~/.claude/CLAUDE.md for personal instructions. A classic diagnostic question describes a new teammate not receiving instructions; the cause is that they were placed in user-level config instead of project-level config.

How do I choose between a skill and CLAUDE.md?

CLAUDE.md holds always-loaded, universal standards that should apply to every interaction, such as coding conventions and testing rules. A skill in .claude/skills/ is invoked on demand for a specific task-focused workflow, and its SKILL.md frontmatter supports options like context: fork (run in an isolated sub-agent so verbose output does not pollute the main conversation), allowed-tools (restrict what the skill can do), and argument-hint. If you need something applied automatically and deterministically based on file paths, that is a job for CLAUDE.md or .claude/rules/, not a skill, because skills depend on being invoked or chosen.

How does context passing to subagents actually work?

Subagents do not automatically inherit the coordinator's conversation history or share memory between invocations; each runs with isolated context. You must explicitly include everything a subagent needs directly in its prompt, for example passing prior web search results and document analysis outputs into the synthesis subagent's prompt. Use structured data formats to keep content separate from metadata (source URLs, document names, page numbers) so attribution survives the handoff. Assuming automatic inheritance is a common wrong mental model that the exam tests directly.

How does the agentic loop know when to stop?

Loop control is driven by stop_reason, not by natural language. You continue the loop and execute tools while stop_reason is 'tool_use', and you terminate when it is 'end_turn'. Tool results are appended to the conversation history so the model can reason about the next step. Anti-patterns the exam flags include parsing the assistant's text for phrases like 'done' to decide termination, using an arbitrary iteration cap as the primary stopping mechanism, or treating any assistant text as a completion signal.

How do I choose between session resumption and forking?

Use --resume with a named session to continue a specific prior conversation when the earlier context is mostly still valid, and tell the agent about any files that changed so it can re-analyze them in a targeted way. Use fork_session to create an independent branch from a shared analysis baseline when you want to explore divergent approaches in parallel, such as comparing two refactoring strategies. When prior tool results are stale, starting fresh with an injected structured summary is more reliable than resuming.

Why do too many tools hurt an agent, and how should I scope them?

Giving an agent access to many tools (for example 18 instead of 4 or 5) increases decision complexity and degrades tool-selection reliability, and agents tend to misuse tools outside their specialization, such as a synthesis agent attempting web searches. Scope each agent to just the tools its role needs. For a high-frequency cross-role need, add a narrow scoped tool (like verify_fact for the synthesis agent) and route complex cases back through the coordinator, rather than handing the agent the full toolset.

Study Strategy

How should I allocate my study time across the blueprint?

Weight your prep to the domain percentages: spend the most on agentic architecture and orchestration (27 percent), then Claude Code configuration and prompt engineering (20 percent each), then MCP and tool design (18 percent), then context management and reliability (15 percent). Within each domain, focus on the decision-making patterns, when to use hooks versus prompts, plan mode versus direct execution, batch versus synchronous, because the questions reward choosing the proportionate, root-cause fix over over-engineered or probabilistic alternatives.

What hands-on practice best prepares me for this exam?

Build the four prep exercises in the blueprint: a multi-tool agent with escalation logic, a Claude Code team workflow (CLAUDE.md hierarchy, .claude/rules/ globs, forked skills, MCP servers), a structured data extraction pipeline (tool_use schemas, validation-retry loops, batch processing), and a multi-agent research pipeline with explicit context passing and provenance tracking. Doing these makes the scenario questions feel familiar because they mirror the same situations.

How should I read and eliminate answer choices during the exam?

Identify the root cause the question describes, then match the option that addresses it most directly and proportionately. Eliminate answers that rely on probabilistic LLM compliance when the situation demands deterministic enforcement, answers that over-engineer with classifiers or ML infrastructure before simpler prompt or configuration fixes have been tried, answers that solve a different problem than the one stated (sentiment analysis for a complexity issue, for example), and answers that reference features that do not exist. The 'first step' or 'most effective' phrasing usually points to the highest-leverage, lowest-effort correct fix.

Should I take the practice exam, and what is out of scope?

Yes, complete the practice exam before sitting the real one; it uses the same scenarios and format and shows explanations after each answer to reinforce reasoning. Also study the out-of-scope list so you do not waste time: fine-tuning and model training, API authentication and billing, MCP server hosting and infrastructure, Claude's internal architecture, RLHF and Constitutional AI, embeddings and vector databases, computer use, vision, streaming implementation, rate limits and pricing math, and tokenization internals are all excluded.