3.613 min

Integrate Claude Code into CI/CD pipelines

Running Claude Code inside a CI/CD pipeline turns it into an automated reviewer and test generator that runs on every pull request, with no human at the keyboard. The architect's job is to make it non-interactive so the job never hangs, force machine-parseable output so findings can post as inline PR comments, feed project context through CLAUDE.md, and design re-runs so they add only new information instead of spamming duplicate comments. Getting the flags and the context right is the difference between a review bot developers trust and one they mute.

How a Claude Code pull-request review job flows through a CI/CD pipeline, the flags that keep it non-interactive and structured, idempotent re-runs, and the exam distractors that do not exist.

Non-interactive execution with the -p / --print flag

Claude Code is interactive by default: run bare claude "..." and it opens a session that can pause to ask for confirmation or more input. In a CI runner there is no human to answer, so that pause becomes a hang, the job sits waiting on stdin until it times out. This is the single most common CI integration failure.

The fix is the -p flag (long form --print). It runs Claude Code in non-interactive (print) mode: it takes the prompt, does the work, writes the result to stdout, and exits with a status code. No session, no prompt for input, no hang. That exit-and-print behavior is exactly what a pipeline step needs.

claude -p "Analyze this pull request for security issues"

The exam leans hard on distractors here. CLAUDE_HEADLESS=true is not a real environment variable, and --batch is not a real Claude Code flag (batching is a Message Batches API concept from Domain 4, unrelated to the CLI). Redirecting stdin from /dev/null is a Unix workaround that does not properly address Claude Code's non-interactive mode. The one correct answer is -p / --print. You can still feed content in by piping on stdin, for example git diff origin/main | claude -p "Review this diff".

Machine-parseable output with --output-format json and --json-schema

A CI step needs structured output, not prose, so downstream tooling can turn findings into inline PR comments. Two flags give you that:

--output-format json switches stdout from conversational text to a JSON document.
--json-schema <file> constrains the output to a schema you supply, so every run produces the same predictable shape (location, issue, severity, suggested fix) that your posting script can rely on.

git diff origin/main | claude -p "Review this diff. Report bugs and security issues only." \
  --output-format json --json-schema review-findings.schema.json

With a guaranteed shape, the next pipeline step parses the JSON and posts each finding as an inline comment on the exact file and line. Never scrape prose output with regex, it breaks the moment the wording changes. This is the same reliability principle as tool_use with a JSON schema from Domain 4, applied to the CLI: constrain the output shape so automation can trust it.

CLAUDE.md is how a CI run gets project context

A CI-invoked Claude Code run is headless and short-lived, so how does it know your team's testing standards, which fixtures exist, or what counts as a valuable finding? The answer is CLAUDE.md. When Claude Code runs inside the checked-out repo, it loads the project's CLAUDE.md automatically, the same file that guides interactive sessions. That makes CLAUDE.md the mechanism for injecting project context into automated runs.

For a review job, document the review criteria: which categories to flag (bugs, security) and which to skip (minor style), plus the severity definitions. For a test-generation job, document testing standards, what makes a test valuable, and the available fixtures and helpers.

This context measurably improves output quality and, crucially, reduces low-value output. Without it, generated tests trend toward trivial or redundant cases and reviews produce noise. Because CLAUDE.md is committed to the repo, it is shared across the whole team and every CI run, exactly the consistency you want from automated behavior. Tightening CLAUDE.md is often how you cut false positives without touching the pipeline itself.

Session context isolation: an independent instance reviews better than the author

When Claude generates code, it accumulates reasoning about why its solution is correct. Ask that same session to review its own output and it carries that reasoning forward, it is primed to defend its choices rather than question them, so it misses subtle bugs. This is the self-review limitation (Domain 4.6).

The CI pattern that sidesteps it is session context isolation: run the review as a fresh, independent Claude Code invocation that has no memory of how the code was produced. A clean instance sees only the diff and the review criteria, so it evaluates the change on its merits and catches issues the author-instance would rationalize away.

In practice this is natural in CI, the review job is a separate claude -p process from whatever generated the code (which may not have been Claude at all). The takeaway for the exam: prefer an independent review instance over instructing a model to 'double-check its own work,' and do not assume a larger context window or extended thinking substitutes for a genuinely independent reviewer.

Idempotent re-runs: report only new or unaddressed issues

CI reviews re-run on every new commit pushed to a PR. A naive setup reviews the full diff each time and re-posts every finding, so the same comment piles up on each push and buries the new information. Developers stop reading.

The fix is to make the review idempotent across runs: include the prior review findings in the context of the re-run and instruct Claude to report only issues that are new or still unaddressed, suppressing anything already raised and anything the latest commit resolved. You feed the previous findings, which you saved as JSON from the last run, back into the prompt as known state.

The result is that each push adds only the delta, new problems and unresolved ones, instead of a growing wall of duplicate comments. This is a context-management move: the prior findings are state you carry forward so the model can reason about what has changed rather than starting from a blank slate each time.

Duplicate-free test generation: show what already exists

For a test-generation job the analogous trap is regenerating tests that already exist. If Claude only sees the source file, it will happily propose tests for scenarios your suite already covers, adding noise and maintenance burden.

Provide the existing test files in the context of the generation request. With visibility into current coverage, Claude proposes tests for the gaps, untested branches and missing edge cases, rather than duplicating scenarios already handled. Combine this with the testing standards and fixtures documented in CLAUDE.md and the generated tests both match house style and reuse existing fixtures instead of inventing parallel ones.

The pattern generalizes across CI jobs: whether reviewing or generating, giving the model what already exists (prior findings, current tests) is what lets it produce only the incremental, valuable output rather than redundant bulk.

Anti-patterns to avoid

avoid

Running `claude "..."` in the pipeline without -p, or reaching for CLAUDE_HEADLESS / --batch to make it non-interactive.

Why it fails: Bare invocation opens an interactive session that waits for input, so the job hangs until it times out. CLAUDE_HEADLESS is not a real env var and --batch is not a real CLI flag, so those 'fixes' do nothing.

instead Use the documented -p (or --print) flag, which runs the prompt, prints to stdout, and exits without waiting for input.

avoid

Letting Claude return prose and scraping it with regex or string matching to extract findings.

Why it fails: Free-form output has no stable shape; the parser breaks the moment Claude phrases something differently, and you cannot reliably map findings to files and lines.

instead Use --output-format json with --json-schema to guarantee a schema-valid shape (file, line, severity, issue, suggestedFix) that a posting script can consume deterministically.

avoid

Re-reviewing the full diff on every commit and re-posting all findings.

Why it fails: The same comments accumulate on each push, spamming the PR and burying genuinely new issues so developers stop reading the bot.

instead Persist prior findings and feed them into the re-run, instructing Claude to report only new or still-unaddressed issues so each push adds just the delta.

avoid

Having the same session that generated the code review its own changes (or telling one model to 'double-check its work').

Why it fails: The generating session retains its reasoning and is primed to justify its choices, so it is less likely to catch its own subtle bugs.

instead Run the review as an independent Claude Code instance (session context isolation) that sees only the diff and criteria, with no memory of how the code was produced.

Worked example: Wiring an automated PR review job into GitHub Actions (Scenario 5)

Your CI/CD pipeline should run an automated Claude Code review on every pull request and post findings as inline comments. Here is how the pieces from this lesson fit together.

1. The pipeline hangs. Your first attempt runs claude "Analyze this PR for security issues" and the job hangs forever, logs show it waiting for interactive input. Root cause: interactive mode with no human present. Fix: add -p.

2. Non-interactive, structured, schema-constrained. You rewrite the step to be headless and machine-parseable:

# .github/workflows/review.yml (step excerpt)
- name: Claude review
  run: |
    git diff origin/${{ github.base_ref }}...HEAD > diff.txt
    claude -p "Review this diff. Report only bugs and security issues per our criteria." \
      --output-format json --json-schema .github/review-findings.schema.json < diff.txt > findings.json
- name: Post inline comments
  run: node scripts/post-comments.js findings.json

The schema pins each finding to { file, line, severity, issue, suggestedFix }, so post-comments.js posts each one on the right line without parsing prose.

3. Context via CLAUDE.md. The review 'criteria' referenced in the prompt live in the repo's CLAUDE.md, which the CI run loads automatically: which categories to flag, which to skip, and the severity definitions. Tightening that file is how you cut false positives without editing the workflow.

4. Independent reviewer. This claude -p process is a fresh instance with no memory of how the code was written, so it reviews the diff on its merits, more effective than asking the authoring session to check its own work.

5. Re-runs without spam. On the next push the workflow re-runs. To avoid re-posting, you persist findings.json (as a workflow artifact) and feed the prior findings into the re-run prompt: 'Here are previously reported findings; report only new or still-unaddressed issues.' Now each commit adds only the delta.

6. The test-generation sibling job. A separate nightly job generates tests. It passes the existing *.test.ts files plus the fixtures documented in CLAUDE.md into context, so Claude targets uncovered branches instead of duplicating existing scenarios or inventing new fixtures.

Note what you did not do: no CLAUDE_HEADLESS env var and no --batch flag, neither exists. -p plus --output-format json / --json-schema is the whole non-interactive story.

Exam tips

✓The -p (or --print) flag is the documented way to run Claude Code non-interactively in CI; it prints to stdout and exits. CLAUDE_HEADLESS is not a real env var, --batch is not a real CLI flag, and `< /dev/null` is a workaround, not the correct mechanism.
✓Use --output-format json together with --json-schema to produce machine-parseable, schema-valid findings that a script can post as inline PR comments; never regex-scrape prose output.
✓CLAUDE.md is the mechanism for giving a CI-invoked run its project context (review criteria, testing standards, valuable-test definitions, available fixtures); it also reduces low-value output.
✓Use an independent review instance, not the session that generated the code, because self-review retains the generator's reasoning and misses its own bugs (session context isolation).
✓On re-runs after new commits, pass the prior findings into context and instruct Claude to report only new or still-unaddressed issues to avoid duplicate comments.
✓For test generation, provide the existing test files so Claude targets coverage gaps instead of proposing scenarios the suite already covers.

Official exam objectives for 3.6

Knowledge of

The -p (or --print) flag for running Claude Code in non-interactive mode in automated pipelines
--output-format json and --json-schema CLI flags for enforcing structured output in CI contexts
CLAUDE.md as the mechanism for providing project context (testing standards, fixture conventions, review criteria) to CI-invoked Claude Code
Session context isolation: why the same Claude session that generated code is less effective at reviewing its own changes compared to an independent review instance

Skills in

Running Claude Code in CI with the -p flag to prevent interactive input hangs
Using --output-format json with --json-schema to produce machine-parseable structured findings for automated posting as inline PR comments
Including prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues to avoid duplicate comments
Providing existing test files in context so test generation avoids suggesting duplicate scenarios already covered by the test suite
Documenting testing standards, valuable test criteria, and available fixtures in CLAUDE.md to improve test generation quality and reduce low-value test output

Flashcards from this lesson

Which flag runs Claude Code non-interactively in a CI pipeline, and what does it do?

-p (or --print). It processes the prompt, writes the result to stdout, and exits without waiting for interactive input, so the job does not hang.

Name two 'features' the exam uses as distractors for running Claude Code in CI that do not actually exist.

A CLAUDE_HEADLESS=true environment variable and a --batch CLI flag. Neither is real; the correct mechanism is the -p / --print flag.

How do you get machine-parseable, reliably-shaped findings out of a CI review run?

Use --output-format json to emit JSON instead of prose, and --json-schema to constrain that JSON to a fixed shape a posting script can parse deterministically.

How does a headless CI-invoked Claude Code run learn your team's review criteria, test standards, and fixtures?

From the repo's CLAUDE.md, which it loads automatically when run inside the checked-out project. It also reduces low-value output.

Why prefer an independent review instance over having the code's author-session review it?

Session context isolation: the generating session retains its reasoning and defends its choices, missing subtle bugs. A fresh instance evaluates the diff on its merits.

How do you keep re-runs of a PR review from posting duplicate comments on every commit?

Include the prior findings in the re-run's context and instruct Claude to report only new or still-unaddressed issues, so each push adds only the delta.

How do you stop CI test generation from proposing tests you already have?

Provide the existing test files in context so Claude targets uncovered branches and missing edge cases instead of duplicating covered scenarios.

Study all flashcards with spaced repetition

Mark this lesson complete when you are confident.

← Previous

3.5 Apply iterative refinement techniques for progressive improvement

4.1 Design prompts with explicit criteria to improve precision and reduce false positives