Context Foundry / CCA Alignment Matrix

Mapped against the Claude Certified Architect -- Foundations Exam Guide (Anthropic, 2025)

Implemented

Partial

Opportunity

N/A

Not Used

Domain 1: Agentic Architecture & Orchestration 27%

Task	Principle	Status	Evidence
1.1	Agentic loop lifecycle (stop_reason, tool_use vs end_turn)	Implemented	`agent.rs:38` -- parses stream-json events, loops on tool_use
1.1	Model-driven decision-making vs pre-configured decision trees	Implemented	Agents choose their own tools; foundry provides role prompts, not tool sequences
1.1	Avoid anti-patterns: NL parsing for termination, arbitrary iteration caps	Implemented	Uses stream-json structured events. Timeout is a safety net, not a stop mechanism
1.2	Hub-and-spoke coordinator with isolated subagent context	Implemented	`app/build.rs` orchestrates agents as isolated CLI invocations with no shared history
1.2	Coordinator handles decomposition, delegation, result aggregation	Implemented	TASKS.md for decomposition, stage agents for delegation, .buildloop/ for aggregation
1.3	Subagent context must be explicitly provided, not inherited	Implemented	Each agent is a fresh CLI process. Artifacts passed as file references
1.3	AgentDefinition with descriptions, prompts, tool restrictions	Implemented	`prompts.rs` per-role prompts; `agent.rs:562` allowed_tools per invocation
1.3	Fork-based session management	N/A	Foundry spawns fresh processes, not Claude Code sessions
1.4	Programmatic enforcement (hooks, prerequisite gates)	Implemented	`gate_builder` and `gate_reviewer` in `build.rs:73,96`. Extension gate at `build.rs:1513`
1.4	Deterministic compliance for critical operations	Implemented	Gates are code, not prompts. Plan must have File Operations + Verification sections
1.4	Structured handoff protocols between stages	Implemented	Each stage writes structured .buildloop/ artifacts consumed by downstream stages
1.5	Agent SDK hooks (PostToolUse) for tool call interception	N/A	Uses Claude Code CLI, not Agent SDK
1.5	Hooks for deterministic guarantees vs prompt-based compliance	Partial	Gates enforce stage ordering; no tool-call-level interception within a session
1.6	Fixed sequential pipelines vs dynamic adaptive decomposition	Implemented	Fixed pipeline with adaptive elements (complexity-based skip, retry-with-feedback)
1.6	Prompt chaining for multi-step workflows	Implemented	Stages chain via file artifacts. Reviewer findings chain into fixer
1.6	Adaptive investigation plans based on discoveries	Partial	Discovery agent adapts task generation, but agents don't spawn sub-investigations
1.7	Named session resumption (--resume)	N/A	Agents are stateless one-shot invocations. State lives in .buildloop/ files
1.7	fork_session for parallel exploration	N/A	Not applicable -- foundry doesn't use Claude Code sessions
1.7	Crash recovery via structured state persistence	Implemented	.buildloop/ artifacts + TASKS.md SPID progress survive crashes

Domain 2: Tool Design & MCP Integration 18%

Task	Principle	Status	Evidence
2.1	Clear tool descriptions with input formats and boundaries	N/A	Agents use Claude Code's built-in tools, not custom MCP tools
2.2	Structured error responses (isError, errorCategory, isRetryable)	N/A	MCP tools exist for external use, not internal agent orchestration
2.3	Scoped tool access per agent role	Implemented	`agent.rs:562` allowed_tools. Skills restrict to 3-4 tools each
2.3	Too many tools degrades selection reliability	Implemented	Reviewer is read-only. Skills restrict to role-appropriate tools
2.4	MCP server scoping (project vs user level)	N/A	Foundry doesn't configure MCP servers for its agents
2.4	MCP resources as content catalogs	Implemented	`mcp.rs:125` pattern catalog + `mcp.rs:131` extension index as browsable MCP resources via `foundry://` URIs
2.5	Effective use of built-in tools (Read, Write, Edit, Bash, Grep, Glob)	Implemented	Agent prompts guide tool selection per role

Domain 3: Claude Code Configuration & Workflows 20%

Task	Principle	Status	Evidence
3.1	CLAUDE.md hierarchy (user > project > directory)	Implemented	Agents inherit CLAUDE.md via normal loading. Foundry appends orchestration override
3.1	.claude/rules/ for path-scoped conventions	Implemented	6 rule files with paths: frontmatter scoping
3.1	@import for modular CLAUDE.md	Not Used	Rules already split into .claude/rules/. Could be useful for extensions
3.2	Custom slash commands in .claude/commands/	Not Used	Skills in .claude/skills/ serve this purpose instead
3.2	Skills with SKILL.md, context: fork, allowed-tools	Implemented	3 skills (audit, scout, extract-patterns) with fork context and scoped tools
3.3	Path-specific rules with YAML frontmatter	Implemented	All 6 rule files use paths: frontmatter for conditional loading
3.4	Plan mode vs direct execution	Implemented	Planner stage IS plan mode. Complexity classifier can skip for simple tasks
3.5	Iterative refinement with concrete I/O examples	Implemented	Few-shot severity examples in reviewer. JSON template in pattern extractor
3.5	Test-driven iteration	Implemented	Builder runs tests. Reviewer re-runs independently. Fixer iterates on failures
3.6	CI/CD integration (--output-format json)	Implemented	--output-format json on foundry run --no-tui. SessionReport with tasks/session/config. Schema: docs/ci-output-schema.json
3.6	Session context isolation -- fresh reviewer	Implemented	Core design principle. Verify agent is a completely separate CLI invocation

Domain 4: Prompt Engineering & Structured Output 20%

Task	Principle	Status	Evidence
4.1	Explicit criteria over vague instructions	Implemented	Reviewer defines severity criteria with examples. "What to report" and "what to skip" lists
4.1	Explicit criteria reduce false positives	Implemented	Categorical criteria, not confidence-based filtering
4.2	Few-shot examples for output consistency	Implemented	Reviewer severity examples. Pattern extractor JSON template. Build-claims format
4.2	Few-shot for ambiguous-case handling	Implemented	`prompts.rs:572-602` three borderline severity examples: unchecked file read = HIGH, test-only return value = LOW, unwrap on constant = SKIP
4.3	Structured output via tool_use with JSON schemas	N/A	Uses Claude Code CLI, not the API
4.4	Retry-with-error-feedback	Implemented	Gate failure triggers planner retry with validation error appended. Agent timeout retry
4.4	Feedback loops -- tracking which patterns trigger findings	Implemented	"Applied" counter tracks patterns in agent output. Frequency 3+ auto-promotes
4.5	Batch processing (Message Batches API)	N/A	Sequential processing via CLI, not API batch endpoint
4.6	Multi-instance review -- independent reviewer	Implemented	Core architecture. Fresh CLI invocation with zero shared context from builder
4.6	Multi-pass review (per-file + cross-file integration)	Implemented	`review.rs:269` run_multipass_review splits into per-file analysis + cross-file integration pass when files exceed `review_multipass_threshold` (default 8)

Domain 5: Context Management & Reliability 15%

Task	Principle	Status	Evidence
5.1	Lost-in-the-middle effect mitigation	Implemented	Scout report: Key Facts first (beginning bias), Risks last (recency bias)
5.1	Trimming verbose tool output before context accumulation	Implemented	`agent.rs:1505` truncate_for_preview trims tool output to 200 chars. Build/test output trimmed between builder and reviewer stages
5.1	Persistent structured state outside conversation history	Implemented	.buildloop/ files persist scout report, plan, claims, review across context boundaries
5.2	Escalation patterns (human-in-the-loop)	Implemented	Review mode pauses for approval. WIP commits + GitHub issues escalate failures
5.2	Escalation on inability to progress, not just complexity	Implemented	Verify failure after fixer retry = WIP + issue. Discovery backs off when nothing found
5.3	Structured error propagation across multi-agent systems	Implemented	`context.rs:35` StageResult struct with failure_type, attempted_action, partial_results, suggestions. Fixer receives structured context
5.3	Distinguish access failures from valid empty results	Implemented	`context.rs:13` FailureType enum: Timeout, Crash, GateFail, ReviewFail, RateLimited, StopRequested
5.4	Context degradation in extended sessions	Implemented	Each agent is a fresh session. Long sessions are architecturally impossible
5.4	Scratchpad files for persisting findings	Implemented	.buildloop/ artifacts are exactly this pattern
5.4	Crash recovery via structured state exports	Implemented	TASKS.md SPID progress + .buildloop/ artifacts survive crashes
5.5	Human review workflows and confidence calibration	Implemented	Review mode creates PRs with polling. `review.rs:714` confidence scores (0.0-1.0) with `config.rs:199` configurable threshold
5.6	Information provenance in multi-source synthesis	Implemented	`review.rs:675` source_evidence field on every finding: snippet, line_range, reasoning chain. Fixer receives full provenance
5.5	Confidence scores for calibrated review routing	Implemented	`review.rs:714-758` per-finding confidence (0.0-1.0). Below `confidence_threshold` (default 0.5) flagged for manual review, not auto-fixed

Resolved Opportunities (Phase 13 -- CCA Alignment)

Structured error propagation (5.3) -- T13.1
StageResult struct with FailureType enum passes structured failure context (what failed, what was attempted, partial results, suggestions) between pipeline stages.
Multi-pass review for large changesets (4.6) -- T13.2/T13.3
run_multipass_review splits into per-file analysis passes + cross-file integration pass when changeset exceeds review_multipass_threshold (default 8 files).
Verbose tool output trimming (5.1) -- T13.3
truncate_for_preview trims tool output to 200 chars. Build/test results trimmed between builder and reviewer to prevent context bloat.
MCP resources for content catalogs (2.4) -- T13.4
Pattern catalog (foundry://patterns/catalog) and extension index (foundry://plugins/index) exposed as browsable MCP resources.
Few-shot borderline severity examples (4.2) -- T13.5
Three borderline classification examples in reviewer prompt: unchecked file read = HIGH, test-only return value = LOW, unwrap on constant = SKIP.
Reviewer finding provenance (5.6) -- T13.6
source_evidence field on every finding: code snippet, line range, and reasoning chain. Fixer receives full provenance for targeted fixes.
Confidence scores on findings (5.5) -- T13.7
Per-finding confidence (0.0-1.0) with configurable threshold. Below 0.5 flagged for manual review instead of auto-fix.