Context Foundry / CCA Alignment Matrix

Mapped against the Claude Certified Architect -- Foundations Exam Guide (Anthropic, 2025)

43
Implemented
3
Partial
0
Opportunity
7
N/A
2
Not Used

Domain 1: Agentic Architecture & Orchestration 27%

TaskPrincipleStatusEvidence
1.1Agentic loop lifecycle (stop_reason, tool_use vs end_turn)Implementedagent.rs:38 -- parses stream-json events, loops on tool_use
1.1Model-driven decision-making vs pre-configured decision treesImplementedAgents choose their own tools; foundry provides role prompts, not tool sequences
1.1Avoid anti-patterns: NL parsing for termination, arbitrary iteration capsImplementedUses stream-json structured events. Timeout is a safety net, not a stop mechanism
1.2Hub-and-spoke coordinator with isolated subagent contextImplementedapp/build.rs orchestrates agents as isolated CLI invocations with no shared history
1.2Coordinator handles decomposition, delegation, result aggregationImplementedTASKS.md for decomposition, stage agents for delegation, .buildloop/ for aggregation
1.3Subagent context must be explicitly provided, not inheritedImplementedEach agent is a fresh CLI process. Artifacts passed as file references
1.3AgentDefinition with descriptions, prompts, tool restrictionsImplementedprompts.rs per-role prompts; agent.rs:562 allowed_tools per invocation
1.3Fork-based session managementN/AFoundry spawns fresh processes, not Claude Code sessions
1.4Programmatic enforcement (hooks, prerequisite gates)Implementedgate_builder and gate_reviewer in build.rs:73,96. Extension gate at build.rs:1513
1.4Deterministic compliance for critical operationsImplementedGates are code, not prompts. Plan must have File Operations + Verification sections
1.4Structured handoff protocols between stagesImplementedEach stage writes structured .buildloop/ artifacts consumed by downstream stages
1.5Agent SDK hooks (PostToolUse) for tool call interceptionN/AUses Claude Code CLI, not Agent SDK
1.5Hooks for deterministic guarantees vs prompt-based compliancePartialGates enforce stage ordering; no tool-call-level interception within a session
1.6Fixed sequential pipelines vs dynamic adaptive decompositionImplementedFixed pipeline with adaptive elements (complexity-based skip, retry-with-feedback)
1.6Prompt chaining for multi-step workflowsImplementedStages chain via file artifacts. Reviewer findings chain into fixer
1.6Adaptive investigation plans based on discoveriesPartialDiscovery agent adapts task generation, but agents don't spawn sub-investigations
1.7Named session resumption (--resume)N/AAgents are stateless one-shot invocations. State lives in .buildloop/ files
1.7fork_session for parallel explorationN/ANot applicable -- foundry doesn't use Claude Code sessions
1.7Crash recovery via structured state persistenceImplemented.buildloop/ artifacts + TASKS.md SPID progress survive crashes

Domain 2: Tool Design & MCP Integration 18%

TaskPrincipleStatusEvidence
2.1Clear tool descriptions with input formats and boundariesN/AAgents use Claude Code's built-in tools, not custom MCP tools
2.2Structured error responses (isError, errorCategory, isRetryable)N/AMCP tools exist for external use, not internal agent orchestration
2.3Scoped tool access per agent roleImplementedagent.rs:562 allowed_tools. Skills restrict to 3-4 tools each
2.3Too many tools degrades selection reliabilityImplementedReviewer is read-only. Skills restrict to role-appropriate tools
2.4MCP server scoping (project vs user level)N/AFoundry doesn't configure MCP servers for its agents
2.4MCP resources as content catalogsImplementedmcp.rs:125 pattern catalog + mcp.rs:131 extension index as browsable MCP resources via foundry:// URIs
2.5Effective use of built-in tools (Read, Write, Edit, Bash, Grep, Glob)ImplementedAgent prompts guide tool selection per role

Domain 3: Claude Code Configuration & Workflows 20%

TaskPrincipleStatusEvidence
3.1CLAUDE.md hierarchy (user > project > directory)ImplementedAgents inherit CLAUDE.md via normal loading. Foundry appends orchestration override
3.1.claude/rules/ for path-scoped conventionsImplemented6 rule files with paths: frontmatter scoping
3.1@import for modular CLAUDE.mdNot UsedRules already split into .claude/rules/. Could be useful for extensions
3.2Custom slash commands in .claude/commands/Not UsedSkills in .claude/skills/ serve this purpose instead
3.2Skills with SKILL.md, context: fork, allowed-toolsImplemented3 skills (audit, scout, extract-patterns) with fork context and scoped tools
3.3Path-specific rules with YAML frontmatterImplementedAll 6 rule files use paths: frontmatter for conditional loading
3.4Plan mode vs direct executionImplementedPlanner stage IS plan mode. Complexity classifier can skip for simple tasks
3.5Iterative refinement with concrete I/O examplesImplementedFew-shot severity examples in reviewer. JSON template in pattern extractor
3.5Test-driven iterationImplementedBuilder runs tests. Reviewer re-runs independently. Fixer iterates on failures
3.6CI/CD integration (--output-format json)Implemented--output-format json on foundry run --no-tui. SessionReport with tasks/session/config. Schema: docs/ci-output-schema.json
3.6Session context isolation -- fresh reviewerImplementedCore design principle. Verify agent is a completely separate CLI invocation

Domain 4: Prompt Engineering & Structured Output 20%

TaskPrincipleStatusEvidence
4.1Explicit criteria over vague instructionsImplementedReviewer defines severity criteria with examples. "What to report" and "what to skip" lists
4.1Explicit criteria reduce false positivesImplementedCategorical criteria, not confidence-based filtering
4.2Few-shot examples for output consistencyImplementedReviewer severity examples. Pattern extractor JSON template. Build-claims format
4.2Few-shot for ambiguous-case handlingImplementedprompts.rs:572-602 three borderline severity examples: unchecked file read = HIGH, test-only return value = LOW, unwrap on constant = SKIP
4.3Structured output via tool_use with JSON schemasN/AUses Claude Code CLI, not the API
4.4Retry-with-error-feedbackImplementedGate failure triggers planner retry with validation error appended. Agent timeout retry
4.4Feedback loops -- tracking which patterns trigger findingsImplemented"Applied" counter tracks patterns in agent output. Frequency 3+ auto-promotes
4.5Batch processing (Message Batches API)N/ASequential processing via CLI, not API batch endpoint
4.6Multi-instance review -- independent reviewerImplementedCore architecture. Fresh CLI invocation with zero shared context from builder
4.6Multi-pass review (per-file + cross-file integration)Implementedreview.rs:269 run_multipass_review splits into per-file analysis + cross-file integration pass when files exceed review_multipass_threshold (default 8)

Domain 5: Context Management & Reliability 15%

TaskPrincipleStatusEvidence
5.1Lost-in-the-middle effect mitigationImplementedScout report: Key Facts first (beginning bias), Risks last (recency bias)
5.1Trimming verbose tool output before context accumulationImplementedagent.rs:1505 truncate_for_preview trims tool output to 200 chars. Build/test output trimmed between builder and reviewer stages
5.1Persistent structured state outside conversation historyImplemented.buildloop/ files persist scout report, plan, claims, review across context boundaries
5.2Escalation patterns (human-in-the-loop)ImplementedReview mode pauses for approval. WIP commits + GitHub issues escalate failures
5.2Escalation on inability to progress, not just complexityImplementedVerify failure after fixer retry = WIP + issue. Discovery backs off when nothing found
5.3Structured error propagation across multi-agent systemsImplementedcontext.rs:35 StageResult struct with failure_type, attempted_action, partial_results, suggestions. Fixer receives structured context
5.3Distinguish access failures from valid empty resultsImplementedcontext.rs:13 FailureType enum: Timeout, Crash, GateFail, ReviewFail, RateLimited, StopRequested
5.4Context degradation in extended sessionsImplementedEach agent is a fresh session. Long sessions are architecturally impossible
5.4Scratchpad files for persisting findingsImplemented.buildloop/ artifacts are exactly this pattern
5.4Crash recovery via structured state exportsImplementedTASKS.md SPID progress + .buildloop/ artifacts survive crashes
5.5Human review workflows and confidence calibrationImplementedReview mode creates PRs with polling. review.rs:714 confidence scores (0.0-1.0) with config.rs:199 configurable threshold
5.6Information provenance in multi-source synthesisImplementedreview.rs:675 source_evidence field on every finding: snippet, line_range, reasoning chain. Fixer receives full provenance
5.5Confidence scores for calibrated review routingImplementedreview.rs:714-758 per-finding confidence (0.0-1.0). Below confidence_threshold (default 0.5) flagged for manual review, not auto-fixed

Resolved Opportunities (Phase 13 -- CCA Alignment)

  1. Structured error propagation (5.3) -- T13.1
    StageResult struct with FailureType enum passes structured failure context (what failed, what was attempted, partial results, suggestions) between pipeline stages.
  2. Multi-pass review for large changesets (4.6) -- T13.2/T13.3
    run_multipass_review splits into per-file analysis passes + cross-file integration pass when changeset exceeds review_multipass_threshold (default 8 files).
  3. Verbose tool output trimming (5.1) -- T13.3
    truncate_for_preview trims tool output to 200 chars. Build/test results trimmed between builder and reviewer to prevent context bloat.
  4. MCP resources for content catalogs (2.4) -- T13.4
    Pattern catalog (foundry://patterns/catalog) and extension index (foundry://extensions/index) exposed as browsable MCP resources.
  5. Few-shot borderline severity examples (4.2) -- T13.5
    Three borderline classification examples in reviewer prompt: unchecked file read = HIGH, test-only return value = LOW, unwrap on constant = SKIP.
  6. Reviewer finding provenance (5.6) -- T13.6
    source_evidence field on every finding: code snippet, line range, and reasoning chain. Fixer receives full provenance for targeted fixes.
  7. Confidence scores on findings (5.5) -- T13.7
    Per-finding confidence (0.0-1.0) with configurable threshold. Below 0.5 flagged for manual review instead of auto-fix.