contextfoundry.dev · May 2026
Same Anthropic Skills format, different layer of the stack.
An honest, in-depth side-by-side of Jesse Vincent's Superpowers Claude Code plugin and Context Foundry. Architecture, skill systems, methodology coverage, token economics, and honest gap analysis in both directions.
Superpowers is a Claude Code plugin that bolts a software-engineering methodology onto a turn-based conversational session. Context Foundry is a standalone Rust binary that runs an autonomous build loop against your codebase across many tasks without supervision. They are not competitors. They live at different layers of the same stack and could be run together.
A methodology layer that lives inside an interactive Claude Code session. 14 composable skills + 1 master dispatcher that fires automatically and routes the agent through TDD, brainstorming, plan-writing, subagent dispatch, code review, and worktree-based parallelism.
A 9-stage autonomous build loop (Q→R→P→P+→B→A→SHIP→DISCOVER→SKILLS) that reads a TASKS.md queue and ships feat() commits unattended. Hybrid skill retriever (BM25 + local Ollama nomic-embed + telemetry) ranks 271 skills per stage. TUI dashboard, multi-provider routing.
The diagram below is the single most important point of this comparison. Superpowers steers what one agent does on one turn. Context Foundry orchestrates many agents across many tasks, each running in its own fresh context. They overlap only at the skill-format layer.
This is why a fair comparison cannot pick a winner: they run at different layers. The interesting question is not "which is better," it is "where does each contribute most?"
The verdict column says what each tool actually does, not what it could in theory do. ✓ means the capability is shipped and load-bearing. ° means partial or different mechanism. − means not present.
| Capability | Superpowers | Context Foundry |
|---|---|---|
| Form factor | Claude Code plugin (markdown skills) | Standalone Rust binary + TUI |
| Execution model | Turn-based, human in the loop | Autonomous, queue-driven, unattended |
| Time horizon | One session, minutes to hours | Multiple tasks, hours to days |
| Skill format | Anthropic Skills SKILL.md ✓ | Anthropic Skills SKILL.md ✓ |
| Skill catalog size | 14 curated skills (methodology-focused) | 271 shipped + user-extensible (pitfall-focused) |
| Skill activation | Master "using-superpowers" dispatcher reads request, picks skills via when_to_use metadata |
Hybrid retriever per stage: BM25 + nomic-embed cosine + telemetry success-rate boost |
| Master / dispatcher skill | using-superpowers — runs every conversation, under 2K tokens, routes to skills ✓ |
No equivalent — routing is per-stage by the retriever − |
| TDD enforcement | test-driven-development skill enforces red/green/refactor; tests must fail before impl ✓ |
No dedicated TDD skill in shipped catalog − |
| Systematic debugging | systematic-debugging: 4-phase root-cause methodology; architectural review trigger after 3 failed fixes ✓ |
No dedicated debugging methodology skill − |
| Brainstorming / requirements | brainstorming: Socratic refinement, design sections for validation ✓ |
QUERY stage: clarifying questions before research ° (different mechanism) |
| Plan writing | writing-plans skill — structured implementation plan ✓ |
PLAN stage produces current-plan.md with file:line operations ✓ |
| Plan review | Code review post-implementation only ° | P+ (plan-review) iterates the plan before BUILD; depth scales by complexity tier (1/2/3) ✓ |
| Plan execution | executing-plans with batched checkpoints ✓ |
BUILD stage; verification commands per plan ✓ |
| Code review | requesting-code-review + receiving-code-review + code-reviewer agent ✓ |
AUDIT stage: fresh-context agent reads build-claims, greps diff, verifies CHECK lines ✓ |
| Fresh-context per stage | Subagent dispatch isolates implementation tokens ✓ | Every stage spawns a clean Claude/Codex process ✓ |
| Subagent dispatch | subagent-driven-development + dispatching-parallel-agents ✓ |
Dual-model arena + per-stage agents; not user-invoked parallel ° |
| Git worktrees | using-git-worktrees — parallel tasks without clobbering ✓ |
Worktrees for dual-model arena and parallel agents ✓ |
| Auto-commit per task | finishing-a-development-branch skill assists, user-driven ° |
SHIP stage commits feat() or WIP() based on audit verdict ✓ |
| Task queue / multi-task | No queue — one task per conversation − | TASKS.md queue with QRPBA progress indicators ✓ |
| Discovery / next-task | Not present − | DISCOVER stage scans codebase, appends new tasks ✓ |
| Skill authoring | writing-skills: TDD applied to skill docs ✓ |
Pattern extractor writes SKILL.md after each task ✓ |
| Cross-tool reach | Ships .claude-plugin/, .codex-plugin/, .cursor-plugin/, .opencode/ — 7+ tools ✓ |
Cross-provider runtime (Claude/Codex/Copilot/local); reads external skill formats (AGENTS.md, .cursorrules, .github/copilot-instructions.md) ✓ |
| Vector embedding | Not used — activation is via when_to_use semantic matching by the host model − |
Local Ollama nomic-embed-text (137M params, 768-dim, ~50ms/call, on-device) ✓ |
| Telemetry / learn loop | Behavioral pressure-tests to keep skills firing under load ° | Citation scanner + SQLite sidecar: skills cited in passing builds gain rank; failures decay ✓ |
| Token economics | "VERY token light" — master skill <2K tokens; subagents absorb implementation cost ✓ | Heavier per task (~$28 per [Complex] task on the May 2026 overnight run) but unattended ° |
| Live dashboard / TUI | None; lives in the terminal Claude Code is running in − | Ratatui TUI with clickable AI summaries on every pane ✓ |
| Per-stage model routing | One model per session − | stage_overrides in .foundry.json — e.g. Claude on PLAN, Codex on BUILD ✓ |
| License | MIT | MIT |
| Install | /plugin install superpowers@claude-plugins-official |
cargo install foundry / npm i -g context-foundry / brew / winget |
| Adoption (May 2026) | 645,146 installs | v3.3.0 just shipped; significantly smaller install base |
Superpowers' value prop is methodology: it teaches Claude how to do software development — brainstorming, TDD, debugging, code review — on every turn. Context Foundry's value prop is orchestration: it spends those tokens on running stages in the right order. They are unequally distributed across the same surface.
| Methodology | Superpowers | Context Foundry | Honest verdict |
|---|---|---|---|
| Brainstorming / clarify requirements | Socratic, multi-turn, design-doc style | QUERY stage writes clarifying questions to disk | Both cover this. Superpowers is more conversational; CF is more artifact-driven. |
| Plan writing | writing-plans skill |
PLAN + P+ stages with verification matrix | CF is more rigorous here — P+ rejects plans that don't grep against actual files. |
| TDD (red/green/refactor) | test-driven-development — tests must fail before code |
Not enforced as a discrete skill in the catalog | Real gap in CF. A test-first skill would slot naturally into the BUILD stage. |
| Systematic debugging | 4-phase: investigate root cause, pattern analysis, hypothesis test, implement; architectural review after 3 failed attempts | The AUDIT stage flags WIP() commits; no proactive debugging methodology | Real gap in CF. Superpowers' debugging discipline is one of its sharpest contributions. |
| Code review | requesting-code-review + receiving-code-review + code-reviewer agent |
AUDIT stage with fresh-context agent verifying CHECK lines against the diff | Different rhythms. Superpowers' review is conversational; CF's is binary pass/fail driving feat() vs WIP(). |
| Worktree-based parallelism | using-git-worktrees — user-invoked branching for parallel tasks |
Dual-model arena uses worktrees for A/B model comparison | Both lean on worktrees. Different use cases — parallel tasks (SP) vs parallel models on one task (CF). |
| Skill authoring | writing-skills — behavioral pressure-tests for skill docs |
Pattern extractor agent writes SKILL.md from task artifacts | Different angles. Superpowers teaches authoring discipline; CF auto-extracts from completed work. |
| Multi-task queue + autonomy | Not in scope — one task per session | Full TASKS.md pipeline running unattended overnight | Real gap in Superpowers — but it is by design. Superpowers is a methodology, not an orchestrator. |
| Per-task verdict + auto-commit | User commits at end of session | SHIP stage emits feat() or WIP() per task based on AUDIT |
Real gap in Superpowers by the same logic — no autonomous git surface. |
| Cross-run learning | No mechanism — skills are static | Citation telemetry; skills cited in passing builds gain rank for next task | Real gap in Superpowers. A static skill library doesn't get better the more you use it. |
Both projects build on Anthropic's Agent Skills specification (SKILL.md with YAML frontmatter, progressive disclosure, name + description as the activation key). The interesting differences are in how the skills are ranked and surfaced.
One master skill, using-superpowers, loads on every conversation. It is under 2K tokens. It reads the user's request and decides which of the 14 skills to invoke. Activation is via the host model's own pattern-matching against each skill's when_to_use metadata. There is no retriever and no vector store — the orchestration logic lives in the dispatcher's prose. From Simon Willison's writeup:
"VERY token light… one doc of fewer than 2k tokens. It uses subagents to manage token-heavy stuff, including all the actual implementation. As it needs bits of the process, it runs a shell script to search for them."
— Simon Willison on Superpowers
The catalog is small and curated: 14 skills, each manually authored, covering methodology rather than domain knowledge.
271 skills shipped by default. Per stage per task, a three-signal retriever ranks every candidate:
feat() commits rank higher than those cited in WIP() commits.The top N (default 10, tunable) get injected into that stage's prompt. The retriever runs for every stage (QUERY, RESEARCH, PLAN, P+, BUILD, AUDIT, SHIP, DISCOVER, SKILLS), filtered softly by the cf-stage hint in each skill's metadata.
Both treat the skill body as the unit of progressive disclosure. Both keep the catalog separate from the activation logic. Both use Anthropic's spec so the on-disk content is portable between tools. Critically: a Superpowers skill could be dropped into ~/.foundry/skills/ and the Context Foundry retriever would pick it up for any stage where its description matched. The portability is real, not theoretical.
Both projects make different bets about how to spend tokens.
Bet: minimize the methodology overhead per turn so the user-facing budget is mostly implementation.
Mechanism: master skill is <2K tokens; subagents take the implementation cost; skills load on demand via runtime search.
Implication: works inside any Claude Code session without changing your billing pattern materially.
Bet: spend whatever a queue takes overnight, because the alternative is paying a human to babysit the run.
Mechanism: 9 stages per task, each a fresh-context agent invocation; ~$28 average per [Complex] task on the May 2026 overnight run.
Implication: per-task cost is visibly higher, but the time the user spends is approximately zero between invocation and the morning's commits.
This is not a fair fight on raw tokens, and it shouldn't be. Superpowers is optimizing for the unit "how much extra do I pay to get methodology added to a session I was going to have anyway?" Context Foundry is optimizing for "how much do I pay to skip the session entirely?"
Both ship beyond a single vendor, but in different ways.
| Aspect | Superpowers | Context Foundry |
|---|---|---|
| Where the skills live | Ships per-tool wrapper dirs: .claude-plugin/, .codex-plugin/, .cursor-plugin/, .opencode/, .codex-app/, plus Factory Droid, Gemini CLI, GitHub Copilot CLI integrations |
Skills live in ~/.foundry/skills/ and plugins/<name>/skills/. The tools are invoked by Foundry, not the other way around |
| Reads external skill formats | Authors its own format and pushes it cross-tool | Discovers AGENTS.md (Linux Foundation standard), .cursorrules, .claude/skills/<topic>/SKILL.md, .github/copilot-instructions.md — with per-source opt-in on the startup screen |
| Direction of integration | "My skills should work in your tool" | "My pipeline can call your tool, and read your skills" |
| Per-stage model routing | One model per session | Different model per pipeline stage via stage_overrides — e.g. Claude Opus on PLAN, Codex on BUILD, local model on DISCOVER |
This is the most interesting scenario and the one that argues hardest against treating them as competitors.
Context Foundry's BUILD and AUDIT stages spawn Claude Code processes. If the user has installed Superpowers, those spawned processes inherit Superpowers' master skill and all 14 of its methodology skills. Foundry's pipeline gets TDD discipline, four-phase debugging, and the code-reviewer agent for free.
Two specific gaps in Context Foundry's catalog — explicit TDD enforcement and a systematic debugging methodology — are exactly where Superpowers is strongest. A user running Foundry on a TDD-shaped codebase, with Superpowers installed in their global Claude Code config, would get something closer to the union of both projects' strengths than either alone:
~/.foundry/skills/ — same SKILL.md format.This is the practical answer to "which should I install?" Both, if your workflow has room for them. Treat the layer diagram at the top as the seating chart: they sit in different chairs.
1. No dedicated TDD skill in the shipped 271. The catalog has many "test failure" pitfalls but no "tests must fail before implementation" enforcement skill. Adding one would slot naturally into BUILD via the cf-stage: build hint.
2. No four-phase debugging methodology. AUDIT catches failures post-hoc; Superpowers' systematic-debugging drives the agent to investigate root cause before patching. CF would benefit from a sibling AUDIT-time skill that scans for symptom-treating fixes.
3. No subagent dispatch from within a stage. Foundry's stages are themselves the subagents; there's no idiom for a stage to fan out further. Superpowers' dispatching-parallel-agents covers a use case (e.g. "review four files in parallel") that Foundry doesn't have a primitive for.
4. No conversational brainstorming. QUERY writes questions to disk; it does not maintain a Socratic dialogue with the user. For ill-specified tasks, Superpowers' multi-turn brainstorming may produce a better starting plan.
5. No skill-author discipline. The pattern extractor writes SKILL.md mechanically; Superpowers' writing-skills applies TDD principles to the skill itself, including behavioral pressure-tests.
1. No autonomy. Superpowers is bound to the rhythm of a human turn. It cannot run overnight against a queue. This is by design — it is a methodology layer, not an orchestrator — but it means "ship 10 tasks while I sleep" is not in scope.
2. No queue, no DISCOVER stage, no auto-commit verdict. Each session ends when the user closes it. There is no equivalent to TASKS.md, no QRPBA progress indicators, and no programmatic feat() vs WIP() decision based on a fresh-context audit.
3. No retriever. Skill activation relies on the host model's own pattern-matching against when_to_use. This is elegant at 14 skills and probably saturates at a few dozen; it would not scale to a 271-skill catalog. Foundry's BM25 + dense + telemetry combination is the answer to "what happens when the catalog grows to hundreds of skills?"
4. No telemetry feedback loop. Skills don't learn from outcomes. A skill that fires every conversation but never contributes to a passing build will keep firing forever. Foundry's citation scanner is the closed-loop version of the same idea.
5. No per-stage model routing. Superpowers runs against whatever model the host CLI uses. Foundry's stage_overrides mean a project can put Claude Opus on PLAN, Codex on BUILD, and a local model on DISCOVER — routing by what each model is best at.
6. No live dashboard. Output is whatever the host terminal shows. Foundry's TUI surfaces pipeline state, AI summaries on every pane, hover tooltips, click-through to artifacts — the user can watch an autonomous run without reading the raw log.
TASKS.md level.The dominant framing on AI coding tools right now is competitive: which plugin is best, which agent wins, which framework will beat which. That framing is wrong for these two projects. Superpowers and Context Foundry sit at different layers of the same stack, both build on Anthropic's Skills format, both treat fresh-context discipline as load-bearing, and both ship under MIT.
The interesting comparison is not which one but which combination. The honest answer is that Context Foundry is stronger at orchestration and weaker at methodology; Superpowers is the inverse. If your workflow has room for both, run both. If it doesn't, the layer diagram at the top of this page tells you where each one fits.
Two projects, same skill format, complementary scope. The Anthropic Skills format being shared between them is not a coincidence — it's what makes the seating chart legible. That portability is itself the most important thing in this comparison.