Superpowers vs Context Foundry

TL;DR

Superpowers is a Claude Code plugin that bolts a software-engineering methodology onto a turn-based conversational session. Context Foundry is a standalone Rust binary that runs an autonomous build loop against your codebase across many tasks without supervision. They are not competitors. They live at different layers of the same stack and could be run together.

Plugin · v5.1.0 · 645K installs

Superpowers

A methodology layer that lives inside an interactive Claude Code session. 14 composable skills + 1 master dispatcher that fires automatically and routes the agent through TDD, brainstorming, plan-writing, subagent dispatch, code review, and worktree-based parallelism.

MIT · Anthropic Skills format · turn-based collaboration

Binary · v3.3.0 · self-hosted

Context Foundry

A 9-stage autonomous build loop (Q→R→P→P+→B→A→SHIP→DISCOVER→SKILLS) that reads a TASKS.md queue and ships feat() commits unattended. Hybrid skill retriever (BM25 + local Ollama nomic-embed + telemetry) ranks 271 skills per stage. TUI dashboard, multi-provider routing.

MIT · Anthropic Skills format · autonomous pipeline

Layer diagram — where each sits

The diagram below is the single most important point of this comparison. Superpowers steers what one agent does on one turn. Context Foundry orchestrates many agents across many tasks, each running in its own fresh context. They overlap only at the skill-format layer.

┌──────────────────────────────────────────────────────────────────────────┐ │ Layer 5 — Multi-task autonomy │ │ Reads TASKS.md, runs stages, commits per task, learns across runs. │ │ │ │ [ Context Foundry ] ← pipeline + retriever + telemetry + TUI │ │ │ └──────────────────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────────────────┐ │ Layer 4 — Per-stage agent invocation │ │ Spawns Claude / Codex / Copilot in PTY, fresh-context per stage. │ │ │ │ [ Context Foundry — Builder, Auditor, etc. ] │ │ │ └──────────────────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────────────────┐ │ Layer 3 — In-session methodology │ │ Inside one Claude session: brainstorm, plan, TDD, review, debug. │ │ │ │ [ Superpowers ] ← dispatcher + composable skills │ │ │ └──────────────────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────────────────┐ │ Layer 2 — Skill / knowledge format │ │ │ │ [ Anthropic Agent Skills (SKILL.md) ] ← both build on this │ │ │ └──────────────────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────────────────┐ │ Layer 1 — Foundation model + CLI │ │ [ Claude Code / Codex / Copilot / Gemini / Cursor / OpenCode ] │ └──────────────────────────────────────────────────────────────────────────┘

This is why a fair comparison cannot pick a winner: they run at different layers. The interesting question is not "which is better," it is "where does each contribute most?"

Side-by-side feature matrix

The verdict column says what each tool actually does, not what it could in theory do. ✓ means the capability is shipped and load-bearing. ° means partial or different mechanism. − means not present.

Capability	Superpowers	Context Foundry
Form factor	Claude Code plugin (markdown skills)	Standalone Rust binary + TUI
Execution model	Turn-based, human in the loop	Autonomous, queue-driven, unattended
Time horizon	One session, minutes to hours	Multiple tasks, hours to days
Skill format	Anthropic Skills SKILL.md ✓	Anthropic Skills SKILL.md ✓
Skill catalog size	14 curated skills (methodology-focused)	271 shipped + user-extensible (pitfall-focused)
Skill activation	Master "using-superpowers" dispatcher reads request, picks skills via `when_to_use` metadata	Hybrid retriever per stage: BM25 + nomic-embed cosine + telemetry success-rate boost
Master / dispatcher skill	`using-superpowers` — runs every conversation, under 2K tokens, routes to skills ✓	No equivalent — routing is per-stage by the retriever −
TDD enforcement	`test-driven-development` skill enforces red/green/refactor; tests must fail before impl ✓	No dedicated TDD skill in shipped catalog −
Systematic debugging	`systematic-debugging`: 4-phase root-cause methodology; architectural review trigger after 3 failed fixes ✓	No dedicated debugging methodology skill −
Brainstorming / requirements	`brainstorming`: Socratic refinement, design sections for validation ✓	QUERY stage: clarifying questions before research ° (different mechanism)
Plan writing	`writing-plans` skill — structured implementation plan ✓	PLAN stage produces `current-plan.md` with file:line operations ✓
Plan review	Code review post-implementation only °	P+ (plan-review) iterates the plan before BUILD; depth scales by complexity tier (1/2/3) ✓
Plan execution	`executing-plans` with batched checkpoints ✓	BUILD stage; verification commands per plan ✓
Code review	`requesting-code-review` + `receiving-code-review` + code-reviewer agent ✓	AUDIT stage: fresh-context agent reads build-claims, greps diff, verifies CHECK lines ✓
Fresh-context per stage	Subagent dispatch isolates implementation tokens ✓	Every stage spawns a clean Claude/Codex process ✓
Subagent dispatch	`subagent-driven-development` + `dispatching-parallel-agents` ✓	Dual-model arena + per-stage agents; not user-invoked parallel °
Git worktrees	`using-git-worktrees` — parallel tasks without clobbering ✓	Worktrees for dual-model arena and parallel agents ✓
Auto-commit per task	`finishing-a-development-branch` skill assists, user-driven °	SHIP stage commits `feat()` or `WIP()` based on audit verdict ✓
Task queue / multi-task	No queue — one task per conversation −	TASKS.md queue with QRPBA progress indicators ✓
Discovery / next-task	Not present −	DISCOVER stage scans codebase, appends new tasks ✓
Skill authoring	`writing-skills`: TDD applied to skill docs ✓	Pattern extractor writes SKILL.md after each task ✓
Cross-tool reach	Ships `.claude-plugin/`, `.codex-plugin/`, `.cursor-plugin/`, `.opencode/` — 7+ tools ✓	Cross-provider runtime (Claude/Codex/Copilot/local); reads external skill formats (AGENTS.md, .cursorrules, .github/copilot-instructions.md) ✓
Vector embedding	Not used — activation is via `when_to_use` semantic matching by the host model −	Local Ollama nomic-embed-text (137M params, 768-dim, ~50ms/call, on-device) ✓
Telemetry / learn loop	Behavioral pressure-tests to keep skills firing under load °	Citation scanner + SQLite sidecar: skills cited in passing builds gain rank; failures decay ✓
Token economics	"VERY token light" — master skill <2K tokens; subagents absorb implementation cost ✓	Heavier per task (~$28 per [Complex] task on the May 2026 overnight run) but unattended °
Live dashboard / TUI	None; lives in the terminal Claude Code is running in −	Ratatui TUI with clickable AI summaries on every pane ✓
Per-stage model routing	One model per session −	`stage_overrides` in `.foundry.json` — e.g. Claude on PLAN, Codex on BUILD ✓
License	MIT	MIT
Install	`/plugin install superpowers@claude-plugins-official`	`cargo install foundry` / `npm i -g context-foundry` / brew / winget
Adoption (May 2026)	645,146 installs	v3.3.0 just shipped; significantly smaller install base

Methodology coverage

Superpowers' value prop is methodology: it teaches Claude how to do software development — brainstorming, TDD, debugging, code review — on every turn. Context Foundry's value prop is orchestration: it spends those tokens on running stages in the right order. They are unequally distributed across the same surface.

Methodology	Superpowers	Context Foundry	Honest verdict
Brainstorming / clarify requirements	Socratic, multi-turn, design-doc style	QUERY stage writes clarifying questions to disk	Both cover this. Superpowers is more conversational; CF is more artifact-driven.
Plan writing	`writing-plans` skill	PLAN + P+ stages with verification matrix	CF is more rigorous here — P+ rejects plans that don't grep against actual files.
TDD (red/green/refactor)	`test-driven-development` — tests must fail before code	Not enforced as a discrete skill in the catalog	Real gap in CF. A test-first skill would slot naturally into the BUILD stage.
Systematic debugging	4-phase: investigate root cause, pattern analysis, hypothesis test, implement; architectural review after 3 failed attempts	The AUDIT stage flags WIP() commits; no proactive debugging methodology	Real gap in CF. Superpowers' debugging discipline is one of its sharpest contributions.
Code review	`requesting-code-review` + `receiving-code-review` + code-reviewer agent	AUDIT stage with fresh-context agent verifying CHECK lines against the diff	Different rhythms. Superpowers' review is conversational; CF's is binary pass/fail driving `feat()` vs `WIP()`.
Worktree-based parallelism	`using-git-worktrees` — user-invoked branching for parallel tasks	Dual-model arena uses worktrees for A/B model comparison	Both lean on worktrees. Different use cases — parallel tasks (SP) vs parallel models on one task (CF).
Skill authoring	`writing-skills` — behavioral pressure-tests for skill docs	Pattern extractor agent writes SKILL.md from task artifacts	Different angles. Superpowers teaches authoring discipline; CF auto-extracts from completed work.
Multi-task queue + autonomy	Not in scope — one task per session	Full TASKS.md pipeline running unattended overnight	Real gap in Superpowers — but it is by design. Superpowers is a methodology, not an orchestrator.
Per-task verdict + auto-commit	User commits at end of session	SHIP stage emits `feat()` or `WIP()` per task based on AUDIT	Real gap in Superpowers by the same logic — no autonomous git surface.
Cross-run learning	No mechanism — skills are static	Citation telemetry; skills cited in passing builds gain rank for next task	Real gap in Superpowers. A static skill library doesn't get better the more you use it.

Skill system — the deep dive

Both projects build on Anthropic's Agent Skills specification (SKILL.md with YAML frontmatter, progressive disclosure, name + description as the activation key). The interesting differences are in how the skills are ranked and surfaced.

Superpowers: dispatcher + when_to_use semantics

One master skill, using-superpowers, loads on every conversation. It is under 2K tokens. It reads the user's request and decides which of the 14 skills to invoke. Activation is via the host model's own pattern-matching against each skill's when_to_use metadata. There is no retriever and no vector store — the orchestration logic lives in the dispatcher's prose. From Simon Willison's writeup:

"VERY token light… one doc of fewer than 2k tokens. It uses subagents to manage token-heavy stuff, including all the actual implementation. As it needs bits of the process, it runs a shell script to search for them."
— Simon Willison on Superpowers

The catalog is small and curated: 14 skills, each manually authored, covering methodology rather than domain knowledge.

Context Foundry: hybrid retriever + per-stage filter

271 skills shipped by default. Per stage per task, a three-signal retriever ranks every candidate:

BM25 — sparse keyword match against the task description.
Cosine similarity — dense match via nomic-embed-text running on a local Ollama process. 137M parameters, 768-dim vectors, ~50ms per call, cached to disk. Embedding stays on the machine.
Telemetry boost — success-rate weighting from the citation scanner: skills cited in feat() commits rank higher than those cited in WIP() commits.

The top N (default 10, tunable) get injected into that stage's prompt. The retriever runs for every stage (QUERY, RESEARCH, PLAN, P+, BUILD, AUDIT, SHIP, DISCOVER, SKILLS), filtered softly by the cf-stage hint in each skill's metadata.

Where they meet

Both treat the skill body as the unit of progressive disclosure. Both keep the catalog separate from the activation logic. Both use Anthropic's spec so the on-disk content is portable between tools. Critically: a Superpowers skill could be dropped into ~/.foundry/skills/ and the Context Foundry retriever would pick it up for any stage where its description matched. The portability is real, not theoretical.

Token economics

Both projects make different bets about how to spend tokens.

Superpowers

Bet: minimize the methodology overhead per turn so the user-facing budget is mostly implementation.

Mechanism: master skill is <2K tokens; subagents take the implementation cost; skills load on demand via runtime search.

Implication: works inside any Claude Code session without changing your billing pattern materially.

Context Foundry

Bet: spend whatever a queue takes overnight, because the alternative is paying a human to babysit the run.

Mechanism: 9 stages per task, each a fresh-context agent invocation; ~$28 average per [Complex] task on the May 2026 overnight run.

Implication: per-task cost is visibly higher, but the time the user spends is approximately zero between invocation and the morning's commits.

This is not a fair fight on raw tokens, and it shouldn't be. Superpowers is optimizing for the unit "how much extra do I pay to get methodology added to a session I was going to have anyway?" Context Foundry is optimizing for "how much do I pay to skip the session entirely?"

Cross-tool reach

Both ship beyond a single vendor, but in different ways.

Aspect	Superpowers	Context Foundry
Where the skills live	Ships per-tool wrapper dirs: `.claude-plugin/`, `.codex-plugin/`, `.cursor-plugin/`, `.opencode/`, `.codex-app/`, plus Factory Droid, Gemini CLI, GitHub Copilot CLI integrations	Skills live in `~/.foundry/skills/` and `plugins/<name>/skills/`. The tools are invoked by Foundry, not the other way around
Reads external skill formats	Authors its own format and pushes it cross-tool	Discovers `AGENTS.md` (Linux Foundation standard), `.cursorrules`, `.claude/skills/<topic>/SKILL.md`, `.github/copilot-instructions.md` — with per-source opt-in on the startup screen
Direction of integration	"My skills should work in your tool"	"My pipeline can call your tool, and read your skills"
Per-stage model routing	One model per session	Different model per pipeline stage via `stage_overrides` — e.g. Claude Opus on PLAN, Codex on BUILD, local model on DISCOVER

What happens if you run both?

This is the most interesting scenario and the one that argues hardest against treating them as competitors.

Hypothesis

Context Foundry's BUILD and AUDIT stages spawn Claude Code processes. If the user has installed Superpowers, those spawned processes inherit Superpowers' master skill and all 14 of its methodology skills. Foundry's pipeline gets TDD discipline, four-phase debugging, and the code-reviewer agent for free.

Two specific gaps in Context Foundry's catalog — explicit TDD enforcement and a systematic debugging methodology — are exactly where Superpowers is strongest. A user running Foundry on a TDD-shaped codebase, with Superpowers installed in their global Claude Code config, would get something closer to the union of both projects' strengths than either alone:

Foundry handles the queue, the artifact discipline, the multi-stage fresh-context rhythm, and the citation feedback loop.
Superpowers handles, inside each BUILD invocation, the red/green/refactor TDD cycle and the root-cause-first debugging pass.
Foundry's hybrid retriever can also rank Superpowers' skills if they're symlinked into ~/.foundry/skills/ — same SKILL.md format.

This is the practical answer to "which should I install?" Both, if your workflow has room for them. Treat the layer diagram at the top as the seating chart: they sit in different chairs.

Honest gaps in Context Foundry (that Superpowers fills)

Where Superpowers is sharper

1. No dedicated TDD skill in the shipped 271. The catalog has many "test failure" pitfalls but no "tests must fail before implementation" enforcement skill. Adding one would slot naturally into BUILD via the cf-stage: build hint.

2. No four-phase debugging methodology. AUDIT catches failures post-hoc; Superpowers' systematic-debugging drives the agent to investigate root cause before patching. CF would benefit from a sibling AUDIT-time skill that scans for symptom-treating fixes.

3. No subagent dispatch from within a stage. Foundry's stages are themselves the subagents; there's no idiom for a stage to fan out further. Superpowers' dispatching-parallel-agents covers a use case (e.g. "review four files in parallel") that Foundry doesn't have a primitive for.

4. No conversational brainstorming. QUERY writes questions to disk; it does not maintain a Socratic dialogue with the user. For ill-specified tasks, Superpowers' multi-turn brainstorming may produce a better starting plan.

5. No skill-author discipline. The pattern extractor writes SKILL.md mechanically; Superpowers' writing-skills applies TDD principles to the skill itself, including behavioral pressure-tests.

Honest gaps in Superpowers (that Context Foundry fills)

Where Context Foundry is sharper

1. No autonomy. Superpowers is bound to the rhythm of a human turn. It cannot run overnight against a queue. This is by design — it is a methodology layer, not an orchestrator — but it means "ship 10 tasks while I sleep" is not in scope.

2. No queue, no DISCOVER stage, no auto-commit verdict. Each session ends when the user closes it. There is no equivalent to TASKS.md, no QRPBA progress indicators, and no programmatic feat() vs WIP() decision based on a fresh-context audit.

3. No retriever. Skill activation relies on the host model's own pattern-matching against when_to_use. This is elegant at 14 skills and probably saturates at a few dozen; it would not scale to a 271-skill catalog. Foundry's BM25 + dense + telemetry combination is the answer to "what happens when the catalog grows to hundreds of skills?"

4. No telemetry feedback loop. Skills don't learn from outcomes. A skill that fires every conversation but never contributes to a passing build will keep firing forever. Foundry's citation scanner is the closed-loop version of the same idea.

5. No per-stage model routing. Superpowers runs against whatever model the host CLI uses. Foundry's stage_overrides mean a project can put Claude Opus on PLAN, Codex on BUILD, and a local model on DISCOVER — routing by what each model is best at.

6. No live dashboard. Output is whatever the host terminal shows. Foundry's TUI surfaces pipeline state, AI summaries on every pane, hover tooltips, click-through to artifacts — the user can watch an autonomous run without reading the raw log.

TL;DR

Superpowers

Context Foundry

Layer diagram — where each sits

Side-by-side feature matrix

Methodology coverage

Skill system — the deep dive

Superpowers: dispatcher + when_to_use semantics

Context Foundry: hybrid retriever + per-stage filter

Where they meet

Token economics

Superpowers

Context Foundry

Cross-tool reach

What happens if you run both?

Honest gaps in Context Foundry (that Superpowers fills)

Honest gaps in Superpowers (that Context Foundry fills)

When to use which

Reach for Superpowers when…

Reach for Context Foundry when…

Closing

Sources