Cutting token waste in autonomous build loops.
Context Foundry v0.7.4
Run an AI coding agent in an autonomous loop and watch the output. You will see something like this:
Now I will read the file src/main.rs to understand the current structure.
Let me check the tests to make sure nothing is broken.
I'll now modify the function to add the new parameter.
Let me verify the build passes after my changes.
Every action is narrated. Every tool call is preceded by an explanation of what it does and why. This is useful when a human is watching -- it builds trust and makes the agent's reasoning transparent. But in a build loop, there is no human watching. The only consumer of this output is an orchestrator that parses tool calls and routes results between pipeline stages.
Those narration tokens cost money. Worse, they consume context window space that could hold actual code. An agent that spends 300 tokens explaining what it is about to do before each tool call will burn through its context budget substantially faster than one that just executes.
The default system prompt for most AI coding tools instructs the agent to communicate with the user. This is the right behavior for interactive sessions -- you want to know what the agent is doing and why. The agent is trained to be helpful, and narration is how it demonstrates helpfulness.
The problem is that build loops are not interactive sessions. The orchestrator does not need to be told "Now I will read the file." It issued the instruction. It knows what the agent should be doing. The narration is dead weight.
Some agents narrate even more aggressively when they encounter uncertainty. Instead of just making a tool call, they produce a paragraph of reasoning about which approach to take, then another paragraph justifying the choice, then finally the tool call. In interactive mode, this is sometimes helpful. In a build loop, it is pure waste.
Context Foundry v0.7.4 injects two directives into every agent prompt:
EXECUTION STYLE: Execute tool calls directly. Do not narrate
your actions. Do not explain what you are about to do before
doing it. Output only results and error messages.
SILENT EXECUTION: You are running inside an automated pipeline.
There is no human watching. Minimize text output between tool
calls. The orchestrator only reads tool call results.
These override the default chattiness. The agent still reasons internally (extended thinking is unaffected), but it stops producing paragraphs of narration between tool calls. The output becomes a clean sequence of tool calls and their results.
The directives are injected into builder, reviewer, and fixer prompts -- every agent that runs inside the pipeline. Scout and planner prompts are left verbose because their text output is the artifact (the scout report and the plan).
Before v0.7.4, every agent invocation read the project's CLAUDE.md file. This file typically contains coding conventions, project structure notes, and workflow instructions. It is useful for interactive sessions where the agent needs to understand the project's norms.
In a build loop, CLAUDE.md is the wrong place for conventions. The pipeline already has a SPEC.md that describes what to build and how. The scout report describes the codebase. The plan describes the implementation steps. Adding CLAUDE.md on top of these means the agent is reading overlapping information from multiple sources, paying tokens for each one.
The fix: replace CLAUDE.md reading with a lightweight AUTONOMY_OVERRIDE injected via --append-system-prompt:
AUTONOMY_OVERRIDE: You are an autonomous agent in a build pipeline.
Do not read CLAUDE.md. Your instructions come from the pipeline
artifacts: SPEC.md, scout-report.md, and current-plan.md. Follow
the plan exactly. Do not deviate to follow conventions from other
sources.
This is a single system prompt injection -- not a file read. It costs a fixed number of tokens regardless of how large the project's CLAUDE.md is.
The savings come from two sources:
Reduced narration: ~500--1000 tokens saved per agent invocation. A typical pipeline runs 4--8 agent invocations per task, so this adds up to 2000--8000 tokens per task.
No CLAUDE.md read: ~2000--4000 tokens saved per invocation, depending on file size. Over a full pipeline, this is 8000--32000 tokens.
For a project with a 3000-token CLAUDE.md running 6 agent invocations per task, the combined savings are roughly 21000--38000 tokens per task. At typical API pricing, this is not negligible -- especially for projects that run dozens of tasks per session.
Coding conventions that live only in CLAUDE.md will not be picked up by pipeline agents. If your CLAUDE.md says "use snake_case for function names" and your SPEC.md does not, the builder will not know about that convention.
This is intentional. The build loop should be self-contained. Everything the builder needs should be in SPEC.md, the scout report, or the plan. If a convention matters for the build, it belongs in one of those artifacts -- not in a side file that gets read on every invocation.
Interactive Claude Code sessions still read CLAUDE.md normally. This change only affects agents running inside the Context Foundry pipeline.