contextfoundry.dev

The Discovery Trap

Capability discovery is the new context switch.

Context Foundry · March 2025

The interrupt you already know

If you've written code for any amount of time, you know the feeling. You're in the zone. The problem is clear, the solution is forming, your hands are just keeping up with your brain. Then something breaks you out. A Slack message. A meeting. A build failure on a branch you forgot about.

It takes twenty minutes to get back. Sometimes longer. Sometimes you never fully get back, and the rest of the day is a wash. We all know this. There's research on it. We've built entire tools and workflows around protecting focus time.

That problem never went away. It just put on a new outfit.

The interrupt you don't notice

I'm building a feature. Maybe it's new, maybe it's an expansion of something that exists. I ask the agent to implement it. And then the agent asks me a question I didn't expect. Maybe it proposes a vector database when I was thinking SQLite. Maybe it suggests a different auth pattern.

It's a reasonable question. So we start evaluating databases. We compare options. We weigh trade-offs. That's productive, right?

But then something happens. The agent takes that same evaluation lens and starts applying it to everything else. The API methods. The security model. The encryption. We start running every feature through this filter, and now I'm not building anymore. I'm re-evaluating the entire implementation. I'm doubting the plan I started with.

And the thing is, I don't notice the switch. It still feels like building. Code is being discussed. Architecture is being shaped. Decisions are being made. But I sat down to ship a feature, and forty-five minutes later I'm redesigning something that was already working.

How it spreads

This is the part that gets you. It starts with one suggestion. One "have you considered..." from the agent. That's fine. That's useful, even. But the evaluation pattern spreads. Once you start questioning one decision, the agent applies the same scrutiny to the next thing, and the next thing.

I don't think agents will ever look at your plan and say "this is perfect, let's ship it." They're designed to have opinions. They're built to surface alternatives. That's a feature, not a bug, but it means even a tight implementation plan won't stop the agent from chiming in. There's always a suggestion. There's always another way to do it.

And you're the one who has to decide not to chase it. That's harder than it sounds when the suggestion is genuinely interesting.

Three eras

If I look back at how this has evolved, I see three phases.

Era 1 "Can you do this?" The agent kind of can, but there are bugs. You spend your time fixing the agent's output. The limitation is capability. Era 2 "Can you do this?" The agent definitely can, and then some. It does what you asked and shows you what else is possible. You start exploring. The limitation shifts from capability to focus. Era 3 You're building and discovering at the same time. You don't realize you've left the zone because the zone now includes the agent's suggestions. You look up and realize you haven't built what you set out to build.

We solved the capability problem. We haven't solved the focus problem. And the focus problem is harder because it disguises itself as progress.

A structural answer

The approach I landed on was simple: separate discovery from execution. Don't let them happen at the same time.

There's a stage where the agent investigates the codebase, reads the spec, and surfaces everything interesting. That's where discovery belongs. It's bounded. It's read-only. It happens before anyone writes a line of code.

Then there's a plan that locks in scope. Then implementation follows the plan. Not a conversation. A plan.

When it works, the experience is completely different. I'm not even in the loop. It's almost like the project doesn't exist for a while. I go to sleep, I wake up, the thing has been built, and I start to see my requirements alive in front of me. There's this sense of anticipation. I wonder what the UI will look like. I wonder how it interpreted what I asked for. And then I see it.

That feeling is so different from sitting in front of a model for hours going on tangents. It's the difference between spending your time and spending your attention. When the pipeline is running, I spend neither.

The part that keeps me honest

Here's where I could tell you I figured it out. The pipeline works. Token costs went down. Output went up. I made some improvements and started producing two, three times the output for the same cost. I ran four loops in parallel this past weekend and they found forty or fifty issues on their own. Identified them, fixed them, moved on. No human intervention. Just intent that I communicated at the start.

But that's exactly what makes me wonder.

If the model can take a few sentences of direction and produce all of that, how much of my architecture is actually doing the work? Maybe the model is powerful enough to route around my design mistakes and I'd never notice. Maybe my pipeline adds overhead that the model compensates for silently. Maybe I'm constraining it in ways that cost more than they save, and the results are good anyway because the engine is just that strong.

I can't tell the difference between "my architecture is good" and "the model is good enough to make my architecture look good." When the engine can overcompensate for your mistakes without making them visible, success is not proof that your approach is right. It's just proof that something in the stack is working. You don't know which part.

What I know and what I don't

I know tasks complete. Code ships. Tests pass. I know my token costs went down while my output went up. I know the loop finds issues I never would have thought to look for, and it resolves them without me asking.

I don't know if the pipeline is the reason, or if it's just not in the way. I don't know if a different structure would work better. I don't know if the model would perform just as well with a loose prompt and no constraints at all.

I think separating discovery from execution is sound. I think giving agents a task queue instead of a conversation reduces drift. I think structured pipelines protect human focus time in a way that ad-hoc prompting doesn't. But I hold all of that loosely, because I've been wrong before about why things work.

What I'm not going to do is claim I've found the answer. I'm building with the best understanding I have. I'm paying attention to what's working and staying open to the possibility that it's working for reasons I haven't identified yet.

That's the honest version. The discovery trap is real. Structured pipelines help. And the model might be doing more of the heavy lifting than any of us want to admit.