What context engineering really means
A large language model only knows what sits inside its context window at the moment it answers. Context engineering is the craft of filling that window with the smallest useful superset of what the model needs — no more, no less. As the LangChain team puts it, it is the delicate art and science of filling the context window with just the right information for the next step
.
Most agent failures are not model failures. They are context failures: the right information was never retrieved, too much irrelevant information was pulled in, or critical facts were compressed away before the model needed them.
The four ways context goes wrong
- Too many tokens. Raw tool results (web search, PDFs, database dumps) can flood the history with tens of thousands of tokens the agent will never look at again — inflating cost and degrading quality.
- Needs more than fits. Some tasks genuinely require more information than a single context window can hold.
- Niche information. The critical detail is buried in one of a thousand files, and semantic search doesn't surface it.
- No learning loop. The agent keeps repeating the same mistake because useful corrections from users never make it back into its working memory.
Patterns that are working in production
Two of the clearest write-ups on context engineering come from the LangChain team and the Manus team. Together, they point at a consistent set of patterns that every serious AI builder is converging on.
1. Design around the KV-cache
For a production agent, the KV-cache hit rate is arguably the single most important metric — it drives both latency and cost. Cached input tokens on Claude Sonnet are roughly 10× cheaper than uncached ones. That means: keep your prompt prefix stable, make the context append-only, and avoid non-deterministic serialization that silently invalidates the cache.
2. Mask tools, don't remove them
As your action space grows, the temptation is to load tools on demand. Don't. Removing or adding tools mid-run invalidates the cache and confuses the model about past actions. Instead, keep the tool list stable and mask token logits so the model can only choose from the tools that are valid in the current state.
3. Use the file system as context
The most powerful idea: treat a filesystem as unlimited, persistent, agent-operable memory. Large tool results get written to files; the agent keeps only a pointer (a path, a URL, a summary) in its working context, and reads the full content back in only when needed. Claude Code's heavy reliance on ls, glob, and grepis not a quirk — it's a deliberate context-engineering choice.
4. Manipulate attention through recitation
Long-horizon agents drift. Manus solves this with a simple trick: the agent maintains and rewrites a todo.mdat every step. That pushes the plan back into recent attention, fighting the classic “lost in the middle” problem without any architectural change.
5. Keep the wrong stuff in
It's tempting to hide failed actions and retry cleanly. Resist it. When the model sees the failed call and the resulting error, it updates its own priors and stops repeating the mistake. Recovery from failure is one of the clearest signals of a truly agentic system.
6. Don't get few-shotted
LLMs are mimics. If your context is full of near-identical action/observation pairs, the model will keep doing the same thing even when it's no longer the right call. Inject small amounts of structured variation to break the pattern.
Why this matters for businesses
For a business evaluating AI, the practical takeaway is this: the quality of an AI product is rarely bottlenecked by the underlying model. It is bottlenecked by how carefully the system curates, compresses, and recalls the right information at each step. Two teams using the exact same model can ship wildly different products based on how well they engineer context.
That's the work. Not prompt tricks. Not model worship. A quiet discipline of shaping context, one step at a time, until the agent becomes reliable.