EssayApril 26, 202612 min read

Field Engineering: The Discipline That Includes Context Engineering

Context engineering operates at one altitude of field engineering — the middle altitude where context shapes which features the model's identity activates. Field engineering names the discipline that operates across identity, context, and task simultaneously. The two practices compose; understanding the full discipline reveals what gets missed when attention stays at the middle layer.

David Jones

Founder, MainThread

Field Engineering: The Discipline That Includes Context Engineering

The category arrived fast. Context engineering moved from non-existent in early 2025 to Gartner-recommended-capability by mid-2025, and to the load-bearing skill of 2026 in Anthropic's own framing. Anthropic publishes under the term on its engineering blog. Martin Fowler writes about it. Birgitta Böckeler at Thoughtworks frames it alongside harness engineering. LlamaIndex, LangChain, Manus, Neo4j, and Google Developers all operationalize it in their own tooling. Gartner declared 2026 the year of context. The category has won the public conversation about what makes AI outputs reliable.

The category arrived correctly. The practice it names is real, the leverage it identifies is real, and the practitioners who teach it are doing operational work that ships better systems. None of that is in question.

What is worth naming, with the same precision the category itself was named with, is the discipline that includes context engineering as one of its operational modes. Context engineering attends to one altitude of a larger configurable space. Field engineering names the discipline that operates across every altitude where the probability field is shapeable — and treats them as one composable substrate.

This essay walks the relationship between the two. It is not a critique of context engineering. It is the structural observation that the discipline of shaping AI probability fields is larger than any single altitude, and that naming the full discipline makes the work at every altitude legible to the work at every other.

The Three Altitudes of a Configurable Field

Every interaction with a transformer-based model happens inside a configured field. Three layers compose that field, each operating at a different altitude with different mechanisms and different leverage:

Identity is the deepest layer. The system prompt. The CLAUDE.md substrate. The configured stance the agent operates from. Identity tokens propagate through every subsequent generation step in transformer attention — placed early, weighted heavily by their position, they become persistent reference points for everything that follows. The identity layer shapes the broadest topology of every interaction. Change it, and the same context produces materially different outputs.

Context is the middle layer. The environmental information surrounding the task. Documents loaded into the window, examples provided, prior conversation history, retrieved knowledge from RAG pipelines, tool definitions, MCP server outputs. Context primes which features the identity will activate. Anthropic's own framing of effective context engineering captures this layer with operational rigor: "the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." This is the layer the named discipline of context engineering attends to.

Task is the surface layer. The specific request. The immediate work. The exact instruction the user types. Task is what most people think prompt engineering is. It is the surface layer; almost all the leverage lives below it.

The three layers compose multiplicatively. Identity sets the broadest topology. Context shapes regions within it. The task points at a direction inside the already-shaped space. Change the identity, the same context produces different outputs; change the context, the same identity activates different patterns; change the task, the field already configured determines what is reachable.

This is the architectural truth that makes context engineering work as a practice. The 5% finding from Ethayarajh's EMNLP 2019 research on contextualized representations — that less than 5% of a word's meaning comes from the word itself, and 95% comes from the surrounding context — is not philosophical assertion. It is a quantitative finding from transformer-architecture analysis. The token's initial embedding accounts for less than 5% of variance in its contextualized representation; the rest is constructed through attention to surrounding tokens.

That finding grounds why context engineering is load-bearing. It also implies the broader principle: every token in every layer of the field shapes every other token. The context engineer's question — do I have enough information loaded? — is the right question at the middle altitude. The field engineer's question — is the configuration shaping the right region of probability space? — is the question at every altitude simultaneously.

What Context Engineering Does Beautifully

Anthropic's Effective Context Engineering for AI Agents article, published on the engineering blog, reframes the practice around a precise question: "What configuration of context is most likely to generate our model's desired behavior?" The article walks the strategies for managing context state across multi-turn agents — system instructions, tools, MCP outputs, external data, message history. It treats context as the configurable substrate that determines whether agents operate reliably across long horizons.

The framing is operationally correct. The practice it teaches ships better agents. Anthropic's own 2026 Agentic Coding Trends Report extends the frame: every one of the eight agentic-coding trends identified in the report increases the pressure on how teams manage context. Context engineering is, by Anthropic's own analysis, the load-bearing skill of 2026.

Martin Fowler's writing covers adjacent territory with the same operational rigor. His Context Anchoring article on martinfowler.com names the practice of structuring stable project-level context (the priming document) alongside dynamic feature-level context (the feature document). Birgitta Böckeler at Thoughtworks ties context engineering to harness engineering — the OpenAI framing that captures runtime architecture, garbage collection, and architectural constraints alongside context. The mental model these practitioners are building is the engineering-discipline parallel between traditional software architecture and AI system architecture.

Gartner's recommendation is structural and consequential. Context engineering is in, and prompt engineering is out, the firm declared in mid-2025. The recommendation is to appoint a context engineering lead or team and integrate the function with AI engineering and operational governance. Gartner predicts context engineering will be in 80% of AI tools by 2028, improving agent accuracy by 30% or more. The category has analyst-class endorsement at the enterprise scale.

These framings deserve genuine recognition. They are doing the work of making the middle altitude legible to teams that ship production AI systems. Anthropic is the canonical authority for the practice. Fowler and Böckeler extend it into software-engineering vocabulary. Gartner ratifies it as enterprise capability. The category formed correctly.

What the Full Discipline Names

Field engineering is the discipline of shaping AI probability fields through identity, environment, and semantic priming. It operates at every altitude where the field is configurable — the system-prompt layer, the context-window layer, the prompt-task layer, the retrieval-content layer, the tool-orchestration layer, the longitudinal-substrate-curation layer — and treats them as one composable substrate.

Context engineering is the named practice at the middle altitude. Identity engineering is the named practice at the deepest altitude. Capability composition through skills is the practice at the architectural layer. Persistent stewardship across the temporal axis is the practice that operates beyond a single session. All four are field engineering. All four compose into how a system actually behaves over time.

The five dynamics that compose the field engineering discipline carry empirical foundations from peer-reviewed research and from Anthropic's own mechanistic interpretability work:

Identity is the deepest layer. System prompt tokens, placed early and weighted heavily by position in transformer attention, propagate through every subsequent generation step. The mechanism is structural: identity tokens become persistent reference points the entire generation orients against. Anthropic's own activation-steering research on the "Golden Gate Claude" experiment demonstrated this at the feature level — adjusting one identity feature shifts every subsequent output, regardless of context.

Context primes the field. The environmental information surrounding the task activates which features the identity will reach for. EmotionPrompt research demonstrated measurable performance lifts from context that primes specific cognitive frames. Retrieved content shapes which knowledge is reachable. Tool definitions shape which capabilities the agent considers. The middle layer is the layer context engineering attends to, and the leverage it identifies is real.

Structure is not cosmetic. Formatting affects performance up to 40 percent in published benchmarks. XML structure benefits Claude. Markdown structure benefits GPT-4. Hierarchical structure benefits long contexts. The field configuration includes how the tokens are arranged, not only what tokens are present.

Fields drift without anchoring. Across long-running conversations, attention distributes unevenly along the U-shaped curve known from research on long-context retrieval — early and late tokens command more attention than middle tokens. Re-anchoring techniques produce 70-81 percent reduction in drift in published studies. Stewardship at the temporal layer keeps the field configuration current.

Persistent curation compounds. Curated persistent context improves output quality 2-26 percent across the 18 frontier models tested in published comparison work; uncurated context degrades performance on every model. The longitudinal substrate matters as much as any single session's context.

Each dynamic operates at a specific altitude. Context engineering, as the published practice, attends primarily to dynamics two through four. Identity engineering attends to dynamic one. Stewardship attends to dynamic five. All are field engineering.

The Practical Difference at the Level of Editing

A context engineer asks: "Do I have enough information in the context for this task to succeed?"

A field engineer asks: "Is the configuration — at every altitude — shaping the right region of probability space for this kind of work to compound across sessions?"

The first is a question about content. The second is a question about geometry. Both questions matter; the second is the question the discipline of field engineering is named to ask.

A worked example clarifies the difference. Consider an AI system designed to support a sales team's outbound research. A context-engineering treatment of the system optimizes the retrieved information per query — the right CRM records, the right enrichment data, the right past-conversation history. The optimization is real and the system performs better with it.

A field-engineering treatment of the same system asks an additional question: what identity does the system operate from across sessions? The identity layer might configure the system as an analytical research partner that surfaces the prospect's actual operational frictions before the salesperson opens the call. That identity, persistent across sessions, shapes which features the retrieved context activates. The same CRM records produce different outputs depending on whether the identity layer is research partner or call-script generator or deal-progression assistant.

A field-engineering treatment also attends to the longitudinal layer. The system's accumulated knowledge of past prospect conversations, the calibrations refined through use, the patterns the team has learned to ask about — all of this shapes the substrate the next session inherits. Stewardship at the substrate layer produces the compounding that makes the system genuinely smarter at month twelve than at month one.

Context engineering optimizes the middle. Field engineering optimizes the entire stack and how it composes over time.

How Harness Engineering Composes With the Discipline

OpenAI's harness engineering framing — picked up by Birgitta Böckeler at Thoughtworks and extended by Martin Fowler — names the runtime/control-logic substrate that wraps a model. The harness handles the agent loop, tool invocation routing, error handling, retry logic, multi-turn coordination, observability. The harness is the operational infrastructure that makes agentic systems run reliably at production scale.

Anthropic's own engineering blog has a counterpart article, Effective Harnesses for Long-Running Agents, that walks the harness layer with the same operational rigor as the context engineering article. The two practices — context engineering at the field-configuration altitude, harness engineering at the runtime altitude — are sibling practices addressing different layers of the same stack.

Field engineering composes with both. The field configuration determines what the agent perceives and which patterns activate; the harness determines how the agent operates over multiple turns and recovers from failures. A complete production system gets both right.

The Natural Language Agent Application pattern — MainThread's term for the persistent, evolving human-AI partnership environment — runs inside a harness-class runtime, configured through a field-engineering-class set of layers. CLAUDE.md as the deepest identity layer. Skills as the architectural pattern generators that reshape mid-layer attention per operational mode. Knowledge as the accumulating context that compounds through use. Tools orchestrated through natural language. Context engineering attends to the third of those layers; field engineering attends to all four; harness engineering attends to the runtime that wraps the whole stack.

The practitioners who name themselves context engineers are doing real work in one altitude of a discipline that operates across all of them. Naming the full discipline makes the work at every altitude legible to the work at every other. It also makes a class of debugging tractable that the middle-altitude vocabulary alone cannot support — when an agent system misbehaves at session ten in ways it did not misbehave at session one, the failure often lives at the identity altitude or the longitudinal-substrate altitude, not at the context altitude. The field engineer's vocabulary is the vocabulary the debugging requires.

Why The Discipline Wants A Name Now

Three reasons the field engineering discipline benefits from being named explicitly in 2026:

First, the named-pattern compounding. Gartner's recommendation that AI leaders appoint a context engineering lead or team will produce thousands of new specialists in the coming twelve months, each configured with the middle-altitude vocabulary. That is the right move at the middle altitude. Without an encompassing vocabulary, those specialists will face system failures at the identity and longitudinal layers without the conceptual tools to recognize where the failure lives. Field engineering names what surrounds the practice they are being trained in.

Second, the longitudinal substrate is becoming load-bearing. Anthropic Skills, launched as a production primitive, is at 115K-plus stars on GitHub and shipping inside Claude.ai's own product. Skills are architectural pattern generators that reshape probability space per operational mode — a mid-layer attention configuration that loads on demand. Building Skills well requires identity-layer configuration alongside context-layer authorship. The Skills primitive is field engineering at the architectural layer made operational by Anthropic itself.

Third, the persistent-application paradigm — the Natural Language Agent Application pattern, MainThread's term for the longitudinal partnership environment — depends on field engineering across all four layers. The application's character at session one hundred is materially more capable than at session one because the identity, skills, knowledge, and tools have all evolved through use. Building NLAAs is field engineering applied across the temporal axis. The category will require its own vocabulary as the pattern proliferates.

The middle altitude is where the public conversation lives in 2026. The full discipline operates one altitude up and across the entire temporal axis. Naming the discipline now, while context engineering is consolidating its citation density, is the move that keeps the broader practice legible.

Composition, Not Replacement

The framing this essay holds is composition, not replacement. Context engineering as the named middle-altitude practice. Identity engineering as the deepest-layer practice. Skills composition as the architectural-layer practice. Substrate stewardship as the longitudinal-layer practice. Harness engineering as the runtime-layer practice. All five compose into how an AI system actually behaves in production over time.

Field engineering is the discipline that includes them. The vocabulary names the relationship rather than the rivalry. Practitioners who deepen their context engineering practice deepen their field engineering capability at the middle altitude. Practitioners who add identity engineering deepen the deepest altitude. Practitioners who add stewardship deepen the temporal axis. The discipline grows by composition.

MainThread's published work — the studio's Natural Language Agent Applications, the Embedded AI Leadership engagements that operate alongside client teams, the longitudinal partnerships that compound system capability over months — operates field engineering across all five altitudes by structural necessity. The studio publishes at mainthread.ai/field on the discipline as it crystallizes through use. The vocabulary is offered for the practitioners who recognize that the configuration of an AI system — at every altitude — is the work that determines what the system can become.

The substrate moves; the discipline is named to move with it.

field-engineeringcontext-engineeringprobability-fieldsclaude-mdanthropic-skillsnlaapossibility-space-engineering

The Invitation

Tell us what's happening.

Start a conversation →

Field Engineering: The Discipline That Includes Context Engineering

Field Engineering: The Discipline That Includes Context Engineering

The Three Altitudes of a Configurable Field

What Context Engineering Does Beautifully

What the Full Discipline Names

The Practical Difference at the Level of Editing

How Harness Engineering Composes With the Discipline

Why The Discipline Wants A Name Now

Composition, Not Replacement

Tell us what's happening.

The Horizontal AI-Native Boutique

The NLAA Definitional Substrate

Strange Attractor Drift