Every retrieval pattern an agent needs — point lookups, range scans, aggregations, full-text search, vector similarity — must be available from a single system. Not because consolidation is a goal, but because serving multiple patterns from one system is the only way to guarantee they reflect the same state. If an agent must query Redis for the balance, ClickHouse for the aggregate, and Pinecone for the similarity match, consistency is architecturally impossible. If all three patterns are served from the same transactional boundary, consistency is a property of the system, not a hope.

Context must reflect the current state of the world at decision time, not a snapshot from minutes ago. This means continuous ingestion from systems of record via change data capture (CDC), with derived context (materialized views, feature computations, embeddings) updated incrementally as source data changes. The freshness requirement varies by context type. User preferences can be hours old. Account balances must be current to the second. Velocity counters for fraud detection must reflect every transaction, including those that arrived milliseconds ago. Context infrastructure must support these different freshness requirements within a single system.

When an agent reads context, every value must reflect the same moment in time. This is the transactional snapshot guarantee — the property that composed stacks fundamentally cannot provide. Consistency is most critical under concurrent load. When multiple agents and multiple transactions operate on shared state simultaneously, each agent must see a coherent view of reality, not a partially-updated state corrupted by in-flight modifications.

Enterprise decisions happen under tight time constraints. A fraud check must complete in under 200 milliseconds. A credit decision must return before the user's session times out. An eligibility check must resolve before the API call that triggered it completes. Context infrastructure must serve complex, multi-pattern queries at these latencies — not just simple key-value lookups, but aggregations, joins, and similarity searches that would normally require separate analytical systems with separate latency profiles.

How is context infrastructure different from a feature store?

A feature store manages ML features for training and serving. Context infrastructure is broader — it serves all retrieval patterns an agent needs (not just feature vectors), maintains transactional consistency across those patterns, and provides freshness guarantees on derived state. A feature store is one component of context infrastructure, not a replacement for it.

How is context infrastructure different from RAG?

RAG (retrieval-augmented generation) is a pattern for injecting retrieved information into an LLM's context. Context infrastructure is the system that provides the retrieved information. RAG tells you to retrieve context before generating a response. Context infrastructure determines whether what you retrieve is fresh, consistent, and complete. You need both — RAG is the technique, context infrastructure is the layer that makes it reliable.

Do I need context infrastructure if I'm using LangChain or LlamaIndex?

LangChain and LlamaIndex are agent orchestration frameworks — they manage the flow of context through your agent's logic. They don't provide the underlying data guarantees. You still need to decide where your context lives, how fresh it is, and whether concurrent reads see consistent state. Context infrastructure sits underneath your orchestration framework, not alongside it.

What is a Context Lake?

A Context Lake is the system architecture that implements context infrastructure. It provides an operational context layer that maintains a complete, consistent, and current decision context derived from systems of record. It's PostgreSQL-compatible, serves all retrieval patterns from a single transactional snapshot, and is purpose-built for real-time decision systems. The formal treatment is in this paper.

Back to Blog

Context Engineering

From Context Engineering to Context Infrastructure

Q: What is context infrastructure?

Context infrastructure is the data layer that serves decision context to AI agents and automated systems. It ingests operational state from systems of record, computes derived context (aggregates, features, embeddings), and serves it under guarantees of completeness, currency, consistency, and performance. It sits between your source databases and your agent logic — the infrastructure that makes context engineering work at scale.

Context engineering has become the defining discipline of AI agent development. But the conversation is missing a layer. Techniques for structuring context are well-understood. The infrastructure that makes context complete, consistent, and current at decision time is not.

Alex Kimball

Product Marketing

Mar 19, 2026

14 min read

Context Engineering Is Solved. Context Infrastructure Is Not.

2025 and 2026 have been the years of context engineering. Anthropic published Effective Context Engineering for AI Agents. Gartner declared it the successor to prompt engineering. LangChain, LlamaIndex, and every agent framework now has context management primitives. The techniques — prompt structuring, tool design, memory types, retrieval strategies — are well-documented.

This is genuine progress. Two years ago, teams building AI agents were guessing at how to manage context. Now there's a discipline with shared vocabulary, known failure modes, and established patterns: - Context window management and context rot — how model performance degrades as token count rises, and strategies for keeping context lean (Anthropic's guide covers this well) - System prompts, tool design, and few-shot examples — the building blocks of what the agent sees and how it reasons - Retrieval and RAG — finding and injecting relevant information from knowledge bases and vector stores - Memory architecture — short-term, long-term, and state memory for persistence across sessions (we wrote a detailed breakdown in What Is Context Engineering? and AI Agent Memory Architecture Explained) - Structured output — ensuring agent outputs are consumable by downstream systems and other agents - Observability and evaluation — tools like LangSmith for tracing agent calls and measuring whether context is actually helping

If you're building an AI agent and need to learn these techniques, start with Anthropic's post and the guides from LangChain and LlamaIndex. They're excellent.

But something is missing from the conversation.

Every context engineering guide assumes the context is available. That it's fresh. That when you retrieve a customer's balance, a risk score, and a velocity counter, they all reflect the same moment in time. That when two agents act on the same account simultaneously, they see the same state.

These are infrastructure assumptions, not technique assumptions. And in production, they break.

The Gap Between Technique and Infrastructure

Here's a simplified view of what context engineering guides tell you to build: The technique stack: - System prompt with agent identity and constraints - Tool calls to external systems for dynamic data - RAG pipeline for knowledge retrieval - Memory management (short-term, long-term, state) - Structured output for multi-agent coordination

This is necessary. It's also insufficient.

The technique stack tells you how to ask for context. It doesn't address where context lives or what guarantees it provides. That's the infrastructure layer.

The infrastructure questions no one is answering: | Question | Why It Matters | |---|---| | Where does derived context live? | Aggregates, velocity counters, risk scores — context that must be computed from raw events — need a home. Not the source database. Not a cache. A system that computes and serves derived state. | | How fresh is "fresh enough"? | A balance read from a cache that's 200ms stale might approve a transaction that should be blocked. Context infrastructure must provide freshness guarantees, not best-effort. | | What happens under concurrent load? | When 50 agents read the same account state while 10 transactions are modifying it, do they all see a consistent snapshot? Or do some see partially-updated state? | | How do you serve multiple retrieval patterns from one consistent snapshot? | A single decision might need a point lookup (account balance), an aggregation (30-day transaction velocity), and a similarity search (behavioral pattern match). If these come from three different systems, they reflect three different moments. |

These are not questions about prompt design or memory management. They're questions about data infrastructure. And the current answer from most teams is: glue together Redis, a vector store, a feature store, and an OLAP database, and hope the propagation delays don't matter.

They matter.

Why Composed Stacks Can't Provide Context Infrastructure

The default architecture for serving agent context today looks something like this: - Redis or Memcached for low-latency key-value lookups (account balances, session state) - Pinecone or Weaviate for vector similarity search (behavioral patterns, semantic retrieval) - Feast or Tecton for feature serving (ML features, risk scores) - ClickHouse or Snowflake for analytical queries (aggregates, velocity counts) - Kafka connecting all of them via event streams with varying propagation delays

Each system is excellent at its job. The problem is compositional.

When an agent needs to make a decision, it queries multiple systems. Each system reflects a different moment in time — the lag between when an event occurred and when each system processed it. Redis might be 50ms behind. The feature store might be 2 seconds behind. The analytical database might be minutes behind. The vector store was last synced an hour ago.

The agent receives context that looks complete — it has a balance, a risk score, an aggregate, and a similarity match. But these values never coexisted in reality. The agent is making a decision based on a fiction: a composite snapshot assembled from multiple independent timelines.

This is not a theoretical problem. It's the context gap — the structural limitation of composed data stacks when applied to real-time decision systems. The gap exists because independently operated systems cannot provide a transactionally consistent snapshot across retrieval patterns without explicit coordination, and that coordination is precisely what composed stacks are designed to avoid.

The implication: you cannot build reliable context infrastructure by composing systems that weren't designed to provide it. Better prompts won't fix this. More sophisticated retrieval patterns won't fix this. The consistency guarantee must come from the infrastructure layer itself.

What Context Infrastructure Actually Requires

Context infrastructure is the data layer that sits between your systems of record and the automated systems that make decisions — AI agents, fraud engines, credit decisioning pipelines, real-time pricing services, eligibility checks. It ingests operational state from source systems, computes derived context (aggregates, features, embeddings), and serves it under guarantees that composed stacks can't provide.

The four requirements:

Context Engineering + Context Infrastructure = Reliable AI Agents

The relationship between context engineering and context infrastructure is not either/or. They're complementary layers of the same problem: | Layer | What It Solves | Who Builds It | |---|---|---| | Context engineering | What information the agent needs, when, and in what format. Prompt design, tool selection, memory management, retrieval strategy. | AI engineers, agent developers | | Context infrastructure | Where context lives, how it's kept fresh, and what guarantees it provides. Ingestion, derived state computation, multi-pattern serving, consistency. | Data/platform engineers |

Context engineering without context infrastructure produces agents that work in demos but fail under production load — because demos don't have concurrent state modifications, stale caches, or propagation delays.

Context infrastructure without context engineering produces a data layer that no one uses effectively — because the agent-side design (what to retrieve, how to format it, when to refresh) determines whether good infrastructure translates to good decisions.

The teams building the most reliable AI agent systems today are investing in both layers. They're applying Anthropic's context engineering patterns to design their agent logic, and they're building (or adopting) purpose-built context infrastructure to ensure the data underneath is complete, current, and consistent.

The Context Infrastructure Pattern

The architecture that satisfies these requirements is what we call a Context Lake — a system that provides an operational context layer between systems of record and the automated decisions that depend on them.

The pattern: 1. Ingest from systems of record via CDC — continuous, not batch. Operational state flows in as it changes. 2. Compute derived context inside the transactional boundary — incremental materialized views for aggregates, feature computations, and embeddings that update as source data arrives. 3. Serve all retrieval patterns — point lookups, aggregations, full-text search, vector similarity — from a single consistent snapshot. Every query reflects the same moment in time. 4. Guarantee freshness, consistency, and performance as system properties, not application-level aspirations.

This is not a new database. It's a new architectural layer — purpose-built for serving decision context to any automated system that must act on a complete, consistent, and current view of operational state. AI agents are one consumer. Fraud detection engines, credit decisioning services, real-time pricing systems, and eligibility checks are others. The common requirement is the same: the decision must reflect reality at the moment it's made.

When You Need Context Infrastructure (and When You Don't)

Not every AI application needs dedicated context infrastructure. The decision depends on the consequences of context failure.

You probably don't need it if: - Your agent operates on static knowledge bases (documentation, FAQs) - Decisions are advisory (suggestions, summaries) rather than authoritative (approvals, blocks, commits) - Context staleness of minutes or hours is acceptable - A single retrieval pattern (e.g., vector search only) is sufficient - Agents operate independently — no shared mutable state

You probably need it if: - Decisions have financial, operational, or safety consequences - Multiple agents or processes modify shared state concurrently - Decisions must act on derived context (aggregates, velocity counters, risk scores) that lags behind source events - The validity window is tight — the decision must be made before the context it depends on changes - You're currently gluing together 3+ systems (cache + vector store + analytics DB + feature store) and discovering that propagation delays cause incorrect decisions

The second list describes most production fintech systems, fraud detection pipelines, real-time pricing engines, and enterprise AI agent deployments. It also describes any system where the context gap — the delay between when an event occurs and when derived context reflects it — has business consequences.

How Do You Know If Your Context Infrastructure Is Working?

Context engineering has good observability tooling — LangSmith, Braintrust, and others let you trace agent calls and evaluate whether retrieved context led to good decisions. But these tools observe the technique layer: what went into the prompt, what came out, whether the agent succeeded.

Context infrastructure needs its own observability: - Freshness monitoring. What's the actual delay between a source event and when derived context reflects it? Not the design target — the measured reality. If your freshness SLA is 100ms but the p99 is 2 seconds, your agents are making decisions on stale context 1% of the time. - Consistency auditing. When an agent reads context from multiple retrieval patterns, did all values reflect the same snapshot? In composed stacks, this is nearly impossible to measure because each system has independent state. In a unified context layer, it's a system property you can verify. - Context gap alerting. When the gap between event time and derived-context-ready time exceeds the validity window for a decision, that's a context infrastructure failure — even if the agent itself ran perfectly. The decision was correct given what the agent saw; what the agent saw was wrong.

This is the bridge between context engineering observability ("did the agent use context well?") and context infrastructure observability ("was the context itself trustworthy?"). Both matter. Most teams only measure the first.