Tacnode
Back to Blog
Architecture

The Modern Data Stack Has a Coherence Problem

The modern data stack is good at making individual tables fresh. It’s bad at ensuring coherence across tables and systems when a decision reads all of them simultaneously. Three failure modes — preparation delay, cross-system retrieval inconsistency, snapshot incoherence under concurrency — get diagnosed as model or feature problems when they’re architectural problems.

TL;DR: The modern data stack is good at making individual tables fresh. It’s bad at ensuring coherence across tables and systems when a decision reads all of them simultaneously. Three failure modes: preparation delay (derived state lags raw events), cross-system retrieval inconsistency (five systems = five different points in time), and snapshot incoherence under concurrency (velocity counters miss concurrent writes). Teams diagnose these as model or feature problems — they’re architectural problems. Coherence requires infrastructure designed for it from the start.

9 min read · Architecture

The modern data stack is an engineering achievement. Teams can ingest petabytes from dozens of sources, transform them with dbt, warehouse them in Snowflake or BigQuery, and serve them through semantic layers to dashboards that refresh in seconds. The tooling has never been better.

And yet, the decisions that matter most — the ones made by automated systems in the moment a customer buys, a transaction clears, an agent acts — keep getting them wrong. Not because the data is missing. Not because the pipeline is slow. Because the stack was never designed for coherence.

---

What Coherence Means — and Why It’s Different from Freshness

Decision coherence means every piece of context a decision consumes reflects the same version of reality at the same moment in time. Unlike freshness (a per-table property measuring data currency), coherence is a cross-system property. A fraud check is coherent when the velocity counter, account balance, session signal, and device fingerprint all describe the same instant — not a patchwork of states from four different systems updated at different times.

When engineers talk about data quality, they usually mean freshness: how recently was this data updated? That’s a real problem, and the modern stack has made genuine progress on it. Streaming pipelines, near-real-time warehouses, and aggressive materialization schedules have compressed lag from hours to minutes, or even seconds.

But freshness is a per-table property. Coherence is a cross-system property.

A decision is coherent when every piece of context it consumes reflects the same version of reality at the same moment in time. A fraud check is coherent when the velocity counter, the account balance, the session signal, and the device fingerprint all describe the same instant — not a patchwork of states from four different systems, each updated at different times, each read at a different point in the query.

Most modern data stacks are very good at keeping individual tables fresh. They are almost universally bad at ensuring coherence across tables and systems when a decision has to read all of them simultaneously.

---

The Stack Was Built for Analysis, Not for Decisions

The modern data stack’s architecture reflects its original purpose: analytical reporting. You have a source of truth (the warehouse), a transformation layer (dbt or similar), and a consumption layer (BI tools, notebooks, dashboards). The flow is append-only, batch-friendly, and eventually consistent.

That model works fine when a human is the decision-maker. A dashboard showing last night’s revenue is coherent enough for a Monday morning review. Stale by a few seconds? Nobody cares.

The problem is that this architecture has been retrofitted — often ad hoc — into serving automated decisions that need correct context now. Product teams building recommendation engines, risk teams building fraud models, and AI teams building autonomous agents all end up pulling from the same warehouse or the same derived tables that were designed for dashboards. The tooling was built for analysis. The decisions need something different.

---

Three Ways the Modern Stack Loses Coherence

Why Teams Don’t See This as a Stack Problem

Here’s the frustrating part: teams usually diagnose coherence failures as model problems, feature problems, or freshness problems — not as architectural problems.

When a fraud model approves a transaction it should have blocked, the first instinct is to retrain the model, adjust the threshold, or improve the features. When an AI agent acts on wrong context, the first instinct is to improve the prompt, add memory, or switch models.

The interventions that should work — more data, better models, lower latency — don’t fix coherence failures, because the problem isn’t in any single layer. It’s in the gap between layers: the seam where independently-consistent systems have to be read together and their outputs treated as a unified picture of reality.

Coherence failures are invisible in the tooling. Your data observability platform will show green. Your feature store’s freshness metrics will look fine. Your latency dashboards will show sub-100ms reads. Everything looks healthy because every individual component is healthy. The incoherence only exists at the moment a decision assembles context from all of them simultaneously.

---

What Would a Coherence-Aware Stack Look Like?

A stack designed for decision coherence has three properties that the modern analytical stack lacks.

Single snapshot semantics across systems. A decision should be able to read all of its required context — transactional state, derived aggregates, streaming signals, vector representations — as of the same logical point in time. This is different from reading each system at “the latest.” It means the stack maintains a consistent snapshot that spans systems, so a decision sees a coherent view of reality rather than a patchwork of independently-current values.

Incremental materialization with bounded lag. Derived state — aggregates, features, rollups — should be maintained incrementally as events arrive, not recomputed on a batch schedule. The goal is not zero-lag (which is impossible for non-trivial transformations) but bounded lag: a guarantee that the context available at decision time is at most N milliseconds behind raw event arrival, where N is small enough to be within the validity window of the decision.

Concurrent write isolation that doesn’t sacrifice read performance. Under high concurrency, reads and writes must be isolated such that a decision sees either a fully committed write or no write — not a partial state. This is a standard database guarantee that most analytical systems relax for throughput. A decision-coherent stack restores it for the specific reads that feed automated decisions.

These properties are not exotic. They exist in database systems, though usually only within a single system boundary. The architectural challenge — and the reason the modern data stack hasn’t solved this — is providing them across the heterogeneous sources that real automated decisions consume.

---

The Coherence Problem Is Getting Harder

Three trends are making this worse.

AI agents read more context, from more systems, under tighter time constraints. A traditional fraud model might read five features from one system. A multi-agent orchestration system might read dozens of signals from a dozen systems, synthesize them, and act — all within a second. Each additional source multiplies the opportunity for incoherence.

Automated decisions are taking on higher-stakes actions. AI agents are increasingly being given the ability to take real-world actions: approving transactions, extending credit, executing trades, modifying customer state. The cost of acting on incoherent context is no longer a misfired recommendation — it’s a financial loss, a compliance violation, or a cascading error that’s hard to reverse.

Concurrency is increasing. As more decisions are automated and as systems scale, the window during which concurrent state changes can cause coherence failures grows. Fraud rings exploit exactly this: high-concurrency bursts designed to exploit the gap between when state changes and when derived context reflects it.

The modern data stack was not designed for this world. It was designed for a world where decisions are made by humans, who can tolerate staleness, who can recognize and correct inconsistencies, and who operate at a cadence that makes analytical eventual consistency acceptable.

---

Frequently Asked Questions

Coherence Is Not a Feature. It’s Infrastructure.

The instinct, when faced with a coherence problem, is to solve it in the application: add more aggressive cache invalidation, tighten replication lag, build a custom state synchronization layer. Teams do this, and it works — until it doesn’t, which is usually at the worst possible moment, under the highest possible load.

Coherence cannot be reliably provided by application logic on top of an architecture that was never designed to support it. It requires infrastructure that was designed for it from the start: a system that maintains a consistent, multi-modal view of state across sources, keeps derived context within bounded lag of events, and guarantees snapshot isolation for the reads that feed automated decisions.

This is what the modern data stack is missing. Not more speed. Not more data. Not better models. Coherence — the guarantee that when a decision is made, the context it reads describes the same world at the same moment in time.

Until the stack provides that guarantee, automated systems will keep making decisions on a world that doesn’t quite exist anymore.

---

Related reading: [Why Real-Time Decisions Fail](/post/why-real-time-decisions-fail) · [Context Silos](/post/context-silos) · [What Context Engineering Actually Means](/post/what-context-engineering-actually-means)

Modern Data StackDecision CoherenceContext LakeArchitecture
Xiaowei Jiang

Written by Xiaowei Jiang

Former Meta and Microsoft. Built distributed query engines at petabyte scale. Author of the Composition Impossibility Theorem (arXiv:2601.17019).

Ready to see Tacnode Context Lake in action?

Book a demo and discover how Tacnode can power your AI-native applications.

Book a Demo