What is the preparation gap in real-time AI systems?

The preparation gap is one form of the context gap: when derived context — computed features, aggregates, velocity counts, vector embeddings — lags the raw events that change the underlying state. A fraud model reads a velocity counter that reflects transactions from 3 seconds ago because the feature pipeline hasn't processed the latest events yet. The preparation gap is a property of how derivation is done: when computation happens outside the serving path (in a separate Flink job, a batch pipeline, an ETL process), there is always a window where derived context doesn't match current raw state.

How does semantic context close the preparation gap?

Semantic context closes the preparation gap by moving derivation inside the transactional boundary where raw state lives. When a feature or aggregate is computed inside the same system that holds raw data — not in a separate pipeline that syncs back — the derived value is always consistent with the state it was derived from. There is no sync gap: a velocity counter updated by an incremental materialized view reflects the same transactions visible to any concurrent read. This is the semantic property: structured features, vector embeddings, and LLM-derived signals are all computed inside Tacnode, not in external systems with their own propagation lag.

What is the difference between a feature store and semantic context?

A feature store pre-computes features in a separate pipeline and caches them for serving. The context gap appears when state changes faster than the pipeline refreshes: the feature reflects a past state, and any decision against it runs against a version of reality that no longer exists. Semantic context is not a caching layer — it is a derivation layer inside the same transactional boundary as raw state. Features, aggregates, and embeddings are computed incrementally as data arrives, not batch-refreshed on a schedule. The key distinction: a feature store reads from raw state to produce derived state (with a sync gap between them); semantic context computes derived state inside the system where raw state lives (no sync gap).

Can vector search and structured queries run against the same consistent snapshot?

Yes — when vector embeddings and structured data live inside the same transactional boundary. In systems where vectors are stored in a separate vector database and structured data in an operational store, a query that joins both will read each at a different snapshot, introducing cross-snapshot inconsistency (a form of the retrieval gap). Semantic context in Tacnode places vector indexes and relational data inside the same system: a query can filter by structured predicates and rank by vector similarity under one consistent snapshot, with no fan-out to separate systems and no risk of cross-snapshot divergence.

The Pillars

Shared·Live·Semantic

Features, embeddings, and AI signals — one system

A Context Lake is semantic — features, aggregates, vector search, and LLM-derived signals are all first-class context, computed and served within the same system as raw state.

Features, embeddings, and LLM signals each require different infrastructure to compute and query. When they live in separate systems, combining them at decision time means assembling results across different APIs — and no guarantees they agree.

No unified retrieval

3 separate APIs. 3 separate consistency models. Cross-system queries impossible at decision time.

Feature Store

Structured / aggregates

query:SQL / SDK

Vector Database

Embeddings / similarity

query:ANN / REST

LLM Pipeline

Semantic signals

query:async / batch

Raw stateuser_events, transactions, inventory

Three Kinds of Semantic Context

Derived context is not monolithic. Structured features, vector embeddings, and LLM-inferred signals have different computational models and different query semantics — but they all describe the same underlying state and should be consistent with it.

Structured

Features & Aggregates

Deterministic, SQL-derived context: rolling windows, ratios, velocity counts. Precise and reproducible. Filterable with exact predicates.

avg_order_value_7d

fraud_velocity_1h

demand_score

user_ltv

Embedding

Vector Representations

Dense vector encodings of objects or queries. Enable similarity search and nearest-neighbor retrieval — without exact predicates.

product_embedding

user_preference_vec

doc_embedding

query_vec

LLM-Derived

Model-Inferred Signals

Probabilistic interpretations that can't be expressed as SQL: classifications, sentiment, intent labels, entity extraction. Computed by a model, stored as context.

intent_label

sentiment_score

topic_cluster

entity_tags

Unified Derivation: All Signals, One Boundary

When derivation happens inside the same transactional boundary as raw state, derived signals are always consistent with the state they were computed from. No separate systems. No sync pipelines. No cross-system coordination.

SELECT avg(order_value) FROM orders WHERE user_id = ? AND ts > now() - interval '7d'

Aggregates and features computed directly over transactional state — no separate feature store.

One system. One consistency model. All signal types queryable and filterable together.

Selecting the Right Context at Query Time

Semantic context isn't only about how signals are constructed — it's also about how agents select and filter what they need. The three signal types have different selection models:

StructuredExact predicates

demand_score > 0.7, category = 'electronics', ts > now() - interval '1h'

EmbeddingSimilarity threshold or ranking

vec <-> query_vec < 0.3, ORDER BY vec <-> query_vec LIMIT 20

LLM-derivedStored signal as predicate

intent_label = 'high_purchase', sentiment = 'negative', topic IN ('billing', 'refund')

When all three live in one system, agents can compose these selection models in a single query — structured filters narrow the candidate set, vector similarity ranks it, LLM-derived labels apply semantic conditions. The result is exact enough to be actionable, and flexible enough to express meaning that no individual signal type can capture alone.

What Semantic Context Actually Requires

Unifying derived context isn't a query routing problem. It requires derivation and serving to share a transactional boundary.

Consistent Derivation

Derived signals are computed against the same state snapshot the query reads — no skew

—Features computed in a separate job run on a different snapshot than the serving layer reads

On-Demand or Pre-Computed

Semantic context can be computed on-demand at query time, or pre-computed and kept current via incremental materialized views — the choice is yours, within the same system

—Pre-computation requires a separate pipeline; on-demand requires a separate query layer. Mixing them means managing two systems with different consistency models

Unified Query Surface

Structured aggregates, vector search, and semantic signals joinable in a single SQL expression

—Each derivation type requires a separate API call, assembled in application code with no atomicity

Semantic Atomicity

An agent reads raw state and its derived interpretation in the same atomic read — they can't diverge

—State and derived signals fetched separately — a write between the two calls causes inconsistency

See how Tacnode unifies all derived context

Structured queries, vector search, and LLM-derived signals — all inside one transactional boundary, computed from the same state.

Book a Demo Explore the Context Lake