Incremental Materialized View: How to Keep Derived State Fresh in Real Time
Standard materialized views recompute everything on a schedule. Incremental materialized views apply only the delta — continuously, as data changes. Here's how they work across Postgres, ClickHouse, Databricks, and streaming databases, what each approach can and can't do, and where they all hit the same structural limit.
TL;DR: Standard materialized views recompute everything on a schedule — fine for dashboards, dangerous for decisions. Views that support incremental refresh apply only the change, continuously. Postgres doesn't support incremental refresh natively (pg_ivm is experimental). ClickHouse has two flavors with different consistency tradeoffs. Databricks performs incremental refresh on batch cadences over delta tables. Streaming databases (Materialize, RisingWave) do true continuous maintenance. All of them solve the preparation gap — keeping derived state fresh. None of them solve the retrieval gap — serving that fresh derived state alongside raw state in one consistent snapshot under concurrent load.
You define a materialized view — a velocity counter, a rolling balance, an exposure aggregate — and everything works. Then load increases. Transactions arrive concurrently. Your `REFRESH MATERIALIZED VIEW` runs every five minutes, and in those five minutes, a fraud check reads a velocity counter that's three transactions behind. The fraud passes. The funds move. By the time the view refreshes, the window has closed.
This is the preparation gap: derived state that must be computed from raw events lags behind the events themselves. The materialized view was correct when it was last refreshed. It's wrong now.
The fix is incremental refresh — applying only the delta as source data changes, rather than recomputing the entire result on a schedule. The concept is simple. The implementations vary wildly.
What Is an Incremental Materialized View?
A materialized view is a precomputed query result stored as a table. You define it as a SQL query — an aggregation, a join, a window function — and the database stores the output. Queries hit the stored result instead of recomputing from the underlying data every time.
The problem is how that stored result stays current as source data changes.
In a standard materialized view, a refresh operation reruns the entire query against the base tables and replaces the stored result. This default refresh behavior is fine when source data changes slowly and staleness is measured in hours. It breaks when the derived state is consumed by automated systems making real-time decisions — fraud velocity counters, credit exposure aggregates, session-bound eligibility checks — where "five minutes stale" means wrong.
An incremental materialized view maintains the stored result by applying only the delta. When new rows are inserted into the source table, the view updates only what changed: increment the counter, adjust the sum, recompute the affected partition. No full rescan. The cost is proportional to the volume of changed data, not the entire dataset.
This distinction matters at scale. If your source table has 100 million rows and 50 new transactions arrive, a full refresh rereads 100 million rows. An incremental refresh processes only the changed data — the view is incrementally refreshed rather than rebuilt from scratch. Under high write throughput, the difference between these two approaches is the difference between a view that's current and a view that's perpetually catching up.
Why Incremental Refresh Matters for Real-Time Systems
Materialized views were originally designed for data warehouse workloads and analytical dashboards — precompute expensive aggregations so the dashboard loads fast. Staleness measured in hours was acceptable because a human was reading the result on a Monday morning.
That contract has changed. Today, materialized views are consumed by automated decision systems that act on derived state in real time. The default refresh that runs overnight in a data warehouse is too slow when decisions happen in milliseconds:
Fraud velocity counters — transaction count per account over the last 60 seconds, evaluated during every card authorization
- Credit exposure aggregates — total outstanding balance per borrower, checked before every new approval
- Portfolio positions — current holdings computed from trade history, queried immediately after every execution
- Session-bound eligibility — real-time qualification status during checkout, based on aggregated behavior within the session
- Risk signals — derived features combining multiple event streams into a single score, consumed by ML models at inference time
In every case, the decision reads derived state and acts on it — often irreversibly. A stale velocity counter means fraud passes. A stale exposure aggregate means a credit limit gets exceeded. A stale portfolio position means the user sees a balance that doesn't reflect what just happened.
The failure mode isn't a slow dashboard. It's a wrong decision. The underlying data changed, but the materialized view refreshes hadn't completed before the decision evaluated it.
This is what makes incremental refresh an infrastructure requirement rather than a performance optimization. When derived state feeds automated decisions, the freshness of that state is a correctness property, not a latency metric. You need to optimize materialized views for decision-time correctness, not just query performance.
Incremental Materialized Views in PostgreSQL
PostgreSQL has native materialized views — but does not support incremental refresh. `CREATE MATERIALIZED VIEW` stores the result of a query. `REFRESH MATERIALIZED VIEW` reruns the query against the base tables and replaces the stored result. Every refresh recomputes from the full source table, regardless of how little changed data there actually is.
`REFRESH MATERIALIZED VIEW CONCURRENTLY` is the closest native approximation. It computes the new result alongside the old one and diffs them, so reads aren't blocked during refresh. But it still runs the full query — the cost is proportional to source table size, not changed data volume. And it requires a unique index on the view, which limits the query structure you can use.
pg_ivm: Experimental incremental refresh support
pg_ivm is a PostgreSQL extension that adds incremental refresh support to PostgreSQL. It hooks into the write path: when source table data changes, pg_ivm incrementally refreshes the view by updating only the affected rows. No full rescan. The cost tracks with the volume of changed data.
pg_ivm supports a subset of SQL — aggregate functions (`SUM`, `COUNT`, `AVG`), `GROUP BY`, joins, and subqueries. It doesn't support all window functions or common table expressions. It's a real implementation of incremental refresh, but it's an extension, not core PostgreSQL, and it's early-stage for production workloads at scale.
The structural limit: Even with pg_ivm, PostgreSQL's materialized views are maintained within a single Postgres instance. At high concurrent write throughput — hundreds or thousands of transactions per second modifying shared state — the incremental refresh competes with transactional writes for the same resources. Postgres was designed for transactional workloads and operational databases, not for maintaining continuously fresh derived state at scale alongside those transactions.
Bottom line: PostgreSQL materialized views work well for analytical precomputation on a default refresh schedule. For derived state consumed by real-time decisions under concurrent load, the native tooling no longer supports incremental refresh (pg_ivm is promising but unproven at scale).
Incremental Materialized Views in ClickHouse
ClickHouse has two distinct materialized view mechanisms, and the terminology can be confusing because both are called "materialized views" but the refresh semantics differ significantly.
Incremental materialized views (the default `CREATE MATERIALIZED VIEW`) are triggered on insert. When new rows land in a source table, ClickHouse runs the view's query against only the new rows and appends results to a destination table. The incremental refresh processes changes from each insert batch, not the full source table.
The catch: these views only see the new data, not the full source. An aggregate query like `SUM` doesn't recompute the global sum — it computes the sum of the new batch. To get correct global aggregates, you pair this with the `AggregatingMergeTree` engine, which merges partial aggregates at query time. This means the view's stored result is not always immediately correct — it becomes correct after background merges complete. Under high write throughput, batch query results can temporarily reflect partial state.
Refreshable materialized views (`CREATE MATERIALIZED VIEW ... REFRESH EVERY`) are the full-recompute model — similar to PostgreSQL's `REFRESH MATERIALIZED VIEW`, run on a configurable schedule. Simpler mental model, but the same scalability problem: cost is proportional to source table size, not the volume of changed data.
The structural limit: ClickHouse incremental materialized views are eventually consistent. The `AggregatingMergeTree` merge process is asynchronous — reads may return partially merged results. There's no snapshot isolation guarantee: a batch query might see some recent inserts reflected in the aggregate but not others. For dashboards and a data warehouse, this is fine. For automated decisions where correctness at the moment of evaluation matters — a fraud check, a credit limit enforcement — "eventually correct after merges" is a gap.
ClickHouse also fans out non-primary-key queries to all shards. At high QPS on filtered queries, adding nodes adds work per query, not capacity. This limits throughput for the concurrent, filtered query patterns that real-time decision systems require.
Bottom line: ClickHouse incremental materialized views are excellent for near-real-time analytics and dashboards where eventual consistency is acceptable. The system doesn't strictly require incremental refresh to be strongly consistent — it trades consistency for write throughput. For decision-time derived state that must be correct when read under concurrent writes, the merge-based model leaves a gap.
Incremental Materialized Views in Databricks and BigQuery
Databricks and BigQuery both support incremental refresh for materialized views — but on batch cadences, not continuous streams. The refresh semantics are fundamentally batch query oriented.
Databricks performs incremental refresh over delta tables using Unity Catalog managed tables. The system tracks which source table data has changed since the last refresh and processes only the delta. An optimal refresh policy depends on your workload — Databricks can do incremental or full refresh depending on the query structure. This is more efficient than a full recompute, but the cadence is scheduled — not continuous as data arrives. The minimum practical interval is minutes, not milliseconds.
BigQuery materialized views support automatic incremental maintenance for a subset of query patterns (single-table aggregations, some joins). When base tables receive fresh data, BigQuery incrementally refreshes the view. The refresh is automatic but not instantaneous — there's a propagation delay, and the view may temporarily store query results that are stale during high-volume writes. The automatic caching behaviors differ from explicit refresh — BigQuery may serve cached results rather than the freshest underlying data.
The structural limit: Both systems are designed for data warehouse and analytical workloads where "minutes fresh" is a major improvement over "hours fresh." Neither targets the sub-second freshness that real-time decision systems require. Incremental maintenance reduces compute cost — you're not rescanning all rows on every refresh — but the cadence is batch, not streaming.
For fraud velocity counters, credit exposure checks, or any derived state consumed at transaction time, a batch-cadence materialized view — even one with incremental maintenance — is still a scheduled snapshot, not a live signal.
Incremental Materialized Views in Streaming Databases
Streaming databases — Materialize, RisingWave, and to some extent ksqlDB — are built specifically around incremental materialized views as the core primitive. These systems offer the best support for continuous maintenance available today.
You define a view as a SQL query. The streaming database maintains it continuously as events arrive from Kafka or CDC sources. Every new event triggers an update — the view's stored result reflects the change immediately. Queries return the pre-maintained result in milliseconds. No scheduled refresh. No full recompute.
Materialize, built on Timely Dataflow and Differential Dataflow, provides particularly strong refresh semantics: materialized views reflect a consistent point in time, and the system can compose views on top of views — enabling multi-stage derivation pipelines expressed entirely in SQL. RisingWave takes a similar approach with cloud object storage as the persistence layer.
For the preparation gap — keeping derived state current as source table data changes — streaming databases are the strongest available solution. They replace the Kafka → Flink → Redis pipeline with a single system that ingests events and maintains derived state continuously, queryable via standard SQL.
The structural limit: Streaming databases sit downstream of your source systems. They ingest from queried data sources via Kafka or CDC, which means there's a pipeline — source database → WAL → Kafka → streaming database → materialized view. Each hop adds latency. Under high throughput, consumer lag compounds with view maintenance lag.
More fundamentally, streaming databases solve the preparation gap but not the retrieval gap. If a decision needs derived state (the velocity counter, incrementally refreshed in the streaming database) and raw state (the account record from Postgres) and a similarity match (from a vector store), those three reads come from three systems at three different points in time. The derived state is fresh. The decision context is still inconsistent — assembled from multiple independent snapshots that never coexisted.
For workloads where the only problem is stale derived state, streaming databases are the right category. For workloads where derived state must be served alongside raw state and other retrieval patterns in one consistent snapshot, the streaming database closes half the gap.
Comparison: Incremental Refresh Support Across Systems
How each system handles incremental refresh, the refresh behavior and semantics, and where the tradeoffs land:
System
Supports Incremental Refresh?
Freshness
Consistency
Best For
PostgreSQL (native)
No — full refresh overwrites the entire dataset
Minutes to hours (scheduled)
Strong (within Postgres)
Analytical precomputation
PostgreSQL (pg_ivm)
Yes — incremental refresh on write to source table
Sub-second (on commit)
Strong (within Postgres)
Low-to-moderate write throughput
ClickHouse (incremental MV)
Yes — incremental refresh on insert batch
Seconds (after merge)
Eventual (AggregatingMergeTree)
Near-real-time analytics
ClickHouse (refreshable MV)
No — full or incremental refresh on schedule
Minutes (scheduled)
Point-in-time on refresh
Scheduled analytical rollups
Databricks
Yes — Databricks performs incremental refresh over delta tables
Minutes (scheduled)
Snapshot per refresh
Data warehouse batch pipelines
BigQuery
Yes — automatic incremental refresh
Minutes (propagation delay)
Eventual
Cloud data warehouse workloads
Materialize
Yes — continuous incremental refresh
Sub-second (event-driven)
Strong (within Materialize)
Streaming-first derived state
RisingWave
Yes — continuous incremental refresh
Sub-second (event-driven)
Strong (within RisingWave)
Cloud-native streaming workloads
The Deeper Problem: Fresh Views, Fragmented Context
Every system in the table above addresses the same question: how do you keep a precomputed query result current as source tables change? And every system — from Postgres to Materialize — answers it within its own boundary.
That's the part nobody talks about.
An incremental materialized view keeps derived state fresh inside the system that maintains it. The velocity counter is incrementally refreshed in the streaming database. The aggregated balance is current in ClickHouse. The feature vector is current in the feature store. Each system's refresh operation keeps its own materialized view source tables in sync.
But the decision doesn't read from one system. A fraud check evaluates a velocity counter (derived), an account balance (raw data from Postgres), a device fingerprint (raw), and a behavioral similarity match (vector). If the velocity counter lives in Materialize, the account balance lives in Postgres, and the similarity match lives in Pinecone, the decision is reading from three queried data sources at three different points in time.
Each individual piece of context may be fresh. The composite — everything the decision evaluates together — never existed as a consistent whole. The velocity counter reflects events through timestamp T₁. The account balance reflects state at T₂. The similarity match was computed at T₃. The decision treats them as simultaneous. They aren't.
This is the retrieval gap: the context a decision needs requires multiple retrieval patterns — point lookups, aggregations, similarity search — split across systems with non-overlapping query capabilities. Each system is internally consistent. The cross-system composite is not.
Incremental materialized views solve the preparation gap — they keep derived state current as source table data changes. They don't solve the retrieval gap. And for automated decisions that act on derived state alongside raw data, the retrieval gap is where the wrong decisions happen.
What a Unified Approach Looks Like
The preparation gap and the retrieval gap collapse when derived state is maintained inside the same transactional boundary as the raw data it's derived from — and served alongside every other retrieval pattern from one consistent snapshot.
This is the architecture behind a context lake: a single system that ingests operational state from systems of record via CDC, computes derived context (aggregations, velocity features, embeddings) continuously as source table data changes, and serves all retrieval patterns — point lookups, range scans, aggregations, full-text search, vector similarity — from one consistent snapshot. No data warehouse batch cadence. No pipeline hops.
The key difference from the composed stack isn't speed. It's boundary. When derived state is maintained inside the transactional boundary, there's no propagation hop between source event and view update. There's no cross-system read that assembles a composite from mismatched timestamps. The decision evaluates context — raw and derived — as it exists at one moment.
Tacnode is built as a PostgreSQL-compatible context lake. Incremental materialized views are maintained inside the transactional boundary — the refresh operation happens as part of the same system that stores the underlying data. A fraud check that needs a velocity counter, an account balance, and a behavioral similarity match gets all three from the same consistent snapshot — not three systems at three propagation stages.
Per-feature freshness control means each derived feature can be configured with an optimal refresh policy independently: always current when the business depends on it (velocity counters for fraud), allowed to lag when it doesn't (weekly rollups for reporting). In the composed stack, all features share the same pipeline latency. In a context lake, freshness is per feature, not per pipeline.
If your materialized views feed dashboards or a data warehouse, the systems above work well — pick the one that fits your infrastructure. If your materialized views feed automated decisions that act on derived state alongside raw state under concurrent load, the question isn't which system refreshes fastest. It's whether the view and the rest of the context the decision needs are served from the same moment in time.
How to Optimize Materialized Views for Incremental Refresh
Regardless of which system you use, there are data source considerations and query plan choices that determine whether a view can be incrementally refreshed — or whether the system falls back to a full recompute.
Base tables and dimension tables matter. Views work best when base tables are append-heavy (new events, new transactions) rather than heavily updated in place. If your source table receives mostly inserts, the incremental refresh processes changes efficiently. If rows are frequently updated or deleted, some systems must fall back to full refresh semantics because the delta is harder to compute.
Query structure determines refresh behavior. Not every query supports incremental refresh. Systems like Databricks and BigQuery may fall back to full refresh if the query plan includes certain patterns — non-deterministic functions, certain outer joins, or expressions that the optimizer can't decompose into incremental refresh select expressions. When the create operation fails to produce an incremental plan, the system silently defaults to full recomputation. Check your system's documentation for explicitly controlled refresh behavior versus automatic fallback.
Refresh policy controls freshness guarantees. In batch systems, the policy controls how frequently materialized view refreshes occur — query periodically or on a schedule. A serverless pipeline automatically manages refresh attempts in BigQuery, but the cadence isn't sub-second. In streaming databases, the refresh is continuous by default, but you can still configure how the system handles late-arriving data and whether to retain data history or archive data from older windows.
Monitor for operations that silently become full refreshes. Some systems fall back from incremental to full refresh under certain conditions — schema changes, data retention threshold breaches, or when the view definition changes. When a system no longer supports incremental refresh for a given view, the refresh overwrites the stored result with a full recompute, and latency spikes. Store query results metadata and monitor to catch these regressions early.