Materialized Views
Materialized views are a powerful feature in Tacnode that can dramatically improve query performance by pre-computing and storing the results of complex queries. Unlike standard views that execute their underlying queries each time they're accessed, materialized views physically store query results, providing immediate access to pre-calculated data.
Understanding Materialized Views
A materialized view is a database object that contains the results of a query. Think of it as a "snapshot" of data that's computed once and stored physically on disk, similar to a table. However, unlike regular tables, materialized views are defined by a query and can be refreshed to reflect changes in the underlying data.
Key Characteristics:
Feature | Standard View | Materialized View |
---|---|---|
Storage | Virtual (query definition only) | Physical (stores actual data) |
Execution | Runs query each time accessed | Returns pre-computed results |
Performance | Varies based on underlying data | Consistent, fast access |
Data Freshness | Always current | Current as of last refresh |
Storage Requirements | Minimal | Requires disk space for results |
Performance Benefits:
Strategic Use Cases for Materialized Views
Query Performance Acceleration:
Materialized views excel at improving response times for computationally expensive queries, particularly those involving complex aggregations, multiple table joins, or large dataset scans.
Ideal Scenarios:
- Dashboard and Reporting: Pre-compute metrics for business intelligence dashboards
- Data Warehouse Analytics: Store pre-aggregated fact tables for faster OLAP queries
- Complex Aggregations: Cache results of statistical calculations, rolling averages, or trend analyses
- Frequently Accessed Reports: Store results of recurring reports that don't need real-time data
Performance Impact:
Cross-Data Source Query Optimization:
When queries span multiple data sources or involve expensive remote data access, materialized views can significantly reduce data transfer costs and improve response times.
Common Applications:
- Data Federation: Combine data from multiple databases or external APIs
- Cross-Region Queries: Cache data from geographically distributed sources
- External API Integration: Store API responses to reduce API call frequency and costs
- Hybrid Cloud Architectures: Optimize queries across on-premises and cloud data sources
Example Implementation:
System Load Reduction:
In high-concurrency environments, materialized views can dramatically reduce system load by eliminating the need to repeatedly execute resource-intensive queries.
Benefits:
- CPU Conservation: Reduce computational load on database servers
- Memory Efficiency: Lower working memory requirements for query execution
- I/O Optimization: Minimize disk reads from large base tables
- Concurrent User Support: Handle more simultaneous users without performance degradation
Real-world Example:
Data Warehousing and ETL:
Materialized views serve as an excellent tool for building efficient data processing pipelines and maintaining hierarchical data transformations in data warehouses.
ETL Applications:
- Staging Area Management: Create intermediate processing stages
- Data Quality Checks: Store validation results and data profiling metrics
- Incremental Processing: Build efficient change-detection mechanisms
- Data Lineage: Track data transformation stages and dependencies
Multi-tier Processing Example:
Characteristics of Materialized Views
Materialized views offer several key advantages:
- Pre-computed Results: Query results are calculated once and stored physically, eliminating the need to re-execute complex queries
- Table-like Access: Can be queried using standard SQL just like regular tables, with full index support
- Automatic Synchronization: Data consistency is maintained through controlled refresh mechanisms
- Resource Efficiency: Reduces CPU and I/O overhead by avoiding repeated complex calculations
- Caching Layer: Acts as a high-performance data cache, reducing load on source tables
How to Use Materialized Views
Materialized views can be accessed in two ways:
1. Direct Querying
- Query materialized views like regular tables using standard SQL
- Create indexes on materialized views for enhanced performance
- Note:
INSERT
,UPDATE
,DELETE
operations are not allowed - Data updates require
REFRESH
commands
2. Automatic Query Optimization
- The optimizer automatically rewrites queries to use materialized views when beneficial
- This optimization only occurs when the materialized view is considered "fresh"
Freshness Rules:
- Views are "fresh" when created and after each refresh
- Views become "expired" when underlying tables change
Key Trade-off:
- Automatic optimization: Convenient but may have unstable performance with expired views
- Direct querying: More predictable performance but data may not be real-time
Categories of Materialized Views
When underlying data changes, materialized views must be updated through a refresh process. Based on refresh strategies, there are two main types:
Materialized views can be categorized based on their refresh strategies into two types:
-
Full Materialized View: In this approach, the materialized view is refreshed by re-executing the query statement in its definition, entirely replacing the existing content with the latest results from the query.
-
Incremental Materialized View: This refresh method analyzes the materialized view's defined query statement, assesses changes since the last update, and merges those changes into the existing materialized view.
Incremental and real-time materialized views require analysis of the query statement and a transformation into a format suitable for incremental calculation. However, due to the intricacies of incremental calculations, not all SQL functions are fully compatible, imposing certain limitations. We will also compare these materialized views across the following dimensions:
Full Materialized View | Incremental Materialized View | |
---|---|---|
Supported Query Types | All | - Project - ProjectSet - Where - Distinct - Over Window - Aggregate - SUM/COUNT/MAX/MIN/AVG - Agg with Filter and Distinct Agg - Inner Join - Left/Right/Full Outer Join - Outer Join must have equivalent conditions |
Unsupported Query Types | None | - All VOLATILE functions - Order By - Subqueries such as Exists / In - Union |
Is It Expired | The data in a materialized view is outdated if the underlying table has changed since the last refresh | The data in a materialized view is outdated if the underlying table has changed since the last refresh |
Refresh Cost | Associated with the total data volume in the base table | Relates to the volume of data changes in the base table since the last refresh and the incremental calculation complexity of the query. |
Typical Refresh Interval | Daily, hourly | Every few seconds to minutes |
VOLATILE
functions return results that may change with every call, even when given the same input parameters. Typical examples include:
- Functions returning the current time: clock_timestamp() etc.
- Random number generators: random(), setseed() etc.
- Sequence number functions: nextval(), setval(), lastval() etc.
- Functions related to environment variables: current_query(), timezone(), timeofday() etc.
To get the full list of VOLATILE
functions, query
These functions are not supported in incremental materialized views.
In real-world situations, we must decide which materialized views to use based on queries' complexity, frequency, timeliness needs, and a thorough assessment of the associated refresh costs.
Fully Materialized View Management
Creating Fully Materialized Views:
Syntax
Parameter description:
IF NOT EXISTS
: No error will be reported if the view exists.view_name
: Name of the materialized view.column_name
: Optional; specify the column's name in the view.USING
: Optional; specify the storage type.query
: SELECT statement defining the view's content.WITH [ NO ] DATA
: Indicates whether to populate data during creation.
Example
When creating a materialized view, using WITH DATA
(the default) immediately executes a query to populate the data. If the data volume is significant, this process may take considerable time and block other operations. Alternatively, you might opt for WITH NO DATA
, which allows you to refresh and fill out the data separately later.
Refreshing:
Syntax:
Parameter description:
CONCURRENTLY
: This parameter is optional. It allows simultaneous viewing refresh to mitigate the obstruction of other query operations; however, certain restrictions apply:- This operation cannot be executed within a transaction block.
- The
WITH NO DATA
option is unsupported.
view_name
: Denotes the name of the materialized view designated for refreshment.WITH [ NO ] DATA
:WITH DATA
(Default): This option refreshes the data immediately using the query statement outlined by the view.WITH NO DATA
: This option updates only the view's definition without refreshing the data. It's commonly used when creating a new view, which is then complemented by a separate data refresh.
Example
Note:
-
Updating materialized views uses resources and requires balancing the frequency of refreshes with the need for real-time data.
-
It’s advisable to schedule refreshes during off-peak times or to implement an automated scheduling system for updates.
Deleting:
Syntax
Parameter Details
IF EXISTS
: No error will be raised if the specified view doesn’t exist.view_name
: This represents the name of the materialized view intended for deletion. You can specify multiple values separated by commas.CASCADE
: This option removes any dependent objects associated with the view, like indexes.RESTRICT
: If there are dependent objects, the deletion will not proceed—this is the default behavior.
-
Deleting a materialized view is permanent and cannot be undone, so please take care when proceeding.
-
Before deletion, confirm that the view is no longer necessary and that no dependent objects exist. Check the view's dependencies to prevent accidental deletion.
-
When using
CASCADE
, exercise caution, as all dependent objects will be deleted simultaneously, potentially causing significant consequences. -
You must have the necessary permissions to delete a view. Insufficient permissions will result in an error.
-
Deleting a view does not remove data from the source table; it only deletes the precomputed and stored results.
-
Regularly review and delete materialized views that are no longer used to conserve storage space.
-
Maintain a reasonable number of materialized views, as an excessive amount can raise maintenance expenses and complicate the system.
Incremental Materialized View Management
Syntax
Parameter Explanation:
refresh_mode
: Refresh mode, incremental indicates incremental refresh mode.- For other parameter meanings, refer to the definition of a full materialized view.
Example:
Refreshing Incremental Views:
Syntax:
Parameter Explanation:
view_name
: The name of the materialized view to refresh.
Example:
Notes:
- Refreshing a materialized view is resource-intensive, so it is essential to balance the refresh frequency and real-time requirements.
- Consider executing scheduled refreshes during non-peak business hours or using a scheduler for automated refreshing.
Modifying Incremental Materialized View Definition:
TacNode supports partial modification of incremental materialized view definitions, allowing you to add or remove columns in the final result set. After modifying the materialized view definition, a refresh will trigger a full refresh of the view.
Syntax:
Example:
Important Notes:
If the modification scope exceeds adding or removing columns scenarios, it may trigger structural incompatibility of the materialized view. During updates, you may see the error "unsupported feature: alter primary key while altering query", indicating that this definition change is not supported.
Deleting Incremental Views:
Parameter Explanation:
- Same meanings as full materialized views.
Altering Storage:
Syntax:
Creating Indexes:
Syntax:
Materialized View Metadata
TacNode provides the pg_matviews
system view for querying metadata of materialized views created in the database. This allows you to efficiently manage materialized views, including querying, monitoring, and optimizing their performance. In practical scenarios, you can use pg_matviews
to perform the following common management tasks:
Listing All Materialized Views
If you need to see which materialized views exist in the current database, you can simply query pg_matviews
:
This allows you to quickly locate all materialized views and their schemas, providing information for further management.
Viewing Materialized View Definitions
When you need to identify the underlying query of a materialized view, you can query the definition
column in pg_matviews
:
This is very useful for understanding the underlying logic of materialized views, debugging, or optimization.
Checking if a Materialized View is Populated
By checking the ispopulated
field, you can determine whether a materialized view was populated with data when created:
If ispopulated
is false
, it means the view is not populated and you need to manually execute REFRESH MATERIALIZED VIEW
.
Reviewing and Optimizing Materialized Views
By combining metadata, you can optimize storage and performance:
- Avoid using OUTER JOIN, try to use INNER JOIN instead
- Review overly nested SQL in definitions and optimize query structure
- Use indexes to improve materialized view query performance
For example, for frequently queried materialized views, add indexes to speed up queries:
For more materialized view commands, refer to CREATE MATERIALIZED VIEW, ALTER MATERIALIZED VIEW, and DROP MATERIALIZED VIEW documentation.
Materialized View Permissions
In TacNode, the permission system for materialized views maintains complete compatibility with PostgreSQL, supporting standard object access control, role and permission inheritance, GRANT/REVOKE authorization, ownership management, etc. The main rules are as follows:
Permission Overview
Operation | Required Permissions | Description |
---|---|---|
Create materialized view | CREATE on schema | Need CREATE permission on schema, SELECT permission on tables/views involved in definition |
Query materialized view | SELECT on materialized view | Need SELECT permission on the materialized view |
Refresh materialized view | SELECT on source table, USAGE on schema | SELECT permission on tables/views involved in definition and USAGE permission on schema |
Modify definition/storage format | ALTER on materialized view | Need ALTER permission on the materialized view |
Delete materialized view | DROP on materialized view | Need DROP permission on the materialized view |
Permission Details
- Object ownership: The owner of a materialized view has all operation permissions (including REFRESH, ALTER, DROP, etc.), and ownership can be transferred via
ALTER MATERIALIZED VIEW ... OWNER TO ...
. - Authorization and revocation: You can use GRANT/REVOKE to authorize or revoke SELECT, ALTER, DROP and other permissions, with syntax completely consistent with PostgreSQL.
- Roles and inheritance: Supports role inheritance and group roles, where permissions can be passed through role chains.
- Access control: All permission checks remain consistent with PostgreSQL, supporting fine-grained object-level authorization.
- Refresh permissions: Only the materialized view owner can execute REFRESH, which cannot be granted to other users via GRANT. If multiple users need refresh capability, this can be achieved through ownership transfer, using SECURITY DEFINER functions, or role switching.
Common Authorization Operation Examples
Materialized Views and Indexes
As a type of physically stored table, materialized views fully support index creation and usage, which can significantly improve query performance:
Index Advantages
- Query acceleration: Creating indexes on materialized views can significantly improve query performance, especially for point queries and range queries
- Sorting optimization: Indexes can help optimize ORDER BY operations, avoiding additional sorting overhead
- Join optimization: Creating indexes on join keys can optimize joins between materialized views and other tables
Index Creation Examples
Best Practices:
- Create indexes on columns frequently used in WHERE conditions, JOIN conditions, and ORDER BY clauses
- Regularly review index usage and remove unused indexes
- For materialized views that are frequently queried but not frequently refreshed, you can create more indexes to optimize query performance