Table Design Suggesiton
Table Structure Design Guide
In modern database systems, a well-designed table structure is crucial for significantly enhancing query performance and optimizing storage efficiency. Tacnode is a high-performance, flexible database platform that supports various storage models and index types. This guide delves into the table structure design in Tacnode, including the selection of table storage types and index design, to help you fully leverage the advantages of Tacnode.
1. Choosing Table Storage Types
Tacnode supports three main table storage types: Row, Columnar, and Hybrid. Each storage type has specific use cases, advantages, and disadvantages. Selecting the appropriate storage model can greatly improve performance.
1.1 Row Storage
Row storage is the default storage mode in Tacnode. In this mode, data is stored by rows, with all data for a row kept together. Row storage is ideal for OLTP (Online Transaction Processing) applications, where frequent inserts, updates, and queries based on entire rows are required. It offers high write performance and low query latency.
Suitable Scenarios:
- Environments with frequent data inserts and updates.
- Applications requiring queries based on single rows of data.
- Traditional transactional applications, such as e-commerce order management and banking systems.
Example: Creating a Row Storage Table
1.2 Columnar Storage
Columnar storage organizes data by columns rather than by rows. In this mode, data for each column is stored together. This storage model is highly efficient for analytical queries, as these queries often focus on specific columns without needing to access all rows of the table. Columnar storage can significantly reduce I/O operations, especially in large-scale data queries.
Suitable Scenarios:
- Complex data queries involving extensive statistics, analysis, and aggregation operations.
- Situations where only specific columns of large datasets need to be read, and data changes between columns are minimal.
- OLAP (Online Analytical Processing) applications, such as data warehousing and log analysis.
Example: Creating a Columnar Storage Table
1.3 Hybrid Storage
Hybrid storage is a special storage mode in Tacnode that combines the advantages of both row and columnar storage, making it suitable for specific hybrid workloads (HTAP: Hybrid Transactional/Analytical Processing). In this mode, some columns of a table are stored in row format, while others are stored in columnar format. This hybrid model is ideal for scenarios that require support for both OLTP and OLAP.
Suitable Scenarios:
- Hybrid applications that need to support both transaction processing and large-scale data analysis.
- Tables that contain both frequently updated data and analytical data that requires efficient querying.
Example: Creating a Hybrid Storage Table
Summary of Storage Mode Selection
-
Row Storage: Best suited for scenarios requiring frequent inserts, updates, and queries based on entire rows, such as OLTP systems.
-
Columnar Storage: Ideal for large-scale data queries and analytical scenarios, such as OLAP systems and data warehouses.
-
Hybrid Storage: Suitable for applications that need to support both OLTP and OLAP, typically found in HTAP scenarios.
Storage Mode Conversion
Tacnode supports Schema Evolution, allowing you to change the storage format of a table online. During this modification, the table remains unlocked, and read/write operations are not affected.
Modification Examples:
2. Index Design
Indexes are crucial tools for enhancing query performance. In Tacnode, there are various index types available, allowing you to choose the most suitable one based on the characteristics of your data and query requirements.
2.1 Primary Key Index
The primary key index is the most common type of index, typically used to ensure data uniqueness and integrity. Every table should have a primary key, and an index is usually automatically created on the primary key column. Primary key indexes are particularly useful for quickly locating a specific row based on the primary key.
Considerations:
- The primary key must be unique and cannot be null.
- For composite primary keys, the order of columns is crucial. The query optimizer will prioritize matching based on the leftmost column, so the order should reflect the sequence of query conditions.
Example: Creating a Primary Key Index
Composite Primary Key Design
- When the primary key is a composite key consisting of multiple columns, queries should follow the leftmost prefix principle. This means that the query conditions must include the leftmost column of the composite primary key. For example, if the primary key is
(a, b, c)
, the query should prioritize equality conditions ona
. If range queries involveb
andc
, the index efficiency may significantly decrease.
2.2 Global Secondary Index (GSI)
Global Secondary Indexes are useful for efficiently querying non-primary key columns. Unlike primary key indexes, secondary indexes do not require data to be unique, but they can significantly enhance query performance for specific columns.
Suitable Scenarios:
- Queries frequently use non-primary key columns as filter conditions.
- Sorting, grouping, or joining operations are needed on certain non-primary key columns.
Example: Creating a Secondary Index
This example demonstrates how to create a secondary index on the username
column, allowing for more efficient queries involving this column in Tacnode.
2.3 Bitmap Index
Bitmap indexes are ideal for columns with a low cardinality of values, such as gender or status. These columns typically have a limited set of discrete values, and using a bitmap index can significantly enhance query performance.
Note: Tacnode automatically constructs bitmap indexes based on the distribution of data within columns, so manual creation of bitmap indexes is rarely necessary.
Suitable Scenarios:
- Columns with a small variety of data values, usually limited discrete values (e.g., boolean, gender, categories).
- High-performance requirements for conditional filtering queries.
Example: Creating a Bitmap Index
2.4 GIN Index (Inverted Index)
GIN (Generalized Inverted Index) is suitable for data types like arrays, JSON, or other complex types. This index accelerates queries by treating each element within the data as an independent index entry, making it ideal for handling multi-valued columns or text searches.
Suitable Scenarios:
- Columns with data types such as arrays or JSON that contain multiple values.
- Need for fast containment queries (e.g.,
@>
operator) or full-text search.
Example: Creating a GIN Index
2.5 Vector Index
Vector indexes are typically used for handling high-dimensional data or applications requiring similarity searches, such as image processing and natural language processing (NLP). These indexes allow efficient retrieval of records with high similarity (e.g., most similar texts or images).
For more information on using vector indexes, refer to the Vector Search documentation.
Suitable Scenarios:
- High-dimensional data, such as vectors generated by machine learning models.
- Similarity searches, such as retrieval based on text, images, or audio.
Summary of Index Selection
- Primary Key Index: Used to uniquely identify a row of data, usually created automatically.
- Global Secondary Index: Suitable for non-primary key columns that are frequently queried.
- Bitmap Index: Ideal for low-cardinality columns (e.g., gender, status).
- GIN Index: Suitable for handling multi-valued columns (e.g., arrays, JSON) or full-text search.
- Vector Index: Suitable for similarity searches on high-dimensional data.
3. Considerations for Primary Key Design
3.1 Choosing Appropriate Primary Key Columns
-
Uniqueness: The primary key column must be unique to ensure that each row of data can be uniquely identified by the primary key.
-
Stability: Avoid selecting columns that are frequently updated as primary keys. For example, avoid using an auto-incrementing
ID
field as a primary key unless it is unique and does not change frequently.
3.2 Order Selection for Composite Primary Keys
For composite primary keys, follow the leftmost prefix principle. The query optimizer will prioritize using the leftmost column of the composite primary key to filter data, so the order of columns should be determined based on the frequency of their use in query conditions.
For example, for a primary key (a, b, c)
:
- In a query
WHERE a = ? AND b = ?
, the index will first filter using columna
. - In a query
WHERE b = ?
, the index will not be used.