As data in a single table grows over time, managing and querying it can become difficult. To address this, tables can be partitioned, using their characteristics to divide the data into smaller storage blocks. This method can significantly improve query performance, simplify data management, and enhance availability.
A partitioned table is divided into smaller segments called partitions. Each partition contains a portion of the table's data. These partitions are invisible to higher-level applications, which still see the table as a single entity.
Each partition is named and can have unique storage characteristics; for instance, different partitions might be stored in separate tablespaces.
When adding data to a partitioned table, a partition key is typically defined to determine which partition the data should go to. The specific partition for each data piece is identified based on the partition key's value or its calculated result. This partition key can consist of one or more columns.
Partition tables are a data management method rather than a traditional approach to optimizing query performance. If not designed effectively and without proper partition pruning, they can hinder query performance for many statements.
The core idea involves logically dividing a large table into smaller, manageable segments. When a query or update targets a significant portion of a single partition, performance can be enhanced by leveraging sequential scans instead of relying on indexes and the scattered random access reads across the table.
The table holds vast historical data in a clear chronological order. It is often essential to clean up or archive this data.
Data must be organized hierarchically, storing various segments on different media. Implementing partition tables is a viable option in this context.
Planning the number of partitions, estimating the maximum required partitions, and establishing a sensible partitioning strategy are crucial.
Incorporating partition keys in the WHERE clause of SQL queries allows for quick identification of the relevant partitions. This process, known as Partition Pruning, significantly reduces the data scanning range, dramatically improving query performance.
Additionally, when JOIN operations involve multiple partitioned tables based on the join key, the engine can convert the join of two large tables into a more efficient partition-level join. This optimization dramatically accelerates the joining process, known as a partition-wise join.
Enhance table manageability
With partitioned tables, traditional table-level management operations are now decentralized to each partition, allowing independent maintenance tasks. Maintenance on large tables can be broken down into smaller tasks executed during different time intervals. For instance, a typical table backup can now focus on a single partition.
Enhance data availability
After partitioning, each partition operates independently. Actions performed on one partition do not impact the data accessibility of others. If a partition fails, it won’t disrupt transactions on other partitions. Furthermore, partitions can be stored across various table spaces to alleviate I/O pressure.
Depending on different application scenarios, you can choose different partitioning strategies for the table. The basic partitioning strategy provided by the engine is as follows:
Range Partition
Hash Partition
List Partition
Based on the basic partitioning strategy, there can be further Composite Partition strategies.
Range partitioning organizes partitions based on a set range. It works best for similar, clearly ordered data, allowing easy definition of the partition range in line with the data's sequence. A common use case for range partitioning is dividing data by time, often utilizing a time-based partition key. The partition range should be continuous, without gaps or overlaps, and the boundary values must be compatible with the partition key type of the main table.
Parameter Description
start_value: The partition's initial value, defined by the FOR VALUES FROM clause; this value is included.
end_value: The final value of the partition specified in FOR VALUES TO. This value is excluded, meaning all data greater than or equal to this value will move to the next partition. In the case of the first partition, the lower limit of each partition is the upper limit of the one before it.
MIN_VALUE indicates RANGE, representing the minimum possible value in the partition. This particular constant sets the lower bound; when used as the starting value, it signifies that the partition includes the minimum value of the partition key. FOR VALUES FROM (MIN_VALUE) TO (value) shows that the partition range starts at the minimum value and extends to the specified value, excluding that value.
MAX_VALUE indicates RANGE, representing the maximum possible value in a partition. This particular constant sets the upper bound; when used as the end value, it signifies that the partition includes the maximum value of the partition key.
MIN_VALUE and MAX_VALUE apply only for RANGE partitions and cannot be used for LIST or HASH sub-partitions.
You can now view the table structure:
Default partition: You can create a DEFAULT partition for RANGE PARTITION. If data is not assigned to an existing partition, it will be written to the default partition.
Partitioned tables can be defined using expressions.
Hash Partition uses a hash algorithm on the designated PARTITION KEY to determine the data's storage partition. This technique distributes data evenly and randomly across partitions to achieve consistent partition sizes.
Given its random distribution, hash partitioning may not be suitable for handling historical data that follows a specific chronological order. However, it excels when there is a need for an even data allocation across different storage devices. To maximize the effectiveness of hash partitioning, keep these guidelines in mind:
Select a partition key with a unique column (Unique) or numerous unique values (Almost Unique).
When creating hash partitions, choose a quantity that is a power of 2, such as 2, 4, 8, 16, etc.
Tables that use hash partitioning are created with the partition clause in the CREATE TABLE statement, specifically by employing PARTITION BY HASH. During partition creation, you can assign a name to each partition and its corresponding tablespace.
Parameter Description:
MODULUS: specifies the total number of hash partitions, which must be a positive integer.
REMAINDER: Specifies the partition number, ranging from 0 to modulus -1.
List partitioning involves defining unique values (lists) for each partition. When the partition key aligns with a distinct value, the related data is saved in that specific partition. This approach facilitates personalized data organization, such as sorting data by region.
To set up list-partitioned tables, use the CREATE TABLE clause alongside the SELECT statement, employ PARTITION BY LIST, and specify a list of unique values for each partition. The partition key values must remain distinct.
Parameter Description
value1, value2, ...: One or more values of the partition key, separated by commas. Use the FOR VALUES IN clause to specify. Each partition can contain one or more discrete values. Partition values cannot appear repeatedly in multiple partitions.
Composite partitions, as the term suggests, integrate various partitioning strategies. Each partition undergoes another application of the original partitioning method. For instance, range partitioning can be applied to each resulting partition, leading to several sub-partitions. This concept can be likened to China, which is divided into provinces (partitions) and cities (sub-partitions).
The three fundamental partitioning strategies can be freely combined when employing composite partitioning. Range partitioning is the primary strategy, and its sub-partitions utilize range, hash, and list partitioning methods.
A practical example is a secondary branch office where all sub-bureaus are organized as LIST partitions.
An example of a first-level partition of LIST and a second-level partition of RANGE.
An example is where the first-level partition is LIST, and the second-level partition is partly RANGE and partly HASH.
Create a new table using CREATE TABLE PARTITION OF statements
Example
ALTER TABLE ... ATTACH PARTITION ... First, build the table, and then add partitions to the parent table by
Notices:
It's advisable to set the partitioning method when establishing the base table. If you choose the ATTACH method, the new partition table must match the main table's structure exactly, and its range must not overlap with any existing partitions.
You can query partitioned tables just like regular ones.
When adding new data, you can use standard INSERT statements. The partition table will automatically direct the data to the correct partition based on the partition key.
Querying a partitioned table resembles a standard table using a SELECT statement. However, Tacnode efficiently identifies and scans only the necessary partitions for the query. To optimize performance, it's advisable to include the partition's key condition in the WHERE clause to narrow the query's focus.
If the table has a primary key, the partition key must include part or all of that primary key. The primary key constraint can only be applied at the partition table level, not individually for each partition.