Grouping Sets, Rollup, and Cube
Master advanced SQL grouping capabilities in TacNode with GROUPING SETS, ROLLUP, and CUBE for multi-dimensional data analysis and reporting.
Grouping Sets, Rollup, and Cube
In data analysis and OLAP (Online Analytical Processing) scenarios, we frequently need to perform multi-dimensional, hierarchical grouping and aggregation calculations. TacNode provides powerful GROUP BY extensions: GROUPING SETS, ROLLUP, and CUBE, which efficiently generate subtotals, grand totals, and multi-dimensional cross-tabulated reports, greatly simplifying the creation of complex aggregation queries.
Prerequisites and Basic Concepts
Before diving into these three features, ensure you understand basic GROUP BY and aggregate functions (such as SUM(), COUNT(), AVG()).
Example Table sales: For the following explanations, we’ll assume we have a sales record table called sales.
CREATE TABLE sales (
region VARCHAR(50),
country VARCHAR(50),
product_category VARCHAR(50),
sales_amount NUMERIC
);
-- Sample data
INSERT INTO sales VALUES
('Asia', 'China', 'Electronics', 10000),
('Asia', 'China', 'Clothing', 5000),
('Asia', 'Japan', 'Electronics', 8000),
('Asia', 'Japan', 'Clothing', 3000),
('Europe', 'UK', 'Electronics', 7000),
('Europe', 'UK', 'Clothing', 2000),
('Europe', 'Germany', 'Electronics', 9000),
('Europe', 'Germany', 'Clothing', 4000);
GROUPING SETS: Custom Grouping Collections
GROUPING SETS is the most fundamental and flexible primitive. It allows you to specify multiple grouping combinations that you want to calculate simultaneously. ROLLUP and CUBE are essentially syntactic sugar for specific GROUPING SETS patterns.
Syntax:
SELECT
column1,
column2,
aggregate_function(column3)
FROM
table_name
GROUP BY
GROUPING SETS (
(column1, column2),
(column1),
(column2),
() -- Empty parentheses represent grand total
);
Functionality: For each grouping set listed in the GROUPING SETS clause, a separate GROUP BY operation is performed, and all results are combined (UNION ALL) together.
Example: We want to simultaneously obtain:
- Fine-grained grouping totals by
regionandcountry. - Grouping totals by
regiononly. - Grouping totals by
countryonly. - Grand total.
Using GROUPING SETS accomplishes this in one step:
SELECT
region,
country,
SUM(sales_amount) AS total_sales
FROM
sales
GROUP BY
GROUPING SETS (
(region, country), -- Combination 1: Fine-grained
(region), -- Combination 2: Region only
(country), -- Combination 3: Country only
() -- Combination 4: Grand total
)
ORDER BY
region, country;
Result:
region | country | total_sales
--------+---------+-------------
Asia | China | 15000
Asia | Japan | 11000
Asia | | 26000 <- This is Asia region subtotal
Europe | Germany | 13000
Europe | UK | 9000
Europe | | 22000 <- This is Europe region subtotal
| China | 15000 <- All regions China total
| Germany | 13000
| Japan | 11000
| UK | 9000
| | 48000 <- This is grand total
(11 rows)
ROLLUP: Hierarchical Subtotals and Grand Totals
ROLLUP is used to generate hierarchical or dimensional subtotals and grand totals. It assumes a hierarchical relationship exists between grouping columns (such as Year > Month > Day, or Region > Country > City).
Syntax:
GROUP BY ROLLUP (column1, column2, ..., columnN)
This is equivalent to:
GROUP BY GROUPING SETS (
(column1, column2, ..., columnN), -- Most detailed level
(column1, column2, ...),
...,
(column1), -- First level subtotal
() -- Grand total
)
Functionality: Starting from the most detailed granularity, it progressively reduces grouping columns from right to left and generates subtotals, finally producing a grand total.
Example: We want to analyze hierarchical sales data for region > country > product_category and obtain subtotals for each level.
SELECT
region,
country,
product_category,
SUM(sales_amount) AS total_sales
FROM
sales
GROUP BY
ROLLUP (region, country, product_category)
ORDER BY
region, country, product_category;
Result interpretation:
region | country | product_category | total_sales
--------+---------+------------------+-------------
Asia | China | Clothing | 5000 -> 5000 (most detailed level)
Asia | China | Electronics | 10000 -> 10000 (most detailed level)
Asia | China | | 15000 -> **15000** (China subtotal)
Asia | Japan | Clothing | 3000 -> 3000
Asia | Japan | Electronics | 8000 -> 8000
Asia | Japan | | 11000 -> **11000** (Japan subtotal)
Asia | | | 26000 -> **26000** (Asia region subtotal)
Europe | Germany | Clothing | 4000
Europe | Germany | Electronics | 9000
Europe | Germany | | 13000
Europe | UK | Clothing | 2000
Europe | UK | Electronics | 7000
Europe | UK | | 9000
Europe | | | 22000
| | | 48000 -> **48000** (grand total)
CUBE: Generate All Combination Subtotals
CUBE is the most powerful feature that generates all possible grouping column combinations. If your dimensions are at the same level with no clear hierarchical relationship, and you want to analyze all dimensional intersections, CUBE is the best choice.
Syntax:
GROUP BY CUBE (column1, column2, ..., columnN)
For N columns, CUBE generates 2^N grouping combinations. For example, CUBE(a, b, c) is equivalent to:
GROUP BY GROUPING SETS (
(a, b, c),
(a, b),
(a, c),
(b, c),
(a),
(b),
(c),
()
)
Functionality: Generates grouping aggregations for all possible subset combinations.
Example: We want to perform full cross-analysis on the three dimensions: region, country, product_category.
SELECT
region,
country,
product_category,
SUM(sales_amount) AS total_sales
FROM
sales
GROUP BY
CUBE (region, country, product_category)
ORDER BY
region, country, product_category;
Result interpretation (in addition to ROLLUP results, it also includes):
region | country | product_category | total_sales
--------+---------+------------------+-------------
Asia | China | Clothing | 5000
Asia | China | Electronics | 10000
Asia | China | | 15000 -> All product categories under China (15000)
Asia | Japan | Clothing | 3000
Asia | Japan | Electronics | 8000
Asia | Japan | | 11000
Asia | | Clothing | 8000
Asia | | Electronics | 18000
Asia | | | 26000
Europe | Germany | Clothing | 4000
Europe | Germany | Electronics | 9000
Europe | Germany | | 13000 -> All product categories under Germany (13000)
Europe | UK | Clothing | 2000
Europe | UK | Electronics | 7000
Europe | UK | | 9000
Europe | | Clothing | 6000
Europe | | Electronics | 16000
Europe | | | 22000
| China | Clothing | 5000
| China | Electronics | 10000
| China | | 15000
| Germany | Clothing | 4000
| Germany | Electronics | 9000
| Germany | | 13000
| Japan | Clothing | 3000
| Japan | Electronics | 8000
| Japan | | 11000
| UK | Clothing | 2000
| UK | Electronics | 7000
| UK | | 9000
| | Clothing | 14000 -> All regions all countries clothing total (5000+3000+2000+4000=14000)
| | Electronics | 34000 -> All regions all countries electronics total
| | | 48000
Distinguishing Original NULL vs Grouping-Generated NULL: GROUPING() Function
When data itself contains NULL values, we cannot distinguish whether NULL in results comes from original data or is generated by grouping functionality. TacNode provides the GROUPING() function to solve this problem.
Syntax: GROUPING(column_name)
- Returns
0: Indicates the column is part of the current grouping (whereNULLis from original data). - Returns
1: Indicates the column was aggregated away by grouping operations (whereNULLis generated by subtotal/grand total).
Advanced Example: Using GROUPING() to beautify output.
SELECT
CASE
WHEN GROUPING(region) = 1 THEN 'All Regions'
ELSE COALESCE(region, 'N/A')
END AS region,
CASE
WHEN GROUPING(country) = 1 THEN 'All Countries'
ELSE COALESCE(country, 'N/A')
END AS country,
SUM(sales_amount) AS total_sales,
-- This is a convenient bitmask representing grouping state
GROUPING(region, country) AS grouping_mask
FROM
sales
GROUP BY
ROLLUP (region, country)
ORDER BY
region, country;
Result:
region | country | total_sales | grouping_mask
-------------+---------------+-------------+---------------
All Regions | All Countries | 48000 | 3
Asia | All Countries | 26000 | 1
Asia | China | 15000 | 0
Asia | Japan | 11000 | 0
Europe | All Countries | 22000 | 1
Europe | Germany | 13000 | 0
Europe | UK | 9000 | 0
GROUPING(region, country) returns an integer whose binary bits represent each field’s grouping state. For example, 3 in binary is 11, indicating both the first column (region) and second column (country) have been aggregated.
Advanced Use Cases and Examples
Financial Reporting with ROLLUP
-- Quarterly sales report with hierarchical totals
SELECT
EXTRACT(YEAR FROM sale_date) AS year,
EXTRACT(QUARTER FROM sale_date) AS quarter,
EXTRACT(MONTH FROM sale_date) AS month,
SUM(sales_amount) AS total_sales,
COUNT(*) AS transaction_count,
AVG(sales_amount) AS avg_transaction
FROM
sales_detailed
GROUP BY
ROLLUP (
EXTRACT(YEAR FROM sale_date),
EXTRACT(QUARTER FROM sale_date),
EXTRACT(MONTH FROM sale_date)
)
ORDER BY
year, quarter, month;
Customer Segmentation Analysis with CUBE
-- Multi-dimensional customer analysis
SELECT
customer_segment,
product_line,
sales_channel,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(order_value) AS total_revenue,
AVG(order_value) AS avg_order_value
FROM
customer_orders
GROUP BY
CUBE (customer_segment, product_line, sales_channel)
ORDER BY
customer_segment, product_line, sales_channel;
Custom Grouping Sets for Specific Analysis
-- Specific grouping combinations for executive dashboard
SELECT
region,
product_category,
sales_rep,
SUM(sales_amount) AS total_sales,
COUNT(*) AS deal_count
FROM
sales_data
GROUP BY
GROUPING SETS (
(region, product_category), -- Regional product performance
(sales_rep), -- Individual rep performance
(region, sales_rep), -- Rep performance by region
() -- Company total
)
ORDER BY
region, product_category, sales_rep;
Performance Optimization Techniques
-- Using GROUPING() for conditional formatting
SELECT
CASE
WHEN GROUPING(region) = 1 AND GROUPING(product_category) = 1 THEN '🌍 GLOBAL TOTAL'
WHEN GROUPING(region) = 0 AND GROUPING(product_category) = 1 THEN '📍 ' || region || ' REGIONAL TOTAL'
WHEN GROUPING(region) = 1 AND GROUPING(product_category) = 0 THEN '📦 ' || product_category || ' PRODUCT TOTAL'
ELSE region || ' - ' || product_category
END AS breakdown,
SUM(sales_amount) AS total_sales,
ROUND(
100.0 * SUM(sales_amount) /
SUM(SUM(sales_amount)) OVER (PARTITION BY GROUPING(region, product_category)),
2
) AS percentage
FROM
sales
GROUP BY
CUBE (region, product_category)
ORDER BY
GROUPING(region, product_category),
region,
product_category;
Summary and Selection Guide
| Feature | GROUPING SETS | ROLLUP | CUBE |
|---|---|---|---|
| Flexibility | Extremely high, can specify any combination | Medium, follows hierarchical structure | High, generates all combinations |
| Output Rows | Custom | Fewer (N+1 levels) | Most (2^N combinations) |
| Use Cases | Need specific, non-continuous aggregation combinations | Clear hierarchical structure reports (geographical, time hierarchy) | Multi-dimensional cross-analysis, explore all dimensional relationships |
| Performance | Depends on specified set count | Usually better than CUBE | Most resource-intensive, use cautiously with many dimensions |
Selection Recommendations:
- Use ROLLUP when you need totals and hierarchical subtotals (such as each region, each country total).
- Use CUBE when you need all possible dimensional combinations for aggregation (such as seeing region totals, product totals, and region-product cross totals simultaneously).
- Use GROUPING SETS when you only need specific combinations of aggregation (such as seeing region aggregation and product aggregation but not their cross combinations) for best performance.
Best Practices
1. Performance Considerations
-- Avoid unnecessary combinations in CUBE
-- Instead of CUBE with many dimensions:
GROUP BY CUBE (region, country, city, product, brand, channel) -- 2^6 = 64 combinations!
-- Use targeted GROUPING SETS:
GROUP BY GROUPING SETS (
(region, product),
(country, brand),
(city, channel),
()
)
2. Indexing Strategy
-- Create composite indexes for common grouping patterns
CREATE INDEX idx_sales_region_country ON sales (region, country);
CREATE INDEX idx_sales_product_category ON sales (product_category);
CREATE INDEX idx_sales_rollup ON sales (region, country, product_category);
3. Memory Management
-- For large datasets, consider using CTEs to pre-filter
WITH filtered_sales AS (
SELECT region, country, product_category, sales_amount
FROM sales
WHERE sale_date >= '2024-01-01'
)
SELECT
region,
country,
product_category,
SUM(sales_amount) AS total_sales
FROM filtered_sales
GROUP BY CUBE (region, country, product_category);
4. Result Set Management
-- Use HAVING to filter aggregated results
SELECT
region,
country,
SUM(sales_amount) AS total_sales
FROM sales
GROUP BY ROLLUP (region, country)
HAVING SUM(sales_amount) > 10000
ORDER BY total_sales DESC;
This comprehensive guide covers all aspects of TacNode’s advanced grouping capabilities, enabling you to perform sophisticated multi-dimensional analysis and generate powerful analytical reports with ease.