Grouping Sets, Rollup, and Cube

Grouping Sets, Rollup, and Cube

In data analysis and OLAP (Online Analytical Processing) scenarios, we frequently need to perform multi-dimensional, hierarchical grouping and aggregation calculations. TacNode provides powerful GROUP BY extensions: GROUPING SETS, ROLLUP, and CUBE, which efficiently generate subtotals, grand totals, and multi-dimensional cross-tabulated reports, greatly simplifying the creation of complex aggregation queries.

Prerequisites and Basic Concepts

Before diving into these three features, ensure you understand basic GROUP BY and aggregate functions (such as SUM(), COUNT(), AVG()).

Example Table sales: For the following explanations, we'll assume we have a sales record table called sales.

CREATE TABLE sales (
    region VARCHAR(50),
    country VARCHAR(50),
    product_category VARCHAR(50),
    sales_amount NUMERIC
);
 
-- Sample data
INSERT INTO sales VALUES
('Asia', 'China', 'Electronics', 10000),
('Asia', 'China', 'Clothing', 5000),
('Asia', 'Japan', 'Electronics', 8000),
('Asia', 'Japan', 'Clothing', 3000),
('Europe', 'UK', 'Electronics', 7000),
('Europe', 'UK', 'Clothing', 2000),
('Europe', 'Germany', 'Electronics', 9000),
('Europe', 'Germany', 'Clothing', 4000);

GROUPING SETS: Custom Grouping Collections

GROUPING SETS is the most fundamental and flexible primitive. It allows you to specify multiple grouping combinations that you want to calculate simultaneously. ROLLUP and CUBE are essentially syntactic sugar for specific GROUPING SETS patterns.

Syntax:

SELECT
    column1,
    column2,
    aggregate_function(column3)
FROM
    table_name
GROUP BY
    GROUPING SETS (
        (column1, column2),
        (column1),
        (column2),
        () -- Empty parentheses represent grand total
    );

Functionality: For each grouping set listed in the GROUPING SETS clause, a separate GROUP BY operation is performed, and all results are combined (UNION ALL) together.

Example: We want to simultaneously obtain:

  1. Fine-grained grouping totals by region and country.
  2. Grouping totals by region only.
  3. Grouping totals by country only.
  4. Grand total.

Using GROUPING SETS accomplishes this in one step:

SELECT
    region,
    country,
    SUM(sales_amount) AS total_sales
FROM
    sales
GROUP BY
    GROUPING SETS (
        (region, country), -- Combination 1: Fine-grained
        (region),          -- Combination 2: Region only
        (country),         -- Combination 3: Country only
        ()                 -- Combination 4: Grand total
    )
ORDER BY
    region, country;

Result:

 region | country | total_sales
--------+---------+-------------
 Asia   | China   |       15000
 Asia   | Japan   |       11000
 Asia   |         |       26000  <- This is Asia region subtotal
 Europe | Germany |       13000
 Europe | UK      |        9000
 Europe |         |       22000  <- This is Europe region subtotal
        | China   |       15000  <- All regions China total
        | Germany |       13000
        | Japan   |       11000
        | UK      |        9000
        |         |       48000  <- This is grand total
(11 rows)

ROLLUP: Hierarchical Subtotals and Grand Totals

ROLLUP is used to generate hierarchical or dimensional subtotals and grand totals. It assumes a hierarchical relationship exists between grouping columns (such as Year > Month > Day, or Region > Country > City).

Syntax:

GROUP BY ROLLUP (column1, column2, ..., columnN)

This is equivalent to:

GROUP BY GROUPING SETS (
    (column1, column2, ..., columnN), -- Most detailed level
    (column1, column2, ...),
    ...,
    (column1), -- First level subtotal
    ()         -- Grand total
)

Functionality: Starting from the most detailed granularity, it progressively reduces grouping columns from right to left and generates subtotals, finally producing a grand total.

Example: We want to analyze hierarchical sales data for region > country > product_category and obtain subtotals for each level.

SELECT
    region,
    country,
    product_category,
    SUM(sales_amount) AS total_sales
FROM
    sales
GROUP BY
    ROLLUP (region, country, product_category)
ORDER BY
    region, country, product_category;

Result interpretation:

 region | country | product_category | total_sales
--------+---------+------------------+-------------
 Asia   | China   | Clothing         |        5000  -> 5000 (most detailed level)
 Asia   | China   | Electronics      |       10000  -> 10000 (most detailed level)
 Asia   | China   |                  |       15000  -> **15000** (China subtotal)
 Asia   | Japan   | Clothing         |        3000  -> 3000
 Asia   | Japan   | Electronics      |        8000  -> 8000
 Asia   | Japan   |                  |       11000  -> **11000** (Japan subtotal)
 Asia   |         |                  |       26000  -> **26000** (Asia region subtotal)
 Europe | Germany | Clothing         |        4000
 Europe | Germany | Electronics      |        9000
 Europe | Germany |                  |       13000
 Europe | UK      | Clothing         |        2000
 Europe | UK      | Electronics      |        7000
 Europe | UK      |                  |        9000
 Europe |         |                  |       22000
        |         |                  |       48000  -> **48000** (grand total)

CUBE: Generate All Combination Subtotals

CUBE is the most powerful feature that generates all possible grouping column combinations. If your dimensions are at the same level with no clear hierarchical relationship, and you want to analyze all dimensional intersections, CUBE is the best choice.

Syntax:

GROUP BY CUBE (column1, column2, ..., columnN)

For N columns, CUBE generates 2^N grouping combinations. For example, CUBE(a, b, c) is equivalent to:

GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b),
    (a, c),
    (b, c),
    (a),
    (b),
    (c),
    ()
)

Functionality: Generates grouping aggregations for all possible subset combinations.

Example: We want to perform full cross-analysis on the three dimensions: region, country, product_category.

SELECT
    region,
    country,
    product_category,
    SUM(sales_amount) AS total_sales
FROM
    sales
GROUP BY
    CUBE (region, country, product_category)
ORDER BY
    region, country, product_category;

Result interpretation (in addition to ROLLUP results, it also includes):

 region | country | product_category | total_sales
--------+---------+------------------+-------------
 Asia   | China   | Clothing         |        5000
 Asia   | China   | Electronics      |       10000
 Asia   | China   |                  |       15000  -> All product categories under China (15000)
 Asia   | Japan   | Clothing         |        3000
 Asia   | Japan   | Electronics      |        8000
 Asia   | Japan   |                  |       11000
 Asia   |         | Clothing         |        8000
 Asia   |         | Electronics      |       18000
 Asia   |         |                  |       26000
 Europe | Germany | Clothing         |        4000
 Europe | Germany | Electronics      |        9000
 Europe | Germany |                  |       13000  -> All product categories under Germany (13000)
 Europe | UK      | Clothing         |        2000
 Europe | UK      | Electronics      |        7000
 Europe | UK      |                  |        9000
 Europe |         | Clothing         |        6000
 Europe |         | Electronics      |       16000
 Europe |         |                  |       22000
        | China   | Clothing         |        5000
        | China   | Electronics      |       10000
        | China   |                  |       15000
        | Germany | Clothing         |        4000
        | Germany | Electronics      |        9000
        | Germany |                  |       13000
        | Japan   | Clothing         |        3000
        | Japan   | Electronics      |        8000
        | Japan   |                  |       11000
        | UK      | Clothing         |        2000
        | UK      | Electronics      |        7000
        | UK      |                  |        9000
        |         | Clothing         |       14000  -> All regions all countries clothing total (5000+3000+2000+4000=14000)
        |         | Electronics      |       34000  -> All regions all countries electronics total
        |         |                  |       48000

Distinguishing Original NULL vs Grouping-Generated NULL: GROUPING() Function

When data itself contains NULL values, we cannot distinguish whether NULL in results comes from original data or is generated by grouping functionality. TacNode provides the GROUPING() function to solve this problem.

Syntax: GROUPING(column_name)

  • Returns 0: Indicates the column is part of the current grouping (where NULL is from original data).
  • Returns 1: Indicates the column was aggregated away by grouping operations (where NULL is generated by subtotal/grand total).

Advanced Example: Using GROUPING() to beautify output.

SELECT
    CASE
        WHEN GROUPING(region) = 1 THEN 'All Regions'
        ELSE COALESCE(region, 'N/A')
    END AS region,
    CASE
        WHEN GROUPING(country) = 1 THEN 'All Countries'
        ELSE COALESCE(country, 'N/A')
    END AS country,
    SUM(sales_amount) AS total_sales,
    -- This is a convenient bitmask representing grouping state
    GROUPING(region, country) AS grouping_mask
FROM
    sales
GROUP BY
    ROLLUP (region, country)
ORDER BY
    region, country;

Result:

   region    |    country    | total_sales | grouping_mask
-------------+---------------+-------------+---------------
 All Regions | All Countries |       48000 |             3
 Asia        | All Countries |       26000 |             1
 Asia        | China         |       15000 |             0
 Asia        | Japan         |       11000 |             0
 Europe      | All Countries |       22000 |             1
 Europe      | Germany       |       13000 |             0
 Europe      | UK            |        9000 |             0

GROUPING(region, country) returns an integer whose binary bits represent each field's grouping state. For example, 3 in binary is 11, indicating both the first column (region) and second column (country) have been aggregated.

Advanced Use Cases and Examples

Financial Reporting with ROLLUP

-- Quarterly sales report with hierarchical totals
SELECT
    EXTRACT(YEAR FROM sale_date) AS year,
    EXTRACT(QUARTER FROM sale_date) AS quarter,
    EXTRACT(MONTH FROM sale_date) AS month,
    SUM(sales_amount) AS total_sales,
    COUNT(*) AS transaction_count,
    AVG(sales_amount) AS avg_transaction
FROM
    sales_detailed
GROUP BY
    ROLLUP (
        EXTRACT(YEAR FROM sale_date),
        EXTRACT(QUARTER FROM sale_date),
        EXTRACT(MONTH FROM sale_date)
    )
ORDER BY
    year, quarter, month;

Customer Segmentation Analysis with CUBE

-- Multi-dimensional customer analysis
SELECT
    customer_segment,
    product_line,
    sales_channel,
    COUNT(DISTINCT customer_id) AS unique_customers,
    SUM(order_value) AS total_revenue,
    AVG(order_value) AS avg_order_value
FROM
    customer_orders
GROUP BY
    CUBE (customer_segment, product_line, sales_channel)
ORDER BY
    customer_segment, product_line, sales_channel;

Custom Grouping Sets for Specific Analysis

-- Specific grouping combinations for executive dashboard
SELECT
    region,
    product_category,
    sales_rep,
    SUM(sales_amount) AS total_sales,
    COUNT(*) AS deal_count
FROM
    sales_data
GROUP BY
    GROUPING SETS (
        (region, product_category),     -- Regional product performance
        (sales_rep),                    -- Individual rep performance
        (region, sales_rep),            -- Rep performance by region
        ()                              -- Company total
    )
ORDER BY
    region, product_category, sales_rep;

Performance Optimization Techniques

-- Using GROUPING() for conditional formatting
SELECT
    CASE
        WHEN GROUPING(region) = 1 AND GROUPING(product_category) = 1 THEN '🌍 GLOBAL TOTAL'
        WHEN GROUPING(region) = 0 AND GROUPING(product_category) = 1 THEN '📍 ' || region || ' REGIONAL TOTAL'
        WHEN GROUPING(region) = 1 AND GROUPING(product_category) = 0 THEN '📦 ' || product_category || ' PRODUCT TOTAL'
        ELSE region || ' - ' || product_category
    END AS breakdown,
    SUM(sales_amount) AS total_sales,
    ROUND(
        100.0 * SUM(sales_amount) / 
        SUM(SUM(sales_amount)) OVER (PARTITION BY GROUPING(region, product_category)),
        2
    ) AS percentage
FROM
    sales
GROUP BY
    CUBE (region, product_category)
ORDER BY
    GROUPING(region, product_category),
    region,
    product_category;

Summary and Selection Guide

FeatureGROUPING SETSROLLUPCUBE
FlexibilityExtremely high, can specify any combinationMedium, follows hierarchical structureHigh, generates all combinations
Output RowsCustomFewer (N+1 levels)Most (2^N combinations)
Use CasesNeed specific, non-continuous aggregation combinationsClear hierarchical structure reports (geographical, time hierarchy)Multi-dimensional cross-analysis, explore all dimensional relationships
PerformanceDepends on specified set countUsually better than CUBEMost resource-intensive, use cautiously with many dimensions

Selection Recommendations:

  • Use ROLLUP when you need totals and hierarchical subtotals (such as each region, each country total).
  • Use CUBE when you need all possible dimensional combinations for aggregation (such as seeing region totals, product totals, and region-product cross totals simultaneously).
  • Use GROUPING SETS when you only need specific combinations of aggregation (such as seeing region aggregation and product aggregation but not their cross combinations) for best performance.

Best Practices

1. Performance Considerations

-- Avoid unnecessary combinations in CUBE
-- Instead of CUBE with many dimensions:
GROUP BY CUBE (region, country, city, product, brand, channel) -- 2^6 = 64 combinations!
 
-- Use targeted GROUPING SETS:
GROUP BY GROUPING SETS (
    (region, product),
    (country, brand),
    (city, channel),
    ()
)

2. Indexing Strategy

-- Create composite indexes for common grouping patterns
CREATE INDEX idx_sales_region_country ON sales (region, country);
CREATE INDEX idx_sales_product_category ON sales (product_category);
CREATE INDEX idx_sales_rollup ON sales (region, country, product_category);

3. Memory Management

-- For large datasets, consider using CTEs to pre-filter
WITH filtered_sales AS (
    SELECT region, country, product_category, sales_amount
    FROM sales
    WHERE sale_date >= '2024-01-01'
)
SELECT
    region,
    country,
    product_category,
    SUM(sales_amount) AS total_sales
FROM filtered_sales
GROUP BY CUBE (region, country, product_category);

4. Result Set Management

-- Use HAVING to filter aggregated results
SELECT
    region,
    country,
    SUM(sales_amount) AS total_sales
FROM sales
GROUP BY ROLLUP (region, country)
HAVING SUM(sales_amount) > 10000
ORDER BY total_sales DESC;

This comprehensive guide covers all aspects of TacNode's advanced grouping capabilities, enabling you to perform sophisticated multi-dimensional analysis and generate powerful analytical reports with ease.