SQL Statement Analysis

The pg_stat_statements extension is a powerful statistics analysis tool provided by Tacnode that allows users to track and analyze SQL query execution patterns. This extension is invaluable for performance optimization, slow query tracking, and understanding database workload patterns.

In pg_stat_statements, query text is "normalized" - logically identical SQL statements with different parameters are merged for statistics purposes, such as SELECT * FROM table WHERE id = $1.

Use Case Overview

Typical use cases for pg_stat_statements include:

  1. Performance Bottleneck Identification: Identify SQL statements with the longest execution times or highest resource consumption.
  2. Query Optimization: Find inefficient queries (such as frequent full table scans or statements missing indexes).
  3. Database Load Monitoring: Track slow queries and request patterns in the database.
  4. Capacity Planning: Understand the most frequently accessed tables or resource-intensive query patterns to plan future optimization tasks.

Loading the Extension

1. Enable the Extension

First, enable the extension in your database. You need system administrator privileges to execute this command. For detailed information about the extension, refer to the pg_stat_statements documentation:

CREATE EXTENSION pg_stat_statements;

2. Verify Installation

After successful installation, two views will be created in the public schema: pg_stat_statements and pg_stat_statements_info. You can verify the extension is active using the following commands:

SELECT pg_stat_statements_reset(); -- Reset statistics data
SELECT * FROM pg_stat_statements 
LIMIT 1;

Common Usage Patterns

Basic Statistics Query

Retrieve statistical information for executed SQL statements:

SELECT
    query,
    calls,              -- Number of executions
    total_exec_time,    -- Total execution time (milliseconds)
    mean_exec_time,     -- Average execution time
    rows                -- Number of rows returned
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;              -- Find the 10 longest-running SQL statements

Example Output:

                           query                            | calls | total_exec_time | mean_exec_time | rows
-----------------------------------------------------------+-------+-----------------+----------------+------
 SELECT * FROM sales WHERE region = $1 AND date > $2      |   156 |        8543.23  |      54.76     | 2340
 INSERT INTO user_activity (user_id, action, timestamp)   |  1247 |        6789.45  |       5.44     |    0
 UPDATE inventory SET quantity = $1 WHERE product_id = $2 |   892 |        5432.10  |       6.09     |  892
(3 rows)

Analyzing Inefficient Queries

Find queries that return few rows but take a long time to execute:

SELECT
    query,
    calls,
    rows,
    mean_exec_time
FROM pg_stat_statements
WHERE calls > 50        -- Filter out sampling noise by requiring > 50 executions
ORDER BY mean_exec_time DESC
LIMIT 5;

Use Case: This query helps identify queries that might be missing indexes or performing unnecessary full table scans.

Finding Most Frequently Executed Queries

SELECT
    query,
    calls
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

Example Output:

                           query                            | calls
-----------------------------------------------------------+-------
 SELECT id, name FROM users WHERE status = $1             | 12450
 INSERT INTO access_log (user_id, endpoint, timestamp)    |  8932
 SELECT COUNT(*) FROM active_sessions                     |  7234
(3 rows)

Filtering Queries by Specific Tables or Keywords

Use pattern matching to filter statements related to specific tables:

SELECT
    query,
    calls,
    total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%your_table_name%'
ORDER BY total_exec_time DESC;

Advanced Pattern Matching Examples:

-- Find all queries involving user-related tables
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time
FROM pg_stat_statements
WHERE query ~* '(users|user_profiles|user_sessions)'
ORDER BY total_exec_time DESC
LIMIT 10;
 
-- Find all UPDATE and DELETE operations
SELECT
    query,
    calls,
    total_exec_time
FROM pg_stat_statements
WHERE query ~* '^(UPDATE|DELETE)'
ORDER BY calls DESC;

Resetting Statistics Data

To clear historical query statistics records:

SELECT pg_stat_statements_reset();

Advanced Analysis Techniques

Identifying Resource-Intensive Operations

-- Find queries with high I/O operations
SELECT
    query,
    calls,
    total_exec_time,
    shared_blks_read,    -- Blocks read from shared memory
    shared_blks_hit,     -- Blocks found in shared memory
    temp_blks_read,      -- Temporary blocks read
    temp_blks_written,   -- Temporary blocks written
    ROUND(
        (shared_blks_hit::numeric / 
         NULLIF(shared_blks_hit + shared_blks_read, 0)) * 100, 2
    ) AS cache_hit_ratio
FROM pg_stat_statements
WHERE shared_blks_read > 1000
ORDER BY shared_blks_read DESC
LIMIT 10;

Performance Trend Analysis

-- Calculate efficiency metrics
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    rows,
    CASE 
        WHEN rows > 0 THEN ROUND(mean_exec_time / rows, 4)
        ELSE NULL 
    END AS ms_per_row,
    ROUND(
        (total_exec_time / SUM(total_exec_time) OVER()) * 100, 2
    ) AS pct_total_time
FROM pg_stat_statements
WHERE calls > 10
ORDER BY pct_total_time DESC
LIMIT 15;

Query Classification and Analysis

-- Classify queries by operation type
SELECT
    CASE
        WHEN query ~* '^SELECT' THEN 'SELECT'
        WHEN query ~* '^INSERT' THEN 'INSERT'
        WHEN query ~* '^UPDATE' THEN 'UPDATE'
        WHEN query ~* '^DELETE' THEN 'DELETE'
        ELSE 'OTHER'
    END AS operation_type,
    COUNT(*) AS query_count,
    SUM(calls) AS total_calls,
    SUM(total_exec_time) AS total_time,
    AVG(mean_exec_time) AS avg_mean_time,
    SUM(rows) AS total_rows
FROM pg_stat_statements
GROUP BY operation_type
ORDER BY total_time DESC;

Database Load Analysis

-- Analyze peak load patterns
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    stddev_exec_time,    -- Standard deviation of execution time
    min_exec_time,       -- Minimum execution time
    max_exec_time,       -- Maximum execution time
    ROUND(
        (stddev_exec_time / NULLIF(mean_exec_time, 0)) * 100, 2
    ) AS exec_time_variability_pct
FROM pg_stat_statements
WHERE calls > 100
ORDER BY exec_time_variability_pct DESC
LIMIT 10;

Monitoring and Alerting

Performance Threshold Monitoring

-- Identify queries exceeding performance thresholds
SELECT
    query,
    calls,
    mean_exec_time,
    max_exec_time,
    total_exec_time
FROM pg_stat_statements
WHERE 
    mean_exec_time > 1000    -- Average > 1 second
    OR max_exec_time > 5000  -- Maximum > 5 seconds
    OR calls > 10000         -- High frequency queries
ORDER BY mean_exec_time DESC;

Top Resource Consumers Dashboard

-- Create a comprehensive performance dashboard
WITH query_stats AS (
    SELECT
        LEFT(query, 80) || '...' AS query_summary,
        calls,
        total_exec_time,
        mean_exec_time,
        rows,
        shared_blks_read + shared_blks_hit AS total_buffers,
        temp_blks_written,
        ROW_NUMBER() OVER (ORDER BY total_exec_time DESC) as time_rank,
        ROW_NUMBER() OVER (ORDER BY calls DESC) as freq_rank,
        ROW_NUMBER() OVER (ORDER BY mean_exec_time DESC) as avg_time_rank
    FROM pg_stat_statements
    WHERE calls > 5
)
SELECT
    query_summary,
    calls,
    ROUND(total_exec_time, 2) AS total_time_ms,
    ROUND(mean_exec_time, 2) AS avg_time_ms,
    rows,
    total_buffers,
    temp_blks_written,
    CASE 
        WHEN time_rank <= 5 THEN '🔥 High Total Time'
        WHEN freq_rank <= 5 THEN '⚡ High Frequency'
        WHEN avg_time_rank <= 5 THEN '🐌 Slow Average'
        ELSE '📊 Normal'
    END AS performance_category
FROM query_stats
WHERE time_rank <= 20 OR freq_rank <= 20 OR avg_time_rank <= 20
ORDER BY total_exec_time DESC;

Configuration and Best Practices

Extension Configuration

Key configuration parameters that affect pg_stat_statements:

-- View current configuration
SELECT name, setting, unit, short_desc 
FROM pg_settings 
WHERE name LIKE 'pg_stat_statements%';
 
-- Common configuration adjustments (requires restart)
-- In postgresql.conf:
-- pg_stat_statements.max = 10000        # Track more statements
-- pg_stat_statements.track = 'all'      # Track all statements
-- pg_stat_statements.track_utility = on # Track utility commands
-- pg_stat_statements.save = on          # Persist across restarts

Maintenance Procedures

-- Regular maintenance query to prevent view overflow
DO $$
BEGIN
    -- Reset stats if we're tracking too many statements
    IF (SELECT COUNT(*) FROM pg_stat_statements) > 8000 THEN
        PERFORM pg_stat_statements_reset();
        RAISE NOTICE 'pg_stat_statements reset due to high statement count';
    END IF;
END $$;
 
-- Selective cleanup - remove low-impact statements
SELECT pg_stat_statements_reset(
    userid, 
    dbid, 
    queryid
)
FROM pg_stat_statements
WHERE calls < 3 AND total_exec_time < 100;

Performance Impact Considerations

-- Monitor the extension's own performance impact
SELECT 
    schemaname,
    tablename,
    n_tup_ins,
    n_tup_upd,
    n_tup_del,
    last_analyze
FROM pg_stat_user_tables 
WHERE tablename = 'pg_stat_statements';
 
-- Check extension overhead
SELECT 
    'pg_stat_statements' AS extension_name,
    pg_size_pretty(pg_total_relation_size('pg_stat_statements')) AS size,
    COUNT(*) AS statement_count
FROM pg_stat_statements;

Integration Examples

Application Performance Monitoring

-- Create a monitoring view for application dashboards
CREATE OR REPLACE VIEW application_query_performance AS
SELECT
    CASE
        WHEN query LIKE 'SELECT%users%' THEN 'User Queries'
        WHEN query LIKE '%INSERT INTO orders%' THEN 'Order Processing'
        WHEN query LIKE '%payment%' THEN 'Payment Processing'
        WHEN query LIKE '%analytics%' THEN 'Analytics'
        ELSE 'Other'
    END AS business_function,
    COUNT(*) AS unique_queries,
    SUM(calls) AS total_executions,
    ROUND(SUM(total_exec_time), 2) AS total_time_ms,
    ROUND(AVG(mean_exec_time), 2) AS avg_exec_time_ms,
    SUM(rows) AS total_rows_processed
FROM pg_stat_statements
GROUP BY business_function
ORDER BY total_time_ms DESC;

Automated Performance Alerts

-- Function to generate performance alerts
CREATE OR REPLACE FUNCTION check_query_performance_alerts()
RETURNS TABLE (
    alert_type TEXT,
    query_snippet TEXT,
    metric_value NUMERIC,
    threshold NUMERIC,
    recommendation TEXT
) AS $$
BEGIN
    -- Slow query alerts
    RETURN QUERY
    SELECT
        'SLOW_QUERY'::TEXT,
        LEFT(query, 100),
        mean_exec_time,
        1000::NUMERIC,
        'Consider adding indexes or optimizing query structure'::TEXT
    FROM pg_stat_statements
    WHERE mean_exec_time > 1000 AND calls > 10;
    
    -- High frequency alerts
    RETURN QUERY
    SELECT
        'HIGH_FREQUENCY'::TEXT,
        LEFT(query, 100),
        calls::NUMERIC,
        5000::NUMERIC,
        'Consider caching or connection pooling for this query'::TEXT
    FROM pg_stat_statements
    WHERE calls > 5000;
    
    -- Resource intensive alerts
    RETURN QUERY
    SELECT
        'HIGH_IO'::TEXT,
        LEFT(query, 100),
        (shared_blks_read + temp_blks_read)::NUMERIC,
        10000::NUMERIC,
        'Query performs excessive I/O operations - review indexes'::TEXT
    FROM pg_stat_statements
    WHERE (shared_blks_read + temp_blks_read) > 10000;
END;
$$ LANGUAGE plpgsql;
 
-- Run performance check
SELECT * FROM check_query_performance_alerts();

Troubleshooting Common Issues

Statement Limit Reached

-- Check if you're hitting the statement limit
SELECT 
    COUNT(*) as current_statements,
    current_setting('pg_stat_statements.max')::int as max_statements,
    CASE 
        WHEN COUNT(*) >= current_setting('pg_stat_statements.max')::int * 0.9 
        THEN 'WARNING: Approaching statement limit'
        ELSE 'OK'
    END as status
FROM pg_stat_statements;

Missing Statistics

-- Verify extension is properly loaded
SELECT 
    name,
    installed_version,
    default_version,
    comment
FROM pg_available_extensions 
WHERE name = 'pg_stat_statements';
 
-- Check if shared_preload_libraries includes pg_stat_statements
SELECT setting FROM pg_settings WHERE name = 'shared_preload_libraries';

Query Text Truncation

-- Handle truncated query text
SELECT
    queryid,
    CASE 
        WHEN LENGTH(query) >= current_setting('track_activity_query_size')::int 
        THEN query || ' [TRUNCATED]'
        ELSE query
    END AS full_query,
    calls,
    mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%...'  -- Indicates truncation
ORDER BY calls DESC;

Best Practices Summary

  1. Regular Maintenance: Reset statistics periodically to prevent view overflow and maintain relevance.

  2. Threshold Monitoring: Set up automated alerts for queries exceeding performance thresholds.

  3. Baseline Establishment: Capture baseline performance metrics before making system changes.

  4. Query Categorization: Group queries by business function for better analysis and optimization priority.

  5. Resource Planning: Use statistics to predict resource requirements and plan capacity upgrades.

  6. Index Optimization: Use I/O statistics to identify tables and queries that would benefit from additional indexes.

  7. Application Optimization: Regularly review high-frequency queries for caching opportunities.

The pg_stat_statements extension is an essential tool for maintaining optimal database performance in Tacnode. By implementing regular monitoring and analysis procedures, you can proactively identify and resolve performance bottlenecks before they impact your applications.