GuidesQuery & Optimization
Query Tuning
Performance optimization is crucial for maintaining responsive applications and efficient resource utilization. This guide covers systematic approaches to identify, diagnose, and resolve query performance issues in Tacnode.
Quick Diagnosis Checklist
Issue Type | Check | Tool | Action |
---|---|---|---|
Slow Queries | Long-running operations | pg_stat_activity | Identify and analyze |
Resource Usage | CPU, Memory, I/O | System monitoring | Scale or optimize |
Index Usage | Scan types | EXPLAIN | Create or optimize indexes |
Lock Contention | Blocking queries | pg_locks | Resolve conflicts |
Statistics | Query planner data | ANALYZE | Update table statistics |
System Monitoring and Activity Analysis
Monitoring Active Sessions
The pg_stat_activity
view provides real-time insights into database activity and is your first tool for performance diagnosis.
-- Get overview of current database activity
SELECT
datname AS database,
state,
COUNT(*) AS session_count,
AVG(EXTRACT(epoch FROM now() - query_start))::int AS avg_duration_seconds
FROM pg_stat_activity
WHERE state IS NOT NULL
GROUP BY datname, state
ORDER BY database, session_count DESC;
-- Find currently active queries
SELECT
pid,
usename AS username,
datname AS database,
state,
EXTRACT(epoch FROM now() - query_start)::int AS duration_seconds,
LEFT(query, 100) AS query_preview
FROM pg_stat_activity
WHERE state = 'active'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration_seconds DESC;
Identifying Long-Running Queries
-- Find queries running longer than expected thresholds
SELECT
pid,
usename,
datname,
state,
ROUND(EXTRACT(epoch FROM now() - query_start), 2) AS duration_seconds,
ROUND(EXTRACT(epoch FROM now() - xact_start), 2) AS transaction_duration,
query
FROM pg_stat_activity
WHERE now() - query_start > INTERVAL '30 seconds' -- Adjust threshold as needed
AND state != 'idle'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration_seconds DESC;
-- Monitor transaction duration (long transactions can cause blocking)
SELECT
pid,
usename,
datname,
state,
EXTRACT(epoch FROM now() - xact_start)::int AS transaction_age_seconds,
query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND now() - xact_start > INTERVAL '5 minutes'
ORDER BY transaction_age_seconds DESC;
Blocking and Lock Analysis
-- Identify blocking queries and their victims
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement,
blocked_activity.application_name AS blocked_application,
blocking_activity.application_name AS blocking_application
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
Query Execution Plan Analysis
Understanding EXPLAIN Output
-- Basic execution plan
EXPLAIN
SELECT customer_id, SUM(amount)
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id;
-- Detailed execution plan with timing information
EXPLAIN ANALYZE
SELECT
c.customer_name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY c.customer_id, c.customer_name
ORDER BY total_amount DESC;
Key Metrics in Execution Plans
-- Example execution plan with annotations
EXPLAIN ANALYZE
SELECT * FROM large_table WHERE indexed_column = 'value';
/*
Example output interpretation:
Index Scan using idx_column on large_table (cost=0.43..8.45 rows=1 width=100)
(actual time=0.025..0.026 rows=1 loops=1)
Index Cond: (indexed_column = 'value'::text)
Buffers: shared hit=4
Key metrics:
- cost=0.43..8.45: Estimated cost (startup..total)
- rows=1: Estimated rows returned
- width=100: Estimated average row size in bytes
- actual time=0.025..0.026: Real execution time (startup..total) in milliseconds
- rows=1 loops=1: Actual rows returned × loop iterations
- Buffers: shared hit=4: Buffer cache hits (no disk reads needed)
*/
Common Performance Patterns
-- Identify expensive operations in execution plans
-- Look for these patterns:
-- 1. Sequential Scans on large tables (usually bad)
EXPLAIN ANALYZE
SELECT * FROM large_table WHERE unindexed_column = 'value';
-- Output: Seq Scan on large_table (high cost, many rows scanned)
-- 2. Nested Loop joins with high iteration counts (often problematic)
EXPLAIN ANALYZE
SELECT * FROM table1 t1 JOIN table2 t2 ON t1.id = t2.foreign_id;
-- Output: Nested Loop (high loops count indicates inefficient join)
-- 3. Sort operations on large datasets (memory intensive)
EXPLAIN ANALYZE
SELECT * FROM large_table ORDER BY unindexed_column;
-- Output: Sort (high cost, potential disk usage)
-- 4. Hash joins with large hash tables (memory issues)
EXPLAIN ANALYZE
SELECT * FROM large_table1 l1 JOIN large_table2 l2 ON l1.col = l2.col;
-- Output: Hash Join with large "Hash Buckets" and "Memory Usage"
Index Analysis and Optimization
Index Usage Statistics
-- Analyze index usage patterns
SELECT
schemaname,
tablename,
indexname,
idx_scan AS times_used,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Find unused indexes (candidates for removal)
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%_pkey' -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find tables without indexes (may need indexing)
SELECT
schemaname,
tablename,
n_tup_ins + n_tup_upd + n_tup_del AS total_writes,
seq_scan,
seq_tup_read,
pg_size_pretty(pg_total_relation_size(relid)) AS table_size
FROM pg_stat_user_tables
WHERE relid NOT IN (
SELECT DISTINCT tablename::regclass
FROM pg_indexes
WHERE schemaname NOT IN ('information_schema', 'pg_catalog')
)
ORDER BY seq_tup_read DESC;
Index Recommendations
-- Analyze query patterns to suggest indexes
-- Look for frequent WHERE clause patterns
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows / calls AS avg_rows_returned
FROM pg_stat_statements
WHERE calls > 100 -- Frequently executed queries
AND mean_exec_time > 100 -- Slower than 100ms average
ORDER BY total_exec_time DESC
LIMIT 20;
-- Create indexes based on common query patterns
-- Example: If you frequently filter by date ranges
CREATE INDEX CONCURRENTLY idx_orders_date_customer
ON orders (order_date, customer_id)
WHERE order_date >= '2024-01-01';
-- Example: Partial index for active records only
CREATE INDEX CONCURRENTLY idx_users_active_email
ON users (email)
WHERE status = 'active';
-- Example: Composite index for JOIN and ORDER BY
CREATE INDEX CONCURRENTLY idx_orders_composite
ON orders (customer_id, order_date DESC, amount);
Table Statistics and ANALYZE
Maintaining Statistics
-- Check statistics freshness
SELECT
schemaname,
tablename,
n_tup_ins + n_tup_upd + n_tup_del AS total_changes,
last_analyze,
last_autoanalyze,
CASE
WHEN last_analyze IS NULL AND last_autoanalyze IS NULL THEN 'Never analyzed'
WHEN COALESCE(last_analyze, '1970-01-01') < COALESCE(last_autoanalyze, '1970-01-01')
THEN 'Auto-analyzed on ' || last_autoanalyze::date
ELSE 'Manually analyzed on ' || last_analyze::date
END AS analysis_status
FROM pg_stat_user_tables
WHERE n_tup_ins + n_tup_upd + n_tup_del > 1000 -- Tables with significant changes
ORDER BY total_changes DESC;
-- Update statistics for specific tables
ANALYZE customers;
ANALYZE orders;
-- Check if statistics are helping query planning
EXPLAIN ANALYZE
SELECT COUNT(*) FROM orders WHERE order_date >= '2024-01-01';
-- Look for significant differences between estimated rows and actual rows
Performance Monitoring Queries
Query Performance Statistics
-- Top slowest queries by total time
SELECT
query,
calls,
ROUND(total_exec_time::numeric, 2) AS total_time_ms,
ROUND(mean_exec_time::numeric, 2) AS avg_time_ms,
ROUND((100 * total_exec_time / SUM(total_exec_time) OVER())::numeric, 2) AS percentage_of_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Queries with high variability (inconsistent performance)
SELECT
query,
calls,
ROUND(mean_exec_time::numeric, 2) AS avg_time_ms,
ROUND(stddev_exec_time::numeric, 2) AS stddev_ms,
ROUND((stddev_exec_time / mean_exec_time * 100)::numeric, 2) AS variability_percent
FROM pg_stat_statements
WHERE calls > 10
AND stddev_exec_time > 0
ORDER BY variability_percent DESC
LIMIT 20;
-- Resource-intensive queries
SELECT
query,
calls,
ROUND(total_exec_time::numeric, 2) AS total_time_ms,
shared_blks_hit + shared_blks_read AS total_buffer_access,
ROUND((shared_blks_read::numeric / NULLIF(shared_blks_hit + shared_blks_read, 0) * 100), 2) AS cache_miss_percent
FROM pg_stat_statements
WHERE shared_blks_read > 1000 -- Queries causing significant I/O
ORDER BY shared_blks_read DESC
LIMIT 20;
System Resource Analysis
-- Database size and growth analysis
SELECT
datname AS database,
pg_size_pretty(pg_database_size(datname)) AS size,
(SELECT COUNT(*) FROM pg_stat_activity WHERE datname = d.datname) AS connections
FROM pg_database d
WHERE datistemplate = false
ORDER BY pg_database_size(datname) DESC;
-- Table sizes and relation analysis
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size,
n_tup_ins + n_tup_upd + n_tup_del AS modifications,
seq_scan,
seq_tup_read / NULLIF(seq_scan, 0) AS avg_seq_read
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;
Optimization Strategies
Query Rewriting Techniques
-- 1. Use EXISTS instead of IN for better performance
-- Inefficient
SELECT * FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders WHERE amount > 1000);
-- Optimized
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND amount > 1000);
-- 2. Use LIMIT to avoid processing unnecessary rows
-- Inefficient
SELECT * FROM large_table ORDER BY created_at DESC;
-- Optimized
SELECT * FROM large_table ORDER BY created_at DESC LIMIT 100;
-- 3. Use specific columns instead of SELECT *
-- Inefficient
SELECT * FROM wide_table WHERE condition = 'value';
-- Optimized
SELECT id, name, important_column FROM wide_table WHERE condition = 'value';
-- 4. Break complex queries into steps using WITH clauses
-- Complex single query
SELECT customer_id, AVG(amount)
FROM orders
WHERE order_date >= '2024-01-01'
AND customer_id IN (SELECT customer_id FROM customers WHERE region = 'North')
GROUP BY customer_id;
-- Optimized with CTE
WITH north_customers AS (
SELECT customer_id FROM customers WHERE region = 'North'
),
recent_orders AS (
SELECT customer_id, amount
FROM orders
WHERE order_date >= '2024-01-01'
)
SELECT ro.customer_id, AVG(ro.amount)
FROM recent_orders ro
JOIN north_customers nc ON ro.customer_id = nc.customer_id
GROUP BY ro.customer_id;
Advanced Optimization Techniques
-- 1. Partitioning for large tables
CREATE TABLE orders_partitioned (
order_id SERIAL,
customer_id INTEGER,
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2024_q1 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- 2. Materialized views for complex aggregations
CREATE MATERIALIZED VIEW customer_monthly_summary AS
SELECT
customer_id,
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount
FROM orders
GROUP BY customer_id, DATE_TRUNC('month', order_date);
CREATE INDEX ON customer_monthly_summary (customer_id, month);
-- Refresh materialized view periodically
REFRESH MATERIALIZED VIEW customer_monthly_summary;
-- 3. Using query hints for specific optimization
-- Force parallel execution
/*+ Parallel(orders 4) */
SELECT COUNT(*) FROM orders WHERE order_date >= '2024-01-01';
-- Force specific join method
/*+ NestLoop(c o) */
SELECT * FROM customers c JOIN orders o ON c.customer_id = o.customer_id;
Troubleshooting Common Issues
Performance Problem Diagnosis Workflow
-- 1. Identify the problem query
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 5;
-- 2. Analyze execution plan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
[YOUR_SLOW_QUERY_HERE];
-- 3. Check for missing indexes
-- Look for Seq Scan in execution plan
-- Check filter conditions in WHERE clauses
-- 4. Verify statistics are current
SELECT last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE tablename = 'your_table';
-- 5. Check for blocking
-- Use the blocking query provided earlier
-- 6. Monitor resource usage
SELECT * FROM pg_stat_activity WHERE state = 'active';
Common Performance Anti-patterns
-- ❌ Anti-pattern: N+1 Query Problem
-- Don't do this in application code:
-- for each customer:
-- SELECT * FROM orders WHERE customer_id = ?
-- ✅ Solution: Use JOINs or batch queries
SELECT c.*, o.*
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- ❌ Anti-pattern: Function calls in WHERE clauses
SELECT * FROM orders WHERE UPPER(status) = 'COMPLETED';
-- ✅ Solution: Create functional index or normalize data
CREATE INDEX idx_orders_status_upper ON orders (UPPER(status));
-- Or better: store status in consistent case
-- ❌ Anti-pattern: OR conditions on different columns
SELECT * FROM products WHERE category = 'electronics' OR brand = 'apple';
-- ✅ Solution: Use UNION for better index usage
SELECT * FROM products WHERE category = 'electronics'
UNION
SELECT * FROM products WHERE brand = 'apple';