Index Design

Indexes are fundamental database structures that dramatically improve query performance by providing fast data access paths. In Tacnode, strategic index design can transform slow, sequential table scans into lightning-fast lookups. This guide covers everything you need to know about designing effective indexes.

Index Fundamentals

Indexes dramatically improve query performance by providing fast data access paths. Consider this example:

CREATE TABLE customer_orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER,
    order_date DATE,
    total_amount DECIMAL(10,2),
    status VARCHAR(20)
);

Without an index, queries require scanning every row:

SELECT * FROM customer_orders WHERE customer_id = 12345;

With an index, the database can locate matching rows in just a few operations:

CREATE INDEX idx_customer_orders_customer_id ON customer_orders (customer_id);

Performance Impact:

  • Small table (1K rows): 10x faster
  • Medium table (100K rows): 1000x faster
  • Large table (10M rows): 100,000x faster

Index Types

B-tree Indexes

B-tree indexes are the most versatile type, excellent for equality and range queries on ordered data.

Best for: High-cardinality columns, range queries (>, <, BETWEEN), equality searches (=), sorting operations (ORDER BY).

-- Basic B-tree index (default type)
CREATE INDEX idx_order_date ON customer_orders (order_date);
 
-- Efficient for range and sorting queries
SELECT * FROM customer_orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'
ORDER BY order_date;

Bitmap Indexes

Bitmap indexes are designed for low-cardinality data and analytical workloads, particularly effective in columnar storage.

Best for: Low-cardinality columns, analytical queries with multiple conditions, categorical data (status, region, type).

CREATE TABLE sales_data (
    sale_id SERIAL PRIMARY KEY,
    region VARCHAR(20),        -- Low cardinality: North, South, East, West
    product_category VARCHAR(30), -- Low cardinality: Electronics, Clothing, etc.
    quarter VARCHAR(10),       -- Low cardinality: Q1-2024, Q2-2024, etc.
    sales_amount DECIMAL(12,2)
);
 
-- Bitmap indexes for categorical columns
CREATE INDEX idx_sales_region ON sales_data USING split_bitmap (region);
CREATE INDEX idx_sales_category ON sales_data USING split_bitmap (product_category);
 
-- Efficient for multi-condition analytics
SELECT region, product_category, SUM(sales_amount) as total_sales
FROM sales_data
WHERE region IN ('North', 'South')
AND product_category = 'Electronics'
AND quarter = 'Q1-2024'
GROUP BY region, product_category;

GIN Indexes

GIN indexes (Generalized Inverted Index) are specialized for complex data types like arrays, JSONB, and full-text search.

Best for: Array columns, JSONB/JSON data, full-text search scenarios, multi-value containment queries.

Full-Text Search with GIN

GIN indexes excel at full-text search operations, enabling fast text matching across large document collections:

CREATE TABLE documents (
    document_id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    content TEXT
) USING COLUMNAR;
 
-- Create GIN index for full-text search
CREATE INDEX idx_documents_fulltext ON documents
USING split_gin (to_tsvector('english', title || ' ' || content));
 
-- Full-text search queries
SELECT title, content FROM documents
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('english', 'database & performance');
 
-- Search with ranking
SELECT title, ts_rank(to_tsvector('english', content), query) as rank
FROM documents, to_tsquery('english', 'search & optimization') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;

For comprehensive full-text search implementation and advanced techniques, see the Full-Text Search Guide.

JSONB Data with GIN

GIN indexes provide efficient querying capabilities for JSONB data structures:

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    specifications JSONB,     -- Flexible product specifications
    tags TEXT[]               -- Array of product tags
);
 
-- GIN indexes for JSONB and array data
CREATE INDEX idx_products_specs ON products USING gin (specifications);
CREATE INDEX idx_products_tags ON products USING gin (tags);
 
-- JSONB containment queries
SELECT name, specifications FROM products
WHERE specifications @> '{"brand": "Apple", "wireless": true}';
 
-- JSONB key existence
SELECT name FROM products
WHERE specifications ? 'warranty_years';
 
-- Array containment queries
SELECT name, tags FROM products
WHERE tags && ARRAY['electronics', 'mobile'];  -- Contains any of these tags
 
SELECT name, tags FROM products
WHERE tags @> ARRAY['smartphone', 'android'];  -- Contains all these tags

For detailed JSONB usage patterns and optimization strategies, see the JSONB Guide.

Vector Indexes

Vector indexes are specialized for similarity search and machine learning applications, enabling efficient nearest-neighbor queries on high-dimensional data.

Best for: AI/ML applications, semantic search, recommendation systems, image/text similarity matching.

CREATE TABLE documents (
    document_id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    embedding VECTOR(768)     -- High-dimensional vector representation
) USING COLUMNAR;
 
-- Vector index for similarity search
CREATE INDEX idx_documents_embedding ON documents USING split_hnsw (embedding vector_l2_ops);
 
-- Similarity search queries
SELECT title, embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

Key Features:

  • High-dimensional data: Handle vectors with hundreds or thousands of dimensions
  • Similarity operators: Distance-based queries using cosine, L2, and inner product
  • AI/ML integration: Perfect for embedding-based applications and semantic search

For comprehensive vector search implementation and optimization strategies, see the Vector Search Guide.

Advanced Index Patterns

The following advanced patterns apply specifically to B-tree indexes, which are the most versatile index type for complex query optimization strategies.

Multi-Column (Composite) Indexes

Composite B-tree indexes span multiple columns and dramatically improve queries that filter on multiple fields.

Design Principles:

  1. Most Selective First - Place highest cardinality columns first
  2. Query Pattern Matching - Order columns to match common query patterns
  3. Equality before Range - Put equality conditions before range conditions
-- Optimized composite index design
CREATE INDEX idx_orders_composite ON customer_orders (
    customer_id,    -- High selectivity, equality queries
    status,         -- Medium selectivity, equality queries
    order_date      -- Range queries, put last
);
 
-- Efficient queries (in order of efficiency):
-- Best: Uses all three columns
SELECT * FROM customer_orders
WHERE customer_id = 12345 AND status = 'completed' AND order_date >= '2024-01-01';
 
-- Good: Uses first two columns
SELECT * FROM customer_orders
WHERE customer_id = 12345 AND status = 'pending';
 
-- OK: Uses first column only
SELECT * FROM customer_orders
WHERE customer_id = 12345;
 
-- Won't use index: Doesn't start with first column
SELECT * FROM customer_orders
WHERE status = 'completed' AND order_date >= '2024-01-01';

Partial (Filtered) Indexes

Partial B-tree indexes include only rows meeting specific conditions, providing excellent performance for subset queries while using minimal storage.

Best for: Sparse data, status-based filtering, time-based subsets, category filtering.

-- Index only active customers
CREATE INDEX idx_active_customers_email ON customers (email, registration_date)
WHERE status = 'active' AND email IS NOT NULL;
 
-- Index for recent transaction analysis
CREATE INDEX idx_recent_transactions ON transactions (customer_id, amount, transaction_date)
WHERE transaction_date >= '2024-01-01' AND amount > 0;
 
-- Index only error logs for troubleshooting
CREATE INDEX idx_error_logs ON application_logs (timestamp, user_id, error_code)
WHERE log_level = 'ERROR' AND error_code IS NOT NULL;

Expression (Function-Based) Indexes

Expression B-tree indexes allow indexing computed values, enabling efficient queries on calculated or transformed data.

-- Case-insensitive username searches
CREATE INDEX idx_username_lower ON users (LOWER(username));
SELECT * FROM users WHERE LOWER(username) = LOWER('JohnDoe');
 
-- Date-based grouping
CREATE INDEX idx_monthly_sales ON orders (DATE_TRUNC('month', order_date));
SELECT DATE_TRUNC('month', order_date), COUNT(*)
FROM orders
GROUP BY DATE_TRUNC('month', order_date);
 
-- Complex calculations
CREATE INDEX idx_profit_margin ON products ((selling_price - cost_price) / cost_price);
SELECT * FROM products
WHERE (selling_price - cost_price) / cost_price > 0.2;

Performance Optimization

Concurrent Index Building

Concurrent index building is crucial for production environments where you cannot afford downtime during index creation. By default, CREATE INDEX blocks all writes and can significantly impact application performance.

Why Use CONCURRENTLY:

  • No table locking: Allows concurrent reads and writes during index creation
  • Zero downtime: Applications continue to function normally
  • Production safe: Essential for high-availability systems
-- Standard index creation (blocks writes)
CREATE INDEX idx_customer_email ON customers (email);  -- ❌ Blocks writes
 
-- Concurrent index creation (non-blocking)
CREATE INDEX CONCURRENTLY idx_customer_email ON customers (email);  -- ✅ Safe

Important Considerations:

  1. Longer Creation Time: Concurrent building takes more time than standard creation
  2. More Resources: Requires additional CPU and I/O during the process
  3. Transaction Limitations: Cannot be run inside a transaction block
  4. Failure Handling: Failed concurrent builds leave invalid indexes that must be cleaned up

Advanced Concurrent Index Management:

-- Monitor concurrent index building progress
SELECT
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query
FROM pg_stat_activity
WHERE query LIKE '%CREATE INDEX CONCURRENTLY%';
 
-- Check for invalid indexes (from failed concurrent builds)
SELECT schemaname, tablename, indexname, indexdef
FROM pg_indexes
WHERE indexdef LIKE '%CONCURRENTLY%'
AND indexname IN (
    SELECT indexname FROM pg_stat_user_indexes WHERE idx_scan = 0
);
 
-- Clean up invalid indexes
DROP INDEX CONCURRENTLY IF EXISTS idx_failed_build;

Best Practices for Concurrent Index Building:

-- 1. Build during low-traffic periods when possible
CREATE INDEX CONCURRENTLY idx_orders_date ON orders (order_date);
 
-- 2. Monitor system resources during building
-- Use htop/iostat to watch CPU and I/O impact
 
-- 3. For very large tables, consider partitioned approach
CREATE INDEX CONCURRENTLY idx_large_table_p1 ON large_table_2024_q1 (column);
CREATE INDEX CONCURRENTLY idx_large_table_p2 ON large_table_2024_q2 (column);
-- Then create on parent table
 
-- 4. Test on staging environment first with similar data volume
CREATE INDEX CONCURRENTLY idx_test_index ON staging_table (test_column);

Index Maintenance

Regular maintenance ensures optimal index performance:

-- Monitor index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan < 10;  -- Find unused indexes
 
-- Drop unused indexes (use CONCURRENTLY for production)
DROP INDEX CONCURRENTLY IF EXISTS idx_rarely_used;

Best Practices

Column Selection:

  • Index columns frequently used in WHERE, JOIN, ORDER BY, and GROUP BY clauses
  • Prioritize high-cardinality columns for B-tree indexes
  • Use bitmap indexes for low-cardinality analytical columns

Efficiency Guidelines:

  • Avoid excessive indexing - every index has write overhead
  • Remove redundant indexes (single-column when composite exists)
  • Always use CONCURRENTLY on production systems - see Concurrent Index Building section above
  • Test with realistic data volumes and query patterns
  • Balance query performance against storage and write costs

Summary

Effective index design requires understanding your data patterns and query requirements:

Index Type Selection:

  • B-tree: High-cardinality columns, range queries, sorting operations
  • Bitmap: Low-cardinality columns, analytical workloads with multiple conditions
  • GIN: Array data, JSONB/JSON fields, full-text search
  • Vector: High-dimensional data, similarity search, AI/ML applications

Advanced B-tree Patterns:

  • Composite indexes: Multi-column queries with proper column ordering
  • Partial indexes: Filtered subsets to save space and improve performance
  • Expression indexes: Computed values and function-based queries

Best Practices:

  • Always use CREATE INDEX CONCURRENTLY on production systems to avoid downtime
  • Monitor index usage and remove unused indexes
  • Balance query performance against write overhead and storage costs
  • Design indexes based on actual query patterns, not assumptions

Strategic index design transforms database performance, but requires ongoing analysis and optimization to maintain effectiveness as data and query patterns evolve.