Full Text Search

Full-text search is essential for modern data applications, enabling users to find relevant information quickly and efficiently. Tacnode provides comprehensive full-text search capabilities through PostgreSQL-compatible extensions and native functions.

Quick Reference

FeatureFunctionUse Case
Trigram Matchingpg_trgm, similarity()Fuzzy string matching, typo tolerance
Basic Text Searchtsvector, tsquery, @@Structured full-text queries
User-Friendly Searchwebsearch_to_tsquery()Google-like search syntax
Search Rankingts_rank(), ts_rank_cd()Relevance scoring
Result Highlightingts_headline()Search result snippets

Fuzzy String Matching with Trigrams

The pg_trgm extension provides trigram-based fuzzy matching, perfect for handling typos and partial matches.

Basic Trigram Operations

-- Calculate string similarity (0.0 to 1.0)
SELECT similarity('PostgreSQL', 'PostGreSQL');
-- Result: 0.8888889
 
-- Find similar strings
SELECT show_trgm('hello');
-- Result: {"  h"," he","el ",ell,hel,llo,"lo "}
 
-- Practical example: Find users by partial email
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email TEXT
) USING COLUMNAR;
 
INSERT INTO users (email) VALUES
('alice@example.com'),
('bob@gmail.com'),
('charlie@company.org'),
('david@yahoo.com');
 
-- Create trigram index for fast fuzzy search
CREATE INDEX users_email_trgm_idx ON users USING SPLIT_GIN(email gin_trgm_ops);
 
-- Find emails similar to input (handles typos)
SELECT email, similarity(email, 'alice@exampl.com') AS score
FROM users 
WHERE email % 'alice@exampl.com'
ORDER BY score DESC;

Trigram Search Operators

-- Similarity operator (adjustable threshold)
SELECT email FROM users WHERE email % 'gmail';
 
-- Pattern matching with index support
SELECT email FROM users WHERE email LIKE '%company%';
SELECT email FROM users WHERE email ILIKE '%COMPANY%';  -- case-insensitive
 
-- Set similarity threshold (default 0.3)
SET pg_trgm.similarity_threshold = 0.5;

For precise word-level searching with Boolean logic, use tsvector and tsquery.

Sample Data Setup

-- Create articles table
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    author TEXT,
    created_at TIMESTAMP DEFAULT NOW()
) USING COLUMNAR;
 
INSERT INTO articles (title, content, author) VALUES
('Getting Started with PostgreSQL', 'PostgreSQL is a powerful relational database system', 'Alice'),
('Advanced SQL Techniques', 'Learn complex queries and optimization strategies', 'Bob'),
('Database Performance Tuning', 'Optimize your PostgreSQL database for better performance', 'Charlie'),
('Web Development with Rust', 'Building fast web applications using Rust programming language', 'David'),
('Machine Learning in Python', 'Implementing ML algorithms with Python libraries', 'Eve');
 
-- Add computed tsvector column for search
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR
GENERATED ALWAYS AS (
    setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
    setweight(to_tsvector('english', COALESCE(content, '')), 'B') ||
    setweight(to_tsvector('english', COALESCE(author, '')), 'C')
) STORED;
 
-- Create search index
CREATE INDEX articles_search_idx ON articles USING SPLIT_GIN(search_vector);

Basic Search Queries

-- Simple word search
SELECT title, author 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'postgresql');
 
-- Multiple words (AND)
SELECT title 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'database & performance');
 
-- Alternative words (OR)
SELECT title 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'rust | python');
 
-- Exclude words (NOT)
SELECT title 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'programming & !database');
 
-- Phrase search (exact order)
SELECT title 
FROM articles 
WHERE search_vector @@ to_tsquery('english', 'web <-> development');

User-Friendly Search Syntax

Use websearch_to_tsquery() for Google-like search experience:

-- Simple text search
SELECT title FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');
 
-- Quoted phrases
SELECT title FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', '"machine learning"');
 
-- OR searches
SELECT title FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'rust or python');
 
-- Exclude terms
SELECT title FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'programming -database');
 
-- Combined syntax
SELECT title FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', '"web development" rust -postgresql');

Search Ranking and Relevance

Rank search results by relevance to show the most relevant matches first.

Basic Ranking

-- Rank by term frequency
SELECT 
    title,
    ts_rank(search_vector, websearch_to_tsquery('english', 'postgresql database')) AS rank
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database')
ORDER BY rank DESC;
 
-- Rank by term density (considers word proximity)
SELECT 
    title,
    ts_rank_cd(search_vector, websearch_to_tsquery('english', 'web development')) AS rank
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'web development')
ORDER BY rank DESC;

Advanced Ranking with Normalization

-- Normalize by document length
SELECT 
    title,
    ts_rank(search_vector, websearch_to_tsquery('english', 'python'), 1) AS rank_norm
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'python')
ORDER BY rank_norm DESC;
 
-- Weight different fields differently
SELECT 
    title,
    author,
    ts_rank(
        '{0.1, 0.2, 0.4, 1.0}',  -- weights for D, C, B, A
        search_vector,
        websearch_to_tsquery('english', 'postgresql')
    ) AS weighted_rank
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql')
ORDER BY weighted_rank DESC;

Result Highlighting

Highlight matching terms in search results:

-- Basic highlighting
SELECT 
    title,
    ts_headline('english', content, websearch_to_tsquery('english', 'postgresql database')) AS snippet
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');
 
-- Custom highlight formatting
SELECT 
    title,
    ts_headline(
        'english',
        content,
        websearch_to_tsquery('english', 'performance optimization'),
        'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=15'
    ) AS highlighted_snippet
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'performance optimization');

Complete Search Implementation

Here's a comprehensive search function combining all features:

-- Advanced search with ranking and highlighting
WITH search_query AS (
    SELECT websearch_to_tsquery('english', 'postgresql performance tuning') AS query
),
search_results AS (
    SELECT 
        a.id,
        a.title,
        a.author,
        a.created_at,
        ts_rank_cd(
            a.search_vector, 
            sq.query,
            32  -- logarithmic normalization
        ) AS relevance_score,
        ts_headline(
            'english',
            a.content,
            sq.query,
            'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
        ) AS snippet
    FROM articles a, search_query sq
    WHERE a.search_vector @@ sq.query
)
SELECT 
    title,
    author,
    snippet,
    ROUND(relevance_score::numeric, 3) AS score,
    created_at
FROM search_results
ORDER BY relevance_score DESC
LIMIT 10;

Performance Tips

Index Optimization

-- Use SPLIT_GIN for better performance on columnar tables
CREATE INDEX articles_gin_idx ON articles USING SPLIT_GIN(search_vector);
 
-- For frequently searched individual columns
CREATE INDEX articles_title_gin ON articles USING SPLIT_GIN(to_tsvector('english', title));
CREATE INDEX articles_content_gin ON articles USING SPLIT_GIN(to_tsvector('english', content));

Query Optimization

-- Use generated columns for complex tsvector computations
ALTER TABLE products ADD COLUMN search_data TSVECTOR
GENERATED ALWAYS AS (
    setweight(to_tsvector('english', name), 'A') ||
    setweight(to_tsvector('english', description), 'B') ||
    setweight(to_tsvector('english', array_to_string(tags, ' ')), 'C')
) STORED;
 
-- Limit result sets for better performance
SELECT title, content 
FROM articles 
WHERE search_vector @@ websearch_to_tsquery('english', 'database')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'database')) DESC
LIMIT 20;

Search Configuration

-- Configure similarity threshold for trigram searches
SET pg_trgm.similarity_threshold = 0.4;
 
-- Use appropriate text search configuration
-- For multilingual content, specify language explicitly
SELECT to_tsvector('simple', content) FROM articles;  -- no stemming
SELECT to_tsvector('english', content) FROM articles; -- English stemming

Common Use Cases

-- Products with weighted search
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    description TEXT,
    brand TEXT,
    tags TEXT[],
    price DECIMAL(10,2)
) USING COLUMNAR;
 
-- Search products with brand boosting
SELECT 
    name,
    brand,
    price,
    ts_rank_cd(
        setweight(to_tsvector('english', name), 'A') ||
        setweight(to_tsvector('english', brand), 'A') ||
        setweight(to_tsvector('english', description), 'B'),
        websearch_to_tsquery('english', 'wireless headphones sony')
    ) AS score
FROM products
WHERE (
    setweight(to_tsvector('english', name), 'A') ||
    setweight(to_tsvector('english', brand), 'A') ||
    setweight(to_tsvector('english', description), 'B')
) @@ websearch_to_tsquery('english', 'wireless headphones sony')
ORDER BY score DESC;

Content Management

-- Blog posts with category and tag search
SELECT 
    title,
    author,
    category,
    ts_headline(
        'english', 
        content, 
        websearch_to_tsquery('english', 'react typescript tutorial')
    ) AS excerpt
FROM blog_posts 
WHERE search_vector @@ websearch_to_tsquery('english', 'react typescript tutorial')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'react typescript tutorial')) DESC;

Best Practices

  1. Use appropriate search methods:

    • Trigrams for fuzzy/typo-tolerant search
    • tsvector/tsquery for precise word matching
    • websearch_to_tsquery for user-friendly interfaces
  2. Optimize for performance:

    • Create SPLIT_GIN indexes on search columns
    • Use generated columns for complex tsvector computations
    • Limit result sets with appropriate LIMIT clauses
  3. Design user-friendly search:

    • Implement autocomplete with trigram similarity
    • Provide search result highlighting
    • Use ranking to show most relevant results first
  4. Handle multilingual content:

    • Specify appropriate text search configurations
    • Consider using 'simple' configuration for mixed languages

This comprehensive guide covers all aspects of full-text search in Tacnode, from basic fuzzy matching to advanced ranked search with highlighting.