Full Text Search
Complete guide to implementing full-text search in Tacnode using trigram matching, tsvector, tsquery, and ranking functions.
Full-text search is essential for modern data applications, enabling users to find relevant information quickly and efficiently. Tacnode provides comprehensive full-text search capabilities through PostgreSQL-compatible extensions and native functions.
Quick Reference
| Feature | Function | Use Case |
|---|---|---|
| Trigram Matching | pg_trgm, similarity() | Fuzzy string matching, typo tolerance |
| Basic Text Search | tsvector, tsquery, @@ | Structured full-text queries |
| User-Friendly Search | websearch_to_tsquery() | Google-like search syntax |
| Search Ranking | ts_rank(), ts_rank_cd() | Relevance scoring |
| Result Highlighting | ts_headline() | Search result snippets |
Fuzzy String Matching with Trigrams
The pg_trgm extension provides trigram-based fuzzy matching, perfect for handling typos and partial matches.
Basic Trigram Operations
-- Calculate string similarity (0.0 to 1.0)
SELECT similarity('PostgreSQL', 'PostGreSQL');
-- Result: 0.8888889
-- Find similar strings
SELECT show_trgm('hello');
-- Result: {" h"," he","el ",ell,hel,llo,"lo "}
-- Practical example: Find users by partial email
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT
) USING COLUMNAR;
INSERT INTO users (email) VALUES
('alice@example.com'),
('bob@gmail.com'),
('charlie@company.org'),
('david@yahoo.com');
-- Create trigram index for fast fuzzy search
CREATE INDEX users_email_trgm_idx ON users USING SPLIT_GIN(email gin_trgm_ops);
-- Find emails similar to input (handles typos)
SELECT email, similarity(email, 'alice@exampl.com') AS score
FROM users
WHERE email % 'alice@exampl.com'
ORDER BY score DESC;
Trigram Search Operators
-- Similarity operator (adjustable threshold)
SELECT email FROM users WHERE email % 'gmail';
-- Pattern matching with index support
SELECT email FROM users WHERE email LIKE '%company%';
SELECT email FROM users WHERE email ILIKE '%COMPANY%'; -- case-insensitive
-- Set similarity threshold (default 0.3)
SET pg_trgm.similarity_threshold = 0.5;
Structured Text Search
For precise word-level searching with Boolean logic, use tsvector and tsquery.
Sample Data Setup
-- Create articles table
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
author TEXT,
created_at TIMESTAMP DEFAULT NOW()
) USING COLUMNAR;
INSERT INTO articles (title, content, author) VALUES
('Getting Started with PostgreSQL', 'PostgreSQL is a powerful relational database system', 'Alice'),
('Advanced SQL Techniques', 'Learn complex queries and optimization strategies', 'Bob'),
('Database Performance Tuning', 'Optimize your PostgreSQL database for better performance', 'Charlie'),
('Web Development with Rust', 'Building fast web applications using Rust programming language', 'David'),
('Machine Learning in Python', 'Implementing ML algorithms with Python libraries', 'Eve');
-- Add computed tsvector column for search
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR
GENERATED ALWAYS AS (
setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(content, '')), 'B') ||
setweight(to_tsvector('english', COALESCE(author, '')), 'C')
) STORED;
-- Create search index
CREATE INDEX articles_search_idx ON articles USING SPLIT_GIN(search_vector);
Basic Search Queries
-- Simple word search
SELECT title, author
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql');
-- Multiple words (AND)
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & performance');
-- Alternative words (OR)
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'rust | python');
-- Exclude words (NOT)
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'programming & !database');
-- Phrase search (exact order)
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'web <-> development');
User-Friendly Search Syntax
Use websearch_to_tsquery() for Google-like search experience:
-- Simple text search
SELECT title FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');
-- Quoted phrases
SELECT title FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"machine learning"');
-- OR searches
SELECT title FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'rust or python');
-- Exclude terms
SELECT title FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'programming -database');
-- Combined syntax
SELECT title FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"web development" rust -postgresql');
Search Ranking and Relevance
Rank search results by relevance to show the most relevant matches first.
Basic Ranking
-- Rank by term frequency
SELECT
title,
ts_rank(search_vector, websearch_to_tsquery('english', 'postgresql database')) AS rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database')
ORDER BY rank DESC;
-- Rank by term density (considers word proximity)
SELECT
title,
ts_rank_cd(search_vector, websearch_to_tsquery('english', 'web development')) AS rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'web development')
ORDER BY rank DESC;
Advanced Ranking with Normalization
-- Normalize by document length
SELECT
title,
ts_rank(search_vector, websearch_to_tsquery('english', 'python'), 1) AS rank_norm
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'python')
ORDER BY rank_norm DESC;
-- Weight different fields differently
SELECT
title,
author,
ts_rank(
'{0.1, 0.2, 0.4, 1.0}', -- weights for D, C, B, A
search_vector,
websearch_to_tsquery('english', 'postgresql')
) AS weighted_rank
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql')
ORDER BY weighted_rank DESC;
Result Highlighting
Highlight matching terms in search results:
-- Basic highlighting
SELECT
title,
ts_headline('english', content, websearch_to_tsquery('english', 'postgresql database')) AS snippet
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');
-- Custom highlight formatting
SELECT
title,
ts_headline(
'english',
content,
websearch_to_tsquery('english', 'performance optimization'),
'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=15'
) AS highlighted_snippet
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'performance optimization');
Complete Search Implementation
Here’s a comprehensive search function combining all features:
-- Advanced search with ranking and highlighting
WITH search_query AS (
SELECT websearch_to_tsquery('english', 'postgresql performance tuning') AS query
),
search_results AS (
SELECT
a.id,
a.title,
a.author,
a.created_at,
ts_rank_cd(
a.search_vector,
sq.query,
32 -- logarithmic normalization
) AS relevance_score,
ts_headline(
'english',
a.content,
sq.query,
'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
) AS snippet
FROM articles a, search_query sq
WHERE a.search_vector @@ sq.query
)
SELECT
title,
author,
snippet,
ROUND(relevance_score::numeric, 3) AS score,
created_at
FROM search_results
ORDER BY relevance_score DESC
LIMIT 10;
Performance Tips
Index Optimization
-- Use SPLIT_GIN for better performance on columnar tables
CREATE INDEX articles_gin_idx ON articles USING SPLIT_GIN(search_vector);
-- For frequently searched individual columns
CREATE INDEX articles_title_gin ON articles USING SPLIT_GIN(to_tsvector('english', title));
CREATE INDEX articles_content_gin ON articles USING SPLIT_GIN(to_tsvector('english', content));
Query Optimization
-- Use generated columns for complex tsvector computations
ALTER TABLE products ADD COLUMN search_data TSVECTOR
GENERATED ALWAYS AS (
setweight(to_tsvector('english', name), 'A') ||
setweight(to_tsvector('english', description), 'B') ||
setweight(to_tsvector('english', array_to_string(tags, ' ')), 'C')
) STORED;
-- Limit result sets for better performance
SELECT title, content
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'database')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'database')) DESC
LIMIT 20;
Search Configuration
-- Configure similarity threshold for trigram searches
SET pg_trgm.similarity_threshold = 0.4;
-- Use appropriate text search configuration
-- For multilingual content, specify language explicitly
SELECT to_tsvector('simple', content) FROM articles; -- no stemming
SELECT to_tsvector('english', content) FROM articles; -- English stemming
Common Use Cases
E-commerce Product Search
-- Products with weighted search
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
description TEXT,
brand TEXT,
tags TEXT[],
price DECIMAL(10,2)
) USING COLUMNAR;
-- Search products with brand boosting
SELECT
name,
brand,
price,
ts_rank_cd(
setweight(to_tsvector('english', name), 'A') ||
setweight(to_tsvector('english', brand), 'A') ||
setweight(to_tsvector('english', description), 'B'),
websearch_to_tsquery('english', 'wireless headphones sony')
) AS score
FROM products
WHERE (
setweight(to_tsvector('english', name), 'A') ||
setweight(to_tsvector('english', brand), 'A') ||
setweight(to_tsvector('english', description), 'B')
) @@ websearch_to_tsquery('english', 'wireless headphones sony')
ORDER BY score DESC;
Content Management
-- Blog posts with category and tag search
SELECT
title,
author,
category,
ts_headline(
'english',
content,
websearch_to_tsquery('english', 'react typescript tutorial')
) AS excerpt
FROM blog_posts
WHERE search_vector @@ websearch_to_tsquery('english', 'react typescript tutorial')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'react typescript tutorial')) DESC;
Best Practices
-
Use appropriate search methods:
- Trigrams for fuzzy/typo-tolerant search
- tsvector/tsquery for precise word matching
- websearch_to_tsquery for user-friendly interfaces
-
Optimize for performance:
- Create SPLIT_GIN indexes on search columns
- Use generated columns for complex tsvector computations
- Limit result sets with appropriate LIMIT clauses
-
Design user-friendly search:
- Implement autocomplete with trigram similarity
- Provide search result highlighting
- Use ranking to show most relevant results first
-
Handle multilingual content:
- Specify appropriate text search configurations
- Consider using ‘simple’ configuration for mixed languages
This comprehensive guide covers all aspects of full-text search in Tacnode, from basic fuzzy matching to advanced ranked search with highlighting.