Full-Text Search in PostgreSQL: The Complete Guide to Building Search That Actually Works
Full-text search in PostgreSQL goes far beyond LIKE queries. Learn how tsvector, tsquery, trigram matching, inverted indexes, and relevance ranking work — with practical examples for e-commerce, content search, and production applications.
TL;DR: PostgreSQL full-text search replaces brittle `LIKE` queries with language-aware, indexed search. It tokenizes text into `tsvector` lexemes, matches against `tsquery` expressions using an inverted GIN index, and ranks results by relevance — all inside your database. Add fuzzy matching with `pg_trgm` for typo tolerance. Combine with semantic search for hybrid retrieval. No Elasticsearch or separate search engine required.
Your users don't search the way your database thinks. They type "reccommended headphones noise canceling" — misspelled, no quotes, no wildcard operators — and expect the right products to show up instantly.
If you're still using simple pattern matching like `LIKE '%headphones%'` or basic substring matching against text data, you're already losing. Those approaches can't handle typos, synonyms, word variations, or relevance ranking. They scan every row. They don't understand that "running" and "runs" are the same word — or that "run" is the root form of both.
PostgreSQL full text search solves all of this. It tokenizes documents, stems individual words to their root form, builds inverted indexes for fast searches, ranks search results by relevance, and even handles fuzzy matching for typos — all inside your database, without bolting on a dedicated search engine.
This guide covers the core concepts of how full text search in PostgreSQL works, from the tsvector data type and match operator basics to production-ready implementations with ranking, highlighting, and query performance tuning.
Why LIKE Queries Break Down
Before diving into PostgreSQL full text search, it's worth understanding why basic approaches fail. Consider a products table with a million rows and existing data that needs to be searchable:
`LIKE '%wireless headphones%'` has three problems:
No index usage. A leading wildcard forces a sequential scan — the database reads every single row. On very large datasets, that's the difference between 2ms and 2 seconds.
No linguistic awareness. Searching for "running shoes" won't find "run shoe" or "runner's sneakers." Simple pattern matching matches character sequences, not concepts. It can't handle word forms or word variations.
No ranking. Even when results match, they come back in arbitrary order. A product with "wireless headphones" in the title is equally weighted with one that mentions it once in a 500-word description. There's no way to rank search results by relevance.
Full text search addresses all three. It uses specialized data structures for fast searches, understands language through stemming and normalization, and scores results by relevance. Let's see how.
PostgreSQL Full Text Search: Core Concepts
PostgreSQL full text search works by converting raw text data into a structured, searchable format. The process has two sides: document processing (what you're searching through) and query processing (what the user typed).
The tsvector and tsquery Data Types
The foundation of text search in PostgreSQL is two specialized data types:
On the document side, PostgreSQL converts text into a `tsvector` — a sorted list of normalized words (called lexemes) with their positions in the original text. The tsvector data type stores preprocessed documents in a format optimized for fast searches. The word "running" becomes "run". The word "databases" becomes "databas". Stop words like "the", "is", and "a" are dropped entirely. This preprocessing means the database doesn't need to parse text data at query time.
On the query side, search terms are converted into a `tsquery` — a structured representation of what to search for. The tsquery data type supports boolean operators (AND, OR, NOT), phrase search with proximity operators, and prefix matching. Together, the tsvector and tsquery data types give PostgreSQL full text search its search capabilities.
The Match Operator
The match operator `@@` compares a tsvector against a tsquery: does this document satisfy this query? This is the core of every full text search query in PostgreSQL. Behind the scenes, PostgreSQL uses an inverted index — a data structure that maps each unique word to the list of documents containing it — to answer this question without scanning every row.
The match operator is what makes PostgreSQL full text search fundamentally different from `LIKE` queries. Instead of scanning text data character by character, it looks up preprocessed documents through an index. The following example shows the match operator in its simplest form:
The GIN Index: Why Full-Text Search Is Fast
A GIN index (Generalized Inverted Index) is the core data structure that makes full text search performant. Instead of scanning documents to find words, it flips the relationship: it maps every unique word to the documents that contain it.
Think of it like the index at the back of a textbook. Instead of reading every page to find where "normalization" is mentioned, you look up "normalization" in the index and get a list of page numbers. A GIN index does the same thing for your database.
In a traditional table scan, searching for "PostgreSQL" means reading every row and checking if the text contains that word. With a GIN index, the database looks up "postgresql" once and immediately gets back the list of matching row IDs.
PostgreSQL implements inverted indexes through GIN indexes. When you create a GIN index on a tsvector column, the database builds this word-to-document mapping automatically and keeps it updated as data changes. Proper indexing is what separates PostgreSQL full text search from naive string matching.
This is what makes full text search scale. A `LIKE '%keyword%'` query gets slower as your data set grows. A GIN-indexed full text search stays fast because the lookup time depends on the number of matching documents, not the total number of rows — even on very large datasets.
Setting Up Full-Text Search
Let's build a working PostgreSQL full text search implementation step by step. The following example uses an articles table:
A few things happening here that matter:
**`GENERATED ALWAYS AS ... STORED`** — The generated column computes the search vector automatically whenever a row is inserted or updated. No application-side sync logic needed. The generated column approach means you store preprocessed documents once rather than computing them at query time.
**`setweight()`** — Assigns different weights to multiple fields. A title match ranks higher than a content match, which ranks higher than an author match. This is how you control relevance without application code.
**`COALESCE()`** — Handles null values gracefully. Without it, a NULL title would make the entire tsvector data NULL.
**`USING GIN`** — Creates the GIN index. Without proper indexing, full text search still works but falls back to sequential scans.
You can also build a tsvector column across multiple columns from existing data without a generated column, using triggers or manual updates. The generated column approach is simpler for most cases.Writing Full-Text Search Queries
PostgreSQL's text search functions give you three ways to construct full text search queries, from low-level to user-friendly:
`to_tsquery()` — Requires explicit boolean operators. Precise but not user-facing. The following example shows complex searches with AND, OR, NOT, and phrase search:
The phrase search operator `<->` ensures the entire phrase appears as written — not just the individual words scattered across the document. You can also match words at a distance: `<2>` means within two words of each other.
**`plainto_tsquery()`** — Converts plain text into query terms with implicit AND. Simpler but no phrase search or exclusions.
**`websearch_to_tsquery()`** — Accepts natural search syntax. This is what you expose to users. The following example shows how it handles the messy, unpredictable input real users type:
```sql\n-- Simple text search (implicit AND)\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');\n\n-- Quoted phrases for exact matches\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', '"machine learning"');\n\n-- OR and exclusions\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'rust or python');\n\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'programming -database');\n```
`websearch_to_tsquery()` is the right choice for any user-facing search box. It handles exact matches, partial matches, and boolean operators without requiring special syntax from the user. These text search functions are what give PostgreSQL full text search its search capabilities for complex searches.How to Rank Search Results
Matching is only half the problem. The other half is returning relevant results in the right order. PostgreSQL provides two text search functions for ranking:
`ts_rank()` scores based on how frequently the search terms appear in the document. More matches = higher score.
`ts_rank_cd()` (cover density) also considers how close the matching query terms are to each other. If a user searches "web development" and a document contains both words in the same sentence, it scores higher than a document where "web" appears in the title text and "development" in the last paragraph.
The following example shows how to rank search results using cover density:
Remember the different weights from setup? They factor into ranking automatically. A title match (weight A = 1.0) scores higher than a body match (weight B = 0.4). You can also customize weights at query time:
```sql\nSELECT\n title,\n ts_rank(\n '{0.1, 0.2, 0.4, 1.0}', -- weights for D, C, B, A\n search_vector,\n websearch_to_tsquery('english', 'postgresql')\n ) AS weighted_rank\nFROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'postgresql')\nORDER BY weighted_rank DESC;\n```
For large data sets, normalize by document length to prevent long documents from dominating short ones:
```sql\nSELECT title,\n ts_rank(search_vector, query, 32) AS rank\nFROM articles, websearch_to_tsquery('english', 'database') AS query\nWHERE search_vector @@ query\nORDER BY rank DESC\nLIMIT 20;\n```Search Result Highlighting
Good search functionality doesn't just find results — it shows the user why each result matched. `ts_headline()` generates highlighted snippets with matching search terms wrapped in custom tags:
The options let you control snippet length (`MaxWords`, `MinWords`), highlight tags (`StartSel`, `StopSel`), and how many fragments to return (`MaxFragments`). For a search results page, this search functionality is what turns a wall of titles into something a user can actually scan and evaluate.Fuzzy Search with Trigrams
Full text search handles stemming and word forms, but it doesn't handle typos. If a user searches for "postgrsql" (missing the 'e'), the match operator on tsvector data will return nothing — no matter how good your GIN index is.
This is where fuzzy search comes in. The `pg_trgm` extension enables fuzzy matching by breaking strings into three-character sequences (trigrams) and calculating similarity between them. "PostgreSQL" becomes `{" p", " po", "pos", "ost", "stg", ...}`. Even with a typo, the trigram overlap between the misspelled and correct version is high enough to find a match.
You can tune the sensitivity by adjusting the similarity threshold:
```sql\n-- Default threshold is 0.3 — lower means more fuzzy matches\nSET pg_trgm.similarity_threshold = 0.4;\n```
Trigram indexes also accelerate `LIKE` and `ILIKE` queries with leading wildcards — the one pattern that normally forces sequential scans even on very large datasets:
```sql\n-- Create a trigram index for fuzzy search\nCREATE INDEX articles_title_trgm_idx ON articles USING GIN(title gin_trgm_ops);\n\n-- Now this uses the index instead of a sequential scan\nSELECT title FROM articles WHERE title ILIKE '%postgresql%';\n```
**Best practice:** combine full text search for primary results with fuzzy search as a fallback. If the match operator returns nothing, fall back to trigram similarity. This gives users both precision and typo tolerance — the search capabilities users expect from modern search functionality.Custom Dictionaries and Multiple Languages
PostgreSQL full text search ships with text search configurations for multiple languages. Using the right one (e.g., `'english'` for English content) ensures proper stemming — converting word forms like "running," "runs," and "ran" to the same root form "run."
For multilingual content, use `'simple'` to skip stemming entirely — it's less precise but works across multiple languages without favoring one. You can also create custom dictionaries for domain-specific terminology, synonyms, or stop words that the built-in configurations don't handle. Custom dictionaries are particularly useful when your text data contains technical jargon or industry terms that standard stemming mishandles.
Full-Text Search vs. Semantic Search
Full text search and semantic search solve different problems, and understanding when to use each — or both — is critical for search performance.
Full text search matches on words: stemmed, normalized, and indexed. It knows that "running" and "runs" are the same word. It handles boolean operators, phrase search, prefix matching, and field weighting. It's deterministic — the same query always returns the same results.
Semantic search matches on meaning: vector embeddings capture conceptual similarity. It knows that "cheap flights" and "affordable airfare" are related concepts even though they share no words. It's probabilistic — results depend on the embedding model's understanding.
PostgreSQL full text search is the right tool when users are searching for specific terms, product names, technical content, or any domain where exact matches and partial matches matter. Semantic search is the right tool when users ask natural-language questions or when the vocabulary gap between queries and documents is large.
The best implementations use both. A hybrid search approach combines full text scoring with vector similarity, giving you the precision of keyword matching and the recall of meaning-based search. PostgreSQL-compatible databases that support both tsvector and vector indexes can run hybrid queries in a single operation — no other search engines required.
Production Patterns
Here's a complete e-commerce product search pulling together everything above — ranking, highlighting, and fuzzy search. The following example shows a production-ready implementation:
For a content management system, add metadata filtering on top of full text ranking. The following query shows complex searches with category filters and date ranges:
```sql\nSELECT\n title, author, category,\n ts_headline('english', content,\n websearch_to_tsquery('english', 'react typescript tutorial')\n ) AS excerpt\nFROM blog_posts\nWHERE search_vector @@ websearch_to_tsquery('english', 'react typescript tutorial')\n AND category = 'tutorials'\n AND created_at > NOW() - INTERVAL '6 months'\nORDER BY ts_rank_cd(search_vector,\n websearch_to_tsquery('english', 'react typescript tutorial')) DESC;\n```Search Performance Tuning
Full text search performance — query performance specifically — comes down to three things: proper indexing, query structure, and column design.
Always create GIN indexes on tsvector columns. Without a GIN index, every full text search query is a sequential scan. With proper indexing, lookups are near-instant even on very large datasets with millions of rows.
Use a generated column for complex tsvectors. If your search vector combines multiple fields with different weights, computing it at query time is expensive. A generated column computes it once on write and indexes the result. This dramatically improves search performance.
Limit your result sets. Full text search returns all matching documents by default. Always add `LIMIT` — most applications only need the top 10-50 relevant results. This reduces the amount of ranking computation the database has to do.
Use `ts_rank_cd()` over `ts_rank()` when proximity matters. Cover density ranking is slightly more expensive but produces better relevance ordering for multi-word search terms.
Tune for your text search configuration. Using the right configuration for your language ensures the text search feature handles word forms correctly. For multilingual content spanning multiple languages, use `'simple'` to skip stemming entirely.
Full-Text Search Is a Database Feature, Not a Separate System
The biggest architectural mistake teams make with full text search is the same one they make with semantic search: they assume they need a dedicated search engine.
Elasticsearch, Solr, Meilisearch — these are powerful search engines. But adopting other search engines means maintaining a separate cluster, building a sync pipeline to keep it updated, handling consistency gaps when your primary database and search engine disagree — leading to stale data that silently degrades search results — and managing two sets of access controls.
For most applications, PostgreSQL's built-in text search feature provides all the search capabilities you need. You get GIN indexes, relevance ranking, highlighting, fuzzy search, and language-aware stemming — all running inside the same database where your data already lives. No sync. No consistency issues. No extra infrastructure bill. The search functionality rivals dedicated search engines for the majority of use cases.
And when you need to combine full text search with vector search, analytical queries, and transactional workloads, a Context Lake handles all of it in a single PostgreSQL-compatible system. One query plan across structured filters, keyword matching, and semantic similarity. That's the architecture that scales without the duct tape.
Frequently Asked Questions
Written by Boyd Stowe
Building the infrastructure layer for AI-native applications. We write about Decision Coherence, Tacnode Context Lake, and the future of data systems.
View all postsContinue Reading
Medallion Architecture: Bronze, Silver and Gold Layers in Modern Lakehouses
Streaming Database: What It Is, How It Works, and When You Need One
Apache Kafka vs Apache Flink: The Real Comparison Is Flink vs Kafka Streams
Ready to see Tacnode Context Lake in action?
Book a demo and discover how Tacnode can power your AI-native applications.
Book a Demo