Back to Blog
Data Engineering

Full-Text Search in PostgreSQL: The Complete Guide to Building Search That Actually Works

Full-text search in PostgreSQL goes far beyond LIKE queries. Learn how tsvector, tsquery, trigram matching, inverted indexes, and relevance ranking work — with practical examples for e-commerce, content search, and production applications.

Boyd Stowe
Solutions Engineering
16 min read
Share:
Diagram showing a search query being tokenized, matched against an inverted index, and returning ranked results

Your users don't search the way your database thinks. They type "reccommended headphones noise canceling" — misspelled, no quotes, no Boolean operators — and expect the right products to show up instantly.

If you're still using `LIKE '%headphones%'` or basic substring matching, you're already losing. Those approaches can't handle typos, synonyms, word variants, or relevance ranking. They scan every row. They don't understand that "running" and "runs" are the same concept.

Full-text search in PostgreSQL solves all of this. It tokenizes documents, stems words to their roots, builds inverted indexes for fast lookups, ranks results by relevance, and even handles fuzzy matching for typos — all inside your database, without bolting on a separate search engine.

This guide covers everything from the fundamentals of how full-text search works to production-ready implementations with ranking, highlighting, and performance tuning.

Why LIKE Queries Break Down

Before diving into full-text search, it's worth understanding why basic approaches fail. Consider a products table with a million rows:

`LIKE '%wireless headphones%'` has three problems:

No index usage. A leading wildcard forces a sequential scan — the database reads every single row. On a million-row table, that's the difference between 2ms and 2 seconds.

No linguistic awareness. Searching for "running shoes" won't find "run shoe" or "runner's sneakers." The query is matching character sequences, not concepts.

No ranking. Even when results match, they come back in arbitrary order. A product with "wireless headphones" in the title is equally weighted with one that mentions it once in a 500-word description.

Full-text search addresses all three. It uses specialized data structures for fast lookups, understands language through stemming and normalization, and scores results by relevance. Let's see how.

How Full-Text Search Works

Full-text search in PostgreSQL works by converting raw text into a structured, searchable format. The process has two sides: document processing (what you're searching through) and query processing (what the user typed).

On the document side, PostgreSQL converts text into a `tsvector` — a sorted list of normalized words (called lexemes) with their positions in the original text. The word "running" becomes "run". The word "databases" becomes "databas". Stop words like "the", "is", and "a" are dropped entirely.

On the query side, the user's search terms are converted into a `tsquery` — a structured representation of what to search for, with support for AND, OR, NOT, and phrase proximity operators.

The match operator `@@` compares the two: does this document's tsvector satisfy this tsquery? Behind the scenes, PostgreSQL uses an inverted index — a data structure that maps each unique word to the list of documents containing it — to answer this question without scanning every row.

The Inverted Index: Why Full-Text Search Is Fast

An inverted index is the core data structure that makes full-text search performant. Instead of scanning documents to find words, it flips the relationship: it maps every unique word to the documents that contain it.

Think of it like the index at the back of a textbook. Instead of reading every page to find where "normalization" is mentioned, you look up "normalization" in the index and get a list of page numbers. An inverted index does the same thing for your database.

In a traditional table scan, searching for "PostgreSQL" means reading every row and checking if the text contains that word. With an inverted index, the database looks up "postgresql" once and immediately gets back the list of matching row IDs.

PostgreSQL implements inverted indexes through GIN indexes (Generalized Inverted Indexes). When you create a GIN index on a `tsvector` column, the database builds this word-to-document mapping automatically and keeps it updated as data changes.

This is what makes full-text search scale. A `LIKE '%keyword%'` query gets slower as your table grows. A GIN-indexed full-text search stays fast because the lookup time depends on the number of matching documents, not the total number of rows.

Setting Up Full-Text Search

Let's build a working full-text search implementation step by step. We'll use an articles table as the example:

sql\n-- Create the table\nCREATE TABLE articles (\n id SERIAL PRIMARY KEY,\n title TEXT,\n content TEXT,\n author TEXT,\n created_at TIMESTAMP DEFAULT NOW()\n);\n\n-- Add a computed tsvector column with weighted fields\nALTER TABLE articles ADD COLUMN search_vector TSVECTOR\nGENERATED ALWAYS AS (\n setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||\n setweight(to_tsvector('english', COALESCE(content, '')), 'B') ||\n setweight(to_tsvector('english', COALESCE(author, '')), 'C')\n) STORED;\n\n-- Create the inverted index\nCREATE INDEX articles_search_idx ON articles USING GIN(search_vector);\n```
A few things happening here that matter:
**`GENERATED ALWAYS AS ... STORED`** — The search vector is computed automatically whenever a row is inserted or updated. No application-side sync logic needed.
**`setweight()`** — Assigns importance levels (A through D) to different fields. A title match ranks higher than a content match, which ranks higher than an author match. This is how you control relevance without application code.
**`COALESCE()`** — Handles NULLs gracefully. Without it, a NULL title would make the entire tsvector NULL.
**`USING GIN`** — Creates the inverted index. Without this, full-text search still works but falls back to sequential scans.

Writing Full-Text Search Queries

PostgreSQL gives you three ways to construct search queries, from low-level to user-friendly:

`to_tsquery()` — Requires explicit Boolean operators. Precise but not user-facing.

sql\n-- AND: both words must appear\nSELECT title FROM articles\nWHERE search_vector @@ to_tsquery('english', 'database & performance');\n\n-- OR: either word matches\nSELECT title FROM articles\nWHERE search_vector @@ to_tsquery('english', 'rust | python');\n\n-- NOT: exclude results with a term\nSELECT title FROM articles\nWHERE search_vector @@ to_tsquery('english', 'programming & !database');\n\n-- Phrase: words must appear adjacent and in order\nSELECT title FROM articles\nWHERE search_vector @@ to_tsquery('english', 'web <-> development');\n```
**`websearch_to_tsquery()`** — Accepts natural search syntax. This is what you expose to users.
```sql\n-- Simple text search (implicit AND)\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');\n\n-- Quoted phrases\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', '"machine learning"');\n\n-- OR and exclusions\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'rust or python');\n\nSELECT title FROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'programming -database');\n```
`websearch_to_tsquery()` is the right choice for any user-facing search box. It handles the messy, unpredictable input real users type — no special syntax required.

Ranking Results by Relevance

Matching is only half the problem. The other half is returning results in the right order. PostgreSQL provides two ranking functions:

`ts_rank()` scores based on how frequently the search terms appear in the document. More matches = higher score.

`ts_rank_cd()` (cover density) also considers how close the matching terms are to each other. If a user searches "web development" and a document contains both words in the same sentence, it scores higher than a document where "web" appears in the title and "development" in the last paragraph.

sql\nSELECT\n title,\n ts_rank_cd(search_vector, query) AS score\nFROM articles, websearch_to_tsquery('english', 'postgresql performance') AS query\nWHERE search_vector @@ query\nORDER BY score DESC;\n```
Remember the weights from setup? They factor into ranking automatically. A title match (weight A = 1.0) scores higher than a body match (weight B = 0.4). You can also customize weights at query time:
```sql\nSELECT\n    title,\n    ts_rank(\n        '{0.1, 0.2, 0.4, 1.0}',  -- weights for D, C, B, A\n        search_vector,\n        websearch_to_tsquery('english', 'postgresql')\n    ) AS weighted_rank\nFROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'postgresql')\nORDER BY weighted_rank DESC;\n```
For large result sets, normalize by document length to prevent long documents from dominating short ones:
```sql\n-- Normalize by document length\nSELECT title,\n    ts_rank(search_vector, query, 32) AS score  -- 32 = rank / (rank + 1)\nFROM articles, websearch_to_tsquery('english', 'database') AS query\nWHERE search_vector @@ query\nORDER BY score DESC\nLIMIT 20;\n```

Search Result Highlighting

Good search doesn't just find results — it shows the user why each result matched. `ts_headline()` generates highlighted snippets with matching terms wrapped in custom tags:

sql\nSELECT\n title,\n ts_headline(\n 'english',\n content,\n websearch_to_tsquery('english', 'postgresql database'),\n 'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=15'\n ) AS snippet\nFROM articles\nWHERE search_vector @@ websearch_to_tsquery('english', 'postgresql database');\n```
The options let you control snippet length (`MaxWords`, `MinWords`), highlight tags (`StartSel`, `StopSel`), and how many fragments to return (`MaxFragments`). For a search results page, this is what turns a wall of titles into something a user can actually scan and evaluate.

Fuzzy Matching with Trigrams

Full-text search handles stemming and word variants, but it doesn't handle typos. If a user searches for "postgrsql" (missing the 'e'), `tsvector` matching will return nothing.

This is where trigram matching comes in. The `pg_trgm` extension breaks strings into three-character sequences and calculates similarity between them. "PostgreSQL" becomes `{" p", " po", "pos", "ost", "stg", ...}`. Even with a typo, the trigram overlap between the misspelled and correct version is high enough to find a match.

sql\n-- Enable the extension\nCREATE EXTENSION IF NOT EXISTS pg_trgm;\n\n-- Calculate similarity (0.0 to 1.0)\nSELECT similarity('PostgreSQL', 'PostGreSQL');\n-- Result: 0.89\n\n-- Find similar strings using the % operator\nSELECT email, similarity(email, 'alice@exampl.com') AS score\nFROM users\nWHERE email % 'alice@exampl.com'\nORDER BY score DESC;\n```
You can tune the sensitivity by adjusting the similarity threshold:
```sql\n-- Default threshold is 0.3 — lower means more fuzzy matches\nSET pg_trgm.similarity_threshold = 0.4;\n```
Trigram indexes also accelerate `LIKE` and `ILIKE` queries with leading wildcards — the one pattern that normally forces sequential scans:
```sql\n-- Create a trigram index\nCREATE INDEX users_email_trgm_idx ON users USING GIN(email gin_trgm_ops);\n\n-- Now this uses the index instead of a sequential scan\nSELECT email FROM users WHERE email ILIKE '%company%';\n```
**Best practice:** combine full-text search for primary results with trigram matching as a fallback. If `tsvector` matching returns nothing, fall back to trigram similarity. This gives users both precision and typo tolerance.

Full-Text Search vs. Semantic Search

Full-text search and semantic search solve different problems, and understanding when to use each — or both — is critical.

Full-text search matches on words: stemmed, normalized, and indexed. It knows that "running" and "runs" are the same root word. It handles Boolean logic, phrase proximity, and field weighting. It's deterministic — the same query always returns the same results.

Semantic search matches on meaning: vector embeddings capture conceptual similarity. It knows that "cheap flights" and "affordable airfare" are related concepts even though they share no words. It's probabilistic — results depend on the embedding model's understanding.

Full-text search is the right tool when users are searching for specific terms, product names, technical content, or any domain where exact word matching matters. Semantic search is the right tool when users ask natural-language questions or when the vocabulary gap between queries and documents is large.

The best implementations use both. A hybrid search approach combines full-text scoring with vector similarity, giving you the precision of keyword matching and the recall of meaning-based search. PostgreSQL-compatible databases that support both `tsvector` and vector indexes can run hybrid queries in a single operation — no separate systems required.

Production Patterns

Here's a complete e-commerce product search pulling together everything above — ranking, highlighting, and fuzzy matching:

sql\n-- Full-featured product search\nWITH search AS (\n SELECT websearch_to_tsquery('english', 'wireless headphones') AS query\n),\nresults AS (\n SELECT\n p.id,\n p.name,\n p.brand,\n p.price,\n ts_rank_cd(\n setweight(to_tsvector('english', p.name), 'A') ||\n setweight(to_tsvector('english', p.brand), 'A') ||\n setweight(to_tsvector('english', p.description), 'B'),\n s.query\n ) AS score,\n ts_headline(\n 'english', p.description, s.query,\n 'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'\n ) AS snippet\n FROM products p, search s\n WHERE (\n setweight(to_tsvector('english', p.name), 'A') ||\n setweight(to_tsvector('english', p.brand), 'A') ||\n setweight(to_tsvector('english', p.description), 'B')\n ) @@ s.query\n)\nSELECT name, brand, price, ROUND(score::numeric, 3) AS relevance, snippet\nFROM results\nORDER BY score DESC\nLIMIT 20;\n```
For a content management system, add metadata filtering on top of full-text ranking:
```sql\n-- Blog search with category filter and date range\nSELECT\n    title, author, category,\n    ts_headline('english', content,\n        websearch_to_tsquery('english', 'react typescript tutorial')\n    ) AS excerpt\nFROM blog_posts\nWHERE search_vector @@ websearch_to_tsquery('english', 'react typescript tutorial')\n    AND category = 'tutorials'\n    AND created_at > NOW() - INTERVAL '6 months'\nORDER BY ts_rank_cd(search_vector,\n    websearch_to_tsquery('english', 'react typescript tutorial')) DESC;\n```

Performance Tuning

Full-text search performance comes down to three things: indexes, query structure, and column design.

Always create GIN indexes on tsvector columns. Without an index, every full-text query is a sequential scan. With a GIN index, lookups are near-instant even on tables with millions of rows.

Use stored generated columns for complex tsvectors. If your search vector combines multiple fields with weights, computing it at query time is expensive. A stored generated column computes it once on write and indexes the result.

Limit your result sets. Full-text search returns all matching rows by default. Always add `LIMIT` — most applications only need the top 10-50 results. This reduces the amount of ranking computation the database has to do.

Use `ts_rank_cd()` over `ts_rank()` when proximity matters. Cover density ranking is slightly more expensive but produces better relevance ordering for multi-word queries.

Tune for your text search configuration. PostgreSQL ships with configurations for multiple languages. Using the right one (e.g., `'english'` for English content) ensures proper stemming. For multilingual content, use `'simple'` to skip stemming entirely — it's less precise but works across languages.

Full-Text SearchPostgreSQLText SearchInverted IndexSearch RankingTrigramDatabase Indexing
T

Written by Boyd Stowe

Building the infrastructure layer for AI-native applications. We write about Decision Coherence, Tacnode Context Lake, and the future of data systems.

View all posts

Ready to see Tacnode Context Lake in action?

Book a demo and discover how Tacnode can power your AI-native applications.

Book a Demo