Understand how tokenization and token filters change matching behavior, recall, and precision.

Analyzer and Tokenization Deep Dive

Search indexes do not store raw text as-is. They store analyzed terms. Querylight TS makes that analysis explicit so you can control how text is transformed during indexing and querying.

What an analyzer does

An analyzer turns input text into terms.

Conceptually:

tokenize the text
normalize or filter the tokens
index the resulting terms

That choice affects:

whether Querylight matches querylight
whether prefixes are easy to find
whether typos can be recovered
how large the index becomes

Keyword vs tokenized text

KeywordTokenizer treats the whole input as one token. That is useful for fields such as:

tags
section names
ids
exact categories

Free text fields usually need a normal analyzer so a sentence becomes multiple searchable terms.

Ngrams and edge ngrams

NgramTokenFilter helps with fuzzy recovery because it breaks text into overlapping slices.

import { Analyzer, NgramTokenFilter, TextFieldIndex } from "@tryformation/querylight-ts";

const fuzzyAnalyzer = new Analyzer(undefined, undefined, [new NgramTokenFilter(3)]);
const field = new TextFieldIndex(fuzzyAnalyzer, fuzzyAnalyzer);

This improves recall for misspellings, but it also:

increases index size
can introduce noisier matches

EdgeNgramsTokenFilter is different. It keeps prefixes from the start of a token and is therefore useful for autocomplete:

import { Analyzer, EdgeNgramsTokenFilter, TextFieldIndex } from "@tryformation/querylight-ts";

const suggestAnalyzer = new Analyzer(undefined, undefined, [new EdgeNgramsTokenFilter(2, 6)]);
const suggestField = new TextFieldIndex(suggestAnalyzer, suggestAnalyzer);

Analysis is a relevance decision

Analyzer choice is not just a technical detail. It changes what counts as “similar enough” to match.

Examples:

keyword-style analysis favors precision
broader tokenization favors recall
ngrams can recover typos
edge ngrams can make prefix suggestions feel fast

Field-by-field analysis works best

Different fields usually need different treatment.

title: normal text analysis
body: normal text analysis
tags: keyword-like analysis
suggest: edge ngrams
typo-recovery field: ngrams

That is usually better than applying one global strategy to everything.

A practical mental model

Ask three questions for every field:

Should this field behave like free text or exact metadata?
Do I need typo recovery here?
Do I need prefix suggestions here?

The answers usually tell you which analyzer shape you need.

Tradeoffs to watch

More aggressive analysis improves recall but may lower precision.
Ngram-heavy fields cost more memory.
Very broad analysis can make short queries noisy.

Start with simple field-specific analyzers, then expand only when actual queries show gaps.

Analyzer and Tokenization Deep Dive

Analyzer and Tokenization Deep Dive

Relevant APIs

Tags

Analyzer and Tokenization Deep Dive

What an analyzer does

Keyword vs tokenized text

Ngrams and edge ngrams

Analysis is a relevance decision

Field-by-field analysis works best

A practical mental model

Tradeoffs to watch