Reference Entry
Analyzer and Tokenization Deep Dive
Analysis · indexing · order 20
Understand how tokenization and token filters change matching behavior, recall, and precision.
Analyzer and Tokenization Deep Dive
Search indexes do not store raw text as-is. They store analyzed terms. Querylight TS makes that analysis explicit so you can control how text is transformed during indexing and querying.
What an analyzer does
An analyzer turns input text into terms.
Conceptually:
- tokenize the text
- normalize or filter the tokens
- index the resulting terms
That choice affects:
- whether
Querylightmatchesquerylight - whether prefixes are easy to find
- whether typos can be recovered
- how large the index becomes
Keyword vs tokenized text
KeywordTokenizer treats the whole input as one token. That is useful for fields such as:
- tags
- section names
- ids
- exact categories
Free text fields usually need a normal analyzer so a sentence becomes multiple searchable terms.
Ngrams and edge ngrams
NgramTokenFilter helps with fuzzy recovery because it breaks text into overlapping slices.
import { Analyzer, NgramTokenFilter, TextFieldIndex } from "@tryformation/querylight-ts";
const fuzzyAnalyzer = new Analyzer(undefined, undefined, [new NgramTokenFilter(3)]);
const field = new TextFieldIndex(fuzzyAnalyzer, fuzzyAnalyzer);
This improves recall for misspellings, but it also:
- increases index size
- can introduce noisier matches
EdgeNgramsTokenFilter is different. It keeps prefixes from the start of a token and is therefore useful for autocomplete:
import { Analyzer, EdgeNgramsTokenFilter, TextFieldIndex } from "@tryformation/querylight-ts";
const suggestAnalyzer = new Analyzer(undefined, undefined, [new EdgeNgramsTokenFilter(2, 6)]);
const suggestField = new TextFieldIndex(suggestAnalyzer, suggestAnalyzer);
Analysis is a relevance decision
Analyzer choice is not just a technical detail. It changes what counts as “similar enough” to match.
Examples:
- keyword-style analysis favors precision
- broader tokenization favors recall
- ngrams can recover typos
- edge ngrams can make prefix suggestions feel fast
Field-by-field analysis works best
Different fields usually need different treatment.
title: normal text analysisbody: normal text analysistags: keyword-like analysissuggest: edge ngrams- typo-recovery field: ngrams
That is usually better than applying one global strategy to everything.
A practical mental model
Ask three questions for every field:
- Should this field behave like free text or exact metadata?
- Do I need typo recovery here?
- Do I need prefix suggestions here?
The answers usually tell you which analyzer shape you need.
Tradeoffs to watch
- More aggressive analysis improves recall but may lower precision.
- Ngram-heavy fields cost more memory.
- Very broad analysis can make short queries noisy.
Start with simple field-specific analyzers, then expand only when actual queries show gaps.