Reference Entry
Relevance Tuning with BM25, TF-IDF, and RRF
Ranking · advanced · order 30
Choose ranking strategies deliberately and combine lexical, field, feature, and scripted signals without losing control.
Relevant APIs
Relevance Tuning with BM25, TF-IDF, and RRF
Good search is not just about matching documents. It is about ordering them well.
Querylight TS gives you three useful tools for that:
- TF-IDF for classic term weighting
- BM25 for stronger length normalization and more modern lexical ranking
- reciprocal rank fusion for combining different ranked lists
On top of that, you can now tune within a ranked result set using:
DisMaxQueryfor best-field scoringBoostingQueryfor soft demotionRankFeatureQueryfor numeric feature influenceDistanceFeatureQueryfor recency or closeness boostsScriptScoreQuerywhen you need custom JS scoring logic
When to prefer BM25
BM25 is usually the better default for full-text search over titles, summaries, and bodies.
It tends to behave better when:
- document lengths vary a lot
- query terms repeat unevenly
- you want more Lucene-like lexical scoring
When TF-IDF is still useful
TF-IDF is simpler and sometimes easier to reason about for small corpora or experiments. If your documents are short and fairly uniform, the difference may not be dramatic.
Ranking is field-sensitive
Not every field should contribute equally.
titleusually deserves stronger influencebodyprovides recalltagsandsectionoften work better as filters than scoring drivers
That is why schema design and ranking design are tightly connected.
Use RRF when signals disagree
Sometimes you have more than one retrieval strategy:
- lexical search over text
- typo recovery over ngrams
- vector similarity
- geo or filtered candidate lists
Those scores are not directly comparable. reciprocalRankFusion solves that by combining rank positions instead of raw scores.
A practical hybrid pattern
- run a lexical query over
titleandbody - run a fuzzy query over an ngram field
- optionally run vector retrieval
- fuse the ranked lists with RRF
This often produces better top results than trying to force all behavior into one query.
Prefer best-field scoring when clauses overlap
If the same idea is searched across several fields, additive bool scoring can over-reward documents that repeat the same terms everywhere.
Use DisMaxQuery when you want the strongest field to dominate:
import { DisMaxQuery, MatchQuery, OP } from "@tryformation/querylight-ts";
const query = new DisMaxQuery([
new MatchQuery("title", "portable search", OP.AND, false, 3.0),
new MatchQuery("tagline", "portable search", OP.AND, false, 2.0),
new MatchQuery("body", "portable search", OP.AND, false, 1.0)
], 0.2);
Soft-demote instead of excluding
Use BoostingQuery when a document is still acceptable but should lose rank because of some secondary signal:
import { BoostingQuery, MatchQuery, TermQuery } from "@tryformation/querylight-ts";
const query = new BoostingQuery(
new MatchQuery("title", "querylight"),
new TermQuery("tags", "deprecated"),
0.25
);
Use numeric and date features directly
For business metrics and time-aware ranking, map fields with NumericFieldIndex or DateFieldIndex.
Then you can use:
RankFeatureQueryfor signals such as popularity, clicks, or qualityDistanceFeatureQueryfor recency or numeric closeness
import {
DateFieldIndex,
DistanceFeatureQuery,
DocumentIndex,
NumericFieldIndex,
RankFeatureQuery
} from "@tryformation/querylight-ts";
const index = new DocumentIndex({
popularity: new NumericFieldIndex(),
publishedAt: new DateFieldIndex()
});
const popularityBoost = new RankFeatureQuery("popularity");
const recencyBoost = new DistanceFeatureQuery(
"publishedAt",
new Date("2025-01-01T00:00:00.000Z"),
7 * 24 * 60 * 60 * 1000
);
Use script scoring sparingly
ScriptScoreQuery lets you write a JavaScript function that receives the document, the current _score, and helpers such as numericValue(field).
That is useful when:
- you need a one-off ranking formula
- you want to mix base lexical score with a business metric
- the scoring rule is too specific for a built-in query
import { ScriptScoreQuery, TermQuery } from "@tryformation/querylight-ts";
const query = new ScriptScoreQuery(
new TermQuery("title", "querylight"),
({ score, numericValue }) => score * (numericValue("popularity") ?? 1)
);
Tuning questions to ask
When results feel wrong, check:
- Is the field design right?
- Is the query too broad?
- Is one field dominating too much?
- Should this signal be fused separately instead?
Keep tuning empirical
Do not guess from theory alone. Collect representative queries and inspect:
- the top 5 results
- where obvious results land
- whether short exact matches are being buried
- whether fuzzy/vector behavior introduces noise
Relevance tuning is iterative. Stable schemas and realistic test queries matter more than clever scoring tricks.