Reference Entry
Ask the Docs End to End
Demo Internals · advanced · order 10
How the demo builds semantic embeddings, ships them to the browser, and answers natural-language questions locally.
Relevant APIs
Ask the Docs End to End
The demo has two retrieval experiences:
- regular search, which is mostly lexical
- “Ask the Docs”, which turns a question into an embedding and matches it against precomputed documentation chunks
This article explains the semantic path from markdown files to in-browser answers.
Short introduction: what embeddings are
An embedding is a vector: a long array of numbers that represents the meaning of a text.
You can think of it like this:
- text goes into a model
- the model outputs numbers
- similar texts produce vectors that are close together
That lets you ask a question such as how do I preload search indexes in the browser? and still retrieve an article that mostly talks about build-time serialization and hydration, even if it does not use your exact wording.
The model used in the demo
The demo uses Xenova/all-MiniLM-L6-v2.
Why this model is a practical choice here:
- it is small enough to run in the browser
- it is available through
@huggingface/transformers - it produces useful sentence-level embeddings for short docs and chunked paragraphs
The semantic metadata lives in the demo source here:
That file defines:
- the model id
- chunk sizing constants
- markdown cleanup helpers
- article and chunk payload types used by both the build step and the browser runtime
Build-time flow
The build step reads all markdown docs, extracts metadata, builds lexical indexes, and precomputes semantic embeddings.
Relevant source:
The high-level flow is:
- Read every file in
docs/. - Parse frontmatter and markdown into
DocEntryrecords. - Turn each article into one article-level semantic text.
- Split each article into heading-aware chunks.
- Run the embedding model over the article text and each chunk.
- Store the vectors in generated demo JSON.
- Bundle that JSON into the demo app.
The Vite plugin in apps/demo/vite.config.ts runs this automatically:
- during local development
- during production builds
- when markdown docs change in dev mode
Why the docs are chunked
Whole-page embeddings are useful for related-article suggestions, but question answering works better when you search smaller pieces.
The demo therefore creates:
- article embeddings, used for related articles
- chunk embeddings, used for Ask the Docs answers
Chunks are built from heading sections and grouped paragraphs. That preserves enough structure to show a useful answer snippet while keeping each vector focused on one topic.
Browser-time flow
When a user switches to “Ask the Docs” and submits a question:
- The browser lazily loads the transformer pipeline.
- The question is embedded locally in the browser.
- The query vector is compared to the precomputed chunk vectors.
- The best-matching chunks are mapped back to documentation pages.
- The UI shows the chunk text as the answer preview.
Relevant source:
In the runtime code, look for:
embedSemanticQuery(...): runs the embedding model in the browsercreateSemanticRuntime(...): loads precomputed chunk vectors into aVectorFieldIndexgetSemanticQuestionResults(...): maps nearest chunk hits back to docsrenderAskResultsPage(...): renders the semantic matches
Running the model in the browser and in GitHub Actions
This demo intentionally uses the same model family in two places:
- at build time in Node.js to precompute embeddings for all docs
- in the browser to embed the user’s question on demand
That gives you a fully local semantic experience:
- the browser does not need a search backend
- the shipped vectors are already ready to query
- only the question embedding needs to be computed at interaction time
The CI workflow builds the demo on GitHub Actions:
During npm run build --workspace @querylight/demo, the demo-data build code runs, which means GitHub Actions can pre-calculate embeddings as part of the site build before deploying the static app.
Expected behavior
A lexical search for serialize index state and an Ask the Docs question like How do I build the index ahead of time and load it in the browser later? may both lead you to the same article, but they get there differently:
- lexical search depends on matching the actual terms
- Ask the Docs depends on semantic closeness between the question vector and the chunk vectors
That is why Ask the Docs can recover pages even when the wording is less exact.
Trade-offs
This design is simple and practical, but it is not magic:
- the first semantic query pays the cost of loading the model in the browser
- embeddings increase build time
- embeddings also increase the size of generated demo data
- semantic retrieval is approximate and should still be evaluated against real questions
For a documentation demo, those trade-offs are reasonable because the architecture stays fully static and easy to deploy.