Practical guide to indexing, querying, batching, persistent indexes, vector search, and xgoja integration with the Bleve JavaScript module.
This guide explains how to use goja-bleve as a JavaScript search toolkit inside an xgoja runtime. It covers the whole workflow: choose an index shape, define mappings, index documents, run queries, use batches, manage persistent indexes, and decide when vector search is appropriate.
Use this guide when you are building an application script, a RAG retrieval prototype, an indexing utility, or a generated xgoja binary that should expose Bleve to JavaScript authors.
goja-bleve puts Bleve's Go search engine behind a JavaScript module:
const bleve = require("bleve");
The JavaScript API is intentionally builder-oriented. Builders let scripts describe Go-backed Bleve objects without exposing raw Go pointers or mutable internals. Terminal .build() calls produce opaque wrapper objects that can be passed to other builders and index methods.
The usual data flow is:
JavaScript documents
-> mapping builders
-> index builder
-> index.index(...) or batch.index(...)
-> query builders
-> search request builder
-> index.search(...)
-> JavaScript result object
The same module supports short-lived in-memory indexes and persistent on-disk indexes. It also exposes vector mapping and KNN request builders for binaries compiled with vector support.
Use an in-memory index when scripts are temporary, tests are deterministic, or documents are loaded fresh every run:
const index = bleve.memory()
.mapping(mapping)
.name("scratch")
.build();
Use a persistent index when you want to reuse indexed data across runs:
const index = bleve.create("./data/articles.bleve")
.mapping(mapping)
.build();
// Later:
const sameIndex = bleve.open("./data/articles.bleve").build();
Persistent indexes must be closed:
index.close();
Treat close as part of your script's normal lifecycle, not as optional cleanup. The runtime can clean up some open resources, but explicit close is easier to reason about and reduces file-lock surprises.
Mappings determine how documents are indexed. Start from the questions users will ask.
Use text() for human language:
const title = bleve.field().text().store(true).build();
const body = bleve.field().text().includeTermVectors(true).build();
Use keyword() for exact matching and faceted-style values:
const status = bleve.field().keyword().docValues(true).build();
Use numeric, datetime, boolean, IP, geo, and disabled fields when the document shape requires them:
const publishedAt = bleve.field().datetime().dateFormat("dateTimeOptional").build();
const views = bleve.field().number().build();
const archived = bleve.field().boolean().build();
const rawPayload = bleve.field().disabled().build();
Then assemble a document mapping:
const article = bleve.docMapping()
.field("title", title)
.field("body", body)
.field("status", status)
.field("publishedAt", publishedAt)
.build();
const mapping = bleve.mapping()
.defaultMapping(article)
.defaultField("body")
.defaultAnalyzer("standard")
.build();
The current API intentionally does not expose custom analyzer registration. Use built-in analyzer names and let Bleve validate the final mapping. If you need application-specific analysis, design it as a separate provider/host concern rather than hiding it in ad hoc JavaScript.
For individual documents:
index.index("article-1", {
title: "Search from JavaScript",
body: "Bleve runs inside a goja-hosted runtime.",
status: "published",
});
For updates, call index.index(id, doc) again with the same ID. For deletes:
index.delete("article-1");
For bulk indexing, use a batch:
const batch = index.batch();
for (const row of rows) {
if (row.deleted) {
batch.delete(row.id);
} else {
batch.index(row.id, row);
}
}
console.log(`queued ${batch.operationCount()} operations`);
batch.execute();
After execute(), do not reuse that batch. Create a new one. This single-use rule prevents subtle bugs where a script accidentally submits old operations twice.
Use match for analyzed full-text terms:
bleve.match("privacy preserving search").field("body").boost(2.0)
Use matchPhrase when order matters:
bleve.matchPhrase("vector search").field("title")
Use term, prefix, wildcard, regexp, and queryString for exact or parser-driven searches:
bleve.term("published").field("status")
bleve.prefix("goja").field("title")
bleve.queryString("title:goja +status:published")
Use Boolean composition when you need multiple conditions:
const query = bleve.bool()
.addMust(bleve.match("search").field("body"))
.addShould(bleve.match("goja").field("title"))
.addMustNot(bleve.term("archived").field("status"));
There are also convenience constructors:
bleve.conj(q1, q2) // conjunction
bleve.disj(q1, q2, q3) // disjunction
bleve.matchAll()
bleve.matchNone()
A query describes what to match. A search request describes how to execute and return it.
const request = bleve.search()
.query(query)
.from(0)
.size(20)
.fields(["title", "status", "publishedAt"])
.sort(["-publishedAt", "_score"])
.highlight(["body"], "html")
.explain(false)
.build();
const result = index.search(request);
Results are plain JavaScript-friendly objects:
{
total: 12,
maxScore: 1.74,
took: "3.2ms",
hits: [
{ id: "article-1", score: 1.74, fields: { title: "..." } }
]
}
Request only fields you need. Stored fields cost index space, and returning large fields can dominate script runtime.
Vector support is build-time gated. The API includes vector builders, but KNN execution requires a binary built with the vectors tag and FAISS linked in.
Define a vector field:
const embedding = bleve.field()
.vector(384)
.similarity("cosine")
.optimizedFor("recall")
.build();
Add it to a mapping:
const chunk = bleve.docMapping()
.field("text", bleve.field().text())
.field("embedding", embedding)
.build();
Run KNN:
const request = bleve.search()
.query(bleve.matchNone())
.knn("embedding", queryVector, 10, 1.0)
.build();
Run hybrid text + vector search:
const request = bleve.search()
.query(bleve.match("database migrations").field("text"))
.knn("embedding", queryVector, 20, 1.0)
.score("rrf")
.scoreRankConstant(60)
.scoreWindowSize(50)
.build();
Use RRF when text and vector scores are not directly comparable. Use RSF when score scales are meaningful enough for relative score fusion. Keep scoreWindowSize at least as large as the result size and large enough to include candidates from each retrieval leg.
A generated xgoja spec mounts the provider package and the JavaScript module alias:
packages:
- id: goja-bleve
import: github.com/go-go-golems/goja-bleve/pkg/xgoja/providers/bleve
modules:
- package: goja-bleve
name: bleve
as: bleve
Then scripts use:
const bleve = require("bleve");
If your runtime also needs file access, database access, LLM calls, or other modules, mount them alongside bleve. For RAG scripts, a common shape is:
const fs = require("fs");
const bleve = require("bleve");
const geppetto = require("geppetto");
Use host modules to read documents and produce embeddings. Use goja-bleve to index and retrieve.
__bleve_ref.| Problem | Cause | Solution |
|---|---|---|
| A field never matches | It may be disabled, not indexed, mapped as a keyword when you expect analyzed text, or queried with the wrong field name. | Inspect the mapping construction and start with a small matchAll() request returning stored fields. |
Returned fields are empty | The field was not stored or not requested. | Set .store(true) on the field mapping and call .fields([...]) on the search request. |
| Query builder panics or throws wrapper errors | A builder expected a Query, Mapping, DocumentMapping, or FieldMapping wrapper created by this module. | Always pass objects returned by bleve.* builders; do not construct wrapper-looking objects by hand. |
| KNN reports dimension mismatch | The query vector length does not match the mapped vector field dimensions. | Check .vector(dims) in the mapping and validate embedding lengths before search. |
| Hybrid results look text-only | KNN may not be contributing candidates, vector support may be absent, or the KNN field may be wrong. | Test pure KNN with .query(bleve.matchNone()), then add text and fusion once KNN works. |
| Reopening a persistent index fails after mapping changes | Existing index data was built with a different mapping. | Create a new index directory or rebuild the index after mapping changes. |
goja-bleve-getting-startedgoja-bleve-js-api-reference