📖 Documentation

PackageVersion

Navigation

5 sectionsv0.1

📄 Goja Bleve user guide — glaze help goja-bleve-user-guide

goja-bleve-user-guide

Goja Bleve user guide

Practical guide to indexing, querying, batching, persistent indexes, vector search, and xgoja integration with the Bleve JavaScript module.

Appgoja-bleveblevejavascriptsearchvector-searchxgojagoja-bleveevalrunreplsearchbatchvectoroutput

This guide explains how to use goja-bleve as a JavaScript search toolkit inside an xgoja runtime. It covers the whole workflow: choose an index shape, define mappings, index documents, run queries, use batches, manage persistent indexes, and decide when vector search is appropriate.

Use this guide when you are building an application script, a RAG retrieval prototype, an indexing utility, or a generated xgoja binary that should expose Bleve to JavaScript authors.

Mental model

goja-bleve puts Bleve's Go search engine behind a JavaScript module:

const bleve = require("bleve");

The JavaScript API is intentionally builder-oriented. Builders let scripts describe Go-backed Bleve objects without exposing raw Go pointers or mutable internals. Terminal .build() calls produce opaque wrapper objects that can be passed to other builders and index methods.

The usual data flow is:

JavaScript documents
  -> mapping builders
  -> index builder
  -> index.index(...) or batch.index(...)
  -> query builders
  -> search request builder
  -> index.search(...)
  -> JavaScript result object

The same module supports short-lived in-memory indexes and persistent on-disk indexes. It also exposes vector mapping and KNN request builders for binaries compiled with vector support.

Choose the right index lifecycle

Use an in-memory index when scripts are temporary, tests are deterministic, or documents are loaded fresh every run:

const index = bleve.memory()
  .mapping(mapping)
  .name("scratch")
  .build();

Use a persistent index when you want to reuse indexed data across runs:

const index = bleve.create("./data/articles.bleve")
  .mapping(mapping)
  .build();

// Later:
const sameIndex = bleve.open("./data/articles.bleve").build();

Persistent indexes must be closed:

index.close();

Treat close as part of your script's normal lifecycle, not as optional cleanup. The runtime can clean up some open resources, but explicit close is easier to reason about and reduces file-lock surprises.

Design mappings from query behavior

Mappings determine how documents are indexed. Start from the questions users will ask.

Use text() for human language:

const title = bleve.field().text().store(true).build();
const body = bleve.field().text().includeTermVectors(true).build();

Use keyword() for exact matching and faceted-style values:

const status = bleve.field().keyword().docValues(true).build();

Use numeric, datetime, boolean, IP, geo, and disabled fields when the document shape requires them:

const publishedAt = bleve.field().datetime().dateFormat("dateTimeOptional").build();
const views = bleve.field().number().build();
const archived = bleve.field().boolean().build();
const rawPayload = bleve.field().disabled().build();

Then assemble a document mapping:

const article = bleve.docMapping()
  .field("title", title)
  .field("body", body)
  .field("status", status)
  .field("publishedAt", publishedAt)
  .build();

const mapping = bleve.mapping()
  .defaultMapping(article)
  .defaultField("body")
  .defaultAnalyzer("standard")
  .build();

The current API intentionally does not expose custom analyzer registration. Use built-in analyzer names and let Bleve validate the final mapping. If you need application-specific analysis, design it as a separate provider/host concern rather than hiding it in ad hoc JavaScript.

Index documents safely

For individual documents:

index.index("article-1", {
  title: "Search from JavaScript",
  body: "Bleve runs inside a goja-hosted runtime.",
  status: "published",
});

For updates, call index.index(id, doc) again with the same ID. For deletes:

index.delete("article-1");

For bulk indexing, use a batch:

const batch = index.batch();

for (const row of rows) {
  if (row.deleted) {
    batch.delete(row.id);
  } else {
    batch.index(row.id, row);
  }
}

console.log(`queued ${batch.operationCount()} operations`);
batch.execute();

After execute(), do not reuse that batch. Create a new one. This single-use rule prevents subtle bugs where a script accidentally submits old operations twice.

Build queries intentionally

Use match for analyzed full-text terms:

bleve.match("privacy preserving search").field("body").boost(2.0)

Use matchPhrase when order matters:

bleve.matchPhrase("vector search").field("title")

Use term, prefix, wildcard, regexp, and queryString for exact or parser-driven searches:

bleve.term("published").field("status")
bleve.prefix("goja").field("title")
bleve.queryString("title:goja +status:published")

Use Boolean composition when you need multiple conditions:

const query = bleve.bool()
  .addMust(bleve.match("search").field("body"))
  .addShould(bleve.match("goja").field("title"))
  .addMustNot(bleve.term("archived").field("status"));

There are also convenience constructors:

bleve.conj(q1, q2)      // conjunction
bleve.disj(q1, q2, q3)  // disjunction
bleve.matchAll()
bleve.matchNone()

Shape search requests

A query describes what to match. A search request describes how to execute and return it.

const request = bleve.search()
  .query(query)
  .from(0)
  .size(20)
  .fields(["title", "status", "publishedAt"])
  .sort(["-publishedAt", "_score"])
  .highlight(["body"], "html")
  .explain(false)
  .build();

const result = index.search(request);

Results are plain JavaScript-friendly objects:

{
  total: 12,
  maxScore: 1.74,
  took: "3.2ms",
  hits: [
    { id: "article-1", score: 1.74, fields: { title: "..." } }
  ]
}

Request only fields you need. Stored fields cost index space, and returning large fields can dominate script runtime.

Use vectors and hybrid search when the host supports them

Vector support is build-time gated. The API includes vector builders, but KNN execution requires a binary built with the vectors tag and FAISS linked in.

Define a vector field:

const embedding = bleve.field()
  .vector(384)
  .similarity("cosine")
  .optimizedFor("recall")
  .build();

Add it to a mapping:

const chunk = bleve.docMapping()
  .field("text", bleve.field().text())
  .field("embedding", embedding)
  .build();

Run KNN:

const request = bleve.search()
  .query(bleve.matchNone())
  .knn("embedding", queryVector, 10, 1.0)
  .build();

Run hybrid text + vector search:

const request = bleve.search()
  .query(bleve.match("database migrations").field("text"))
  .knn("embedding", queryVector, 20, 1.0)
  .score("rrf")
  .scoreRankConstant(60)
  .scoreWindowSize(50)
  .build();

Use RRF when text and vector scores are not directly comparable. Use RSF when score scales are meaningful enough for relative score fusion. Keep scoreWindowSize at least as large as the result size and large enough to include candidates from each retrieval leg.

Integrate with xgoja

A generated xgoja spec mounts the provider package and the JavaScript module alias:

packages:
  - id: goja-bleve
    import: github.com/go-go-golems/goja-bleve/pkg/xgoja/providers/bleve
modules:
  - package: goja-bleve
    name: bleve
    as: bleve

Then scripts use:

const bleve = require("bleve");

If your runtime also needs file access, database access, LLM calls, or other modules, mount them alongside bleve. For RAG scripts, a common shape is:

const fs = require("fs");
const bleve = require("bleve");
const geppetto = require("geppetto");

Use host modules to read documents and produce embeddings. Use goja-bleve to index and retrieve.

Operational guidelines

Keep index paths explicit and outside source directories.
Use in-memory indexes for examples and tests.
Store only fields that must appear in search results.
Close persistent indexes.
Keep vector dimensions fixed per field and validate embeddings before indexing.
Prefer batches for bulk indexing.
Treat mapping changes as index-version changes; rebuild indexes when mappings change materially.
Keep JavaScript wrappers opaque. Do not inspect or modify internal properties such as __bleve_ref.

Troubleshooting

Problem	Cause	Solution
A field never matches	It may be disabled, not indexed, mapped as a keyword when you expect analyzed text, or queried with the wrong field name.	Inspect the mapping construction and start with a small `matchAll()` request returning stored fields.
Returned `fields` are empty	The field was not stored or not requested.	Set `.store(true)` on the field mapping and call `.fields([...])` on the search request.
Query builder panics or throws wrapper errors	A builder expected a `Query`, `Mapping`, `DocumentMapping`, or `FieldMapping` wrapper created by this module.	Always pass objects returned by `bleve.*` builders; do not construct wrapper-looking objects by hand.
KNN reports dimension mismatch	The query vector length does not match the mapped vector field dimensions.	Check `.vector(dims)` in the mapping and validate embedding lengths before search.
Hybrid results look text-only	KNN may not be contributing candidates, vector support may be absent, or the KNN field may be wrong.	Test pure KNN with `.query(bleve.matchNone())`, then add text and fusion once KNN works.
Reopening a persistent index fails after mapping changes	Existing index data was built with a different mapping.	Create a new index directory or rebuild the index after mapping changes.

Goja Bleve user guide

Practical guide to indexing, querying, batching, persistent indexes, vector search, and xgoja integration with the Bleve JavaScript module.

Sections

Goja Bleve user guide

Mental model

Choose the right index lifecycle

Design mappings from query behavior

Index documents safely

Build queries intentionally

Shape search requests

Use vectors and hybrid search when the host supports them

Integrate with xgoja

Operational guidelines

Troubleshooting

See Also