Embeddings Workflows

Build applications using embeddings for semantic search, with single/batch generation and caching strategies.

Sections

Terminology & Glossary
📖 Documentation
Navigation
54 sectionsv0.1
📄 Embeddings Workflows — glaze help geppetto-tutorial-embeddings-workflows
geppetto-tutorial-embeddings-workflows

Embeddings Workflows

Build applications using embeddings for semantic search, with single/batch generation and caching strategies.

Tutorialgeppettotutorialembeddingssemantic-searchcaching

Embeddings Workflows

This tutorial teaches you how to use Geppetto's embeddings package to build semantic search and similarity applications. You'll learn single and batch embedding generation, caching strategies, and how to compute similarity between texts.

What You'll Build

A semantic document search application that:

  • Generates embeddings for a document collection
  • Caches embeddings to avoid redundant API calls
  • Finds similar documents based on query similarity
  • Uses batch processing for efficiency

Prerequisites

  • Basic Go knowledge
  • API key for OpenAI or a running Ollama instance
  • Understanding of what embeddings are (see Embeddings)

Learning Objectives

  • Create and configure embedding providers
  • Choose between memory and file caching
  • Generate single and batch embeddings
  • Compute cosine similarity for search
  • Use the Emrichen !Embeddings tag in templates

Step 1: Create an Embedding Provider

Choose between OpenAI (cloud) or Ollama (local):

OpenAI Provider

package main

import (
    "context"
    "fmt"

    "github.com/go-go-golems/geppetto/pkg/embeddings"
)

func main() {
    ctx := context.Background()
    resolved := resolveEmbeddingProfile("openai-embedding-small") // Loads ~/.config/pinocchio/profiles.yaml.
    if err := embeddings.ValidateInferenceSettingsForEmbeddings(resolved.FinalInferenceSettings); err != nil {
        panic(err)
    }
    provider, err := embeddings.NewSettingsFactoryFromInferenceSettings(resolved.FinalInferenceSettings).NewProvider()
    if err != nil {
        panic(err)
    }
    vector, err := provider.GenerateEmbedding(ctx, "Hello, world!")
    if err != nil {
        panic(err)
    }

    fmt.Printf("Generated %d-dimensional vector\n", len(vector))
    fmt.Printf("First 5 values: %v\n", vector[:5])
}

Ollama Provider (Local)

    // Create Ollama provider (runs locally)
    provider := embeddings.NewOllamaProvider(
        "http://localhost:11434",  // Ollama server URL
        "all-minilm",              // Model name
        384,                       // Dimensions (model-specific)
    )

Common models and dimensions:

ProviderModelDimensions
OpenAItext-embedding-3-small1536
OpenAItext-embedding-3-large3072
Ollamaall-minilm384
Ollamanomic-embed-text768

Step 2: Add In-Memory Caching

Wrap your provider with a cache to avoid redundant API calls:

    // Wrap with memory cache (stores up to 1000 embeddings)
    cachedProvider := embeddings.NewCachedProvider(provider, 1000)

    // First call - hits the API
    vec1, _ := cachedProvider.GenerateEmbedding(ctx, "Hello, world!")

    // Second call - served from cache (instant, free)
    vec2, _ := cachedProvider.GenerateEmbedding(ctx, "Hello, world!")

    // Check cache stats
    fmt.Printf("Cache size: %d/%d\n", cachedProvider.Size(), cachedProvider.MaxSize())

When to use memory cache:

  • Short-lived scripts
  • Tests
  • Same texts embedded multiple times in one run

Step 3: Add Persistent File Caching

For long-running applications or CLI tools that restart:

    // Wrap with disk cache
    diskProvider, err := embeddings.NewDiskCacheProvider(
        provider,
        embeddings.WithDirectory("./cache/embeddings"),  // Custom directory
        embeddings.WithMaxSize(1<<30),                   // 1GB max
        embeddings.WithMaxEntries(10000),                // 10,000 entries max
    )
    if err != nil {
        panic(err)
    }

    // Embeddings persist across program restarts
    vec, _ := diskProvider.GenerateEmbedding(ctx, "Hello, world!")

Default location: ~/.geppetto/cache/embeddings/<model>/

When to use file cache:

  • CLI tools run repeatedly
  • Static document collections
  • Development/testing iterations

Step 4: Batch Embedding Generation

For multiple texts, batch calls are more efficient:

    texts := []string{
        "Go is a statically typed language.",
        "Python is dynamically typed.",
        "Rust focuses on memory safety.",
        "JavaScript runs in browsers.",
    }

    // Generate all embeddings in one API call
    vectors, err := provider.GenerateBatchEmbeddings(ctx, texts)
    if err != nil {
        panic(err)
    }

    for i, vec := range vectors {
        fmt.Printf("%s → %d dimensions\n", texts[i][:20], len(vec))
    }

If your provider doesn't support batch natively:

    // Parallel generation with concurrency limit
    vectors, err := embeddings.ParallelGenerateBatchEmbeddings(
        ctx, 
        provider, 
        texts, 
        4,  // Max 4 concurrent requests
    )

Step 5: Compute Cosine Similarity

Find how similar two texts are:

import "math"

// Cosine similarity: 1.0 = identical, 0.0 = orthogonal, -1.0 = opposite
func cosineSimilarity(a, b []float32) float64 {
    var dot, normA, normB float64
    for i := range a {
        dot += float64(a[i]) * float64(b[i])
        normA += float64(a[i]) * float64(a[i])
        normB += float64(b[i]) * float64(b[i])
    }
    return dot / (math.Sqrt(normA) * math.Sqrt(normB))
}

func main() {
    ctx := context.Background()
    provider := createProvider()

    // Compare two texts
    vec1, _ := provider.GenerateEmbedding(ctx, "Golang is a programming language.")
    vec2, _ := provider.GenerateEmbedding(ctx, "Go is a compiled language.")
    vec3, _ := provider.GenerateEmbedding(ctx, "The weather is nice today.")

    fmt.Printf("Go vs Go: %.4f\n", cosineSimilarity(vec1, vec2))   // High similarity
    fmt.Printf("Go vs Weather: %.4f\n", cosineSimilarity(vec1, vec3)) // Low similarity
}

Output:

Go vs Go: 0.9234
Go vs Weather: 0.1876

Step 6: Build a Semantic Search System

Put it all together:

package main

import (
    "context"
    "fmt"
    "math"
    "sort"

    "github.com/go-go-golems/geppetto/pkg/embeddings"
)

type Document struct {
    ID        string
    Text      string
    Embedding []float32
}

type SearchResult struct {
    Document   Document
    Similarity float64
}

func main() {
    ctx := context.Background()

    // Create cached provider from a resolved profile. The profile stack supplies
    // provider credentials, model, and dimensions.
    resolved := resolveEmbeddingProfile("openai-embedding-small")
    provider, err := embeddings.NewSettingsFactoryFromInferenceSettings(resolved.FinalInferenceSettings).NewProvider()
    if err != nil {
        panic(err)
    }
    cachedProvider := embeddings.NewCachedProvider(provider, 1000)

    // Document collection
    documents := []Document{
        {ID: "1", Text: "Go is a statically typed, compiled language designed at Google."},
        {ID: "2", Text: "Python is an interpreted, high-level programming language."},
        {ID: "3", Text: "Rust is a systems programming language focused on safety."},
        {ID: "4", Text: "JavaScript is the language of the web browser."},
        {ID: "5", Text: "TypeScript adds static typing to JavaScript."},
    }

    // Pre-compute embeddings for all documents
    fmt.Println("Indexing documents...")
    for i := range documents {
        embedding, err := cachedProvider.GenerateEmbedding(ctx, documents[i].Text)
        if err != nil {
            panic(err)
        }
        documents[i].Embedding = embedding
    }

    // Search query
    query := "Which languages have static typing?"
    fmt.Printf("\nQuery: %s\n\n", query)

    queryEmbedding, _ := cachedProvider.GenerateEmbedding(ctx, query)

    // Compute similarities
    var results []SearchResult
    for _, doc := range documents {
        sim := cosineSimilarity(queryEmbedding, doc.Embedding)
        results = append(results, SearchResult{Document: doc, Similarity: sim})
    }

    // Sort by similarity (descending)
    sort.Slice(results, func(i, j int) bool {
        return results[i].Similarity > results[j].Similarity
    })

    // Print results
    fmt.Println("Results:")
    for i, r := range results {
        fmt.Printf("%d. [%.4f] %s\n", i+1, r.Similarity, r.Document.Text)
    }
}

func cosineSimilarity(a, b []float32) float64 {
    var dot, normA, normB float64
    for i := range a {
        dot += float64(a[i]) * float64(b[i])
        normA += float64(a[i]) * float64(a[i])
        normB += float64(b[i]) * float64(b[i])
    }
    return dot / (math.Sqrt(normA) * math.Sqrt(normB))
}

Output:

Indexing documents...

Query: Which languages have static typing?

Results:
1. [0.8912] Go is a statically typed, compiled language designed at Google.
2. [0.8654] TypeScript adds static typing to JavaScript.
3. [0.7823] Rust is a systems programming language focused on safety.
4. [0.6234] Python is an interpreted, high-level programming language.
5. [0.5987] JavaScript is the language of the web browser.

Step 7: Use Configuration-Based Setup

For CLI applications, use the settings factory:

import (
    "github.com/go-go-golems/geppetto/pkg/embeddings"
)

func createProviderFromProfile() (embeddings.Provider, error) {
    resolved := resolveEmbeddingProfile("openai-embedding-small") // Loads ~/.config/pinocchio/profiles.yaml.
    if err := embeddings.ValidateInferenceSettingsForEmbeddings(resolved.FinalInferenceSettings); err != nil {
        return nil, err
    }

    factory := embeddings.NewSettingsFactoryFromInferenceSettings(resolved.FinalInferenceSettings)
    return factory.NewProvider()
}

Step 8: Use Emrichen Templates

Generate embeddings inline in YAML templates:

# template.yaml
documents:
  - text: "Introduction to Go"
    embedding: !Embeddings
      text: "Introduction to Go"
      config:
        type: openai
        engine: text-embedding-3-small

  - text: "Python Tutorial"
    embedding: !Embeddings
      text: "Python Tutorial"

Process with Emrichen:

import "github.com/go-go-golems/geppetto/pkg/js"

// Register the !Embeddings tag
emrichen.RegisterTag("Embeddings", js.GetEmbeddingTagFunc(embeddingsConfig))

// Process template
result, err := emrichen.ProcessFile("template.yaml")

Cache Decision Guide

ScenarioCache TypeWhy
One-off scriptnoneNot worth caching
Unit testsmemoryFast, isolated
CLI tool (repeated runs)filePersists across runs
Long-running servermemoryIn-process performance
Large static corpusfileDon't re-embed on restart
Dynamic contentnone or memoryContent changes frequently

Performance Tips

  1. Batch when possible — 100 texts in one call is faster than 100 individual calls
  2. Cache aggressively — Embedding API calls are expensive
  3. Pre-compute static content — Index documents once, store embeddings
  4. Use smaller models for prototypingall-minilm (384d) is faster than text-embedding-3-large (3072d)
  5. Consider vector databases — For large collections, use Pinecone, Weaviate, or pgvector

Troubleshooting

ProblemCauseSolution
Dimension mismatchComparing vectors from different modelsUse same model for all vectors
API rate limitsToo many concurrent requestsUse ParallelGenerateBatchEmbeddings with low concurrency
Cache not workingWrong cache typeCheck CacheType in config
Slow first runCache emptyExpected; subsequent runs use cache
Ollama connection refusedServer not runningStart with ollama serve

See Also

  • Embeddings — Full embeddings reference
  • Example: geppetto/cmd/examples/ (check for embedding examples)