Understanding and Using the Geppetto Inference Engine Architecture (Turn-based)

A comprehensive guide to the Turn-based inference engine architecture, covering engines, streaming, and provider implementations.

Sections

Terminology & Glossary
📖 Documentation
Navigation
58 sectionsv0.1
📄 Understanding and Using the Geppetto Inference Engine Architecture (Turn-based) — glaze help geppetto-inference-engines
geppetto-inference-engines

Understanding and Using the Geppetto Inference Engine Architecture (Turn-based)

A comprehensive guide to the Turn-based inference engine architecture, covering engines, streaming, and provider implementations.

Tutorialgeppettoinferenceenginesturnstoolsarchitecturetutorial

Understanding and Using the Geppetto Inference Engine Architecture

Audience: Developers familiar with Go who want to embed AI capabilities into their applications using Geppetto.<br/> Outcome: You will understand the architecture's core concepts and learn how to instantiate engines, orchestrate tool calls, and apply best practices in production.

30-Second Overview

// 1. Create an engine from configuration
engine, _ := factory.NewEngineFromParsedValues(parsedValues)

// 2. Build a Turn with your prompt
turn := &turns.Turn{}
turns.AppendBlock(turn, turns.NewSystemTextBlock("You are a helpful assistant."))
turns.AppendBlock(turn, turns.NewUserTextBlock("Hello!"))

// 3. Run inference
result, _ := engine.RunInference(ctx, turn)

// 4. Read the response
for _, block := range result.Blocks {
    if block.Kind == turns.BlockKindLLMText {
        fmt.Println(block.Payload[turns.PayloadKeyText])
    }
}

That's it. The engine handles provider-specific API calls, streaming, and response parsing. You work with Turns and Blocks.

For chat-style, multi-turn apps, prefer session.Session (it centralizes “clone latest + append user prompt” via AppendNewTurnFromUserPrompt(s) and runs inference against the latest appended turn in-place via StartInference).

Table of Contents

  1. Core Architecture Principles
  2. The Engine Interface
  3. Creating Engines with Factories
  4. Basic Inference Without Tools
  5. Tool Calling with Helpers
  6. Provider-Specific Implementations
  7. Middleware and Cross-Cutting Concerns
  8. Testing and Mocking
  9. Best Practices
  10. Debugging and Troubleshooting
  11. Conclusion

This tutorial explains the Turn-based inference architecture in Geppetto. Engines operate on a Turn (ordered Blocks plus metadata), handle provider I/O, and publish streaming events via sinks. Tool orchestration can be handled by middleware or helpers. The result is simpler, more testable, and provider-agnostic.

Packages

import (
    "github.com/go-go-golems/geppetto/pkg/inference/engine"
    "github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
    "github.com/go-go-golems/geppetto/pkg/inference/middleware"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

Core Architecture Principles

The Geppetto inference architecture is built around a clean separation of concerns:

  • Engines: Handle provider-specific API calls and streaming (emit events)
  • Tool Helpers: Manage tool calling orchestration and workflows
  • Factories: Create engines from parsed values
  • Middleware: Add cross-cutting concerns like logging and event publishing

Key Benefits

  • Simplicity: Single RunInference method on engines
  • Provider Agnostic: Works with OpenAI, Claude, Gemini, or any provider
  • Testable: Easy to mock engines for testing
  • Composable: Mix and match engines, helpers, and middleware

Context-Based Dependency Injection

Geppetto uses context.Context to carry runtime dependencies rather than global state or struct fields:

  • Event sinks: events.WithEventSinks(ctx, sink) — engines and middleware publish streaming events to sinks found on the context.
  • Tool registries: tools.WithRegistry(ctx, registry) — engines discover available tools from the context.
  • Snapshot hooks: toolloop.WithTurnSnapshotHook(ctx, hook) — the tool loop invokes snapshot callbacks found on the context.

This pattern avoids global state, makes testing straightforward (just pass a different context), and supports multiple parallel inference runs with independent configuration.

The Engine Interface

The heart of the architecture is the simple Engine interface (no explicit streaming method; streaming happens when sinks are configured on the engine):

import (
    "context"
    "github.com/go-go-golems/geppetto/pkg/inference/engine"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

// Engine processes a Turn and returns the updated Turn.
type Engine interface {
    RunInference(ctx context.Context, t *turns.Turn) (*turns.Turn, error)
}

Canonical Inference Result Metadata

RunInference remains the core provider contract. In addition, Geppetto exposes a helper that returns a normalized inference outcome and persists it on the turn:

out, result, err := engine.RunInferenceWithResult(ctx, eng, turn)
if err != nil {
    return err
}
_ = out
fmt.Println(result.StopReason, result.FinishClass, result.Truncated)

The helper guarantees a canonical metadata envelope on Turn.Metadata via turns.KeyTurnMetaInferenceResult. Engines that already persist canonical inference metadata keep their metadata normalized; bare custom engines that do not will still get a minimal synthesized result.

Engine Responsibilities

Engines focus solely on API communication:

  1. API Calls: Make provider-specific HTTP requests
  2. Response Parsing: Convert API responses to Turn blocks (llm_text, tool_call)
  3. Streaming: Publish events for real-time updates
  4. Error Handling: Manage API-level errors and retries

Engines do NOT handle:

  • Tool execution
  • Tool calling loops
  • Complex orchestration logic

Creating Engines with Factories

The factory pattern creates engines from parsed values, providing a provider-agnostic way to instantiate engines:

package main

import (
    "context"
    "fmt"

    "github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
    "github.com/go-go-golems/glazed/pkg/cmds/values"
)

func createEngine(parsedValues *values.Values) (engine.Engine, error) {
    // Create engine from configuration - works with any provider
    baseEngine, err := factory.NewEngineFromParsedValues(parsedValues)
    if err != nil {
        return nil, fmt.Errorf("failed to create engine: %w", err)
    }

    return baseEngine, nil
}

Programmatic Engine Creation (without CLI)

If you are building a library, a test harness, or an embedded tool and do not want the full CLI framework (clay, glazed, cobra), use factory.NewEngineFromStepSettings with a manually constructed StepSettings:

import (
    "context"
    "os"

    "github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
    "github.com/go-go-golems/geppetto/pkg/steps/ai/settings"
    "github.com/go-go-golems/geppetto/pkg/steps/ai/types"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

func programmaticInference(ctx context.Context, prompt string) error {
    apiType := types.ApiTypeOpenAI
    model := "gpt-4o-mini"
    temp := 0.7

    stepSettings, err := settings.NewStepSettings()
    if err != nil {
        return err
    }

    stepSettings.Chat.ApiType = &apiType
    stepSettings.Chat.Engine = &model
    stepSettings.Chat.Temperature = &temp
    stepSettings.Chat.APIKeys = map[string]string{
        "openai-api-key": os.Getenv("OPENAI_API_KEY"),
    }

    eng, err := factory.NewEngineFromStepSettings(stepSettings)
    if err != nil {
        return err
    }

    seed := &turns.Turn{}
    turns.AppendBlock(seed, turns.NewSystemTextBlock("You are a helpful assistant."))
    turns.AppendBlock(seed, turns.NewUserTextBlock(prompt))

    result, err := eng.RunInference(ctx, seed)
    if err != nil {
        return err
    }

    for _, block := range result.Blocks {
        if block.Kind == turns.BlockKindLLMText {
            if text, ok := block.Payload[turns.PayloadKeyText].(string); ok {
                fmt.Println(text)
            }
        }
    }
    return nil
}

StepSettings Reference

StepSettings is the central configuration object. It carries all provider-specific settings:

type StepSettings struct {
    API        *APISettings              // API keys and base URLs for all providers
    Chat       *ChatSettings             // Provider type, model, temperature, top-p, stop sequences
    OpenAI     *openai.Settings          // OpenAI-specific: frequency/presence penalty, logit bias, reasoning
    Client     *ClientSettings           // HTTP client: timeout, user agent
    Claude     *claude.Settings          // Claude-specific: top-k, user ID
    Gemini     *gemini.Settings          // Gemini-specific settings
    Ollama     *ollama.Settings          // Ollama-specific settings
    Embeddings *config.EmbeddingsConfig  // Embeddings provider config
    Inference  *engine.InferenceConfig   // Per-turn overrides: thinking budget, reasoning effort
}

The most important fields for basic usage:

  • Chat.ApiType — which provider to use: "openai", "openai-responses", "claude", "gemini" (see types.ApiType* constants)
  • Chat.Engine — the model name, e.g. "gpt-4o-mini", "claude-sonnet-4-20250514", "gemini-pro"
  • Chat.Temperature — sampling temperature
  • Chat.APIKeys — map of API keys, keyed by provider prefix: "openai-api-key", "claude-api-key", "gemini-api-key"

You can also load settings from YAML:

f, _ := os.Open("settings.yaml")
stepSettings, err := settings.NewStepSettingsFromYAML(f)

Provider Examples

Claude:

apiType := types.ApiTypeClaude
model := "claude-sonnet-4-20250514"
stepSettings.Chat.ApiType = &apiType
stepSettings.Chat.Engine = &model
stepSettings.Chat.APIKeys = map[string]string{
    "claude-api-key": os.Getenv("ANTHROPIC_API_KEY"),
}

Gemini:

apiType := types.ApiTypeGemini
model := "gemini-pro"
stepSettings.Chat.ApiType = &apiType
stepSettings.Chat.Engine = &model
stepSettings.Chat.APIKeys = map[string]string{
    "gemini-api-key": os.Getenv("GOOGLE_API_KEY"),
}

Engine Options

Provider engines are created without options. Event sinks are attached to the runtime context.Context:

func createEngine(parsedValues *values.Values) (engine.Engine, error) {
    return factory.NewEngineFromParsedValues(parsedValues)
}

func runWithSinks(ctx context.Context, eng engine.Engine, sink events.EventSink, seed *turns.Turn) (*turns.Turn, error) {
    runCtx := events.WithEventSinks(ctx, sink)
    return eng.RunInference(runCtx, seed)
}

Basic Inference Without Tools

For simple text generation without tool calling:

import (
    "context"
    "fmt"
    "github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

func simpleInference(ctx context.Context, parsedValues *values.Values, prompt string) error {
    e, err := factory.NewEngineFromParsedValues(parsedValues)
    if err != nil { return fmt.Errorf("failed to create engine: %w", err) }

    seed := &turns.Turn{}
    turns.AppendBlock(seed, turns.NewSystemTextBlock("You are a helpful assistant."))
    turns.AppendBlock(seed, turns.NewUserTextBlock(prompt))

    updated, err := e.RunInference(ctx, seed)
    if err != nil { return fmt.Errorf("inference failed: %w", err) }

    for _, block := range updated.Blocks {
        if block.Kind != turns.BlockKindLLMText {
            continue
        }
        if text, ok := block.Payload[turns.PayloadKeyText].(string); ok {
            fmt.Println(text)
        }
    }
    return nil
}

Tool Calling with the Tool Loop (Per-Turn tools)

Geppetto’s canonical tool orchestration lives in toolloop.Loop. Engines focus on provider I/O, while the loop handles extracting tool calls, executing tools, appending tool results, and iterating.

Providers learn about available tools from the runtime registry attached to context.Context (see tools.WithRegistry) plus any serializable tool config stored on Turn.Data (written automatically by the loop via engine.KeyToolConfig).

Tool Calling Building Blocks

The main building blocks are:

  • toolblocks.ExtractPendingToolCalls: find tool calls without matching tool_use blocks
  • tools.NewDefaultToolExecutor: execute tool calls against a registry
  • toolblocks.AppendToolResultsBlocks: append tool_use blocks from results
  • toolloop.Loop.RunLoop: complete automated Turn-based workflow

Setting Up Tools

First, create and register your tools:

package main

import (
    "context"
    "encoding/json"

    "github.com/go-go-golems/geppetto/pkg/inference/tools"
)

// WeatherRequest represents the input for the weather tool
type WeatherRequest struct {
    Location string `json:"location" jsonschema:"required,description=The city to get weather for"`
    Units    string `json:"units,omitempty" jsonschema:"description=Temperature units,default=celsius,enum=celsius,enum=fahrenheit"`
}

// WeatherResponse represents the weather tool's response
type WeatherResponse struct {
    Location    string  `json:"location"`
    Temperature float64 `json:"temperature"`
    Conditions  string  `json:"conditions"`
    Units       string  `json:"units"`
}

// weatherTool is a mock weather tool
func weatherTool(req WeatherRequest) WeatherResponse {
    // Mock implementation
    return WeatherResponse{
        Location:    req.Location,
        Temperature: 22.0,
        Conditions:  "Sunny",
        Units:       req.Units,
    }
}

func setupTools() (tools.ToolRegistry, error) {
    // Create registry
    registry := tools.NewInMemoryToolRegistry()

    // Create tool definition from function
    weatherToolDef, err := tools.NewToolFromFunc(
        "get_weather",
        "Get current weather information for a specific location",
        weatherTool,
    )
    if err != nil {
        return nil, err
    }

    // Register tool
    err = registry.RegisterTool("get_weather", *weatherToolDef)
    if err != nil {
        return nil, err
    }

    return registry, nil
}

Manual Tool Calling

For fine-grained control, use the Turn helpers directly:

import (
    "context"
    "encoding/json"

    "github.com/go-go-golems/geppetto/pkg/inference/engine"
    "github.com/go-go-golems/geppetto/pkg/turns/toolblocks"
    "github.com/go-go-golems/geppetto/pkg/inference/tools"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

func manualToolCalling(ctx context.Context, eng engine.Engine, registry tools.ToolRegistry, seed *turns.Turn) (*turns.Turn, error) {
    // Run inference
    updated, err := eng.RunInference(ctx, seed)
    if err != nil {
        return nil, err
    }

    calls := toolblocks.ExtractPendingToolCalls(updated)
    if len(calls) == 0 {
        return updated, nil
    }

    exec := tools.NewDefaultToolExecutor(tools.DefaultToolConfig())
    var execCalls []tools.ToolCall
    for _, call := range calls {
        args, _ := json.Marshal(call.Arguments)
        execCalls = append(execCalls, tools.ToolCall{
            ID: call.ID, Name: call.Name, Arguments: args,
        })
    }

    results, err := exec.ExecuteToolCalls(ctx, execCalls, registry)
    if err != nil {
        return nil, err
    }

    var shared []toolblocks.ToolResult
    for _, res := range results {
        if res == nil {
            continue
        }
        content := ""
        if res.Result != nil {
            if b, err := json.Marshal(res.Result); err == nil {
                content = string(b)
            }
        }
        shared = append(shared, toolblocks.ToolResult{
            ID: res.ID, Content: content, Error: res.Error,
        })
    }

    toolblocks.AppendToolResultsBlocks(updated, shared)
    return updated, nil
}

Automated Tool Calling Loop

For most use cases, use the automated loop:

import (
    "context"
    "time"

    "github.com/go-go-golems/geppetto/pkg/inference/engine"
    "github.com/go-go-golems/geppetto/pkg/inference/toolloop"
    "github.com/go-go-golems/geppetto/pkg/inference/tools"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

func automatedToolCalling(ctx context.Context, eng engine.Engine, registry tools.ToolRegistry, seed *turns.Turn) (*turns.Turn, error) {
    loop := toolloop.New(
        toolloop.WithEngine(eng),
        toolloop.WithRegistry(registry),
        toolloop.WithLoopConfig(toolloop.NewLoopConfig().WithMaxIterations(5)),
        toolloop.WithToolConfig(tools.DefaultToolConfig().
            WithExecutionTimeout(30*time.Second).
            WithMaxParallelTools(3).
            WithToolChoice(tools.ToolChoiceAuto)),
    )

    return loop.RunLoop(ctx, seed)
}

Complete Tool Calling Example (with per-Turn tools)

Here's a complete example showing tool calling with streaming events (engine emits start/partial/final). Tools are attached to the Turn instead of the Engine.

package main

import (
    "context"
    "fmt"
    "io"
    "time"

    "github.com/go-go-golems/geppetto/pkg/events"
    "github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
    "github.com/go-go-golems/geppetto/pkg/inference/middleware"
    "github.com/go-go-golems/geppetto/pkg/inference/toolloop"
    "github.com/go-go-golems/geppetto/pkg/inference/tools"
    "github.com/go-go-golems/geppetto/pkg/turns"
    "golang.org/x/sync/errgroup"
)

func completeToolCallingExample(ctx context.Context, parsedValues *values.Values, prompt string, w io.Writer) error {
    // 1. Create event router for streaming
    router, err := events.NewEventRouter()
    if err != nil {
        return fmt.Errorf("failed to create event router: %w", err)
    }
    defer router.Close()

    // 2. Add console printer for events (or use structured printer)
    // Handler signature is func(*message.Message) error
    router.AddHandler("chat", "chat", events.StepPrinterFunc("", w))
    // Alternative structured printer:
    // printer := events.NewStructuredPrinter(w, events.PrinterOptions{Format: events.FormatText})
    // router.AddHandler("chat", "chat", printer)

    // 3. Create watermill sink for publishing events
    watermillSink := middleware.NewWatermillSink(router.Publisher, "chat")

    // 4. Create engine (sinks are attached at runtime via context)
    baseEngine, err := factory.NewEngineFromParsedValues(parsedValues)
    if err != nil {
        return fmt.Errorf("failed to create engine: %w", err)
    }

    // 5. Set up tools
    registry := tools.NewInMemoryToolRegistry()

    // Register weather tool (from previous example)
    weatherToolDef, err := tools.NewToolFromFunc(
        "get_weather",
        "Get current weather information for a specific location",
        weatherTool,
    )
    if err != nil {
        return fmt.Errorf("failed to create weather tool: %w", err)
    }

    err = registry.RegisterTool("get_weather", *weatherToolDef)
    if err != nil {
        return fmt.Errorf("failed to register weather tool: %w", err)
    }

    // 6. Build seed Turn
    seed := &turns.Turn{Data: map[turns.TurnDataKey]any{}}
    turns.AppendBlock(seed, turns.NewSystemTextBlock(
        "You are a helpful assistant with access to weather information. Use the get_weather tool when users ask about weather.",
    ))
    turns.AppendBlock(seed, turns.NewUserTextBlock(prompt))

    // 7. Run inference with streaming in parallel
    eg := errgroup.Group{}
    ctx, cancel := context.WithCancel(ctx)
    defer cancel()

    // Start event router
    eg.Go(func() error {
        defer cancel()
        return router.Run(ctx)
    })

    // Run inference with tool calling
    eg.Go(func() error {
        defer cancel()
        <-router.Running() // Wait for router to be ready

        // Attach engine sink to context so engine can stream
        runCtx := events.WithEventSinks(ctx, watermillSink)

        loop := toolloop.New(
            toolloop.WithEngine(baseEngine),
            toolloop.WithRegistry(registry),
            toolloop.WithLoopConfig(toolloop.NewLoopConfig().WithMaxIterations(5)),
            toolloop.WithToolConfig(tools.DefaultToolConfig().
                WithExecutionTimeout(30*time.Second).
                WithMaxParallelTools(3).
                WithToolChoice(tools.ToolChoiceAuto)),
        )

        // Run complete tool calling workflow
        updatedTurn, err := loop.RunLoop(runCtx, seed)
        if err != nil {
            return fmt.Errorf("tool calling failed: %w", err)
        }

        // Process final results
        for _, block := range updatedTurn.Blocks {
            if block.Kind == turns.BlockKindLLMText {
                if text, ok := block.Payload[turns.PayloadKeyText].(string); ok {
                    fmt.Fprintln(w, text)
                }
            }
        }

        return nil
    })

    return eg.Wait()
}

Important: events.NewEventRouter() defaults to an in-memory gochannel pub/sub with publish→ACK blocking. For streaming UIs (many partial events) or slow handlers (DB/UI), you may need to configure buffering and disable publish blocking (see glaze help geppetto-events-streaming-watermill), or use an external transport.

Complete Runtime Flow

The sections above describe individual components. Here is how they connect into a single request flow, from user prompt to final response, in a multi-turn application:

1. Session.AppendNewTurnFromUserPrompt("question")
   ├── Clones the latest turn (preserving full conversation history)
   ├── Appends a new user block
   └── Assigns a new TurnID

2. Session.StartInference(ctx)
   ├── Creates an ExecutionHandle (tracks async result)
   └── Launches goroutine with runner.RunInference()

3. Runner setup
   ├── Attaches event sinks to context
   ├── Sets SessionID and InferenceID on Turn metadata
   └── Creates toolloop.Loop (if tool registry is present)

4. Tool loop iterates (up to maxIterations):
   │
   ├─ a. Snapshot: "pre_inference"
   │     Turn captured before any processing
   │
   ├─ b. Middleware chain (pre-processing)
   │     System prompt → agent mode → tool reorder → ...
   │     Each middleware can inspect/mutate the Turn
   │
   ├─ c. Engine.RunInference()
   │     ├── Translates Turn blocks to provider wire format
   │     ├── Calls LLM API
   │     ├── Streams events: start → delta → delta → ...
   │     └── Appends output blocks: llm_text, tool_call
   │
   ├─ d. Middleware chain (post-processing)
   │     Each middleware can inspect/mutate the result
   │
   ├─ e. Snapshot: "post_inference"
   │     Turn captured with model output
   │
   ├─ f. Extract pending tool calls
   │     If none: loop exits (done)
   │
   ├─ g. Execute tools in parallel
   │     Append tool_use blocks with results
   │
   ├─ h. Snapshot: "post_tools"
   │     Turn captured with tool results
   │
   └─ i. Loop back to (a) with updated Turn

5. Final turn persisted (if persister configured)

6. ExecutionHandle receives result
   Caller retrieves via handle.Wait()

This flow shows why snapshot phases are valuable for debugging: you can see the Turn at each critical moment and understand exactly what the model received and produced.

Provider-Specific Implementations

The factory automatically selects the correct provider based on configuration:

OpenAI Engine (Chat Completions)

# Configuration for OpenAI
api:
  openai:
    api_key: "your-openai-key"
    model: "gpt-4"
    base_url: "https://api.openai.com/v1"

OpenAI Responses Engine (Reasoning + Tools)

The OpenAI Responses API is supported via a dedicated engine package and is selected by setting ai-api-type to openai-responses. This engine streams reasoning summary ("thinking") and tool-call arguments in addition to normal output text deltas. Thinking text is emitted as EventThinkingPartial / partial-thinking; Delta is the latest increment and Completion is the accumulated reasoning text.

These examples use direct --ai-* flags because this page documents the low-level engine surface. In profile-first applications, prefer registry-backed profiles and bootstrap final runtime settings through the defaults/config/registries/profile playbook.

Key notes:

  • Use --ai-api-type=openai-responses and a reasoning-capable model (e.g., o4-mini).
  • The engine omits temperature and top_p for o3/o4 families (these models reject sampling params).
  • For function tools, the engine omits tool_choice (vendor-only values like file_search are not applicable).
  • User or system blocks that carry turns.PayloadKeyImages are serialized as mixed Responses content arrays with input_text plus one or more input_image parts.
  • Reasoning blocks replay payload.text as official Responses content: [{type: "reasoning_text", text: ...}], replay payload.summary as reasoning summaries, and replay payload.encrypted_content as encrypted reasoning continuation state.
  • Provider item IDs are stored in payload.item_id and are the only values replayed as Responses input[].id; local Block.ID is never used as a provider item ID.
  • Parsed Responses output items also carry OpenAI-specific block metadata such as openai_responses.response_id@v1, openai_responses.output_index@v1, openai_responses.item_type@v1, and openai_responses.status@v1.
  • The Responses token-count path (/responses/input_tokens) reuses the same request builder, so image-bearing turns are counted with the same request shape used for inference.
  • At trace log level, the engine prints a full YAML dump of the request payload to aid debugging; debug request previews redact provider IDs and encrypted reasoning blobs while still showing item types, summary counts, and content part types.

Example (multimodal turn construction):

turn := &turns.Turn{}
turns.AppendBlock(turn, turns.NewUserMultimodalBlock(
    "What's in this screenshot?",
    []map[string]any{{
        "media_type": "image/png",
        "url":        "https://example.com/screenshot.png",
        "detail":     "high",
    }},
))

Example (tools):

go run ./cmd/examples/advanced/openai-tools test-openai-tools \
  --ai-api-type=openai-responses \
  --ai-engine=o4-mini \
  --mode=tools \
  --prompt='Please use get_weather to check the weather in San Francisco, in celsius.' \
  --log-level trace --verbose

Example (thinking only):

go run ./cmd/examples/advanced/openai-tools test-openai-tools \
  --ai-api-type=openai-responses \
  --ai-engine=o4-mini \
  --mode=thinking \
  --prompt='Prove the sum of first n odd numbers equals n^2, stream reasoning summary.' \
  --log-level info

Structured Output Schema (OpenAI + Claude + OpenAI Responses)

The structured-output setting is intentionally owned by chat settings (--ai-structured-output-*) because provider request-shaping belongs at the engine layer, not in prompt text.

Again, these flags are the explicit engine interface. Application binaries that want a smaller public CLI should resolve them through config plus profile registries and pass the resulting StepSettings into engine construction.

Configuration surface:

  • --ai-structured-output-mode (off or json_schema, default: off)
  • --ai-structured-output-name (schema identifier)
  • --ai-structured-output-description (provider hint text)
  • --ai-structured-output-schema (JSON object string)
  • --ai-structured-output-strict (default: true)
  • --ai-structured-output-require-valid (default: false)

Provider mapping when mode=json_schema:

  • OpenAI Chat Completions: response_format: {type: "json_schema", json_schema: {...}}
  • OpenAI Responses: text.format: {type: "json_schema", ...}
  • Claude Messages: output_format: {type: "json_schema", name, schema}
  • Gemini: no provider-native schema mapping in this path yet

Field support differences:

  • OpenAI Chat and OpenAI Responses use: name, description, schema, strict.
  • Claude currently uses: name, schema (description/strict are not emitted).

Validation behavior:

  • --ai-structured-output-require-valid=true: invalid config fails the request.
  • --ai-structured-output-require-valid=false: invalid config is ignored, request continues as normal text output, and a warning is logged.

Turn-level note:

  • engine.KeyStructuredOutputConfig exists as a typed Turn.Data key for structured-output config.
  • Current provider request builders consume chat settings directly; this key is ready for per-turn override wiring.

Example (OpenAI Chat Completions):

go run ./cmd/examples/advanced/openai-tools test-openai-tools \
  --ai-api-type=openai \
  --ai-engine=gpt-4o-mini \
  --ai-structured-output-mode=json_schema \
  --ai-structured-output-name=person \
  --ai-structured-output-schema='{"type":"object","properties":{"name":{"type":"string"}},"required":["name"],"additionalProperties":false}' \
  --prompt='Return a person object with a name.'

Example (OpenAI Responses):

go run ./cmd/examples/advanced/openai-tools test-openai-tools \
  --ai-api-type=openai-responses \
  --ai-engine=o4-mini \
  --ai-structured-output-mode=json_schema \
  --ai-structured-output-name=person \
  --ai-structured-output-schema='{"type":"object","properties":{"name":{"type":"string"}},"required":["name"],"additionalProperties":false}' \
  --prompt='Return a person object with a name.'

Example (Claude):

go run ./cmd/examples/advanced/claude-tools main \
  --ai-api-type=claude \
  --ai-engine=claude-sonnet-4-20250514 \
  --ai-structured-output-mode=json_schema \
  --ai-structured-output-name=person \
  --ai-structured-output-schema='{"type":"object","properties":{"name":{"type":"string"}},"required":["name"],"additionalProperties":false}' \
  --prompt='Return a person object with a name.'

Claude Engine

# Configuration for Claude
api:
  claude:
    api_key: "your-claude-key"
    model: "claude-3-opus-20240229"
    base_url: "https://api.anthropic.com"

Gemini Engine

# Configuration for Gemini
api:
  gemini:
    api_key: "your-gemini-key"
    model: "gemini-pro"

Middleware and Cross-Cutting Concerns

Add middleware for logging, metrics, and other cross-cutting concerns:

import (
    "github.com/go-go-golems/geppetto/pkg/inference/session"
    "github.com/go-go-golems/geppetto/pkg/inference/toolloop/enginebuilder"
    "github.com/go-go-golems/geppetto/pkg/inference/middleware"
    "github.com/go-go-golems/geppetto/pkg/turns"
)

func addMiddleware(baseEngine engine.Engine) session.EngineBuilder {
    // Add logging middleware
    loggingMiddleware := func(next middleware.HandlerFunc) middleware.HandlerFunc {
        return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
            log.Info().Int("block_count", len(t.Blocks)).Msg("Starting inference")

            result, err := next(ctx, t)
            if err != nil {
                log.Error().Err(err).Msg("Inference failed")
            } else {
                log.Info().Int("result_count", len(result.Blocks)).Msg("Inference completed")
            }

            return result, err
        }
    }

    return enginebuilder.New(
        enginebuilder.WithBase(baseEngine),
        enginebuilder.WithMiddlewares(loggingMiddleware),
    )
}

Testing and Mocking

The simple engine interface makes testing straightforward:

type MockEngine struct{ add func(*turns.Turn) }

func (m *MockEngine) RunInference(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
    if m.add != nil { m.add(t) }
    return t, nil
}

Best Practices

When working with the inference engine architecture:

Engine Design

  • Keep engines focused: Only handle API communication
  • Use factories: Always create engines through the factory pattern
  • Provider agnostic: Write code that works with any provider
  • Error handling: Handle API-specific errors appropriately

Tool Calling

  • Use canonical surfaces: wire tools via toolloop.Loop + toolloop/enginebuilder (no toolhelpers)
  • Configure limits: Set reasonable iteration and timeout limits
  • Handle errors: Configure appropriate error handling strategies
  • Test tools: Test tool functions independently

Performance

  • Streaming: Use event sinks for real-time updates
  • Parallel execution: Allow parallel tool execution when possible
  • Caching: Consider caching for repeated operations
  • Timeouts: Set appropriate timeouts for all operations

Development

  • Mock engines: Use mock engines for testing
  • Logging: Add logging middleware for debugging
  • Configuration: Use parsed values for flexibility
  • Error handling: Implement comprehensive error handling

Debugging and Troubleshooting

Enable Debug Logging

import "github.com/rs/zerolog/log"

// Set log level to debug
log.Logger = log.Level(zerolog.DebugLevel)

Debug Tool Execution

Tool execution emits logs and (optionally) events depending on your wiring (engine event sinks on context.Context, and tool execution settings in tools.ToolConfig):

// Tool calling components can log detailed information about:
// - Tool call extraction
// - Tool execution steps
// - Result processing
// - Error handling

Event Monitoring

Monitor events for real-time debugging through the Watermill router:

// Add debug event handler that parses messages into events
router.AddHandler("debug", "chat", func(msg *message.Message) error {
    e, err := events.NewEventFromJson(msg.Payload)
    if err != nil { return err }
    log.Debug().Interface("event", e).Msg("Received event")
    msg.Ack()
    return nil
})

Conclusion

The Geppetto inference engine architecture provides a clean, testable, and provider-agnostic foundation for AI applications. By separating API communication (engines) from orchestration logic (tool loop + middleware), the architecture achieves:

  • Simplicity: Easy to understand and maintain
  • Flexibility: Works with any AI provider
  • Testability: Simple interfaces enable comprehensive testing
  • Composability: Mix and match components as needed

The combination of engines, tool loop, factories, and middleware provides all the tools needed to build sophisticated AI applications while maintaining clean separation of concerns.

See Also

  • Turns and Blocks — The Turn data model that engines operate on; see "How Blocks Accumulate"
  • Sessions — Multi-turn session management built on top of engines
  • Tools — Defining and executing tools
  • Events — How engines publish streaming events
  • Middlewares — Adding cross-cutting behavior; see "Middleware as Composable Prompting"
  • Structured Sinks — Extracting structured data from LLM text streams
  • Streaming Tutorial — Complete working example
  • Examples: geppetto/cmd/examples/streaming-inference/, geppetto/cmd/examples/advanced/generic-tool-calling/