A comprehensive guide to the Turn-based inference engine architecture, covering engines, streaming, and provider implementations.
Audience: Developers familiar with Go who want to embed AI capabilities into their applications using Geppetto.<br/> Outcome: You will understand the architecture's core concepts and learn how to instantiate engines, orchestrate tool calls, and apply best practices in production.
// 1. Create an engine from configuration
engine, _ := factory.NewEngineFromParsedValues(parsedValues)
// 2. Build a Turn with your prompt
turn := &turns.Turn{}
turns.AppendBlock(turn, turns.NewSystemTextBlock("You are a helpful assistant."))
turns.AppendBlock(turn, turns.NewUserTextBlock("Hello!"))
// 3. Run inference
result, _ := engine.RunInference(ctx, turn)
// 4. Read the response
for _, block := range result.Blocks {
if block.Kind == turns.BlockKindLLMText {
fmt.Println(block.Payload[turns.PayloadKeyText])
}
}
That's it. The engine handles provider-specific API calls, streaming, and response parsing. You work with Turns and Blocks.
For chat-style, multi-turn apps, prefer session.Session (it centralizes “clone latest + append user
prompt” via AppendNewTurnFromUserPrompt(s) and runs inference against the latest appended turn
in-place via StartInference).
This tutorial explains the Turn-based inference architecture in Geppetto. Engines operate on a Turn (ordered Blocks plus metadata), handle provider I/O, and publish streaming events via sinks. Tool orchestration can be handled by middleware or helpers. The result is simpler, more testable, and provider-agnostic.
import (
"github.com/go-go-golems/geppetto/pkg/inference/engine"
"github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
"github.com/go-go-golems/geppetto/pkg/inference/middleware"
"github.com/go-go-golems/geppetto/pkg/turns"
)
The Geppetto inference architecture is built around a clean separation of concerns:
RunInference method on enginesGeppetto uses context.Context to carry runtime dependencies rather than global state or struct fields:
events.WithEventSinks(ctx, sink) — engines and middleware publish streaming events to sinks found on the context.tools.WithRegistry(ctx, registry) — engines discover available tools from the context.toolloop.WithTurnSnapshotHook(ctx, hook) — the tool loop invokes snapshot callbacks found on the context.This pattern avoids global state, makes testing straightforward (just pass a different context), and supports multiple parallel inference runs with independent configuration.
The heart of the architecture is the simple Engine interface (no explicit streaming method; streaming happens when sinks are configured on the engine):
import (
"context"
"github.com/go-go-golems/geppetto/pkg/inference/engine"
"github.com/go-go-golems/geppetto/pkg/turns"
)
// Engine processes a Turn and returns the updated Turn.
type Engine interface {
RunInference(ctx context.Context, t *turns.Turn) (*turns.Turn, error)
}
RunInference remains the core provider contract. In addition, Geppetto exposes a helper that returns a normalized inference outcome and persists it on the turn:
out, result, err := engine.RunInferenceWithResult(ctx, eng, turn)
if err != nil {
return err
}
_ = out
fmt.Println(result.StopReason, result.FinishClass, result.Truncated)
The helper guarantees a canonical metadata envelope on Turn.Metadata via turns.KeyTurnMetaInferenceResult. Engines that already persist canonical inference metadata keep their metadata normalized; bare custom engines that do not will still get a minimal synthesized result.
Engines focus solely on API communication:
llm_text, tool_call)Engines do NOT handle:
The factory pattern creates engines from parsed values, providing a provider-agnostic way to instantiate engines:
package main
import (
"context"
"fmt"
"github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
"github.com/go-go-golems/glazed/pkg/cmds/values"
)
func createEngine(parsedValues *values.Values) (engine.Engine, error) {
// Create engine from configuration - works with any provider
baseEngine, err := factory.NewEngineFromParsedValues(parsedValues)
if err != nil {
return nil, fmt.Errorf("failed to create engine: %w", err)
}
return baseEngine, nil
}
If you are building a library, a test harness, or an embedded tool and do not want the full CLI framework (clay, glazed, cobra), use factory.NewEngineFromStepSettings with a manually constructed StepSettings:
import (
"context"
"os"
"github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
"github.com/go-go-golems/geppetto/pkg/steps/ai/settings"
"github.com/go-go-golems/geppetto/pkg/steps/ai/types"
"github.com/go-go-golems/geppetto/pkg/turns"
)
func programmaticInference(ctx context.Context, prompt string) error {
apiType := types.ApiTypeOpenAI
model := "gpt-4o-mini"
temp := 0.7
stepSettings, err := settings.NewStepSettings()
if err != nil {
return err
}
stepSettings.Chat.ApiType = &apiType
stepSettings.Chat.Engine = &model
stepSettings.Chat.Temperature = &temp
stepSettings.Chat.APIKeys = map[string]string{
"openai-api-key": os.Getenv("OPENAI_API_KEY"),
}
eng, err := factory.NewEngineFromStepSettings(stepSettings)
if err != nil {
return err
}
seed := &turns.Turn{}
turns.AppendBlock(seed, turns.NewSystemTextBlock("You are a helpful assistant."))
turns.AppendBlock(seed, turns.NewUserTextBlock(prompt))
result, err := eng.RunInference(ctx, seed)
if err != nil {
return err
}
for _, block := range result.Blocks {
if block.Kind == turns.BlockKindLLMText {
if text, ok := block.Payload[turns.PayloadKeyText].(string); ok {
fmt.Println(text)
}
}
}
return nil
}
StepSettings ReferenceStepSettings is the central configuration object. It carries all provider-specific settings:
type StepSettings struct {
API *APISettings // API keys and base URLs for all providers
Chat *ChatSettings // Provider type, model, temperature, top-p, stop sequences
OpenAI *openai.Settings // OpenAI-specific: frequency/presence penalty, logit bias, reasoning
Client *ClientSettings // HTTP client: timeout, user agent
Claude *claude.Settings // Claude-specific: top-k, user ID
Gemini *gemini.Settings // Gemini-specific settings
Ollama *ollama.Settings // Ollama-specific settings
Embeddings *config.EmbeddingsConfig // Embeddings provider config
Inference *engine.InferenceConfig // Per-turn overrides: thinking budget, reasoning effort
}
The most important fields for basic usage:
Chat.ApiType — which provider to use: "openai", "openai-responses", "claude", "gemini" (see types.ApiType* constants)Chat.Engine — the model name, e.g. "gpt-4o-mini", "claude-sonnet-4-20250514", "gemini-pro"Chat.Temperature — sampling temperatureChat.APIKeys — map of API keys, keyed by provider prefix: "openai-api-key", "claude-api-key", "gemini-api-key"You can also load settings from YAML:
f, _ := os.Open("settings.yaml")
stepSettings, err := settings.NewStepSettingsFromYAML(f)
Claude:
apiType := types.ApiTypeClaude
model := "claude-sonnet-4-20250514"
stepSettings.Chat.ApiType = &apiType
stepSettings.Chat.Engine = &model
stepSettings.Chat.APIKeys = map[string]string{
"claude-api-key": os.Getenv("ANTHROPIC_API_KEY"),
}
Gemini:
apiType := types.ApiTypeGemini
model := "gemini-pro"
stepSettings.Chat.ApiType = &apiType
stepSettings.Chat.Engine = &model
stepSettings.Chat.APIKeys = map[string]string{
"gemini-api-key": os.Getenv("GOOGLE_API_KEY"),
}
Provider engines are created without options. Event sinks are attached to the runtime
context.Context:
func createEngine(parsedValues *values.Values) (engine.Engine, error) {
return factory.NewEngineFromParsedValues(parsedValues)
}
func runWithSinks(ctx context.Context, eng engine.Engine, sink events.EventSink, seed *turns.Turn) (*turns.Turn, error) {
runCtx := events.WithEventSinks(ctx, sink)
return eng.RunInference(runCtx, seed)
}
For simple text generation without tool calling:
import (
"context"
"fmt"
"github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
"github.com/go-go-golems/geppetto/pkg/turns"
)
func simpleInference(ctx context.Context, parsedValues *values.Values, prompt string) error {
e, err := factory.NewEngineFromParsedValues(parsedValues)
if err != nil { return fmt.Errorf("failed to create engine: %w", err) }
seed := &turns.Turn{}
turns.AppendBlock(seed, turns.NewSystemTextBlock("You are a helpful assistant."))
turns.AppendBlock(seed, turns.NewUserTextBlock(prompt))
updated, err := e.RunInference(ctx, seed)
if err != nil { return fmt.Errorf("inference failed: %w", err) }
for _, block := range updated.Blocks {
if block.Kind != turns.BlockKindLLMText {
continue
}
if text, ok := block.Payload[turns.PayloadKeyText].(string); ok {
fmt.Println(text)
}
}
return nil
}
Geppetto’s canonical tool orchestration lives in toolloop.Loop. Engines focus on provider I/O, while the loop handles extracting tool calls, executing tools, appending tool results, and iterating.
Providers learn about available tools from the runtime registry attached to context.Context (see tools.WithRegistry) plus any serializable tool config stored on Turn.Data (written automatically by the loop via engine.KeyToolConfig).
The main building blocks are:
toolblocks.ExtractPendingToolCalls: find tool calls without matching tool_use blockstools.NewDefaultToolExecutor: execute tool calls against a registrytoolblocks.AppendToolResultsBlocks: append tool_use blocks from resultstoolloop.Loop.RunLoop: complete automated Turn-based workflowFirst, create and register your tools:
package main
import (
"context"
"encoding/json"
"github.com/go-go-golems/geppetto/pkg/inference/tools"
)
// WeatherRequest represents the input for the weather tool
type WeatherRequest struct {
Location string `json:"location" jsonschema:"required,description=The city to get weather for"`
Units string `json:"units,omitempty" jsonschema:"description=Temperature units,default=celsius,enum=celsius,enum=fahrenheit"`
}
// WeatherResponse represents the weather tool's response
type WeatherResponse struct {
Location string `json:"location"`
Temperature float64 `json:"temperature"`
Conditions string `json:"conditions"`
Units string `json:"units"`
}
// weatherTool is a mock weather tool
func weatherTool(req WeatherRequest) WeatherResponse {
// Mock implementation
return WeatherResponse{
Location: req.Location,
Temperature: 22.0,
Conditions: "Sunny",
Units: req.Units,
}
}
func setupTools() (tools.ToolRegistry, error) {
// Create registry
registry := tools.NewInMemoryToolRegistry()
// Create tool definition from function
weatherToolDef, err := tools.NewToolFromFunc(
"get_weather",
"Get current weather information for a specific location",
weatherTool,
)
if err != nil {
return nil, err
}
// Register tool
err = registry.RegisterTool("get_weather", *weatherToolDef)
if err != nil {
return nil, err
}
return registry, nil
}
For fine-grained control, use the Turn helpers directly:
import (
"context"
"encoding/json"
"github.com/go-go-golems/geppetto/pkg/inference/engine"
"github.com/go-go-golems/geppetto/pkg/turns/toolblocks"
"github.com/go-go-golems/geppetto/pkg/inference/tools"
"github.com/go-go-golems/geppetto/pkg/turns"
)
func manualToolCalling(ctx context.Context, eng engine.Engine, registry tools.ToolRegistry, seed *turns.Turn) (*turns.Turn, error) {
// Run inference
updated, err := eng.RunInference(ctx, seed)
if err != nil {
return nil, err
}
calls := toolblocks.ExtractPendingToolCalls(updated)
if len(calls) == 0 {
return updated, nil
}
exec := tools.NewDefaultToolExecutor(tools.DefaultToolConfig())
var execCalls []tools.ToolCall
for _, call := range calls {
args, _ := json.Marshal(call.Arguments)
execCalls = append(execCalls, tools.ToolCall{
ID: call.ID, Name: call.Name, Arguments: args,
})
}
results, err := exec.ExecuteToolCalls(ctx, execCalls, registry)
if err != nil {
return nil, err
}
var shared []toolblocks.ToolResult
for _, res := range results {
if res == nil {
continue
}
content := ""
if res.Result != nil {
if b, err := json.Marshal(res.Result); err == nil {
content = string(b)
}
}
shared = append(shared, toolblocks.ToolResult{
ID: res.ID, Content: content, Error: res.Error,
})
}
toolblocks.AppendToolResultsBlocks(updated, shared)
return updated, nil
}
For most use cases, use the automated loop:
import (
"context"
"time"
"github.com/go-go-golems/geppetto/pkg/inference/engine"
"github.com/go-go-golems/geppetto/pkg/inference/toolloop"
"github.com/go-go-golems/geppetto/pkg/inference/tools"
"github.com/go-go-golems/geppetto/pkg/turns"
)
func automatedToolCalling(ctx context.Context, eng engine.Engine, registry tools.ToolRegistry, seed *turns.Turn) (*turns.Turn, error) {
loop := toolloop.New(
toolloop.WithEngine(eng),
toolloop.WithRegistry(registry),
toolloop.WithLoopConfig(toolloop.NewLoopConfig().WithMaxIterations(5)),
toolloop.WithToolConfig(tools.DefaultToolConfig().
WithExecutionTimeout(30*time.Second).
WithMaxParallelTools(3).
WithToolChoice(tools.ToolChoiceAuto)),
)
return loop.RunLoop(ctx, seed)
}
Here's a complete example showing tool calling with streaming events (engine emits start/partial/final). Tools are attached to the Turn instead of the Engine.
package main
import (
"context"
"fmt"
"io"
"time"
"github.com/go-go-golems/geppetto/pkg/events"
"github.com/go-go-golems/geppetto/pkg/inference/engine/factory"
"github.com/go-go-golems/geppetto/pkg/inference/middleware"
"github.com/go-go-golems/geppetto/pkg/inference/toolloop"
"github.com/go-go-golems/geppetto/pkg/inference/tools"
"github.com/go-go-golems/geppetto/pkg/turns"
"golang.org/x/sync/errgroup"
)
func completeToolCallingExample(ctx context.Context, parsedValues *values.Values, prompt string, w io.Writer) error {
// 1. Create event router for streaming
router, err := events.NewEventRouter()
if err != nil {
return fmt.Errorf("failed to create event router: %w", err)
}
defer router.Close()
// 2. Add console printer for events (or use structured printer)
// Handler signature is func(*message.Message) error
router.AddHandler("chat", "chat", events.StepPrinterFunc("", w))
// Alternative structured printer:
// printer := events.NewStructuredPrinter(w, events.PrinterOptions{Format: events.FormatText})
// router.AddHandler("chat", "chat", printer)
// 3. Create watermill sink for publishing events
watermillSink := middleware.NewWatermillSink(router.Publisher, "chat")
// 4. Create engine (sinks are attached at runtime via context)
baseEngine, err := factory.NewEngineFromParsedValues(parsedValues)
if err != nil {
return fmt.Errorf("failed to create engine: %w", err)
}
// 5. Set up tools
registry := tools.NewInMemoryToolRegistry()
// Register weather tool (from previous example)
weatherToolDef, err := tools.NewToolFromFunc(
"get_weather",
"Get current weather information for a specific location",
weatherTool,
)
if err != nil {
return fmt.Errorf("failed to create weather tool: %w", err)
}
err = registry.RegisterTool("get_weather", *weatherToolDef)
if err != nil {
return fmt.Errorf("failed to register weather tool: %w", err)
}
// 6. Build seed Turn
seed := &turns.Turn{Data: map[turns.TurnDataKey]any{}}
turns.AppendBlock(seed, turns.NewSystemTextBlock(
"You are a helpful assistant with access to weather information. Use the get_weather tool when users ask about weather.",
))
turns.AppendBlock(seed, turns.NewUserTextBlock(prompt))
// 7. Run inference with streaming in parallel
eg := errgroup.Group{}
ctx, cancel := context.WithCancel(ctx)
defer cancel()
// Start event router
eg.Go(func() error {
defer cancel()
return router.Run(ctx)
})
// Run inference with tool calling
eg.Go(func() error {
defer cancel()
<-router.Running() // Wait for router to be ready
// Attach engine sink to context so engine can stream
runCtx := events.WithEventSinks(ctx, watermillSink)
loop := toolloop.New(
toolloop.WithEngine(baseEngine),
toolloop.WithRegistry(registry),
toolloop.WithLoopConfig(toolloop.NewLoopConfig().WithMaxIterations(5)),
toolloop.WithToolConfig(tools.DefaultToolConfig().
WithExecutionTimeout(30*time.Second).
WithMaxParallelTools(3).
WithToolChoice(tools.ToolChoiceAuto)),
)
// Run complete tool calling workflow
updatedTurn, err := loop.RunLoop(runCtx, seed)
if err != nil {
return fmt.Errorf("tool calling failed: %w", err)
}
// Process final results
for _, block := range updatedTurn.Blocks {
if block.Kind == turns.BlockKindLLMText {
if text, ok := block.Payload[turns.PayloadKeyText].(string); ok {
fmt.Fprintln(w, text)
}
}
}
return nil
})
return eg.Wait()
}
Important: events.NewEventRouter() defaults to an in-memory gochannel pub/sub with publish→ACK blocking. For streaming UIs (many partial events) or slow handlers (DB/UI), you may need to configure buffering and disable publish blocking (see glaze help geppetto-events-streaming-watermill), or use an external transport.
The sections above describe individual components. Here is how they connect into a single request flow, from user prompt to final response, in a multi-turn application:
1. Session.AppendNewTurnFromUserPrompt("question")
├── Clones the latest turn (preserving full conversation history)
├── Appends a new user block
└── Assigns a new TurnID
2. Session.StartInference(ctx)
├── Creates an ExecutionHandle (tracks async result)
└── Launches goroutine with runner.RunInference()
3. Runner setup
├── Attaches event sinks to context
├── Sets SessionID and InferenceID on Turn metadata
└── Creates toolloop.Loop (if tool registry is present)
4. Tool loop iterates (up to maxIterations):
│
├─ a. Snapshot: "pre_inference"
│ Turn captured before any processing
│
├─ b. Middleware chain (pre-processing)
│ System prompt → agent mode → tool reorder → ...
│ Each middleware can inspect/mutate the Turn
│
├─ c. Engine.RunInference()
│ ├── Translates Turn blocks to provider wire format
│ ├── Calls LLM API
│ ├── Streams events: start → delta → delta → ...
│ └── Appends output blocks: llm_text, tool_call
│
├─ d. Middleware chain (post-processing)
│ Each middleware can inspect/mutate the result
│
├─ e. Snapshot: "post_inference"
│ Turn captured with model output
│
├─ f. Extract pending tool calls
│ If none: loop exits (done)
│
├─ g. Execute tools in parallel
│ Append tool_use blocks with results
│
├─ h. Snapshot: "post_tools"
│ Turn captured with tool results
│
└─ i. Loop back to (a) with updated Turn
5. Final turn persisted (if persister configured)
6. ExecutionHandle receives result
Caller retrieves via handle.Wait()
This flow shows why snapshot phases are valuable for debugging: you can see the Turn at each critical moment and understand exactly what the model received and produced.
The factory automatically selects the correct provider based on configuration:
# Configuration for OpenAI
api:
openai:
api_key: "your-openai-key"
model: "gpt-4"
base_url: "https://api.openai.com/v1"
The OpenAI Responses API is supported via a dedicated engine package and is selected by setting ai-api-type to openai-responses. This engine streams reasoning summary ("thinking") and tool-call arguments in addition to normal output text deltas. Thinking text is emitted as EventThinkingPartial / partial-thinking; Delta is the latest increment and Completion is the accumulated reasoning text.
These examples use direct --ai-* flags because this page documents the low-level engine surface. In profile-first applications, prefer registry-backed profiles and bootstrap final runtime settings through the defaults/config/registries/profile playbook.
Key notes:
--ai-api-type=openai-responses and a reasoning-capable model (e.g., o4-mini).temperature and top_p for o3/o4 families (these models reject sampling params).tool_choice (vendor-only values like file_search are not applicable).turns.PayloadKeyImages are serialized as mixed Responses content arrays with input_text plus one or more input_image parts.payload.text as official Responses content: [{type: "reasoning_text", text: ...}], replay payload.summary as reasoning summaries, and replay payload.encrypted_content as encrypted reasoning continuation state.payload.item_id and are the only values replayed as Responses input[].id; local Block.ID is never used as a provider item ID.openai_responses.response_id@v1, openai_responses.output_index@v1, openai_responses.item_type@v1, and openai_responses.status@v1./responses/input_tokens) reuses the same request builder, so image-bearing turns are counted with the same request shape used for inference.Example (multimodal turn construction):
turn := &turns.Turn{}
turns.AppendBlock(turn, turns.NewUserMultimodalBlock(
"What's in this screenshot?",
[]map[string]any{{
"media_type": "image/png",
"url": "https://example.com/screenshot.png",
"detail": "high",
}},
))
Example (tools):
go run ./cmd/examples/advanced/openai-tools test-openai-tools \
--ai-api-type=openai-responses \
--ai-engine=o4-mini \
--mode=tools \
--prompt='Please use get_weather to check the weather in San Francisco, in celsius.' \
--log-level trace --verbose
Example (thinking only):
go run ./cmd/examples/advanced/openai-tools test-openai-tools \
--ai-api-type=openai-responses \
--ai-engine=o4-mini \
--mode=thinking \
--prompt='Prove the sum of first n odd numbers equals n^2, stream reasoning summary.' \
--log-level info
The structured-output setting is intentionally owned by chat settings (--ai-structured-output-*) because provider request-shaping belongs at the engine layer, not in prompt text.
Again, these flags are the explicit engine interface. Application binaries that want a smaller public CLI should resolve them through config plus profile registries and pass the resulting StepSettings into engine construction.
Configuration surface:
--ai-structured-output-mode (off or json_schema, default: off)--ai-structured-output-name (schema identifier)--ai-structured-output-description (provider hint text)--ai-structured-output-schema (JSON object string)--ai-structured-output-strict (default: true)--ai-structured-output-require-valid (default: false)Provider mapping when mode=json_schema:
response_format: {type: "json_schema", json_schema: {...}}text.format: {type: "json_schema", ...}output_format: {type: "json_schema", name, schema}Field support differences:
name, description, schema, strict.name, schema (description/strict are not emitted).Validation behavior:
--ai-structured-output-require-valid=true: invalid config fails the request.--ai-structured-output-require-valid=false: invalid config is ignored, request continues as normal text output, and a warning is logged.Turn-level note:
engine.KeyStructuredOutputConfig exists as a typed Turn.Data key for structured-output config.Example (OpenAI Chat Completions):
go run ./cmd/examples/advanced/openai-tools test-openai-tools \
--ai-api-type=openai \
--ai-engine=gpt-4o-mini \
--ai-structured-output-mode=json_schema \
--ai-structured-output-name=person \
--ai-structured-output-schema='{"type":"object","properties":{"name":{"type":"string"}},"required":["name"],"additionalProperties":false}' \
--prompt='Return a person object with a name.'
Example (OpenAI Responses):
go run ./cmd/examples/advanced/openai-tools test-openai-tools \
--ai-api-type=openai-responses \
--ai-engine=o4-mini \
--ai-structured-output-mode=json_schema \
--ai-structured-output-name=person \
--ai-structured-output-schema='{"type":"object","properties":{"name":{"type":"string"}},"required":["name"],"additionalProperties":false}' \
--prompt='Return a person object with a name.'
Example (Claude):
go run ./cmd/examples/advanced/claude-tools main \
--ai-api-type=claude \
--ai-engine=claude-sonnet-4-20250514 \
--ai-structured-output-mode=json_schema \
--ai-structured-output-name=person \
--ai-structured-output-schema='{"type":"object","properties":{"name":{"type":"string"}},"required":["name"],"additionalProperties":false}' \
--prompt='Return a person object with a name.'
# Configuration for Claude
api:
claude:
api_key: "your-claude-key"
model: "claude-3-opus-20240229"
base_url: "https://api.anthropic.com"
# Configuration for Gemini
api:
gemini:
api_key: "your-gemini-key"
model: "gemini-pro"
Add middleware for logging, metrics, and other cross-cutting concerns:
import (
"github.com/go-go-golems/geppetto/pkg/inference/session"
"github.com/go-go-golems/geppetto/pkg/inference/toolloop/enginebuilder"
"github.com/go-go-golems/geppetto/pkg/inference/middleware"
"github.com/go-go-golems/geppetto/pkg/turns"
)
func addMiddleware(baseEngine engine.Engine) session.EngineBuilder {
// Add logging middleware
loggingMiddleware := func(next middleware.HandlerFunc) middleware.HandlerFunc {
return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
log.Info().Int("block_count", len(t.Blocks)).Msg("Starting inference")
result, err := next(ctx, t)
if err != nil {
log.Error().Err(err).Msg("Inference failed")
} else {
log.Info().Int("result_count", len(result.Blocks)).Msg("Inference completed")
}
return result, err
}
}
return enginebuilder.New(
enginebuilder.WithBase(baseEngine),
enginebuilder.WithMiddlewares(loggingMiddleware),
)
}
The simple engine interface makes testing straightforward:
type MockEngine struct{ add func(*turns.Turn) }
func (m *MockEngine) RunInference(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
if m.add != nil { m.add(t) }
return t, nil
}
When working with the inference engine architecture:
toolloop.Loop + toolloop/enginebuilder (no toolhelpers)import "github.com/rs/zerolog/log"
// Set log level to debug
log.Logger = log.Level(zerolog.DebugLevel)
Tool execution emits logs and (optionally) events depending on your wiring (engine event sinks on context.Context, and tool execution settings in tools.ToolConfig):
// Tool calling components can log detailed information about:
// - Tool call extraction
// - Tool execution steps
// - Result processing
// - Error handling
Monitor events for real-time debugging through the Watermill router:
// Add debug event handler that parses messages into events
router.AddHandler("debug", "chat", func(msg *message.Message) error {
e, err := events.NewEventFromJson(msg.Payload)
if err != nil { return err }
log.Debug().Interface("event", e).Msg("Received event")
msg.Ack()
return nil
})
The Geppetto inference engine architecture provides a clean, testable, and provider-agnostic foundation for AI applications. By separating API communication (engines) from orchestration logic (tool loop + middleware), the architecture achieves:
The combination of engines, tool loop, factories, and middleware provides all the tools needed to build sophisticated AI applications while maintaining clean separation of concerns.
geppetto/cmd/examples/streaming-inference/, geppetto/cmd/examples/advanced/generic-tool-calling/