Session Management in Geppetto

Managing multi-turn interactions with session.Session — turn history, inference lifecycle, and async execution.

Sections

Terminology & Glossary
📖 Documentation
Navigation
54 sectionsv0.1
📄 Session Management in Geppetto — glaze help geppetto-sessions
geppetto-sessions

Session Management in Geppetto

Managing multi-turn interactions with session.Session — turn history, inference lifecycle, and async execution.

Tutorialgeppettosessionsturnsarchitecture

Session Management in Geppetto

Why Sessions?

A single inference call processes one Turn. But most applications involve multiple exchanges: the user asks a question, the model responds, the user asks a follow-up, and so on. Sessions manage this multi-turn lifecycle.

A Session provides:

  • Turn history — an append-only list of Turn snapshots, each a complete record of one inference cycle.
  • Safe turn creation — clones the latest Turn and appends the new user prompt, preventing accidental mutation of historical snapshots.
  • Exclusive inference — only one inference runs at a time per session, enforced by mutex.
  • Async execution — inference runs in a goroutine; callers get an ExecutionHandle to wait or cancel.

Core Concepts

The Session Struct

type Session struct {
    SessionID string          // Stable identifier for this session
    Turns     []*turns.Turn   // Append-only history of turn snapshots
    Builder   EngineBuilder   // Creates inference runners (wires engine, middleware, tools, etc.)
}

SessionID is auto-generated (UUID) when you create a session. Turns grows as the conversation progresses. Builder is the bridge to the inference pipeline — it produces a runner that knows how to execute inference with the right engine, middleware, event sinks, and tools.

How Turns Grow

Each new Turn starts as a clone of the previous Turn's final state, with the new user prompt appended:

Turn 1 (seed):            [system, user₁]
Turn 1 (after inference): [system, user₁, llm_text₁]

Turn 2 = clone(Turn 1) + user₂:
Turn 2 (seed):            [system, user₁, llm_text₁, user₂]
Turn 2 (after inference): [system, user₁, llm_text₁, user₂, tool_call, tool_use, llm_text₂]

This means every Turn is a complete snapshot — you can examine any Turn in isolation and see the full context the model had.

The ExecutionHandle

When you start inference, you get back an ExecutionHandle immediately (inference runs asynchronously):

type ExecutionHandle struct {
    SessionID   string
    InferenceID string
    Input       *turns.Turn
}

Use it to:

  • Wait: result, err := handle.Wait() — blocks until inference completes.
  • Cancel: handle.Cancel() — cancels the in-flight inference via context cancellation.
  • Check: handle.IsRunning() — non-blocking check.

Basic Usage

import "github.com/go-go-golems/geppetto/pkg/inference/session"

// 1. Create a session
sess := session.NewSession()
sess.Builder = myEngineBuilder // see EngineBuilder below

// 2. Add the first user prompt (creates the seed Turn)
turn, err := sess.AppendNewTurnFromUserPrompt("What's the weather in Paris?")

// 3. Run inference
handle, err := sess.StartInference(ctx)
if err != nil {
    // ErrSessionAlreadyActive if another inference is running
    // ErrSessionEmptyTurn if the turn has no blocks
}

// 4. Wait for the result
result, err := handle.Wait()
// result is the completed Turn (same pointer as sess.Latest())

// 5. Continue the conversation
turn2, _ := sess.AppendNewTurnFromUserPrompt("What about tomorrow?")
handle2, _ := sess.StartInference(ctx)
result2, _ := handle2.Wait()

Multiple Prompts

You can append multiple user prompts at once (useful for multi-modal input or batch scenarios):

turn, err := sess.AppendNewTurnFromUserPrompts("Summarize this:", "Also extract key dates.")

The EngineBuilder Interface

The Session delegates all inference pipeline construction to an EngineBuilder:

type EngineBuilder interface {
    Build(ctx context.Context, sessionID string) (InferenceRunner, error)
}

type InferenceRunner interface {
    RunInference(ctx context.Context, t *turns.Turn) (*turns.Turn, error)
}

The Builder is responsible for wiring together:

  • The inference engine (OpenAI, Claude, Gemini, etc.)
  • Middleware chain (system prompt, agent mode, tool reorder, etc.)
  • Event sinks (for streaming events)
  • Tool registry (for tool calling)
  • Snapshot hooks (for debugging)
  • Persistence (for storing completed turns)

The canonical implementation is enginebuilder.Builder from geppetto/pkg/inference/toolloop/enginebuilder/.

How StartInference Works

Understanding the internal flow helps with debugging:

  1. Validates session state (not nil, has turns, no active inference).
  2. Sets metadata on the latest Turn: SessionID, InferenceID, TurnID.
  3. Calls Builder.Build() to create an InferenceRunner.
  4. Creates an ExecutionHandle with a cancellable context.
  5. Launches a goroutine that calls runner.RunInference(ctx, turn).
  6. On completion: stores the result in the handle, clears the active state.

The latest Turn in sess.Turns is mutated in place by the runner and middleware. This is intentional: middleware modifications (system prompt updates, block reordering, tool results) become part of the Turn's final state and serve as the base for the next Turn.

Cancellation

// Cancel via the handle
handle.Cancel()

// Or cancel via the session (cancels whatever is active)
err := sess.CancelActive()

Cancellation propagates via context.Context: the engine, tool loop, and middleware all see the cancellation and can clean up.

Concurrency Model

  • One active inference at a time. If you call StartInference() while another is running, you get ErrSessionAlreadyActive.
  • Thread-safe. All Session methods are protected by a mutex. Multiple goroutines can safely call AppendNewTurnFromUserPrompt and StartInference (though only one inference runs at a time).
  • Wait from any goroutine. Multiple callers can Wait() on the same handle — they all receive the same result.

Error Handling

ErrorWhenWhat to do
ErrSessionNilSession pointer is nilCheck initialization
ErrSessionNoIDSessionID is emptyEnsure NewSession() was used
ErrSessionAlreadyActiveInference already runningWait for current inference or cancel it
ErrSessionEmptyTurnLatest turn has no blocksAppend user prompt before starting inference
ErrSessionNoBuilderBuilder is nilSet sess.Builder before calling StartInference

Packages

import (
    "github.com/go-go-golems/geppetto/pkg/inference/session"            // Session, ExecutionHandle
    "github.com/go-go-golems/geppetto/pkg/inference/toolloop/enginebuilder" // Canonical EngineBuilder
    "github.com/go-go-golems/geppetto/pkg/turns"                        // Turn, Block types
)

See Also

  • Turns and Blocks — The Turn data model that sessions manage
  • Inference Engines — How engines process Turns; see "Complete Runtime Flow"
  • Middlewares — Middleware applied during inference
  • Events — Streaming events emitted during inference
  • Implementation: geppetto/pkg/inference/session/session.go