Middlewares in Geppetto (Turn-based)

A practical guide to writing, composing, and using middlewares with Turn-based engines.

Sections

Terminology & Glossary
📖 Documentation
Navigation
58 sectionsv0.1
📄 Middlewares in Geppetto (Turn-based) — glaze help geppetto-middlewares
geppetto-middlewares

Middlewares in Geppetto (Turn-based)

A practical guide to writing, composing, and using middlewares with Turn-based engines.

Tutorialgeppettomiddlewaresturnsarchitecture

Middlewares in Geppetto (Turn-based)

Why Middlewares?

Middlewares let you add behavior around inference calls without modifying the engine itself. They're the standard pattern for:

  • Logging — Record every inference call with timing and block counts
  • Safety filters — Block harmful requests before they reach the provider
  • Tracing — Add correlation IDs for distributed tracing
  • Rate limiting — Throttle requests per user or globally

Middlewares compose cleanly: wrap an engine once, and all calls to RunInference pass through the chain.

Request → [Logging] → [Engine] → Response
                     ↓          ↓
                     [Logging] ←

Middleware as Composable Prompting

The examples above (logging, safety, tracing) are infrastructure middleware — they observe or gate inference for operational concerns. But the most powerful use of middleware in Geppetto is as composable prompting techniques.

Most LLM frameworks treat prompt construction as a single function that builds a string. If you want a system prompt, you concatenate it. If you want tool instructions, you concatenate more. If you want mode-specific guidance, you add more text. The result is a fragile, monolithic prompt builder.

Middleware inverts this: each prompting technique is a separate, composable wrapper that adds its contribution to the Turn. Real examples in the codebase:

MiddlewareWhat it doesType of change
System promptEnsures the correct system block exists; adds or replaces itBlock insertion/replacement
Tool reorderMoves tool_use blocks to sit adjacent to their tool_call blocksBlock reordering
Agent modeInjects mode-specific guidance blocks; parses model output for mode switchesBlock insertion + output parsing
SQLite toolRegisters a database query tool into the runtime registryConfiguration change (no text change)

Each technique is:

  • Independent — develop and test in isolation
  • Composable — stack with other techniques without interference
  • Observable — tags blocks with provenance metadata (Block.Metadata) for debugging

Not all middleware effects are visible as text changes. Some modify Turn configuration (Turn.Data), register tools, or emit events. A debugging UI must surface these "invisible" changes alongside content diffs.

What you'll learn

  • The middleware interface and how it composes
  • How to write middlewares that modify Turn content (not just log)
  • How to attach middlewares to engines

Core interfaces

package middleware

import "context"

type HandlerFunc func(ctx context.Context, t *turns.Turn) (*turns.Turn, error)
type Middleware  func(HandlerFunc) HandlerFunc

// Chain composes multiple middleware into a single HandlerFunc.
func Chain(handler HandlerFunc, middlewares ...Middleware) HandlerFunc { /* ... */ }

Conceptually, a middleware takes a HandlerFunc (the next step) and returns a new HandlerFunc that adds behavior before and/or after calling next.


Example: Logging middleware

logMw := func(next middleware.HandlerFunc) middleware.HandlerFunc {
    return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
        logger := log.With().Int("block_count", len(t.Blocks)).Logger()
        logger.Info().Msg("Starting inference")
        res, err := next(ctx, t)
        if err != nil {
            logger.Error().Err(err).Msg("Inference failed")
        } else {
            logger.Info().Int("result_block_count", len(res.Blocks)).Msg("Inference completed")
        }
        return res, err
    }
}

builder := enginebuilder.New(
    enginebuilder.WithBase(baseEngine),
    enginebuilder.WithMiddlewares(logMw),
)

Example: Block-mutating middleware (system prompt)

Unlike the logging example above, this middleware modifies the Turn's content before inference — it ensures a system block is always present with the correct text:

systemPromptMw := func(prompt string) middleware.Middleware {
    return func(next middleware.HandlerFunc) middleware.HandlerFunc {
        return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
            // Check if a system block already exists
            found := false
            for i, b := range t.Blocks {
                if b.Kind == turns.BlockKindSystem {
                    // Update existing system block
                    t.Blocks[i].Payload[turns.PayloadKeyText] = prompt
                    _ = turns.KeyBlockMetaMiddleware.Set(&t.Blocks[i].Metadata, "systemprompt")
                    found = true
                    break
                }
            }
            if !found {
                // Insert system block at the beginning
                block := turns.NewSystemTextBlock(prompt)
                _ = turns.KeyBlockMetaMiddleware.Set(&block.Metadata, "systemprompt")
                turns.PrependBlock(t, block)
            }
            return next(ctx, t)
        }
    }
}

Note how the middleware tags the block with KeyBlockMetaMiddleware — this records provenance (which middleware touched this block), enabling debugging tools to show attribution.


Example: Post-processing middleware (output parsing)

Middlewares can also inspect and act on the model's output after inference. This pattern is used by the agent-mode middleware to detect mode-switch signals in the model's response:

postProcessMw := func(next middleware.HandlerFunc) middleware.HandlerFunc {
    return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
        // Call the next handler (or engine) first
        result, err := next(ctx, t)
        if err != nil {
            return result, err
        }

        // Examine model output blocks
        for _, b := range result.Blocks {
            if b.Kind == turns.BlockKindLLMText {
                text, _ := b.Payload[turns.PayloadKeyText].(string)
                // Parse structured content from model output,
                // update Turn.Data, emit events, etc.
                _ = text
            }
        }
        return result, nil
    }
}

This two-phase capability (pre-processing + post-processing) is what makes middleware a full prompting technique rather than just a request filter.


Composing Multiple Middlewares

Middlewares run in the order they're provided:

e := baseEngine
builder := enginebuilder.New(
    enginebuilder.WithBase(e),
    enginebuilder.WithMiddlewares(logMw /*, sysPromptMw, ... */),
)
// Now: RunInference -> logMw -> engine

For convenience, pass them as a slice once:

builder := enginebuilder.New(
    enginebuilder.WithBase(baseEngine),
    enginebuilder.WithMiddlewares(logMw, safetyMw),
)
OrderMiddlewareWhy
1LoggingCapture all requests, including rejected ones
2Rate limitingBlock before expensive operations
3Safety filtersBlock before reaching provider
4Mode switchingSet context (e.g., agent mode) before provider call
5(Engine)The actual provider call

General principle: Middlewares that reject/filter go first; middlewares that modify/augment go last.


Guidance and best practices

  • Keep middlewares stateless when possible; prefer reading/writing on the provided *turns.Turn
  • Prefer structured Turn data (blocks + typed metadata keys) over parsing raw text when possible
  • Log with context (correlation IDs in Turn.Metadata), but avoid leaking sensitive data
  • Ensure the middleware chain always calls next unless you intend to short-circuit

Lessons learned

  • Prefer per-Turn data hints over global state: attach small hints on Turn.Data using typed keys (e.g., turns.KeyAgentMode from geppetto/pkg/turns, or application-specific keys from moments/backend/pkg/turnkeys) to guide downstream middlewares without tight coupling. Define keys in *_keys.go and reuse the canonical variables everywhere else.
  • Reuse shared parsers/utilities: use a central YAML fenced-block parser to reliably extract structured content from LLM output instead of ad-hoc regex.
  • Compose by concern: keep provider-specific logic in engines and cross-cutting concerns (logging, validation, mode switching) in middleware.
  • Make instructions explicit: when asking models to emit structured control output (like mode switches), provide a clear fenced YAML template and ask for long analysis when needed.

Profile-Scoped Middleware Configuration

In current app integrations, middleware selection and config are profile-scoped runtime data:

{
  "runtime": {
    "middlewares": [
      {
        "name": "agentmode",
        "id": "default",
        "config": {
          "default_mode": "financial_analyst"
        }
      }
    ]
  }
}

The profile controls:

  • middleware ordering,
  • per-instance identity (id),
  • enabled/disabled flags,
  • config payload values.

Write-Time Validation Model

Profile write APIs validate middleware entries before persistence:

  • unknown middleware names fail hard (400 + validation error),
  • config payloads are coerced and validated against middleware JSON schema,
  • invalid shape/types fail hard (400 + validation error with field path).

This avoids storing profile data that only fails later at compose-time.

Schema Discovery for Frontends

Schema catalogs can be exposed by app APIs:

  • GET /api/chat/schemas/middlewares returns middleware names + metadata + JSON schema payloads,
  • GET /api/chat/schemas/extensions returns typed extension keys + JSON schema payloads.

Middleware schema item contract:

{
  "name": "agentmode",
  "version": 1,
  "display_name": "Agent Mode",
  "description": "Inject mode guidance and parse mode switches.",
  "schema": {
    "type": "object",
    "properties": {
      "default_mode": { "type": "string" }
    }
  }
}

Extension schema item contract:

{
  "key": "middleware.agentmode_config@v1",
  "schema": {
    "type": "object",
    "properties": {
      "instances": {
        "type": "object",
        "additionalProperties": { "type": "object" }
      }
    },
    "required": ["instances"],
    "additionalProperties": false
  }
}

Important behavior:

  • middleware config is stored as profile extensions under typed keys (for example middleware.agentmode_config@v1),
  • extension schema discovery can include middleware-derived keys and codec-discovered keys,
  • explicit app-provided extension schemas win on duplicate keys.

Frontend editors can use these endpoints to build profile forms and validate payloads before app-owned persistence or registry export/import flows.


Troubleshooting

  • Middleware not running: ensure you’re using enginebuilder.New(... enginebuilder.WithMiddlewares(...)) (or that you’re applying middleware.Chain(...) in your own engine adapter).
  • Wrong ordering: remember middleware.Chain(m1, m2, m3) runs as m1(m2(m3(next))).
  • Nil Turn: most middleware should be defensive if t == nil (either treat as empty turn or error early).
  • validation error (runtime.middlewares[*].name): middleware definition is not registered in the application runtime definition registry.
  • validation error (runtime.middlewares[*].config): payload does not satisfy the middleware JSON schema. Fetch schema from /api/chat/schemas/middlewares and fix payload shape/types.

See Also

  • Inference Engines — The engines that middlewares wrap; see "Complete Runtime Flow"
  • Turns and Blocks — The Turn data model; see "How Blocks Accumulate"
  • Sessions — Multi-turn session management
  • Events — Event publishing from middlewares
  • Structured Sinks — How middleware and sinks compose for structured output extraction
  • Real-world examples: geppetto/pkg/inference/middleware/agentmode/, geppetto/pkg/inference/middleware/sqlitetool/