A practical guide to writing, composing, and using middlewares with Turn-based engines.
Middlewares let you add behavior around inference calls without modifying the engine itself. They're the standard pattern for:
Middlewares compose cleanly: wrap an engine once, and all calls to RunInference pass through the chain.
Request → [Logging] → [Engine] → Response
↓ ↓
[Logging] ←
The examples above (logging, safety, tracing) are infrastructure middleware — they observe or gate inference for operational concerns. But the most powerful use of middleware in Geppetto is as composable prompting techniques.
Most LLM frameworks treat prompt construction as a single function that builds a string. If you want a system prompt, you concatenate it. If you want tool instructions, you concatenate more. If you want mode-specific guidance, you add more text. The result is a fragile, monolithic prompt builder.
Middleware inverts this: each prompting technique is a separate, composable wrapper that adds its contribution to the Turn. Real examples in the codebase:
| Middleware | What it does | Type of change |
|---|---|---|
| System prompt | Ensures the correct system block exists; adds or replaces it | Block insertion/replacement |
| Tool reorder | Moves tool_use blocks to sit adjacent to their tool_call blocks | Block reordering |
| Agent mode | Injects mode-specific guidance blocks; parses model output for mode switches | Block insertion + output parsing |
| SQLite tool | Registers a database query tool into the runtime registry | Configuration change (no text change) |
Each technique is:
Block.Metadata) for debuggingNot all middleware effects are visible as text changes. Some modify Turn configuration (Turn.Data), register tools, or emit events. A debugging UI must surface these "invisible" changes alongside content diffs.
package middleware
import "context"
type HandlerFunc func(ctx context.Context, t *turns.Turn) (*turns.Turn, error)
type Middleware func(HandlerFunc) HandlerFunc
// Chain composes multiple middleware into a single HandlerFunc.
func Chain(handler HandlerFunc, middlewares ...Middleware) HandlerFunc { /* ... */ }
Conceptually, a middleware takes a HandlerFunc (the next step) and returns a new HandlerFunc that adds behavior before and/or after calling next.
logMw := func(next middleware.HandlerFunc) middleware.HandlerFunc {
return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
logger := log.With().Int("block_count", len(t.Blocks)).Logger()
logger.Info().Msg("Starting inference")
res, err := next(ctx, t)
if err != nil {
logger.Error().Err(err).Msg("Inference failed")
} else {
logger.Info().Int("result_block_count", len(res.Blocks)).Msg("Inference completed")
}
return res, err
}
}
builder := enginebuilder.New(
enginebuilder.WithBase(baseEngine),
enginebuilder.WithMiddlewares(logMw),
)
Unlike the logging example above, this middleware modifies the Turn's content before inference — it ensures a system block is always present with the correct text:
systemPromptMw := func(prompt string) middleware.Middleware {
return func(next middleware.HandlerFunc) middleware.HandlerFunc {
return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
// Check if a system block already exists
found := false
for i, b := range t.Blocks {
if b.Kind == turns.BlockKindSystem {
// Update existing system block
t.Blocks[i].Payload[turns.PayloadKeyText] = prompt
_ = turns.KeyBlockMetaMiddleware.Set(&t.Blocks[i].Metadata, "systemprompt")
found = true
break
}
}
if !found {
// Insert system block at the beginning
block := turns.NewSystemTextBlock(prompt)
_ = turns.KeyBlockMetaMiddleware.Set(&block.Metadata, "systemprompt")
turns.PrependBlock(t, block)
}
return next(ctx, t)
}
}
}
Note how the middleware tags the block with KeyBlockMetaMiddleware — this records provenance (which middleware touched this block), enabling debugging tools to show attribution.
Middlewares can also inspect and act on the model's output after inference. This pattern is used by the agent-mode middleware to detect mode-switch signals in the model's response:
postProcessMw := func(next middleware.HandlerFunc) middleware.HandlerFunc {
return func(ctx context.Context, t *turns.Turn) (*turns.Turn, error) {
// Call the next handler (or engine) first
result, err := next(ctx, t)
if err != nil {
return result, err
}
// Examine model output blocks
for _, b := range result.Blocks {
if b.Kind == turns.BlockKindLLMText {
text, _ := b.Payload[turns.PayloadKeyText].(string)
// Parse structured content from model output,
// update Turn.Data, emit events, etc.
_ = text
}
}
return result, nil
}
}
This two-phase capability (pre-processing + post-processing) is what makes middleware a full prompting technique rather than just a request filter.
Middlewares run in the order they're provided:
e := baseEngine
builder := enginebuilder.New(
enginebuilder.WithBase(e),
enginebuilder.WithMiddlewares(logMw /*, sysPromptMw, ... */),
)
// Now: RunInference -> logMw -> engine
For convenience, pass them as a slice once:
builder := enginebuilder.New(
enginebuilder.WithBase(baseEngine),
enginebuilder.WithMiddlewares(logMw, safetyMw),
)
| Order | Middleware | Why |
|---|---|---|
| 1 | Logging | Capture all requests, including rejected ones |
| 2 | Rate limiting | Block before expensive operations |
| 3 | Safety filters | Block before reaching provider |
| 4 | Mode switching | Set context (e.g., agent mode) before provider call |
| 5 | (Engine) | The actual provider call |
General principle: Middlewares that reject/filter go first; middlewares that modify/augment go last.
*turns.TurnTurn.Metadata), but avoid leaking sensitive datanext unless you intend to short-circuitTurn.Data using typed keys (e.g., turns.KeyAgentMode from geppetto/pkg/turns, or application-specific keys from moments/backend/pkg/turnkeys) to guide downstream middlewares without tight coupling. Define keys in *_keys.go and reuse the canonical variables everywhere else.In current app integrations, middleware selection and config are profile-scoped runtime data:
{
"runtime": {
"middlewares": [
{
"name": "agentmode",
"id": "default",
"config": {
"default_mode": "financial_analyst"
}
}
]
}
}
The profile controls:
id),Profile write APIs validate middleware entries before persistence:
400 + validation error),400 + validation error with field path).This avoids storing profile data that only fails later at compose-time.
Schema catalogs can be exposed by app APIs:
GET /api/chat/schemas/middlewares returns middleware names + metadata + JSON schema payloads,GET /api/chat/schemas/extensions returns typed extension keys + JSON schema payloads.Middleware schema item contract:
{
"name": "agentmode",
"version": 1,
"display_name": "Agent Mode",
"description": "Inject mode guidance and parse mode switches.",
"schema": {
"type": "object",
"properties": {
"default_mode": { "type": "string" }
}
}
}
Extension schema item contract:
{
"key": "middleware.agentmode_config@v1",
"schema": {
"type": "object",
"properties": {
"instances": {
"type": "object",
"additionalProperties": { "type": "object" }
}
},
"required": ["instances"],
"additionalProperties": false
}
}
Important behavior:
middleware.agentmode_config@v1),Frontend editors can use these endpoints to build profile forms and validate payloads before app-owned persistence or registry export/import flows.
enginebuilder.New(... enginebuilder.WithMiddlewares(...)) (or that you’re applying middleware.Chain(...) in your own engine adapter).middleware.Chain(m1, m2, m3) runs as m1(m2(m3(next))).t == nil (either treat as empty turn or error early).validation error (runtime.middlewares[*].name): middleware definition is not registered in the application runtime definition registry.validation error (runtime.middlewares[*].config): payload does not satisfy the middleware JSON schema. Fetch schema from /api/chat/schemas/middlewares and fix payload shape/types.geppetto/pkg/inference/middleware/agentmode/, geppetto/pkg/inference/middleware/sqlitetool/