---
title: Minitrace Schema Reference
description: Field-by-field reference for the minitrace session JSON format
doc_version: 1
last_updated: 2026-07-02
---


Every converted session is a single JSON file conforming to the minitrace schema (currently `minitrace-v0.2.0`). This page documents every field, its type, and what it means.

The authoritative Go types live in `pkg/minitrace/schema.go`. The JSON field names match the struct tags exactly.

## Session (top level)

The session is the root object in each `.minitrace.json` file.

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique session identifier, usually the original session UUID |
| `schema_version` | string | Always `minitrace-v0.2.0` for current output |
| `profile` | string | Session profile, typically `organic` for real sessions |
| `scenario_id` | string? | Reserved for synthetic/benchmark scenarios |
| `quality` | string? | Quality tier: `A` (rich conversation + tool I/O + >10 tools + >5 turns), `B` (has conversation), `C` (no conversation) |
| `title` | string? | Auto-extracted from the first human turn (truncated to 80 chars) |
| `summary` | string? | Optional session summary |
| `classification` | string | Always `internal` for locally converted sessions |
| `provenance` | object | Where this session came from (see below) |
| `flags` | object | Data quality flags |
| `environment` | object | Model, framework, and tools configuration |
| `operational_context` | object | Working directory, git state, autonomy level |
| `timing` | object | Timestamps, duration, and time-of-day information |
| `condition` | object? | Experimental condition metadata |
| `coordination` | object | Multi-session coordination info |
| `handover` | object | Session handover documents |
| `turns` | array | Conversation turns in order |
| `tool_calls` | array | Every tool invocation with input and output |
| `outcome` | object? | Success/failure of the session |
| `annotations` | array | Human or automated annotations |
| `metrics` | object | Computed summary metrics |

## Provenance

Tracks where the session came from so you can trace any converted file back to its original source.

| Field | Type | Description |
|-------|------|-------------|
| `source_format` | string | Adapter-specific format identifier, e.g. `claude-code-jsonl-v2`, `pi-agent-jsonl-v3`, `pinocchio-turns-sqlite-v1` |
| `source_path` | string? | Path to the original file (home directory normalized to `~`) |
| `converted_at` | string | RFC 3339 timestamp of when conversion ran |
| `converter_version` | string | Converter identifier, e.g. `go-minitrace-claude-adapter-dev` |
| `original_session_id` | string? | The session ID in the original format |

## Flags

Data quality signals set during conversion. Most converters set `needs_cleaning: true` since raw sessions are not curated.

| Field | Type | Description |
|-------|------|-------------|
| `for_research` | bool | Whether this session is flagged for research use |
| `needs_cleaning` | bool | Whether the session needs manual review |
| `contains_error` | bool | Whether conversion detected errors in the session |
| `contains_pii` | bool | Whether file paths contain `/home/` or `/Users/` patterns |
| `category` | string[] | Free-form category tags |

## Environment

Captures the model and framework configuration for the session.

| Field | Type | Description |
|-------|------|-------------|
| `model` | string? | Model identifier, e.g. `claude-opus-4-6`, `gpt-5-nano` |
| `model_version` | string? | Specific model version if available |
| `temperature` | float? | Sampling temperature if known |
| `tools_enabled` | string[] | List of tool names available in this session |
| `system_prompt` | string? | System prompt if captured (often null for privacy) |
| `agent_framework` | string? | Framework name: `claude-code`, `codex`, `pi`, `pinocchio`, `claude-ai`, `chatgpt` |
| `agent_version` | string? | Framework version if available |
| `platform_type` | string? | Platform category: `agent`, `chat`, etc. |
| `provider_hint` | string? | API provider: `anthropic`, `openai`, `unknown` |

## Operational Context

Runtime context captured at session start. Availability depends on the source format.

| Field | Type | Description |
|-------|------|-------------|
| `working_directory` | string? | Filesystem path where the agent was running |
| `git_branch` | string? | Active git branch |
| `git_ref` | string? | Git commit reference |
| `autonomy_level` | string? | How autonomous the agent was |
| `sandbox` | bool? | Whether the session ran in a sandbox |
| `framework_config` | any? | Adapter-specific configuration blob, typically for raw session/runtime metadata that does not fit the shared schema |

## Timing

Temporal information about the session. The `privacy_level` field controls how much timing detail is exposed.

| Field | Type | Description |
|-------|------|-------------|
| `privacy_level` | string | Always `full` for locally converted sessions |
| `duration_seconds` | float? | Wall-clock duration from first to last event |
| `active_duration_seconds` | float? | Time excluding gaps longer than 5 minutes (the idle threshold) |
| `started_at` | string? | RFC 3339 start timestamp |
| `ended_at` | string? | RFC 3339 end timestamp |
| `hour_of_day` | int? | Hour (0–23) when the session started |
| `day_of_week` | int? | Day of week (0=Monday, 6=Sunday) |

## Turns

The conversation transcript as an ordered array. Each turn is one message from a participant.

| Field | Type | Description |
|-------|------|-------------|
| `index` | int | Zero-based position in the turn sequence |
| `timestamp` | string? | RFC 3339 timestamp of this turn |
| `role` | string | `user`, `assistant`, or `system` |
| `source` | string? | Origin detail: `human`, `tool_result`, etc. |
| `model` | string? | Model that generated this turn (assistant turns) |
| `content_type` | string? | MIME type hint for the content |
| `input_channel` | string? | How the input arrived |
| `content` | string | The actual message text |
| `framework_metadata` | any? | Adapter-specific turn metadata preserved from the raw transcript |
| `tool_calls_in_turn` | string[] | IDs of tool calls emitted by this turn |
| `thinking` | string? | Chain-of-thought / reasoning text if captured |
| `intent_markers` | object? | Whether this turn was requested, inferred, or proactive |
| `streaming` | object | Whether the turn was streamed (`was_streamed`, `stream_log`) |
| `usage` | object? | Per-turn token usage (see below) |

### Per-turn Usage

| Field | Type | Description |
|-------|------|-------------|
| `input_tokens` | int? | Tokens in the prompt |
| `output_tokens` | int? | Tokens in the response |
| `cache_read_tokens` | int? | Tokens served from cache |
| `cache_creation_tokens` | int? | Tokens that populated the cache |
| `reasoning_tokens` | int? | Reasoning/thinking tokens |
| `tool_tokens` | int? | Tokens consumed by tool use |

## Tool Calls

Every tool invocation is recorded with its input, output, and contextual position within the session.

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique tool call identifier |
| `emitting_turn_index` | int? | Index of the turn that triggered this call |
| `timestamp` | string? | When the tool was invoked |
| `tool_name` | string | Name of the tool, for example `read`, `edit`, `bash`, `write`, `grep`, `agent`. Exact naming and casing can vary by adapter, so prefer checking real data instead of assuming one canonical case. |
| `operation_type` | string | Normalized operation: `READ`, `MODIFY`, `NEW`, `EXECUTE`, `DELEGATE`, `OTHER` |
| `input` | object | Tool input (see below) |
| `output` | object | Tool output (see below) |
| `context` | object | Position and surrounding tool context |
| `framework_metadata` | any? | Adapter-specific tool call metadata preserved from the raw transcript |
| `spawned_agent` | object? | If this tool call delegated to a subagent |

### Tool Call Input

| Field | Type | Description |
|-------|------|-------------|
| `file_path` | string? | File path argument (normalized, `~` for home) |
| `command` | string? | Shell command if applicable |
| `justification` | string? | Tool-use rationale if the source transcript provides one |
| `arguments` | any? | Full arguments blob |

A practical querying note: `input.file_path` is the normalized shared field when the adapter can provide one, while `input.arguments` preserves the tool-specific raw payload. In SQL, that often means the safest file-oriented pattern is:

```sql
COALESCE(tc->'input'->>'file_path', tc->'input'->'arguments'->>'path')
```

Likewise, shell tools often use:

```sql
(tc->'input'->>'command')
```

and tools such as web search may expose their key values under `input.arguments`, for example:

```sql
(tc->'input'->'arguments'->>'query')
```

Do not assume every tool uses the same nested keys. When in doubt, inspect a bounded preview of one unnested tool call first.

### Tool Call Output

| Field | Type | Description |
|-------|------|-------------|
| `success` | bool | Whether the tool call succeeded |
| `result` | string? | Output text (truncated to 10 KB if larger) |
| `error` | string? | Error message if the call failed |
| `exit_code` | int? | Process exit code when the source transcript exposes one |
| `duration_ms` | int? | Execution time in milliseconds |
| `truncated` | bool | Whether the result was truncated |
| `full_bytes` | int? | Original size before truncation |
| `full_hash` | string? | SHA-256 hash of the full output (for deduplication) |
| `full_reference` | string? | External reference to the full output |
| `redacted` | bool? | Whether the output was redacted |
| `content_origin` | string? | Where the content came from |

## Framework-specific metadata conventions

Minitrace uses a small number of shared first-class fields plus three explicit escape hatches for source-specific detail:

- `operational_context.framework_config`
- `turns[].framework_metadata`
- `tool_calls[].framework_metadata`

Use these when a raw field is analytically useful but not yet stable enough to become shared schema. See `go-minitrace help framework-metadata-mappings` for the per-adapter mapping tables.

### Tool Call Context

| Field | Type | Description |
|-------|------|-------------|
| `position_in_session` | float? | Normalized position (0.0 = first tool call, 1.0 = last) |
| `tools_before` | string[] | Names of the 5 preceding tool calls |
| `time_since_last_user` | float? | Seconds since the last human turn |

### Spawned Agent

Present when a tool call delegates to a subagent (e.g., Claude Code's `Agent` tool).

| Field | Type | Description |
|-------|------|-------------|
| `agent_type` | string | Type of subagent |
| `task_scope` | string | What the subagent was asked to do |
| `sub_session_id` | string? | ID of the subagent's minitrace session |
| `outcome_summary` | string | What the subagent accomplished |

## Metrics

Computed summary statistics. These are calculated during conversion from the turns and tool calls.

| Field | Type | Description |
|-------|------|-------------|
| `turn_count` | int | Total number of turns |
| `tool_call_count` | int | Total number of tool invocations |
| `read_count` | int | Tool calls with operation_type `READ` |
| `modify_count` | int | Tool calls with operation_type `MODIFY` |
| `create_count` | int | Tool calls with operation_type `NEW` |
| `execute_count` | int | Tool calls with operation_type `EXECUTE` |
| `delegate_count` | int | Tool calls with operation_type `DELEGATE` |
| `read_ratio` | float? | `read_count / tool_call_count` — how much the agent reads before acting |
| `time_to_first_action` | float? | Seconds from session start to first tool call |
| `idle_ratio` | float? | `1 - (active_duration / total_duration)` — fraction of time spent idle |
| `total_input_tokens` | int? | Sum of input tokens across all turns |
| `total_output_tokens` | int? | Sum of output tokens across all turns |
| `total_cache_read_tokens` | int? | Sum of cache read tokens |
| `total_cache_creation_tokens` | int? | Sum of cache creation tokens |
| `total_reasoning_tokens` | int? | Sum of reasoning tokens |
| `total_tool_tokens` | int? | Sum of tool tokens |
| `session_cost` | float? | Estimated cost if computable |
| `subagent_count` | int | Number of subagent sessions spawned |
| `subagent_tool_calls` | int | Tool calls made by subagents |
| `model_switches` | int? | Times the model changed during the session |
| `unique_models` | int? | Distinct models used |
| `median_response_tokens` | int? | Median output tokens per assistant turn |
| `max_response_tokens` | int? | Maximum output tokens in any assistant turn |

## Annotations

Optional human or automated labels attached to a session, turn, or tool call.

At the file-format level, annotations live inside the session JSON as the `annotations` array. In SQL, this appears as the `annotations` column on `sessions_base`, and the normal query pattern is:

```sql
SELECT ...
FROM sessions_base,
     UNNEST(annotations) AS a(ann)
```

The annotation object has these fields:

| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Annotation identifier |
| `timestamp` | string | When the annotation was created |
| `annotator` | string | Who created it (human name or tool ID) |
| `scope.type` | string | What the annotation targets: `session`, `turn`, `tool_call` |
| `scope.target_id` | string | ID of the target |
| `content.category` | string | Annotation category |
| `content.tags` | string[] | Free-form tags |
| `content.title` | string | Short annotation title |
| `content.detail` | string | Full annotation text |
| `taxonomy_mappings` | object | Mappings to minitrace, MAST, and ToolEmu taxonomies |
| `classification` | string? | Annotation classification |

### Annotation scope semantics

The `scope` object is how an annotation is attached to a concrete thing in the transcript.

| `scope.type` | `scope.target_id` meaning |
|--------------|---------------------------|
| `session` | Usually the session ID itself |
| `turn` | The turn index as a string, for example `0` or `14` |
| `tool_call` | The tool-call ID, for example `call_Y70XEopD3Ef1mGctwTXG2CEq` |

This distinction matters in analysis because a session-level label answers a different question than a turn-level or tool-call-level label.

### Annotation categories

The current built-in categories are:

- `observation`
- `ai-failure`
- `user-error`
- `environment-issue`
- `success`
- `question`
- `to-discuss`
- `to-improve`

### Taxonomy mappings

`taxonomy_mappings` is an object containing arrays of codes from three different labeling systems:

| Field | Type | Meaning |
|-------|------|---------|
| `taxonomy_mappings.minitrace` | string[] | Minitrace taxonomy codes such as `F-AUT` |
| `taxonomy_mappings.mast` | string[] | MAST taxonomy codes |
| `taxonomy_mappings.toolemu` | string[] | ToolEmu taxonomy codes |

### Annotation query paths

These are the JSON paths you will most often use in SQL after `UNNEST(annotations)`:

| Path | Meaning |
|------|---------|
| `$.annotator` | annotation author |
| `$.scope.type` | annotation scope |
| `$.scope.target_id` | transcript target |
| `$.content.category` | primary label |
| `$.content.title` | short summary |
| `$.content.detail` | detailed note |
| `$.content.tags` | free-form tag array |
| `$.taxonomy_mappings.minitrace` | minitrace taxonomy array |
| `$.taxonomy_mappings.mast` | MAST taxonomy array |
| `$.taxonomy_mappings.toolemu` | ToolEmu taxonomy array |
| `$.classification` | classification level |

A compact SQL example:

```sql
SELECT
  id AS session_id,
  REPLACE(CAST(json_extract(ann, '$.scope.type') AS VARCHAR), '"', '') AS scope_type,
  REPLACE(CAST(json_extract(ann, '$.content.category') AS VARCHAR), '"', '') AS category,
  REPLACE(CAST(json_extract(ann, '$.content.title') AS VARCHAR), '"', '') AS title
FROM sessions_base,
     UNNEST(annotations) AS a(ann);
```

## Coordination

Multi-session coordination metadata.

| Field | Type | Description |
|-------|------|-------------|
| `project_id` | string? | Project this session belongs to |
| `predecessor_session` | string? | Previous session in a chain |
| `concurrent_sessions` | int? | How many sessions were running simultaneously |
| `human_attention` | string | Level of human oversight: `active`, `background`, `unknown` |

## Quality tiers

Quality is assigned automatically during conversion based on content richness:

- **A** — Has conversation turns, tool calls with output, more than 10 tool calls, and more than 5 turns
- **B** — Has conversation turns but does not meet the A threshold
- **C** — No conversation turns at all

## See also

- `go-minitrace help what-is-minitrace` — conceptual overview
- `go-minitrace help writing-duckdb-queries` — how to query these fields with SQL
- `go-minitrace help adapter-reference` — how each source format maps to this schema