Minitrace Schema Reference

Field-by-field reference for the minitrace session JSON format

Sections

Terminology & Glossary
📖 Documentation
Navigation
20 sectionsv0.1
📄 Minitrace Schema Reference — glaze help minitrace-schema
minitrace-schema

Minitrace Schema Reference

Field-by-field reference for the minitrace session JSON format

Topicminitraceschema

Every converted session is a single JSON file conforming to the minitrace schema (currently minitrace-v0.2.0). This page documents every field, its type, and what it means.

The authoritative Go types live in pkg/minitrace/schema.go. The JSON field names match the struct tags exactly.

Session (top level)

The session is the root object in each .minitrace.json file.

FieldTypeDescription
idstringUnique session identifier, usually the original session UUID
schema_versionstringAlways minitrace-v0.2.0 for current output
profilestringSession profile, typically organic for real sessions
scenario_idstring?Reserved for synthetic/benchmark scenarios
qualitystring?Quality tier: A (rich conversation + tool I/O + >10 tools + >5 turns), B (has conversation), C (no conversation)
titlestring?Auto-extracted from the first human turn (truncated to 80 chars)
summarystring?Optional session summary
classificationstringAlways internal for locally converted sessions
provenanceobjectWhere this session came from (see below)
flagsobjectData quality flags
environmentobjectModel, framework, and tools configuration
operational_contextobjectWorking directory, git state, autonomy level
timingobjectTimestamps, duration, and time-of-day information
conditionobject?Experimental condition metadata
coordinationobjectMulti-session coordination info
handoverobjectSession handover documents
turnsarrayConversation turns in order
tool_callsarrayEvery tool invocation with input and output
outcomeobject?Success/failure of the session
annotationsarrayHuman or automated annotations
metricsobjectComputed summary metrics

Provenance

Tracks where the session came from so you can trace any converted file back to its original source.

FieldTypeDescription
source_formatstringAdapter-specific format identifier, e.g. claude-code-jsonl-v2, pi-agent-jsonl-v3, pinocchio-turns-sqlite-v1
source_pathstring?Path to the original file (home directory normalized to ~)
converted_atstringRFC 3339 timestamp of when conversion ran
converter_versionstringConverter identifier, e.g. go-minitrace-claude-adapter-dev
original_session_idstring?The session ID in the original format

Flags

Data quality signals set during conversion. Most converters set needs_cleaning: true since raw sessions are not curated.

FieldTypeDescription
for_researchboolWhether this session is flagged for research use
needs_cleaningboolWhether the session needs manual review
contains_errorboolWhether conversion detected errors in the session
contains_piiboolWhether file paths contain /home/ or /Users/ patterns
categorystring[]Free-form category tags

Environment

Captures the model and framework configuration for the session.

FieldTypeDescription
modelstring?Model identifier, e.g. claude-opus-4-6, gpt-5-nano
model_versionstring?Specific model version if available
temperaturefloat?Sampling temperature if known
tools_enabledstring[]List of tool names available in this session
system_promptstring?System prompt if captured (often null for privacy)
agent_frameworkstring?Framework name: claude-code, codex, pi, pinocchio, claude-ai, chatgpt
agent_versionstring?Framework version if available
platform_typestring?Platform category: agent, chat, etc.
provider_hintstring?API provider: anthropic, openai, unknown

Operational Context

Runtime context captured at session start. Availability depends on the source format.

FieldTypeDescription
working_directorystring?Filesystem path where the agent was running
git_branchstring?Active git branch
git_refstring?Git commit reference
autonomy_levelstring?How autonomous the agent was
sandboxbool?Whether the session ran in a sandbox
framework_configany?Adapter-specific configuration blob, typically for raw session/runtime metadata that does not fit the shared schema

Timing

Temporal information about the session. The privacy_level field controls how much timing detail is exposed.

FieldTypeDescription
privacy_levelstringAlways full for locally converted sessions
duration_secondsfloat?Wall-clock duration from first to last event
active_duration_secondsfloat?Time excluding gaps longer than 5 minutes (the idle threshold)
started_atstring?RFC 3339 start timestamp
ended_atstring?RFC 3339 end timestamp
hour_of_dayint?Hour (0–23) when the session started
day_of_weekint?Day of week (0=Monday, 6=Sunday)

Turns

The conversation transcript as an ordered array. Each turn is one message from a participant.

FieldTypeDescription
indexintZero-based position in the turn sequence
timestampstring?RFC 3339 timestamp of this turn
rolestringuser, assistant, or system
sourcestring?Origin detail: human, tool_result, etc.
modelstring?Model that generated this turn (assistant turns)
content_typestring?MIME type hint for the content
input_channelstring?How the input arrived
contentstringThe actual message text
framework_metadataany?Adapter-specific turn metadata preserved from the raw transcript
tool_calls_in_turnstring[]IDs of tool calls emitted by this turn
thinkingstring?Chain-of-thought / reasoning text if captured
intent_markersobject?Whether this turn was requested, inferred, or proactive
streamingobjectWhether the turn was streamed (was_streamed, stream_log)
usageobject?Per-turn token usage (see below)

Per-turn Usage

FieldTypeDescription
input_tokensint?Tokens in the prompt
output_tokensint?Tokens in the response
cache_read_tokensint?Tokens served from cache
cache_creation_tokensint?Tokens that populated the cache
reasoning_tokensint?Reasoning/thinking tokens
tool_tokensint?Tokens consumed by tool use

Tool Calls

Every tool invocation is recorded with its input, output, and contextual position within the session.

FieldTypeDescription
idstringUnique tool call identifier
emitting_turn_indexint?Index of the turn that triggered this call
timestampstring?When the tool was invoked
tool_namestringName of the tool, for example read, edit, bash, write, grep, agent. Exact naming and casing can vary by adapter, so prefer checking real data instead of assuming one canonical case.
operation_typestringNormalized operation: READ, MODIFY, NEW, EXECUTE, DELEGATE, OTHER
inputobjectTool input (see below)
outputobjectTool output (see below)
contextobjectPosition and surrounding tool context
framework_metadataany?Adapter-specific tool call metadata preserved from the raw transcript
spawned_agentobject?If this tool call delegated to a subagent

Tool Call Input

FieldTypeDescription
file_pathstring?File path argument (normalized, ~ for home)
commandstring?Shell command if applicable
justificationstring?Tool-use rationale if the source transcript provides one
argumentsany?Full arguments blob

A practical querying note: input.file_path is the normalized shared field when the adapter can provide one, while input.arguments preserves the tool-specific raw payload. In SQL, that often means the safest file-oriented pattern is:

COALESCE(tc->'input'->>'file_path', tc->'input'->'arguments'->>'path')

Likewise, shell tools often use:

(tc->'input'->>'command')

and tools such as web search may expose their key values under input.arguments, for example:

(tc->'input'->'arguments'->>'query')

Do not assume every tool uses the same nested keys. When in doubt, inspect a bounded preview of one unnested tool call first.

Tool Call Output

FieldTypeDescription
successboolWhether the tool call succeeded
resultstring?Output text (truncated to 10 KB if larger)
errorstring?Error message if the call failed
exit_codeint?Process exit code when the source transcript exposes one
duration_msint?Execution time in milliseconds
truncatedboolWhether the result was truncated
full_bytesint?Original size before truncation
full_hashstring?SHA-256 hash of the full output (for deduplication)
full_referencestring?External reference to the full output
redactedbool?Whether the output was redacted
content_originstring?Where the content came from

Framework-specific metadata conventions

Minitrace uses a small number of shared first-class fields plus three explicit escape hatches for source-specific detail:

  • operational_context.framework_config
  • turns[].framework_metadata
  • tool_calls[].framework_metadata

Use these when a raw field is analytically useful but not yet stable enough to become shared schema. See go-minitrace help framework-metadata-mappings for the per-adapter mapping tables.

Tool Call Context

FieldTypeDescription
position_in_sessionfloat?Normalized position (0.0 = first tool call, 1.0 = last)
tools_beforestring[]Names of the 5 preceding tool calls
time_since_last_userfloat?Seconds since the last human turn

Spawned Agent

Present when a tool call delegates to a subagent (e.g., Claude Code's Agent tool).

FieldTypeDescription
agent_typestringType of subagent
task_scopestringWhat the subagent was asked to do
sub_session_idstring?ID of the subagent's minitrace session
outcome_summarystringWhat the subagent accomplished

Metrics

Computed summary statistics. These are calculated during conversion from the turns and tool calls.

FieldTypeDescription
turn_countintTotal number of turns
tool_call_countintTotal number of tool invocations
read_countintTool calls with operation_type READ
modify_countintTool calls with operation_type MODIFY
create_countintTool calls with operation_type NEW
execute_countintTool calls with operation_type EXECUTE
delegate_countintTool calls with operation_type DELEGATE
read_ratiofloat?read_count / tool_call_count — how much the agent reads before acting
time_to_first_actionfloat?Seconds from session start to first tool call
idle_ratiofloat?1 - (active_duration / total_duration) — fraction of time spent idle
total_input_tokensint?Sum of input tokens across all turns
total_output_tokensint?Sum of output tokens across all turns
total_cache_read_tokensint?Sum of cache read tokens
total_cache_creation_tokensint?Sum of cache creation tokens
total_reasoning_tokensint?Sum of reasoning tokens
total_tool_tokensint?Sum of tool tokens
session_costfloat?Estimated cost if computable
subagent_countintNumber of subagent sessions spawned
subagent_tool_callsintTool calls made by subagents
model_switchesint?Times the model changed during the session
unique_modelsint?Distinct models used
median_response_tokensint?Median output tokens per assistant turn
max_response_tokensint?Maximum output tokens in any assistant turn

Annotations

Optional human or automated labels attached to a session, turn, or tool call.

At the file-format level, annotations live inside the session JSON as the annotations array. In SQL, this appears as the annotations column on sessions_base, and the normal query pattern is:

SELECT ...
FROM sessions_base,
     UNNEST(annotations) AS a(ann)

The annotation object has these fields:

FieldTypeDescription
idstringAnnotation identifier
timestampstringWhen the annotation was created
annotatorstringWho created it (human name or tool ID)
scope.typestringWhat the annotation targets: session, turn, tool_call
scope.target_idstringID of the target
content.categorystringAnnotation category
content.tagsstring[]Free-form tags
content.titlestringShort annotation title
content.detailstringFull annotation text
taxonomy_mappingsobjectMappings to minitrace, MAST, and ToolEmu taxonomies
classificationstring?Annotation classification

Annotation scope semantics

The scope object is how an annotation is attached to a concrete thing in the transcript.

scope.typescope.target_id meaning
sessionUsually the session ID itself
turnThe turn index as a string, for example 0 or 14
tool_callThe tool-call ID, for example call_Y70XEopD3Ef1mGctwTXG2CEq

This distinction matters in analysis because a session-level label answers a different question than a turn-level or tool-call-level label.

Annotation categories

The current built-in categories are:

  • observation
  • ai-failure
  • user-error
  • environment-issue
  • success
  • question
  • to-discuss
  • to-improve

Taxonomy mappings

taxonomy_mappings is an object containing arrays of codes from three different labeling systems:

FieldTypeMeaning
taxonomy_mappings.minitracestring[]Minitrace taxonomy codes such as F-AUT
taxonomy_mappings.maststring[]MAST taxonomy codes
taxonomy_mappings.toolemustring[]ToolEmu taxonomy codes

Annotation query paths

These are the JSON paths you will most often use in SQL after UNNEST(annotations):

PathMeaning
$.annotatorannotation author
$.scope.typeannotation scope
$.scope.target_idtranscript target
$.content.categoryprimary label
$.content.titleshort summary
$.content.detaildetailed note
$.content.tagsfree-form tag array
$.taxonomy_mappings.minitraceminitrace taxonomy array
$.taxonomy_mappings.mastMAST taxonomy array
$.taxonomy_mappings.toolemuToolEmu taxonomy array
$.classificationclassification level

A compact SQL example:

SELECT
  id AS session_id,
  REPLACE(CAST(json_extract(ann, '$.scope.type') AS VARCHAR), '"', '') AS scope_type,
  REPLACE(CAST(json_extract(ann, '$.content.category') AS VARCHAR), '"', '') AS category,
  REPLACE(CAST(json_extract(ann, '$.content.title') AS VARCHAR), '"', '') AS title
FROM sessions_base,
     UNNEST(annotations) AS a(ann);

Coordination

Multi-session coordination metadata.

FieldTypeDescription
project_idstring?Project this session belongs to
predecessor_sessionstring?Previous session in a chain
concurrent_sessionsint?How many sessions were running simultaneously
human_attentionstringLevel of human oversight: active, background, unknown

Quality tiers

Quality is assigned automatically during conversion based on content richness:

  • A — Has conversation turns, tool calls with output, more than 10 tool calls, and more than 5 turns
  • B — Has conversation turns but does not meet the A threshold
  • C — No conversation turns at all

See also

  • go-minitrace help what-is-minitrace — conceptual overview
  • go-minitrace help writing-duckdb-queries — how to query these fields with SQL
  • go-minitrace help adapter-reference — how each source format maps to this schema