Learn DuckDB JSON operators and query patterns for the minitrace schema
This tutorial teaches you how to write DuckDB SQL queries against minitrace archives. It covers the JSON access syntax, type casting, array operations, and common patterns you need for analysis.
All examples use the query duckdb command with --sql. The loaded table is called sessions_base by default.
When go-minitrace loads an archive, it creates a table with these columns:
id, title, summary, classification, profile — directly queryableprovenance, flags, environment, operational_context, timing, metrics — access fields with ->>turns, tool_calls, annotations — iterate with UNNESTDuckDB uses the ->> operator to extract a string value from a JSON column:
-- Extract the model from the environment JSON
SELECT environment->>'model' AS model FROM sessions_base;
-- Extract nested fields
SELECT environment->>'agent_framework' AS framework FROM sessions_base;
SELECT provenance->>'source_format' AS source FROM sessions_base;
SELECT timing->>'started_at' AS started FROM sessions_base;
SELECT metrics->>'turn_count' AS turns FROM sessions_base;
The -> operator (single arrow) returns JSON; ->> (double arrow) returns a string. For most queries you want ->> because it gives you a plain value you can GROUP BY, filter, or display.
One important DuckDB parser rule is easy to miss: -> and ->> have low precedence because DuckDB also uses arrow syntax in lambda-related contexts. Inside predicates, parenthesize JSON-arrow expressions so the parser groups the extraction the way you intend.
For example, prefer:
WHERE (provenance->>'source_format') NOT LIKE '%subagent%'
over:
WHERE provenance->>'source_format' NOT LIKE '%subagent%'
The parenthesized form is easier to read and avoids a class of confusing parse/coercion errors.
JSON field extraction with ->> always returns a string. To do math, you must CAST:
-- Integer fields
CAST(metrics->>'turn_count' AS INT)
CAST(metrics->>'tool_call_count' AS INT)
-- Float fields
CAST(metrics->>'read_ratio' AS DOUBLE)
CAST(timing->>'duration_seconds' AS DOUBLE)
-- Large numbers (token counts can be very large)
CAST(metrics->>'total_input_tokens' AS BIGINT)
CAST(metrics->>'total_output_tokens' AS BIGINT)
Common pattern for aggregation:
SELECT
environment->>'agent_framework' AS framework,
COUNT(*) AS sessions,
ROUND(AVG(CAST(metrics->>'tool_call_count' AS INT)), 1) AS avg_tools,
ROUND(AVG(CAST(timing->>'duration_seconds' AS DOUBLE)), 0) AS avg_duration_s
FROM sessions_base
GROUP BY framework;
The turns, tool_calls, and annotations columns are loaded as DuckDB JSON[] columns. The normal way to query individual elements is to UNNEST the array first:
-- Count tool calls by name across all sessions
SELECT
tc->>'tool_name' AS tool_name,
COUNT(*) AS invocations
FROM sessions_base,
UNNEST(tool_calls) AS t(tc)
GROUP BY tool_name
ORDER BY invocations DESC;
The UNNEST(tool_calls) AS t(tc) clause expands each tool call array element into a row. The variable tc holds one JSON element that you can query with either ->> or json_extract().
Once you have unnested into an element like tc, both of these patterns are valid:
-- Direct JSON extraction with ->>
tc->>'tool_name'
tc->>'operation_type'
tc->'output'->>'error'
-- Equivalent json_extract() style
json_extract(tc, '$.tool_name')
json_extract(tc, '$.operation_type')
json_extract(tc, '$.output.success')
json_extract(tc, '$.timestamp')
For string extraction from within json_extract, wrap in CAST(... AS VARCHAR) and strip quotes:
REPLACE(CAST(json_extract(tc, '$.tool_name') AS VARCHAR), '"', '') AS tool_name
A few DuckDB behaviors are easy to miss when you are exploring ad hoc:
tool_calls[1] is the first elementtool_calls[0] returns NULLUNNEST(tool_calls) AS t(tc) is the most reliable patternjson_extract(tool_calls, '$[0].tool_name') are easy to misread and often do not behave the way users expectExamples:
-- First element by direct list indexing (1-based)
SELECT
tool_calls[1]->>'tool_name' AS first_tool,
tool_calls[0]->>'tool_name' AS zeroth_tool
FROM sessions_base
LIMIT 1;
-- Recommended pattern: unnest first, then query each element
SELECT
tc->>'tool_name' AS tool_name,
tc->>'operation_type' AS operation_type
FROM sessions_base,
UNNEST(tool_calls) AS t(tc)
LIMIT 20;
Tool input fields vary a bit by tool. A few patterns are especially useful in practice:
-- Normalized path when available, otherwise raw argument path
COALESCE(tc->'input'->>'file_path', tc->'input'->'arguments'->>'path')
-- Shell commands
(tc->'input'->>'command')
-- Search queries or other tool-specific arguments
(tc->'input'->'arguments'->>'query')
That COALESCE(...) pattern is the safest default when you are inspecting read/write/edit-style calls, because many adapters normalize the path to input.file_path while still preserving the raw original payload under input.arguments.
One more DuckDB parser sharp edge is worth calling out explicitly: when using -> / ->> inside predicates, wrap the extraction in parentheses. This matters especially for LIKE, but the same habit is useful for =, IN, NOT LIKE, and other boolean expressions too.
Do not treat CAST(... AS VARCHAR) as the primary fix for these cases. If a naked arrow expression fails in a predicate, parenthesizing it is the more direct fix; CAST(...) sometimes works only because it also forces the extraction into a grouped subexpression.
-- Safer than writing tc->'input'->>'command' LIKE ... directly
WHERE (tc->'input'->>'command') LIKE '%docmgr%'
-- Same rule for equality and membership checks
WHERE (environment->>'agent_framework') = 'claude-code'
AND (environment->>'agent_framework') IN ('claude-code', 'pi')
Annotations follow the same UNNEST pattern, but there is one workflow-specific nuance: go-minitrace query duckdb reads the .minitrace.json archive files it loads. If you created or edited annotations through go-minitrace annotate ..., sync them first:
go-minitrace annotate sync --output-dir ./output
Then query them like any other JSON array:
SELECT
id AS session_id,
REPLACE(CAST(json_extract(ann, '$.scope.type') AS VARCHAR), '"', '') AS scope_type,
REPLACE(CAST(json_extract(ann, '$.scope.target_id') AS VARCHAR), '"', '') AS target_id,
REPLACE(CAST(json_extract(ann, '$.content.category') AS VARCHAR), '"', '') AS category,
REPLACE(CAST(json_extract(ann, '$.content.title') AS VARCHAR), '"', '') AS title
FROM sessions_base,
UNNEST(annotations) AS a(ann);
Common annotation paths:
$.scope.type$.scope.target_id$.content.category$.content.title$.content.detail$.taxonomy_mappings.minitrace$.classificationExamples:
-- Count annotations by category
SELECT
REPLACE(CAST(json_extract(ann, '$.content.category') AS VARCHAR), '"', '') AS category,
COUNT(*) AS n
FROM sessions_base,
UNNEST(annotations) AS a(ann)
GROUP BY category
ORDER BY n DESC;
-- Filter to tool-call-level annotations only
SELECT
id AS session_id,
REPLACE(CAST(json_extract(ann, '$.scope.target_id') AS VARCHAR), '"', '') AS tool_call_id,
REPLACE(CAST(json_extract(ann, '$.content.title') AS VARCHAR), '"', '') AS title
FROM sessions_base,
UNNEST(annotations) AS a(ann)
WHERE REPLACE(CAST(json_extract(ann, '$.scope.type') AS VARCHAR), '"', '') = 'tool_call';
You can unnest multiple arrays in the same query, but each creates a cross-product. Usually you want to unnest one array per query:
-- Per-turn analysis
SELECT
id,
REPLACE(CAST(json_extract(turn, '$.role') AS VARCHAR), '"', '') AS role,
CAST(json_extract(turn, '$.index') AS INT) AS turn_index
FROM sessions_base,
UNNEST(turns) AS t(turn)
WHERE id = 'some-session-id';
WHERE (environment->>'agent_framework') = 'claude-code'
WHERE (environment->>'agent_framework') IN ('claude-code', 'pi')
WHERE (provenance->>'source_format') NOT LIKE '%subagent%'
WHERE quality = 'A'
WHERE (timing->>'started_at') >= '2026-03-01'
AND (timing->>'started_at') < '2026-04-01'
WHERE CAST(metrics->>'tool_call_count' AS INT) > 10
AND CAST(metrics->>'turn_count' AS INT) > 5
SELECT
environment->>'model' AS model,
COUNT(*) AS sessions,
ROUND(AVG(CAST(metrics->>'tool_call_count' AS INT)), 1) AS avg_tools
FROM sessions_base
GROUP BY model
ORDER BY sessions DESC;
SELECT id, title,
CAST(metrics->>'tool_call_count' AS INT) AS tools
FROM sessions_base
ORDER BY tools DESC
LIMIT 10;
SELECT
environment->>'agent_framework' AS framework,
COUNT(*) FILTER (WHERE quality = 'A') AS quality_a,
COUNT(*) FILTER (WHERE quality = 'B') AS quality_b,
COUNT(*) FILTER (WHERE quality = 'C') AS quality_c
FROM sessions_base
GROUP BY framework;
SELECT
CAST(timing->>'started_at' AS DATE) AS day,
COUNT(*) AS sessions
FROM sessions_base
WHERE (timing->>'started_at') IS NOT NULL
GROUP BY day
ORDER BY day;
For queries you run repeatedly, save them to .sql files:
go-minitrace query duckdb \
--archive-glob './output/active/*/*.minitrace.json' \
--sql-file ./my-analysis.sql
The repo ships query recipes in the queries/ directory. You can use these with the external DuckDB CLI too:
duckdb analysis.duckdb -init queries/load.sql -f queries/framework-summary.sql
ignore_errors = true in the load step means malformed files are silently skipped. If you get unexpected row counts, validate first.--db-path with a file to avoid reloading on every query.LIMIT during exploration to avoid printing thousands of rows.| Problem | Cause | Solution |
|---|---|---|
Invalid Input Error: ... | Trying math on a string extracted by ->> | Wrap in CAST(... AS INT) or CAST(... AS DOUBLE) |
Binder Error: column not found | Typo in column name or missing JSON field | Check spelling against go-minitrace help minitrace-schema |
| NULL values in aggregation | Some sessions don't have the field | Use WHERE field IS NOT NULL or COALESCE |
| Empty results from UNNEST | The array column is empty for all matched sessions | Check that sessions have tool_calls or turns data |
go-minitrace help js-api-reference — use the same SQL patterns inside mt.query() and mt.queryOne() in JS command handlersgo-minitrace help analysis-guide — where SQL queries fit in the end-to-end analysis workflowgo-minitrace help structured-query-commands — promote useful SQL into a named, reusable structured commandgo-minitrace help duckdb-query-recipes — ready-to-use SQL examples for common analysis patternsgo-minitrace help annotation-playbook — operator workflow for creating, syncing, and validating annotationsgo-minitrace help query-commands — query command flags and modesgo-minitrace help minitrace-schema — complete field reference