DuckDB file queries

Query JSON, CSV, and Parquet files directly through sqleton using DuckDB.

Sections

Terminology & Glossary
πŸ“– Documentation
Navigation
35 sectionsv0.1
πŸ“„ DuckDB file queries β€” glaze help duckdb-file-queries
duckdb-file-queries

DuckDB file queries

Query JSON, CSV, and Parquet files directly through sqleton using DuckDB.

Topicduckdbfilesanalyticsqueryrundbdb-typedatabaseoutput

Overview

DuckDB support lets sqleton act as a lightweight analytics CLI for local files. Instead of loading JSON, CSV, or Parquet data into a server database first, you connect sqleton to DuckDB and use DuckDB SQL functions to read files directly.

The common pattern is:

  1. Start a DuckDB connection with --db-type duckdb.
  2. Use --database '' for an in-memory engine, or --database ./file.duckdb for a persistent local database file.
  3. Reference the files inside the SQL statement with DuckDB functions such as read_json_auto, read_csv_auto, and read_parquet.

Quick examples

Query JSON arrays

sqleton query --db-type duckdb --database '' --output json \
  "SELECT user_id, SUM(amount) AS total_amount, COUNT(*) AS event_count
   FROM read_json_auto('./events/*.json', format='array')
   GROUP BY user_id
   ORDER BY user_id"

Query CSV files

sqleton query --db-type duckdb --database '' --output json \
  "SELECT region, SUM(amount) AS revenue, SUM(qty) AS units
   FROM read_csv_auto('./reports/*.csv')
   GROUP BY region
   ORDER BY region"

Query Parquet files

sqleton query --db-type duckdb --database '' --output json \
  "SELECT product, SUM(amount) AS revenue
   FROM read_parquet('./warehouse/*.parquet')
   GROUP BY product
   ORDER BY product DESC"

Connection model

DuckDB in sqleton does not treat a JSON/CSV/Parquet glob as the sqleton --database value. Instead:

  • --database '' means: create an in-memory DuckDB connection.
  • --database ./analytics.duckdb means: open a persistent DuckDB database file.
  • --dsn duckdb:///tmp/app.db means: open /tmp/app.db using a URI-style DSN.
  • --dsn duckdb:///:memory: means: open an in-memory DuckDB connection using a URI-style DSN.
  • read_json_auto('./events/*.json') means: read external files from SQL.

So this is the intended pattern:

sqleton query --db-type duckdb --database '' \
  "SELECT * FROM read_csv_auto('./data/*.csv') LIMIT 10"

and not this:

# Not the intended usage model
sqleton query --db-type duckdb --database './data/*.csv' "SELECT ..."

Supported DuckDB connection styles

The preferred DuckDB connection styles are:

# Preferred: in-memory
sqleton query --db-type duckdb --database '' "SELECT 1"

# Preferred: persistent DuckDB file
sqleton query --db-type duckdb --database ./analytics.duckdb "SELECT 1"

URI-style DSNs are also supported:

# URI-style absolute path
sqleton query --driver duckdb --dsn 'duckdb:///tmp/app.db' "SELECT 1"

# URI-style in-memory DuckDB
sqleton query --driver duckdb --dsn 'duckdb:///:memory:' "SELECT 1"

# URI-style path with query options preserved
sqleton query --driver duckdb --dsn 'duckdb:///tmp/app.db?access_mode=read_only' "SELECT 1"

When sqleton sees a duckdb:// DSN, it normalizes the URI into the file-path form expected by the DuckDB Go driver while preserving query parameters.

When to use in-memory vs persistent DuckDB

In-memory DuckDB

Use this when you want fast ad hoc inspection of raw files:

sqleton query --db-type duckdb --database '' \
  "SELECT COUNT(*) FROM read_json_auto('./events/*.json', format='array')"

Persistent DuckDB database file

Use this when you want to cache results, create tables, or reuse derived data:

sqleton query --db-type duckdb --database ./analytics.duckdb \
  "CREATE TABLE IF NOT EXISTS daily_sales AS
   SELECT * FROM read_parquet('./warehouse/sales/*.parquet')"

Useful DuckDB file functions

read_json_auto

Best for JSON data when you want DuckDB to infer the schema.

SELECT * FROM read_json_auto('./events/*.json', format='array')

If each file contains a top-level array, format='array' is often the right choice.

read_csv_auto

Best for CSV input with header rows and inferred types.

SELECT * FROM read_csv_auto('./exports/*.csv')

read_parquet

Best for Parquet files and Parquet globs.

SELECT * FROM read_parquet('./warehouse/*.parquet')

Smoke-test pattern

A small end-to-end validation loop looks like this:

# 1. JSON
sqleton query --db-type duckdb --database '' --output json \
  "SELECT user_id, COUNT(*)
   FROM read_json_auto('./events/*.json', format='array')
   GROUP BY user_id"

# 2. CSV
sqleton query --db-type duckdb --database '' --output json \
  "SELECT region, SUM(amount)
   FROM read_csv_auto('./reports/*.csv')
   GROUP BY region"

# 3. Parquet
sqleton query --db-type duckdb --database '' --output json \
  "SELECT product, SUM(revenue)
   FROM read_parquet('./warehouse/*.parquet')
   GROUP BY product"
  • Use --output json when you want the result to feed another tool.
  • Use --output csv when you want a quick export after aggregating raw files.
  • Prefer query for ad hoc SQL and run for repeatable SQL files checked into a repo.
  • Keep file paths in the SQL layer; keep connection settings in sqleton flags.