Query JSON, CSV, and Parquet files directly through sqleton using DuckDB.
DuckDB support lets sqleton act as a lightweight analytics CLI for local files. Instead of loading JSON, CSV, or Parquet data into a server database first, you connect sqleton to DuckDB and use DuckDB SQL functions to read files directly.
The common pattern is:
--db-type duckdb.--database '' for an in-memory engine, or --database ./file.duckdb
for a persistent local database file.read_json_auto, read_csv_auto, and read_parquet.sqleton query --db-type duckdb --database '' --output json \
"SELECT user_id, SUM(amount) AS total_amount, COUNT(*) AS event_count
FROM read_json_auto('./events/*.json', format='array')
GROUP BY user_id
ORDER BY user_id"
sqleton query --db-type duckdb --database '' --output json \
"SELECT region, SUM(amount) AS revenue, SUM(qty) AS units
FROM read_csv_auto('./reports/*.csv')
GROUP BY region
ORDER BY region"
sqleton query --db-type duckdb --database '' --output json \
"SELECT product, SUM(amount) AS revenue
FROM read_parquet('./warehouse/*.parquet')
GROUP BY product
ORDER BY product DESC"
DuckDB in sqleton does not treat a JSON/CSV/Parquet glob as the sqleton
--database value. Instead:
--database '' means: create an in-memory DuckDB connection.--database ./analytics.duckdb means: open a persistent DuckDB database file.--dsn duckdb:///tmp/app.db means: open /tmp/app.db using a URI-style DSN.--dsn duckdb:///:memory: means: open an in-memory DuckDB connection using a URI-style DSN.read_json_auto('./events/*.json') means: read external files from SQL.So this is the intended pattern:
sqleton query --db-type duckdb --database '' \
"SELECT * FROM read_csv_auto('./data/*.csv') LIMIT 10"
and not this:
# Not the intended usage model
sqleton query --db-type duckdb --database './data/*.csv' "SELECT ..."
The preferred DuckDB connection styles are:
# Preferred: in-memory
sqleton query --db-type duckdb --database '' "SELECT 1"
# Preferred: persistent DuckDB file
sqleton query --db-type duckdb --database ./analytics.duckdb "SELECT 1"
URI-style DSNs are also supported:
# URI-style absolute path
sqleton query --driver duckdb --dsn 'duckdb:///tmp/app.db' "SELECT 1"
# URI-style in-memory DuckDB
sqleton query --driver duckdb --dsn 'duckdb:///:memory:' "SELECT 1"
# URI-style path with query options preserved
sqleton query --driver duckdb --dsn 'duckdb:///tmp/app.db?access_mode=read_only' "SELECT 1"
When sqleton sees a duckdb:// DSN, it normalizes the URI into the file-path form
expected by the DuckDB Go driver while preserving query parameters.
Use this when you want fast ad hoc inspection of raw files:
sqleton query --db-type duckdb --database '' \
"SELECT COUNT(*) FROM read_json_auto('./events/*.json', format='array')"
Use this when you want to cache results, create tables, or reuse derived data:
sqleton query --db-type duckdb --database ./analytics.duckdb \
"CREATE TABLE IF NOT EXISTS daily_sales AS
SELECT * FROM read_parquet('./warehouse/sales/*.parquet')"
read_json_autoBest for JSON data when you want DuckDB to infer the schema.
SELECT * FROM read_json_auto('./events/*.json', format='array')
If each file contains a top-level array, format='array' is often the right
choice.
read_csv_autoBest for CSV input with header rows and inferred types.
SELECT * FROM read_csv_auto('./exports/*.csv')
read_parquetBest for Parquet files and Parquet globs.
SELECT * FROM read_parquet('./warehouse/*.parquet')
A small end-to-end validation loop looks like this:
# 1. JSON
sqleton query --db-type duckdb --database '' --output json \
"SELECT user_id, COUNT(*)
FROM read_json_auto('./events/*.json', format='array')
GROUP BY user_id"
# 2. CSV
sqleton query --db-type duckdb --database '' --output json \
"SELECT region, SUM(amount)
FROM read_csv_auto('./reports/*.csv')
GROUP BY region"
# 3. Parquet
sqleton query --db-type duckdb --database '' --output json \
"SELECT product, SUM(revenue)
FROM read_parquet('./warehouse/*.parquet')
GROUP BY product"
--output json when you want the result to feed another tool.--output csv when you want a quick export after aggregating raw files.query for ad hoc SQL and run for repeatable SQL files checked into a repo.