\n - `go\\.sum
Process and analyze source code for LLM context preparation and token analysis
The pinocchio catter command is a tool for preparing and analyzing source code for Large Language Model (LLM) contexts. It offers two main subcommands: print for outputting and processing file contents, and stats for analyzing codebase statistics.
The catter command provides a powerful and flexible file filtering system that helps you precisely control which files are processed.
By default, catter excludes common binary and non-text files:
Binary File Extensions:
.png, .jpg, .jpeg, .gif, .bmp, .tiff, .webp.mp3, .wav, .ogg, .flac.mp4, .avi, .mov, .wmv.zip, .tar, .gz, .rar.exe, .dll, .so, .dylib.pdf, .doc, .docx, .xls, .xlsx.bin, .dat, .db, .sqlite.woff, .ttf, .eot, .svg, .woff2.lockExcluded Directories:
.git, .svnnode_modules, vendor.history, .idea, .vscodebuild, dist, sorbet.yardocExcluded Filenames (regex patterns):
.*-lock\.json$go\.sum$yarn\.lock$package-lock\.json$These defaults can be disabled using the --disable-default-filters flag.
The following examples use these flags:
-i, --include: Specify file extensions to include-e, --exclude: Specify file extensions to exclude# Include only specific extensions
pinocchio catter print -i .go,.js,.py
# Exclude specific extensions
pinocchio catter print -e .test.js,.spec.py
# Combine include and exclude
pinocchio catter print -i .go,.js -e .test.js
Flags used:
-f, --match-filename: Match filenames using regex patterns-p, --match-path: Match file paths using regex patterns--exclude-match-filename: Exclude files matching regex patterns--exclude-match-path: Exclude paths matching regex patterns# Match test files
pinocchio catter print -f "^test_.*\.pyquot;
# Match multiple patterns
pinocchio catter print -f "^main.*" -f "^app.*"
# Match specific directories while excluding tests
pinocchio catter print -p "src/models/" --exclude-match-path "internal/testing/"
Using -x, --exclude-dirs to specify directories to skip:
# Exclude multiple directories
pinocchio catter print -x tests,docs,examples,vendor
Flags:
--max-file-size: Maximum size for individual files (bytes)--filter-binary: Control binary file filtering# Set maximum file size and include binary files
pinocchio catter print --max-file-size 500000 --filter-binary=false
# Use repository's .gitignore rules (default)
pinocchio catter print .
# Disable .gitignore rules
pinocchio catter print --disable-gitignore .
Create a .catter-filter.yaml file to define reusable filter profiles:
profiles:
go-only:
include-exts: [.go]
exclude-dirs: [vendor, test]
exclude-match-filenames: [".*_test\\.goquot;]
max-file-size: 1048576 # 1MB
filter-binary-files: true
docs:
include-exts: [.md, .rst, .txt]
match-paths: ["docs/", "README"]
exclude-dirs: [node_modules, vendor]
tests:
match-filenames: ["^test_", "_test\\.goquot;]
exclude-dirs: [vendor]
Use profiles with:
pinocchio catter print --filter-profile go-only .
Use the verbose flag to see which files are being included or excluded:
pinocchio catter print --verbose .
Print current filter configuration:
pinocchio catter print --print-filters
Filters are applied in the following order:
A file must pass all applicable filters to be included in the output.
Start Broad, Then Narrow
# Start with extension filtering
pinocchio catter print --include .py .
# Add specific patterns
pinocchio catter print --include .py --match-filename "^(?!test_).*\.pyquot;
Use Multiple Filter Types
# Combine different filter types for precision
pinocchio catter print \
--include .go \
--exclude-dirs vendor,test \
--match-path "src/" \
--exclude-match-filename "_test\.goquot;
Profile-based Workflow
Performance Considerations
Flags used:
-d, --delimiter: Output format for text (markdown, xml, simple, begin-end)-s, --stats: Statistics detail level (overview, dir, full)# Get Python files with context headers (text output)
pinocchio catter print -i .py -x tests/ -d markdown src/
# Process specific files with token statistics
pinocchio catter stats -s full main.go utils.go config.go
Flags used:
-a, --archive-file: Output archive file path (e.g., output.zip, codebase.tar.gz)--archive-prefix: Directory prefix within the archive (e.g., my-project/)# Archive all .go files (excluding vendor) into a zip file
pinocchio catter print -i .go -x vendor -a go_files.zip .
# Archive .py and .js files into a tar.gz, placing them under a 'src' prefix
pinocchio catter print -i .py,.js --archive-prefix src/ -a source_archive.tar.gz .
# Archive files matching a path pattern into a zip file
pinocchio catter print -p "internal/api/" -a api_files.zip .
Flags:
--max-tokens: Limit total tokens processed (applies to text output and archive content)--max-lines: Limit lines per file (applies to text output and archive content)--glazed: Enable structured output (text output only)# Limit tokens while getting detailed stats (text output)
pinocchio catter print --max-tokens 4000 --max-lines 100 --glazed src/
# Limit tokens when creating an archive
pinocchio catter print --max-tokens 10000 -i .go -a limited_go.zip .
# Get structured stats output
pinocchio catter stats --glazed -s full . | glazed format -f json
pinocchio catter print [flags] <paths...>
Main flags:
--max-file-size: Limit individual file sizes (default: 1MB)--max-total-size: Limit total processed size (default: 10MB)-i, --include: File extensions to include (e.g., .go,.js)-e, --exclude: File extensions to exclude-d, --delimiter: Output format for text output (default, xml, markdown, simple, begin-end)--max-lines: Maximum lines per file (applies to text and archive)--max-tokens: Maximum tokens per file (applies to text and archive)-a, --archive-file: Path to output archive file. Format (zip or tar.gz/.tgz) inferred from extension. If set, text output flags (-d, --glazed) are ignored.--archive-prefix: Directory prefix to add within the archive (e.g., myproject/). Used only with --archive-file.--glazed: Enable structured output (ignored if --archive-file is used)Filtering options:
-f, --match-filename: Regex patterns for filenames-p, --match-path: Regex patterns for file paths-x, --exclude-dirs: Directories to exclude--disable-gitignore: Ignore .gitignore rules--print-filters: Print the resolved filter configuration and exit.--filter-yaml: Path to a YAML file with filter profiles.--filter-profile: Name of a filter profile to use from YAML.--disable-default-filters: Disable built-in default filters.pinocchio catter stats [flags] <paths...>
Main flags:
-s, --stats: Statistics detail level (overview, dir, full)--glazed: Enable structured output (default: true)The stats command provides:
Create a .catter-filter.yaml file for persistent settings:
profiles:
python-only:
include-exts: [.py]
exclude-dirs: [venv, __pycache__]
api-docs:
match-paths: ["api/", "docs/"]
include-exts: [.md, .rst]
Use profiles:
pinocchio catter print --filter-profile python-only .
Generate machine-readable output (only for text output modes):
# Get JSON-formatted stats
pinocchio catter stats --glazed -s full . | glazed format -f json
# Process text output with other tools
pinocchio catter print --glazed src/ | glazed filter --col Content
Maintain code context with delimiters:
# XML format for structured parsing / claude
pinocchio catter print -d xml src/
# Markdown format separator
pinocchio catter print -d markdown --include .md,.rst docs/
Respect repository settings:
# Use repository's .gitignore
pinocchio catter print .
# Override gitignore rules
pinocchio catter print --disable-gitignore .
Token Optimization
--max-tokens to stay within API limits--max-lines for reasonable chunk sizesFiltering Strategy
--print-filters to verify configurationOutput Management
Configuration Management
Common error scenarios and solutions:
Size Limits
--max-total-size--max-tokensFilter Issues
Performance
# Prepare code for OpenAI API
pinocchio catter print --max-tokens 4000 -d markdown src/ > context.md
# Generate documentation
pinocchio catter print --include .go --exclude-dirs vendor/ . | pinocchio code professional --context - "Generate documentation"
# Code review preparation
pinocchio catter print --match-path "changed/" -d markdown > review.md
# Documentation updates
pinocchio catter stats -s full . > codebase-metrics.json