\n - `go\\.sum Glazed Help Browser \n - `yarn\\.lock Glazed Help Browser \n - `package-lock\\.json Glazed Help Browser \n\nThese defaults can be disabled using the `--disable-default-filters` flag.\n\n### Filter Configuration Options\n\n#### Extension-based Filtering\nThe following examples use these flags:\n- `-i, --include`: Specify file extensions to include\n- `-e, --exclude`: Specify file extensions to exclude\n\n```bash\n# Include only specific extensions\npinocchio catter print -i .go,.js,.py\n\n# Exclude specific extensions\npinocchio catter print -e .test.js,.spec.py\n\n# Combine include and exclude\npinocchio catter print -i .go,.js -e .test.js\n```\n\n#### Pattern Matching\nFlags used:\n- `-f, --match-filename`: Match filenames using regex patterns\n- `-p, --match-path`: Match file paths using regex patterns\n- `--exclude-match-filename`: Exclude files matching regex patterns\n- `--exclude-match-path`: Exclude paths matching regex patterns\n\n```bash\n# Match test files\npinocchio catter print -f \"^test_.*\\.py$\"\n\n# Match multiple patterns\npinocchio catter print -f \"^main.*\" -f \"^app.*\"\n\n# Match specific directories while excluding tests\npinocchio catter print -p \"src/models/\" --exclude-match-path \"internal/testing/\"\n```\n\n#### Directory Exclusion\nUsing `-x, --exclude-dirs` to specify directories to skip:\n\n```bash\n# Exclude multiple directories\npinocchio catter print -x tests,docs,examples,vendor\n```\n\n#### Size and Binary Filtering\nFlags:\n- `--max-file-size`: Maximum size for individual files (bytes)\n- `--filter-binary`: Control binary file filtering\n\n```bash\n# Set maximum file size and include binary files\npinocchio catter print --max-file-size 500000 --filter-binary=false\n```\n\n#### GitIgnore Integration\n\n```bash\n# Use repository's .gitignore rules (default)\npinocchio catter print .\n\n# Disable .gitignore rules\npinocchio catter print --disable-gitignore .\n```\n\n### YAML Configuration\n\nCreate a `.catter-filter.yaml` file to define reusable filter profiles:\n\n```yaml\nprofiles:\n go-only:\n include-exts: [.go]\n exclude-dirs: [vendor, test]\n exclude-match-filenames: [\".*_test\\\\.go$\"]\n max-file-size: 1048576 # 1MB\n filter-binary-files: true\n\n docs:\n include-exts: [.md, .rst, .txt]\n match-paths: [\"docs/\", \"README\"]\n exclude-dirs: [node_modules, vendor]\n\n tests:\n match-filenames: [\"^test_\", \"_test\\\\.go$\"]\n exclude-dirs: [vendor]\n```\n\nUse profiles with:\n```bash\npinocchio catter print --filter-profile go-only .\n```\n\n### Debugging Filters\n\nUse the verbose flag to see which files are being included or excluded:\n\n```bash\npinocchio catter print --verbose .\n```\n\nPrint current filter configuration:\n```bash\npinocchio catter print --print-filters\n```\n\n### Filter Precedence\n\nFilters are applied in the following order:\n\n1. GitIgnore rules (unless disabled)\n2. File size limits\n3. Default exclusions (unless disabled)\n4. Extension includes\n5. Extension excludes\n6. Filename pattern matches\n7. Path pattern matches\n8. Directory exclusions\n9. Binary file filtering\n\nA file must pass all applicable filters to be included in the output.\n\n### Best Practices\n\n1. **Start Broad, Then Narrow**\n ```bash\n # Start with extension filtering\n pinocchio catter print --include .py .\n \n # Add specific patterns\n pinocchio catter print --include .py --match-filename \"^(?!test_).*\\.py$\"\n ```\n\n2. **Use Multiple Filter Types**\n ```bash\n # Combine different filter types for precision\n pinocchio catter print \\\n --include .go \\\n --exclude-dirs vendor,test \\\n --match-path \"src/\" \\\n --exclude-match-filename \"_test\\.go$\"\n ```\n\n3. **Profile-based Workflow**\n - Create profiles for common tasks\n - Use environment variables for profile selection\n - Share profiles across team members\n\n4. **Performance Considerations**\n - Start with directory exclusions for large codebases\n - Use file size limits for large files\n - Enable binary filtering to avoid processing non-text files\n\n## Common Use Cases\n\n### 1. Preparing Code for LLM Prompts\nFlags used:\n- `-d, --delimiter`: Output format for text (markdown, xml, simple, begin-end)\n- `-s, --stats`: Statistics detail level (overview, dir, full)\n\n```bash\n# Get Python files with context headers (text output)\npinocchio catter print -i .py -x tests/ -d markdown src/\n\n# Process specific files with token statistics\npinocchio catter stats -s full main.go utils.go config.go\n```\n\n### 2. Archiving Filtered Files\nFlags used:\n- `-a, --archive-file`: Output archive file path (e.g., `output.zip`, `codebase.tar.gz`)\n- `--archive-prefix`: Directory prefix within the archive (e.g., `my-project/`)\n\n```bash\n# Archive all .go files (excluding vendor) into a zip file\npinocchio catter print -i .go -x vendor -a go_files.zip .\n\n# Archive .py and .js files into a tar.gz, placing them under a 'src' prefix\npinocchio catter print -i .py,.js --archive-prefix src/ -a source_archive.tar.gz .\n\n# Archive files matching a path pattern into a zip file\npinocchio catter print -p \"internal/api/\" -a api_files.zip .\n```\n\n### 3. Token-Aware Processing\nFlags:\n- `--max-tokens`: Limit total tokens processed (applies to text output and archive content)\n- `--max-lines`: Limit lines per file (applies to text output and archive content)\n- `--glazed`: Enable structured output (text output only)\n\n```bash\n# Limit tokens while getting detailed stats (text output)\npinocchio catter print --max-tokens 4000 --max-lines 100 --glazed src/\n\n# Limit tokens when creating an archive\npinocchio catter print --max-tokens 10000 -i .go -a limited_go.zip .\n\n# Get structured stats output\npinocchio catter stats --glazed -s full . | glazed format -f json\n```\n\n## Command Reference\n\n### Print Command\n\n`pinocchio catter print [flags] \u003cpaths...\u003e`\n\nMain flags:\n- `--max-file-size`: Limit individual file sizes (default: 1MB)\n- `--max-total-size`: Limit total processed size (default: 10MB)\n- `-i, --include`: File extensions to include (e.g., .go,.js)\n- `-e, --exclude`: File extensions to exclude\n- `-d, --delimiter`: Output format for text output (default, xml, markdown, simple, begin-end)\n- `--max-lines`: Maximum lines per file (applies to text and archive)\n- `--max-tokens`: Maximum tokens per file (applies to text and archive)\n- `-a, --archive-file`: Path to output archive file. Format (zip or tar.gz/.tgz) inferred from extension. If set, text output flags (`-d`, `--glazed`) are ignored.\n- `--archive-prefix`: Directory prefix to add within the archive (e.g., `myproject/`). Used only with `--archive-file`.\n- `--glazed`: Enable structured output (ignored if `--archive-file` is used)\n\nFiltering options:\n- `-f, --match-filename`: Regex patterns for filenames\n- `-p, --match-path`: Regex patterns for file paths\n- `-x, --exclude-dirs`: Directories to exclude\n- `--disable-gitignore`: Ignore .gitignore rules\n- `--print-filters`: Print the resolved filter configuration and exit.\n- `--filter-yaml`: Path to a YAML file with filter profiles.\n- `--filter-profile`: Name of a filter profile to use from YAML.\n- `--disable-default-filters`: Disable built-in default filters.\n\n### Stats Command\n\n`pinocchio catter stats [flags] \u003cpaths...\u003e`\n\nMain flags:\n- `-s, --stats`: Statistics detail level (overview, dir, full)\n- `--glazed`: Enable structured output (default: true)\n\nThe stats command provides:\n- Total token counts\n- File and directory statistics\n- Extension-based analysis\n- Line counts and file sizes\n\n## Advanced Usage\n\n### 1. Using YAML Configuration\n\nCreate a `.catter-filter.yaml` file for persistent settings:\n\n```yaml\nprofiles:\n python-only:\n include-exts: [.py]\n exclude-dirs: [venv, __pycache__]\n api-docs:\n match-paths: [\"api/\", \"docs/\"]\n include-exts: [.md, .rst]\n```\n\nUse profiles:\n```bash\npinocchio catter print --filter-profile python-only .\n```\n\n### 2. Structured Output\n\nGenerate machine-readable output (only for text output modes):\n\n```bash\n# Get JSON-formatted stats\npinocchio catter stats --glazed -s full . | glazed format -f json\n\n# Process text output with other tools\npinocchio catter print --glazed src/ | glazed filter --col Content\n```\n\n### 3. Context-Aware Processing\n\nMaintain code context with delimiters:\n\n```bash\n# XML format for structured parsing / claude \npinocchio catter print -d xml src/\n\n# Markdown format separator\npinocchio catter print -d markdown --include .md,.rst docs/\n```\n\n### 4. Gitignore Integration\n\nRespect repository settings:\n\n```bash\n# Use repository's .gitignore\npinocchio catter print .\n\n# Override gitignore rules\npinocchio catter print --disable-gitignore .\n```\n\n## Tips and Best Practices\n\n1. **Token Optimization**\n - Use `--max-tokens` to stay within API limits\n - Combine with `--max-lines` for reasonable chunk sizes\n - Use stats command to analyze token usage patterns\n\n2. **Filtering Strategy**\n - Start with broad filters and refine\n - Use `--print-filters` to verify configuration\n - Combine path and filename patterns for precision\n\n3. **Output Management**\n - Choose appropriate delimiters for your use case\n - Use structured output for automation\n - Consider file size limits for large codebases\n\n4. **Configuration Management**\n - Use YAML profiles for repeated tasks\n - Set CATTER_PROFILE environment variable\n - Create project-specific filter configurations\n\n## Error Handling\n\nCommon error scenarios and solutions:\n\n1. **Size Limits**\n - \"maximum total size limit reached\": Increase `--max-total-size`\n - \"maximum tokens limit reached\": Adjust `--max-tokens`\n\n2. **Filter Issues**\n - No files processed: Check filter patterns\n - Unexpected files: Verify .gitignore settings\n\n3. **Performance**\n - Large directories: Use specific paths\n - Memory usage: Set appropriate size limits\n\n## Integration Examples\n\n### 1. With LLM Tools\n\n```bash\n# Prepare code for OpenAI API\npinocchio catter print --max-tokens 4000 -d markdown src/ \u003e context.md\n\n# Generate documentation\npinocchio catter print --include .go --exclude-dirs vendor/ . | pinocchio code professional --context - \"Generate documentation\"\n```\n\n### 2. With Development Workflows\n\n```bash\n# Code review preparation\npinocchio catter print --match-path \"changed/\" -d markdown \u003e review.md\n\n# Documentation updates\npinocchio catter stats -s full . \u003e codebase-metrics.json\n```\n"},"fulfilledTimeStamp":1782955459452}},"mutations":{},"provided":{"tags":{"Section":{"geppetto-playbook-add-tool":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-playbook-add-event-handler":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-playbook-progressive-structured-data":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"migrate-to-session-api":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"migrate-legacy-profiles-to-registry":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"operate-sqlite-profile-registry":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"wire-provider-credentials-js-go-runner":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"bootstrap-binary-step-settings-defaults-config-registries-profile":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-docs-index":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"profiles":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-events-streaming-watermill":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-embeddings-package":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-inference-engines":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-tools":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-turns":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-middlewares":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"runner":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-sessions":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-structured-sinks":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-linting":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-js-api-reference":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-js-api-user-guide":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-streaming-inference-tools":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-tutorial-event-routing-logging":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-tutorial-embeddings-workflows":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-tutorial-structured-data-extraction":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-js-api-getting-started":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-migrate-sections-values":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-scoped-tool-databases":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-build-scopedjs-eval-tools":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"build-streaming-tool-loop-agent-glazed-flags":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"geppetto-cli-bootstrap-profile-migration":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"chatbuilder-guide":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"chatapp-protobuf-plugins":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"profile-resolution-runtime-switching":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"tui-integration-playbook":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"runtime-symbol-migration-playbook":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"webchat-debugging-and-ops":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"webchat-frontend-architecture":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"webchat-frontend-integration":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"webchat-runtime-truth-migration-playbook":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"building-middleware-with-renderer":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"webchat-getting-started":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"tui-integration-guide":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"cli-profile-bootstrap-migration":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"config-migration-guide":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"building-sessionstream-react-chat-apps":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"catter":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"temporizer":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"md-extract":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"autosave":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"token-count-modes":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"js-runner-scripts":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"rpc-jsonl-output":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"LIST":["listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})"],"pinocchio:v0.11.5:catter":["getSection({\"packageName\":\"pinocchio\",\"slug\":\"catter\",\"version\":\"v0.11.5\"})"]}},"keys":{"listPackages(undefined)":[],"listSections({\"packageName\":\"pinocchio\",\"version\":\"v0.11.5\"})":[{"type":"Section","id":"geppetto-playbook-add-tool"},{"type":"Section","id":"geppetto-playbook-add-event-handler"},{"type":"Section","id":"geppetto-playbook-progressive-structured-data"},{"type":"Section","id":"migrate-to-session-api"},{"type":"Section","id":"migrate-legacy-profiles-to-registry"},{"type":"Section","id":"operate-sqlite-profile-registry"},{"type":"Section","id":"wire-provider-credentials-js-go-runner"},{"type":"Section","id":"bootstrap-binary-step-settings-defaults-config-registries-profile"},{"type":"Section","id":"geppetto-docs-index"},{"type":"Section","id":"profiles"},{"type":"Section","id":"geppetto-events-streaming-watermill"},{"type":"Section","id":"geppetto-embeddings-package"},{"type":"Section","id":"geppetto-inference-engines"},{"type":"Section","id":"geppetto-tools"},{"type":"Section","id":"geppetto-turns"},{"type":"Section","id":"geppetto-middlewares"},{"type":"Section","id":"runner"},{"type":"Section","id":"geppetto-sessions"},{"type":"Section","id":"geppetto-structured-sinks"},{"type":"Section","id":"geppetto-linting"},{"type":"Section","id":"geppetto-js-api-reference"},{"type":"Section","id":"geppetto-js-api-user-guide"},{"type":"Section","id":"geppetto-streaming-inference-tools"},{"type":"Section","id":"geppetto-tutorial-event-routing-logging"},{"type":"Section","id":"geppetto-tutorial-embeddings-workflows"},{"type":"Section","id":"geppetto-tutorial-structured-data-extraction"},{"type":"Section","id":"geppetto-js-api-getting-started"},{"type":"Section","id":"geppetto-migrate-sections-values"},{"type":"Section","id":"geppetto-scoped-tool-databases"},{"type":"Section","id":"geppetto-build-scopedjs-eval-tools"},{"type":"Section","id":"build-streaming-tool-loop-agent-glazed-flags"},{"type":"Section","id":"geppetto-cli-bootstrap-profile-migration"},{"type":"Section","id":"chatbuilder-guide"},{"type":"Section","id":"chatapp-protobuf-plugins"},{"type":"Section","id":"profile-resolution-runtime-switching"},{"type":"Section","id":"tui-integration-playbook"},{"type":"Section","id":"runtime-symbol-migration-playbook"},{"type":"Section","id":"webchat-debugging-and-ops"},{"type":"Section","id":"webchat-frontend-architecture"},{"type":"Section","id":"webchat-frontend-integration"},{"type":"Section","id":"webchat-runtime-truth-migration-playbook"},{"type":"Section","id":"building-middleware-with-renderer"},{"type":"Section","id":"webchat-getting-started"},{"type":"Section","id":"tui-integration-guide"},{"type":"Section","id":"cli-profile-bootstrap-migration"},{"type":"Section","id":"config-migration-guide"},{"type":"Section","id":"building-sessionstream-react-chat-apps"},{"type":"Section","id":"catter"},{"type":"Section","id":"temporizer"},{"type":"Section","id":"md-extract"},{"type":"Section","id":"autosave"},{"type":"Section","id":"token-count-modes"},{"type":"Section","id":"js-runner-scripts"},{"type":"Section","id":"rpc-jsonl-output"},{"type":"Section","id":"LIST"}],"getSection({\"packageName\":\"pinocchio\",\"slug\":\"catter\",\"version\":\"v0.11.5\"})":[{"type":"Section","id":"pinocchio:v0.11.5:catter"}]}},"subscriptions":{},"config":{"online":true,"focused":true,"middlewareRegistered":true,"refetchOnFocus":false,"refetchOnReconnect":false,"refetchOnMountOrArgChange":false,"keepUnusedDataFor":60,"reducerPath":"helpApi","invalidationBehavior":"delayed"}}};

Using catter to gather source code for LLMs

Process and analyze source code for LLM context preparation and token analysis

Sections

Terminology & Glossary
📖 Documentation
Navigation
54 sectionsv0.1
📄 Using catter to gather source code for LLMs — glaze help catter
catter

Using catter to gather source code for LLMs

Process and analyze source code for LLM context preparation and token analysis

Topiccatterllmcatter printcatter statsmax-file-sizemax-total-sizeincludeexclude+6

The pinocchio catter command is a tool for preparing and analyzing source code for Large Language Model (LLM) contexts. It offers two main subcommands: print for outputting and processing file contents, and stats for analyzing codebase statistics.

File Filtering System

The catter command provides a powerful and flexible file filtering system that helps you precisely control which files are processed.

Default Filters

By default, catter excludes common binary and non-text files:

  1. Binary File Extensions:

    • Images: .png, .jpg, .jpeg, .gif, .bmp, .tiff, .webp
    • Audio: .mp3, .wav, .ogg, .flac
    • Video: .mp4, .avi, .mov, .wmv
    • Archives: .zip, .tar, .gz, .rar
    • Executables: .exe, .dll, .so, .dylib
    • Documents: .pdf, .doc, .docx, .xls, .xlsx
    • Data: .bin, .dat, .db, .sqlite
    • Fonts: .woff, .ttf, .eot, .svg, .woff2
    • Lock files: .lock
  2. Excluded Directories:

    • Version Control: .git, .svn
    • Dependencies: node_modules, vendor
    • IDE/Editor: .history, .idea, .vscode
    • Build: build, dist, sorbet
    • Documentation: .yardoc
  3. Excluded Filenames (regex patterns):

    • .*-lock\.json$
    • go\.sum$
    • yarn\.lock$
    • package-lock\.json$

These defaults can be disabled using the --disable-default-filters flag.

Filter Configuration Options

Extension-based Filtering

The following examples use these flags:

  • -i, --include: Specify file extensions to include
  • -e, --exclude: Specify file extensions to exclude
# Include only specific extensions
pinocchio catter print -i .go,.js,.py

# Exclude specific extensions
pinocchio catter print -e .test.js,.spec.py

# Combine include and exclude
pinocchio catter print -i .go,.js -e .test.js

Pattern Matching

Flags used:

  • -f, --match-filename: Match filenames using regex patterns
  • -p, --match-path: Match file paths using regex patterns
  • --exclude-match-filename: Exclude files matching regex patterns
  • --exclude-match-path: Exclude paths matching regex patterns
# Match test files
pinocchio catter print -f "^test_.*\.py
quot; # Match multiple patterns pinocchio catter print -f "^main.*" -f "^app.*" # Match specific directories while excluding tests pinocchio catter print -p "src/models/" --exclude-match-path "internal/testing/"

Directory Exclusion

Using -x, --exclude-dirs to specify directories to skip:

# Exclude multiple directories
pinocchio catter print -x tests,docs,examples,vendor

Size and Binary Filtering

Flags:

  • --max-file-size: Maximum size for individual files (bytes)
  • --filter-binary: Control binary file filtering
# Set maximum file size and include binary files
pinocchio catter print --max-file-size 500000 --filter-binary=false

GitIgnore Integration

# Use repository's .gitignore rules (default)
pinocchio catter print .

# Disable .gitignore rules
pinocchio catter print --disable-gitignore .

YAML Configuration

Create a .catter-filter.yaml file to define reusable filter profiles:

profiles:
  go-only:
    include-exts: [.go]
    exclude-dirs: [vendor, test]
    exclude-match-filenames: [".*_test\\.go
quot;] max-file-size: 1048576 # 1MB filter-binary-files: true docs: include-exts: [.md, .rst, .txt] match-paths: ["docs/", "README"] exclude-dirs: [node_modules, vendor] tests: match-filenames: ["^test_", "_test\\.go
quot;] exclude-dirs: [vendor]

Use profiles with:

pinocchio catter print --filter-profile go-only .

Debugging Filters

Use the verbose flag to see which files are being included or excluded:

pinocchio catter print --verbose .

Print current filter configuration:

pinocchio catter print --print-filters

Filter Precedence

Filters are applied in the following order:

  1. GitIgnore rules (unless disabled)
  2. File size limits
  3. Default exclusions (unless disabled)
  4. Extension includes
  5. Extension excludes
  6. Filename pattern matches
  7. Path pattern matches
  8. Directory exclusions
  9. Binary file filtering

A file must pass all applicable filters to be included in the output.

Best Practices

  1. Start Broad, Then Narrow

    # Start with extension filtering
    pinocchio catter print --include .py .
    
    # Add specific patterns
    pinocchio catter print --include .py --match-filename "^(?!test_).*\.py
    quot;
  2. Use Multiple Filter Types

    # Combine different filter types for precision
    pinocchio catter print \
      --include .go \
      --exclude-dirs vendor,test \
      --match-path "src/" \
      --exclude-match-filename "_test\.go
    quot;
  3. Profile-based Workflow

    • Create profiles for common tasks
    • Use environment variables for profile selection
    • Share profiles across team members
  4. Performance Considerations

    • Start with directory exclusions for large codebases
    • Use file size limits for large files
    • Enable binary filtering to avoid processing non-text files

Common Use Cases

1. Preparing Code for LLM Prompts

Flags used:

  • -d, --delimiter: Output format for text (markdown, xml, simple, begin-end)
  • -s, --stats: Statistics detail level (overview, dir, full)
# Get Python files with context headers (text output)
pinocchio catter print -i .py -x tests/ -d markdown src/

# Process specific files with token statistics
pinocchio catter stats -s full main.go utils.go config.go

2. Archiving Filtered Files

Flags used:

  • -a, --archive-file: Output archive file path (e.g., output.zip, codebase.tar.gz)
  • --archive-prefix: Directory prefix within the archive (e.g., my-project/)
# Archive all .go files (excluding vendor) into a zip file
pinocchio catter print -i .go -x vendor -a go_files.zip .

# Archive .py and .js files into a tar.gz, placing them under a 'src' prefix
pinocchio catter print -i .py,.js --archive-prefix src/ -a source_archive.tar.gz .

# Archive files matching a path pattern into a zip file
pinocchio catter print -p "internal/api/" -a api_files.zip .

3. Token-Aware Processing

Flags:

  • --max-tokens: Limit total tokens processed (applies to text output and archive content)
  • --max-lines: Limit lines per file (applies to text output and archive content)
  • --glazed: Enable structured output (text output only)
# Limit tokens while getting detailed stats (text output)
pinocchio catter print --max-tokens 4000 --max-lines 100 --glazed src/

# Limit tokens when creating an archive
pinocchio catter print --max-tokens 10000 -i .go -a limited_go.zip .

# Get structured stats output
pinocchio catter stats --glazed -s full . | glazed format -f json

Command Reference

pinocchio catter print [flags] <paths...>

Main flags:

  • --max-file-size: Limit individual file sizes (default: 1MB)
  • --max-total-size: Limit total processed size (default: 10MB)
  • -i, --include: File extensions to include (e.g., .go,.js)
  • -e, --exclude: File extensions to exclude
  • -d, --delimiter: Output format for text output (default, xml, markdown, simple, begin-end)
  • --max-lines: Maximum lines per file (applies to text and archive)
  • --max-tokens: Maximum tokens per file (applies to text and archive)
  • -a, --archive-file: Path to output archive file. Format (zip or tar.gz/.tgz) inferred from extension. If set, text output flags (-d, --glazed) are ignored.
  • --archive-prefix: Directory prefix to add within the archive (e.g., myproject/). Used only with --archive-file.
  • --glazed: Enable structured output (ignored if --archive-file is used)

Filtering options:

  • -f, --match-filename: Regex patterns for filenames
  • -p, --match-path: Regex patterns for file paths
  • -x, --exclude-dirs: Directories to exclude
  • --disable-gitignore: Ignore .gitignore rules
  • --print-filters: Print the resolved filter configuration and exit.
  • --filter-yaml: Path to a YAML file with filter profiles.
  • --filter-profile: Name of a filter profile to use from YAML.
  • --disable-default-filters: Disable built-in default filters.

Stats Command

pinocchio catter stats [flags] <paths...>

Main flags:

  • -s, --stats: Statistics detail level (overview, dir, full)
  • --glazed: Enable structured output (default: true)

The stats command provides:

  • Total token counts
  • File and directory statistics
  • Extension-based analysis
  • Line counts and file sizes

Advanced Usage

1. Using YAML Configuration

Create a .catter-filter.yaml file for persistent settings:

profiles:
  python-only:
    include-exts: [.py]
    exclude-dirs: [venv, __pycache__]
  api-docs:
    match-paths: ["api/", "docs/"]
    include-exts: [.md, .rst]

Use profiles:

pinocchio catter print --filter-profile python-only .

2. Structured Output

Generate machine-readable output (only for text output modes):

# Get JSON-formatted stats
pinocchio catter stats --glazed -s full . | glazed format -f json

# Process text output with other tools
pinocchio catter print --glazed src/ | glazed filter --col Content

3. Context-Aware Processing

Maintain code context with delimiters:

# XML format for structured parsing / claude 
pinocchio catter print -d xml src/

# Markdown format separator
pinocchio catter print -d markdown --include .md,.rst docs/

4. Gitignore Integration

Respect repository settings:

# Use repository's .gitignore
pinocchio catter print .

# Override gitignore rules
pinocchio catter print --disable-gitignore .

Tips and Best Practices

  1. Token Optimization

    • Use --max-tokens to stay within API limits
    • Combine with --max-lines for reasonable chunk sizes
    • Use stats command to analyze token usage patterns
  2. Filtering Strategy

    • Start with broad filters and refine
    • Use --print-filters to verify configuration
    • Combine path and filename patterns for precision
  3. Output Management

    • Choose appropriate delimiters for your use case
    • Use structured output for automation
    • Consider file size limits for large codebases
  4. Configuration Management

    • Use YAML profiles for repeated tasks
    • Set CATTER_PROFILE environment variable
    • Create project-specific filter configurations

Error Handling

Common error scenarios and solutions:

  1. Size Limits

    • "maximum total size limit reached": Increase --max-total-size
    • "maximum tokens limit reached": Adjust --max-tokens
  2. Filter Issues

    • No files processed: Check filter patterns
    • Unexpected files: Verify .gitignore settings
  3. Performance

    • Large directories: Use specific paths
    • Memory usage: Set appropriate size limits

Integration Examples

1. With LLM Tools

# Prepare code for OpenAI API
pinocchio catter print --max-tokens 4000 -d markdown src/ > context.md

# Generate documentation
pinocchio catter print --include .go --exclude-dirs vendor/ . | pinocchio code professional --context - "Generate documentation"

2. With Development Workflows

# Code review preparation
pinocchio catter print --match-path "changed/" -d markdown > review.md

# Documentation updates
pinocchio catter stats -s full . > codebase-metrics.json