Token Count Modes

Choosing between local estimates and provider-native token counting

Sections

Terminology & Glossary
📖 Documentation
Navigation
54 sectionsv0.1
📄 Token Count Modes — glaze help token-count-modes
token-count-modes

Token Count Modes

Choosing between local estimates and provider-native token counting

Topictokenstoken-countopenaiclaudepinocchiopinocchiotokenscount-modemodelcodecai-api-type+2

Overview

pinocchio tokens count now supports three counting modes through --count-mode:

  • estimate: local tokenizer estimate using the existing tiktoken-based path
  • api: provider-native token counting through Geppetto
  • auto: try provider-native counting first and fall back to a local estimate

Basic Examples

Local estimate:

pinocchio tokens count --count-mode estimate --model gpt-4o-mini prompt.txt

Profile-first provider count:

pinocchio tokens count \
  --count-mode api \
  --profile gpt-4o-mini \
  --profile-registries ~/.config/pinocchio/profiles.yaml \
  prompt.txt

OpenAI Responses API count with explicit flags:

pinocchio tokens count \
  --count-mode api \
  --model gpt-4o-mini \
  --ai-api-type openai-responses \
  --openai-api-key "$OPENAI_API_KEY" \
  prompt.txt

Anthropic count with explicit flags:

pinocchio tokens count \
  --count-mode api \
  --model claude-sonnet-4-20250514 \
  --ai-api-type claude \
  --claude-api-key "$ANTHROPIC_API_KEY" \
  prompt.txt

Automatic fallback:

pinocchio tokens count --count-mode auto --model gpt-4o-mini prompt.txt

How To Choose

  • Use estimate when you want a fast local answer and do not need provider-exact counts.
  • Use api when the exact provider accounting matters. Prefer --profile plus --profile-registries for normal operator workflows.
  • Use auto when you prefer provider-exact counts but still want the command to work without credentials or when provider-native counting is unavailable.
  • Keep the explicit provider flags for ad-hoc debugging and CLI testing.

Output Shape

The command prints the requested mode and the actual count source so fallback behavior is explicit.

  • Estimate output includes the tokenizer codec used.
  • API output includes the provider and endpoint used.
  • Auto fallback output includes the provider error that triggered the local estimate.