LLM Proxy Overview

OpenAI-compatible proxy backed by Geppetto profile runtime configuration.

Sections

Terminology & Glossary
πŸ“– Documentation
Navigation
1 sectionv0.1
πŸ“„ LLM Proxy Overview β€” glaze help llm-proxy-overview
llm-proxy-overview

LLM Proxy Overview

OpenAI-compatible proxy backed by Geppetto profile runtime configuration.

Topicllm-proxyopenaigeppettollm-proxy-serverllm-proxy-server servelistenprofiles

llm-proxy-server exposes an OpenAI-compatible HTTP API and translates requests into Geppetto inference calls. Provider credentials and model routing live in Geppetto profile YAML; the proxy itself does not store API keys or provider routing tables.

The main runtime flow is:

  1. Run the Glazed-backed serve command.
  2. Load optional profile YAML from --profiles.
  3. Build OpenAI-compatible model, completion, and chat-completion services from those profiles.
  4. Start an HTTP server on --listen.
  5. Serve /healthz, /v1/models, /v1/completions, and /v1/chat/completions.

Example:

llm-proxy-server serve --profiles examples/profiles.yaml --listen 127.0.0.1:8080