Two services, two ports
| Port | Service | What lives there |
|---|---|---|
8080 | Gateway (Go) | /v1/chat/completions, /v1/route, /v1/models, /health |
8000 | Management (Py) | Distillation jobs, trace search, clustering, evaluations, datasets |
Authentication
Out of the box, neither port requires auth. The engine expects you to put it behind your own proxy (Traefik, Caddy, a VPN, a service mesh) before exposing it to the internet. When you configure authentication, both services accept a bearer token:Request format
Every endpoint is JSON-in, JSON-out:OpenTracy-specific response headers
The gateway adds a few headers to every/v1/chat/completions response:
| Header | Meaning |
|---|---|
X-OpenTracy-Selected-Model | Concrete model the request was answered by. |
X-OpenTracy-Cluster-ID | Semantic cluster the prompt landed in (0–99). |
X-OpenTracy-Expected-Error | The router’s predicted error rate for this cluster/model. |
X-OpenTracy-Routing-Ms | Wall time spent on the routing decision. |
X-OpenTracy-Session-Id | For multi-turn tool calls — pass back on follow-up requests. |
Health
Pages
Chat completions
POST /v1/chat/completions — the OpenAI-compatible entry point.Routing decision
POST /v1/route — ask the router which model it would pick, without generating.Models & health
GET /v1/models, GET /health — discover what’s configured.Distillation
Create jobs, poll status, fetch artifacts over HTTP.
Traces
Search captured traces by model, time range, cost, or metadata.

