Skip to main content
The OpenTracy engine speaks OpenAI-compatible HTTP, so any language that can do a POST request can use it. Python is a convenience; the REST API is the real surface.

Two services, two ports

PortServiceWhat lives there
8080Gateway (Go)/v1/chat/completions, /v1/route, /v1/models, /health
8000Management (Py)Distillation jobs, trace search, clustering, evaluations, datasets
The gateway is the hot path — every completion goes through it. The management API is the slow path — you hit it when you’re building datasets, kicking off training jobs, or reviewing analytics.

Authentication

Out of the box, neither port requires auth. The engine expects you to put it behind your own proxy (Traefik, Caddy, a VPN, a service mesh) before exposing it to the internet. When you configure authentication, both services accept a bearer token:
Authorization: Bearer <your-token>
Provider API keys (OpenAI, Anthropic, …) are held by the engine, not passed by the client. See Self-hosting → Configuration.

Request format

Every endpoint is JSON-in, JSON-out:
POST /v1/chat/completions HTTP/1.1
Host: localhost:8080
Content-Type: application/json

{ ... }
All responses are UTF-8 JSON. Errors follow the OpenAI shape:
{
  "error": {
    "message": "Model 'foo' not found",
    "type": "invalid_request",
    "code": "model_not_found"
  }
}

OpenTracy-specific response headers

The gateway adds a few headers to every /v1/chat/completions response:
HeaderMeaning
X-OpenTracy-Selected-ModelConcrete model the request was answered by.
X-OpenTracy-Cluster-IDSemantic cluster the prompt landed in (0–99).
X-OpenTracy-Expected-ErrorThe router’s predicted error rate for this cluster/model.
X-OpenTracy-Routing-MsWall time spent on the routing decision.
X-OpenTracy-Session-IdFor multi-turn tool calls — pass back on follow-up requests.

Health

curl -s http://localhost:8080/health
curl -s http://localhost:8000/health
{
  "status": "healthy",
  "router_initialized": true,
  "num_models": 12,
  "num_clusters": 100,
  "embedder_ready": true
}

Pages

Chat completions

POST /v1/chat/completions — the OpenAI-compatible entry point.

Routing decision

POST /v1/route — ask the router which model it would pick, without generating.

Models & health

GET /v1/models, GET /health — discover what’s configured.

Distillation

Create jobs, poll status, fetch artifacts over HTTP.

Traces

Search captured traces by model, time range, cost, or metadata.

Dropping in over the OpenAI SDK

The gateway is OpenAI-compatible by design. Any client library that lets you set a base URL works unchanged — see the drop-in OpenAI guide for Python, TypeScript, and raw curl.