Self-hosting

OpenTracy ghost holding a shield — self-host when you need full control over data residency and security

Most of OpenTracy works from just pip install opentracy. You need the full self-hosted stack if you want:

Trace storage and analytics (ClickHouse-backed).
The UI at localhost:3000 for browsing traces, datasets, experiments.
Distillation (Distiller client — requires the REST API + a GPU).
Engine-side routing aliases (the model="smart" → distilled-student swap).

Prerequisites

Docker ≥ 24 and Docker Compose v2
An NVIDIA GPU + nvidia-container-toolkit if you plan to run distillation
~10 GB free disk for ClickHouse + weights + training artifacts

On Linux with an NVIDIA GPU:

sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi

If that last command prints your GPU, you’re ready.

Start the stack

git clone https://github.com/OpenTracy/opentracy.git
cd opentracy
make start-full

After a minute or two:

Service	URL	Purpose
Gateway engine	`http://localhost:8080`	OpenAI-compatible, routes requests.
REST / Python API	`http://localhost:8000`	Datasets, distillation, evaluations.
UI	`http://localhost:3000`	Browser dashboard.
ClickHouse HTTP	`http://localhost:8123`	Trace storage.

Health-check them all:

curl -s http://localhost:8080/health | jq .
curl -s http://localhost:8000/health | jq .

make start-full is defined in the repo’s top-level Makefile. It pulls the pinned images, bakes a docker-compose.yml with the four services above, and waits until ClickHouse is reachable before starting the engine. Run make help to see every command.

What each service is

opentracy-engine (Go) — the gateway. Receives /v1/chat/completions, authenticates to upstream providers, streams responses back, writes traces to ClickHouse. Also exposes /v1/route for auto-routing.
opentracy-api (Python, FastAPI) — business logic on top of traces: datasets, evaluations, distillation jobs. Uses ClickHouse for reads, launches training subprocesses.
clickhouse — column-store for traces and analytics.
ui (Next.js) — the dashboard.

Configuration knobs

Provider API keys live under ~/.opentracy/secrets.json on the host (mounted into the engine container):

{
  "openai_api_key": "sk-...",
  "anthropic_api_key": "sk-ant-...",
  "groq_api_key": "gsk_...",
  "deepseek_api_key": "sk-...",
  "huggingface_api_key": "hf_..."
}

Or via env vars passed through to the engine:

OPENAI_API_KEY=... ANTHROPIC_API_KEY=... make start-full

Other env vars worth knowing (all read by the opentracy-api container):

Variable	Default	Purpose
`OPENTRACY_CH_HOST`	`clickhouse`	ClickHouse host inside the compose network.
`OPENTRACY_CH_DATABASE`	`opentracy`	ClickHouse database name.
`OPENTRACY_CH_ENABLED`	`false`	Enable ClickHouse writes. `make start-full` sets this.
`OPENTRACY_DATA_DIR`	`/app/data`	Training artifacts root (mapped to a volume).
`OPENTRACY_TRACE_REDACT`	`false`	Strip PII patterns from trace content.
`OPENTRACY_TRACE_CONTENT`	`true`	Set to `false` to drop prompt/response text.
`AUTO_TRAINER`	`false`	Enable the autonomous operator loop.

Legacy LUNAR_* env vars still work and fall back to the new OPENTRACY_* names with a one-time DeprecationWarning. Migrate when you touch the env file — no rush.

Point your app at the local stack

import os
os.environ["OPENTRACY_ENGINE_URL"] = "http://localhost:8080"

import opentracy as ot
resp = ot.completion(model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "hi"}])
# Request flowed through the engine, trace persisted in ClickHouse.

Or with the OpenAI SDK — see the drop-in-openai guide.

Running distillation

With the stack up and at least one GPU visible:

from opentracy import Distiller

d = Distiller(base_url="http://localhost:8000")
print([t["id"] for t in d.teacher_models()][:5])
print([s["id"] for s in d.student_models()][:5])

job = d.create(
    name="demo",
    dataset_id="ds_my_dataset",
    teacher_model="openai/gpt-4o-mini",
    student_model="llama-3.2-1b",
    num_prompts=50,
    training_steps=30,
)
job = d.wait(job["id"])
print(d.artifacts(job["id"]))

If the training step fails with “HuggingFace access required”, the student model is gated — add your HuggingFace token in UI → Settings → Integrations, then retry. The engine’s trainer subprocess picks up the token from secrets storage automatically.

Updating

git pull
make start-full    # rebuilds and restarts changed services

Migration: database schema changes are handled automatically on startup. Routing weights update whenever a newer version appears in the Hub (ot.download("weights-default") refreshes them locally).

Stopping

make stop-full

Data persists in volumes (clickhouse-data, weights-data, distillation-data). To wipe everything and start fresh:

docker compose down --volumes

Production considerations

The engine has no auth out of the box. Do not expose port 8080 publicly. Put it behind an auth proxy (Traefik, Caddy with mTLS, a VPN, etc.).

ClickHouse defaults are for dev. The compose file starts ClickHouse with the default password opentracy and no TLS. For production, set a real password (CLICKHOUSE_PASSWORD env var), enable HTTPS, and take regular backups.

Distillation artifacts are on the host. The distillation-data volume ends up under Docker’s volume root; set up regular backups if the adapters matter beyond reproducing them from the dataset.

Pipeline

Understand how the self-hosted stack implements the pipeline.

Python SDK

The thin client that talks to the stack you just started.

Get Started

Concepts

Guides

Prerequisites

Start the stack

What each service is

Configuration knobs

Point your app at the local stack

Running distillation

Updating

Stopping

Production considerations

Next

Pipeline

Python SDK

​Prerequisites

​Start the stack

​What each service is

​Configuration knobs

​Point your app at the local stack

​Running distillation

​Updating

​Stopping

​Production considerations

​Next

Pipeline

Python SDK

Prerequisites

Start the stack

What each service is

Configuration knobs

Point your app at the local stack

Running distillation

Updating

Stopping

Production considerations

Next