Skip to main content
The Python SDK (opentracy) is the native entry point. Use it if you’re starting a new project or if you want features (auto-routing, distillation, trace ingestion) that aren’t part of the OpenAI API shape.

Install

pip install opentracy
One install pulls a platform-specific wheel with the Go engine binary, the ONNX embedder, and pre-trained routing weights bundled in. No extras needed for the core path.
pip install "opentracy[distill]"    # adds training deps (torch, unsloth, peft, trl)
pip install "opentracy[research]"   # adds sentence-transformers for the Python router backend
pip install "opentracy[server]"     # adds FastAPI + ClickHouse for self-hosting
pip install "opentracy[anthropic]"  # native Anthropic SDK path
pip install "opentracy[all]"        # everything

The four things you’ll do

1. One-off completion

Just a chat completion, no routing, no trace.
import opentracy as ot

resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    temperature=0,
)
print(resp.choices[0].message.content)
Full API: completion reference.

2. Explicit router with fallbacks

When you want deterministic rules (“try GPT-4o first, then Claude, then DeepSeek”), use the Router class:
router = ot.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
    strategy="round-robin",   # or "least-cost", "lowest-latency", "weighted-random"
    num_retries=2,
    timeout=60,
)

resp = router.completion(
    model="smart",   # logical alias, resolved to one of the deployments
    messages=[{"role": "user", "content": "..."}],
)
Full API: Router reference.

3. Semantic auto-router

Load the pre-trained router once; it picks the right model per prompt:
auto = ot.load_router(cost_weight=0.5)

decision = auto.route("Write a haiku about autumn")
print(decision.selected_model)      # e.g. "ministral-3b-latest"
print(decision.cluster_id)          # e.g. 87
print(decision.expected_error)      # e.g. 0.212
print(decision.all_scores)          # full score dict
Combined with ot.completion this becomes a cost-optimizing client:
def smart_call(prompt: str, api_key: str) -> str:
    d = auto.route(prompt)
    resp = ot.completion(
        model=d.selected_model,
        messages=[{"role": "user", "content": prompt}],
        api_key=api_key,
    )
    return resp.choices[0].message.content
Full API: load_router reference.

4. Distillation

The one-call path — ot.distill() runs the full 4-phase pipeline in-process and returns a callable Student. Needs opentracy[distill] and a CUDA GPU.
import opentracy as ot

student = ot.distill(
    dataset="tickets.jsonl",          # path, list[dict], or a callable
    teacher="openai/gpt-4o",
    student="llama-3.2-1b",
    steps=100,
    quantize="q4_k_m",                # or None to skip GGUF export
)

print(student("Classify: refund please"))       # local inference, $0

# Ship it behind a logical name — app code never changes
student.deploy("ticket-classifier")
resp = ot.completion(model="ticket-classifier", messages=[...])
Full API: ot.distill reference. For the long-running, queued REST flow against a self-hosted engine (ClickHouse-backed jobs, UI observability), use Distiller instead — same engine, different deployment shape.

Async

Everything that has a sync version has async:
import asyncio
import opentracy as ot

async def main():
    resp = await ot.acompletion(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "hello"}],
    )
    print(resp.choices[0].message.content)

asyncio.run(main())
acompletion shares its request-preparation path with the sync version, so force_engine, force_direct, fallbacks, and engine-prefix handling all behave identically.

Trace ingestion

If you have existing logs from another LLM provider and want to use them for dataset building or distillation in OpenTracy, you can import them directly:
from opentracy import add_trace, add_traces, import_traces

# Single trace
add_trace({
    "prompt": "Classify: ...",
    "response": "billing",
    "model": "openai/gpt-4o",
    "total_cost_usd": 0.00025,
    "latency_ms": 340,
    "metadata": {"source": "legacy-log-export"},
})

# Batch
add_traces([{...}, {...}, {...}])

# From a JSONL file
import_traces("path/to/exported-traces.jsonl")

Engine routing opt-in

By default the SDK calls providers directly. To route through an OpenTracy engine (for observability, aliases, etc.), set the env var once:
export OPENTRACY_ENGINE_URL="http://localhost:8080"
From that point on, ot.completion(...) routes through the engine. Per-call overrides:
# Always engine (even if OPENTRACY_ENGINE_URL is unset):
ot.completion(..., force_engine=True)

# Always direct (even if OPENTRACY_ENGINE_URL is set):
ot.completion(..., force_direct=True)
Why isn’t this automatic? Because silently routing through whatever happens to be listening on localhost:8080 is a footgun. Opt-in is explicit.

13 providers via create_client

If you want a first-class LLMClient object (for profiling, or to fit into custom routing code), create_client covers every provider:
c = ot.create_client("openai",   "gpt-4o-mini")       # dedicated class
c = ot.create_client("deepseek", "deepseek-chat")     # UnifiedClient wrapper
c = ot.create_client("together", "meta-llama/Llama-3")# UnifiedClient wrapper

out = c.generate("Hello", max_tokens=64, temperature=0.0)
print(out.text, out.latency_ms, out.tokens_used)
Five providers have dedicated classes (OpenAI, Anthropic, Google, Groq, Mistral); the remaining seven (DeepSeek, Perplexity, Cerebras, Sambanova, Together, Fireworks, Cohere) route through a UnifiedClient that speaks the OpenAI-chat protocol. Bedrock is registered but raises a clear error on construction — AWS SigV4 is not handled by UnifiedClient yet; use ot.completion(force_engine=True) instead.

Public API

Everything import opentracy as ot exposes publicly:
# Core
ot.completion, ot.acompletion, ot.Router, ot.ModelResponse, ot.StreamChunk, ot.parse_model
# Multi-provider
ot.create_client, ot.LLMResponse
# Pricing
ot.model_cost, ot.get_model_info, ot.supported_models
# Trace ingestion
ot.add_trace, ot.add_traces, ot.import_traces
# Distillation — one-call + REST client
ot.distill, ot.DistillError, ot.Student, ot.StudentError
ot.Distiller, ot.TrainingClient, ot.DistillerError
# Local alias registry (distilled students map to logical model names)
ot.set_alias, ot.unset_alias, ot.list_aliases, ot.get_alias
# Version
ot.__version__
Lazy research APIs (load_router, UniRouteRouter, RouterEvaluator, LLMJudge, …) resolve via __getattr__ — they import the first time you touch them, so they don’t slow down the initial import opentracy.
Legacy code using import lunar_router as lr keeps working via a backwards-compat shim that redirects to opentracy and emits a DeprecationWarning. New code should use import opentracy as ot.

Next

Self-host

Run engine + ClickHouse + UI locally or in your cloud.

API Reference

Every parameter and return value.