The Complete LLM Operations Platform

Everything OpenTracy ships — from a unified model gateway to routing, tracing, evaluations, and distillation. Built for production teams with real architecture, measurable quality, and cost control.

Unified GatewaySmart RoutingQuality MonitoringSelf-hostable

Start for free Docs View on GitHub

Platform

Core capabilities

Everything in one place, following the same visual system and interaction patterns as the home page.

One OpenAI-compatible API that routes to 13 providers and 300+ models. Change one line of code to start.

import openai

# Just change the base URL — everything else stays the same
client = openai.OpenAI(
    base_url="https://api.opentracy.com/v1",
    api_key="your-opentracy-key"
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

OpenAI-compatible API -- drop-in replacement, same SDK, same format
13 providers: OpenAI, Anthropic, Google Gemini, Mistral, Groq, DeepSeek, Perplexity, Cerebras, SambaNova, Together, Fireworks, Cohere, AWS Bedrock
300+ models with automatic per-token pricing baked in
Full streaming support for all providers including Anthropic SSE translation
Vision and multimodal support (base64 or URL images)
Tool calling with cross-provider format translation

Route requests to the right model based on cost, latency, complexity, or custom rules. Automatic fallbacks when providers go down.

import opentracy as ot

# Semantic routing: simple -> cheap, complex -> powerful
router = ot.Router(
    strategy="semantic",
    models={
        "simple": "openai/gpt-4o-mini",
        "complex": "anthropic/claude-sonnet-4-20250514",
    },
    fallbacks=["google/gemini-2.0-flash"]
)

response = router.completion(
    messages=[{"role": "user", "content": prompt}]
)
print(f"Routed to: {response.model}")
print(f"Cost: ${response._cost:.6f}")

Router class with strategies: round-robin, least-cost, lowest-latency, weighted-random
Semantic routing -- classifies prompt complexity, sends simple prompts to cheap models, complex ones to powerful models
Automatic fallbacks with configurable retry chains (e.g. GPT-4o -> Claude -> Gemini)
Load balancing across model pools for high-throughput workloads
Go engine for high-performance routing with <2ms overhead

Every request logged with full input, output, cost, latency, model, and token counts. Query millions of traces instantly.

Full trace logging: input messages, output, cost, latency, model, tokens in/out
ClickHouse analytics backend -- query millions of traces in milliseconds
Real-time dashboard UI with filters, search, and trace detail view
Model-level performance stats: latency P50/P95/P99, error rates, cost per request
Export traces for offline analysis or integration with your data pipeline

Automatic per-token pricing for every model. See exactly where your money goes and how much smart routing saves you.

Automatic per-token pricing for 300+ models (continuously updated pricing database)
Cost attached to every response -- no more guessing or manual calculation
Baseline vs actual cost comparison: see what you'd pay with the most expensive model vs smart routing
Net savings calculation with monthly projections
Cost breakdown by model, by provider, by time period
Budget alerts and anomaly detection for unexpected cost spikes

7 autonomous AI agents continuously scan your production traffic for issues. Catch problems before your users do.

Cluster Labeler -- groups prompts by domain automatically
Trace Scanner -- detects hallucinations, refusals, PII leaks, and format issues
Outlier Detector -- flags anomalous traces that deviate from normal patterns
Coherence Scorer -- rates cluster quality to ensure consistent behavior
Heuristic detection: incomplete responses, refusal phrases, latency spikes, cost anomalies
LLM-based hallucination detection with confidence scoring (0-1)

LLM-as-Judge for pairwise comparison and pointwise scoring. Track quality across model updates with real metrics.

Pairwise comparison: model A vs B, pick the winner on your production data
Pointwise scoring: rate responses 1-5 with customizable rubrics
RouterEvaluator: benchmark routing decisions against cached responses
AUROC metrics, Pareto curves, and win rate calculations
Domain-specific evaluation with AI-suggested quality metrics
Track quality over time across model updates and routing changes

Train smaller, faster, cheaper models from your production data. Full pipeline from teacher model to deployed LoRA.

Pipeline: Teacher model -> LLM-as-Judge curation -> LoRA training (Unsloth) -> GGUF export
Automatic training data extraction from production traces
Preference pair generation for DPO/RLHF alignment
Golden dataset augmentation for evaluation benchmarks
Own your models -- no vendor lock-in, deploy anywhere
Eval Generator creates evaluation datasets from real production data

Automatic domain discovery from your production traffic. Understand what your users actually ask and how each domain performs.

Automatic domain discovery from production traffic patterns
KMeans + learned map clustering for grouping similar prompts
Embedding-based similarity using sentence transformers
Per-cluster quality metrics and cost analysis
Drift detection when traffic patterns change unexpectedly
Merge Checker suggests cluster consolidation to reduce noise

Full stack with Docker. Self-host with MIT license or use the managed cloud. Production-ready from day one.

# Install the SDK
pip install opentracy

# Or self-host the full stack
git clone https://github.com/lunar-org-ai/lunar-router.git
cd lunar-router && docker compose up -d

Full stack Docker deployment: ClickHouse + Go engine + Python API + React UI
Self-host option with MIT license -- your data stays on your infrastructure
Go engine for high-performance routing (<2ms overhead per request)
Python SDK: pip install opentracy
OpenAI SDK drop-in: just change base_url to your OpenTracy instance

Ready to take control of your LLM stack?

Open source, self-hostable, MIT licensed. Start in 5 minutes.

Start for free View on GitHub

The Complete LLM Operations Platform

Core capabilities

Unified Gateway

Smart Routing

Real-Time Traces

Cost Intelligence

Quality Monitoring

Evaluations

Model Distillation (BOND Pipeline)

Prompt Clustering

Deployment

Ready to take control of your LLM stack?