The Complete LLM Operations Platform

Everything OpenTracy delivers -- from unified gateway to model distillation. Not marketing. Real capabilities, real architecture.

Unified Gateway

One OpenAI-compatible API that routes to 13 providers and 70+ models. Change one line of code to start.

  • OpenAI-compatible API -- drop-in replacement, same SDK, same format
  • 13 providers: OpenAI, Anthropic, Google Gemini, Mistral, Groq, DeepSeek, Perplexity, Cerebras, SambaNova, Together, Fireworks, Cohere, AWS Bedrock
  • 70+ models with automatic per-token pricing baked in
  • Full streaming support for all providers including Anthropic SSE translation
  • Vision and multimodal support (base64 or URL images)
  • Tool calling with cross-provider format translation
python
import openai

# Just change the base URL — everything else stays the same
client = openai.OpenAI(
    base_url="https://api.opentracy.com/v1",
    api_key="your-opentracy-key"
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Smart Routing

Route requests to the right model based on cost, latency, complexity, or custom rules. Automatic fallbacks when providers go down.

  • Router class with strategies: round-robin, least-cost, lowest-latency, weighted-random
  • Semantic routing -- classifies prompt complexity, sends simple prompts to cheap models, complex ones to powerful models
  • Automatic fallbacks with configurable retry chains (e.g. GPT-4o -> Claude -> Gemini)
  • Load balancing across model pools for high-throughput workloads
  • Go engine for high-performance routing with <2ms overhead
python
import opentracy as ot

# Semantic routing: simple -> cheap, complex -> powerful
router = ot.Router(
    strategy="semantic",
    models={
        "simple": "openai/gpt-4o-mini",
        "complex": "anthropic/claude-sonnet-4-20250514",
    },
    fallbacks=["google/gemini-2.0-flash"]
)

response = router.completion(
    messages=[{"role": "user", "content": prompt}]
)
print(f"Routed to: {response.model}")
print(f"Cost: ${response._cost:.6f}")

Real-Time Traces

Every request logged with full input, output, cost, latency, model, and token counts. Query millions of traces instantly.

  • Full trace logging: input messages, output, cost, latency, model, tokens in/out
  • ClickHouse analytics backend -- query millions of traces in milliseconds
  • Real-time dashboard UI with filters, search, and trace detail view
  • Model-level performance stats: latency P50/P95/P99, error rates, cost per request
  • Export traces for offline analysis or integration with your data pipeline
01

Full trace logging: input messages, output, cost, latency, model, tokens in/out

02

ClickHouse analytics backend -- query millions of traces in milliseconds

03

Real-time dashboard UI with filters, search, and trace detail view

04

Model-level performance stats: latency P50/P95/P99, error rates, cost per request

Cost Intelligence

Automatic per-token pricing for every model. See exactly where your money goes and how much smart routing saves you.

  • Automatic per-token pricing for 70+ models (continuously updated pricing database)
  • Cost attached to every response -- no more guessing or manual calculation
  • Baseline vs actual cost comparison: see what you'd pay with the most expensive model vs smart routing
  • Net savings calculation with monthly projections
  • Cost breakdown by model, by provider, by time period
  • Budget alerts and anomaly detection for unexpected cost spikes
01

Automatic per-token pricing for 70+ models (continuously updated pricing database)

02

Cost attached to every response -- no more guessing or manual calculation

03

Baseline vs actual cost comparison: see what you'd pay with the most expensive model vs smart routing

04

Net savings calculation with monthly projections

Quality Monitoring

7 autonomous AI agents continuously scan your production traffic for issues. Catch problems before your users do.

  • Cluster Labeler -- groups prompts by domain automatically
  • Trace Scanner -- detects hallucinations, refusals, PII leaks, and format issues
  • Outlier Detector -- flags anomalous traces that deviate from normal patterns
  • Coherence Scorer -- rates cluster quality to ensure consistent behavior
  • Heuristic detection: incomplete responses, refusal phrases, latency spikes, cost anomalies
  • LLM-based hallucination detection with confidence scoring (0-1)
01

Cluster Labeler -- groups prompts by domain automatically

02

Trace Scanner -- detects hallucinations, refusals, PII leaks, and format issues

03

Outlier Detector -- flags anomalous traces that deviate from normal patterns

04

Coherence Scorer -- rates cluster quality to ensure consistent behavior

Evaluations

LLM-as-Judge for pairwise comparison and pointwise scoring. Track quality across model updates with real metrics.

  • Pairwise comparison: model A vs B, pick the winner on your production data
  • Pointwise scoring: rate responses 1-5 with customizable rubrics
  • RouterEvaluator: benchmark routing decisions against cached responses
  • AUROC metrics, Pareto curves, and win rate calculations
  • Domain-specific evaluation with AI-suggested quality metrics
  • Track quality over time across model updates and routing changes
01

Pairwise comparison: model A vs B, pick the winner on your production data

02

Pointwise scoring: rate responses 1-5 with customizable rubrics

03

RouterEvaluator: benchmark routing decisions against cached responses

04

AUROC metrics, Pareto curves, and win rate calculations

Model Distillation (BOND Pipeline)

Train smaller, faster, cheaper models from your production data. Full pipeline from teacher model to deployed LoRA.

  • Pipeline: Teacher model -> LLM-as-Judge curation -> LoRA training (Unsloth) -> GGUF export
  • Automatic training data extraction from production traces
  • Preference pair generation for DPO/RLHF alignment
  • Golden dataset augmentation for evaluation benchmarks
  • Own your models -- no vendor lock-in, deploy anywhere
  • Eval Generator creates evaluation datasets from real production data
01

Pipeline: Teacher model -> LLM-as-Judge curation -> LoRA training (Unsloth) -> GGUF export

02

Automatic training data extraction from production traces

03

Preference pair generation for DPO/RLHF alignment

04

Golden dataset augmentation for evaluation benchmarks

Prompt Clustering

Automatic domain discovery from your production traffic. Understand what your users actually ask and how each domain performs.

  • Automatic domain discovery from production traffic patterns
  • KMeans + learned map clustering for grouping similar prompts
  • Embedding-based similarity using sentence transformers
  • Per-cluster quality metrics and cost analysis
  • Drift detection when traffic patterns change unexpectedly
  • Merge Checker suggests cluster consolidation to reduce noise
01

Automatic domain discovery from production traffic patterns

02

KMeans + learned map clustering for grouping similar prompts

03

Embedding-based similarity using sentence transformers

04

Per-cluster quality metrics and cost analysis

Deployment

Full stack with Docker. Self-host with MIT license or use the managed cloud. Production-ready from day one.

  • Full stack Docker deployment: ClickHouse + Go engine + Python API + React UI
  • Self-host option with MIT license -- your data stays on your infrastructure
  • Go engine for high-performance routing (<2ms overhead per request)
  • Python SDK: pip install opentracy
  • OpenAI SDK drop-in: just change base_url to your OpenTracy instance
python
# Install the SDK
pip install opentracy

# Or self-host the full stack
git clone https://github.com/lunar-org-ai/lunar-router.git
cd lunar-router && docker compose up -d

Ready to take control of your LLM stack?

Open source, self-hostable, MIT licensed. Start in 5 minutes.