Two lines to your first completion with cost + latency — no server, no setup
By the end of this page — in under three minutes — you’ll have made a
real LLM call, seen the cost and latency on the response, swapped providers
with one string change, and added automatic fallbacks. No server, no
Docker, no config files.
What you need right now: an OpenAI API key (or Anthropic, Groq, etc.
— any of the 13 providers). Nothing else.
import opentracy as otresp = ot.completion( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Say hello in three words."}],)print(resp.choices[0].message.content)print(f"cost: ${resp._cost:.6f} latency: {resp._latency_ms:.0f}ms")
Hi there, friend!cost: $0.000008 latency: 612ms
This is the hook. Every response already carries _cost and
_latency_ms. You didn’t wire up any observability — it’s on by default.
ot.completion is OpenAI-compatible, so resp.choices[0].message.content,
resp.usage, and streaming all work like you’d expect.
If you want to see the full pipeline in action — including the model
picking itself per prompt based on learned error profiles — load the
pre-trained router. This downloads ~100 MB of weights on first run and
caches them in ~/.local/share/opentracy/.
import opentracy as otrouter = ot.load_router(cost_weight=0.5)for prompt in [ "What is the capital of France?", "Prove the square root of 2 is irrational.", "Write a haiku about autumn.",]: d = router.route(prompt) print(f"[{d.selected_model:<24}] cluster={d.cluster_id:>3} {prompt}")
[ministral-3b-latest ] cluster= 84 What is the capital of France?[gpt-4o ] cluster= 47 Prove the square root of 2 is irrational.[ministral-3b-latest ] cluster= 29 Write a haiku about autumn.
Easy trivia → a cheap small model. Math proof → a strong model. No
rules from you. See Auto-routing for the
full picture.