One-call distillation — train a custom student from a dataset and get back a callable Student
ot.distill() runs the full 4-phase BOND distillation pipeline
(data-generation → curation → training → export) in-process and
returns a callable Student. No
FastAPI service, no ClickHouse, no job-polling — if you just want to
train a student from a dataset and use it, this is the API.For the long-running, queued, multi-tenant REST flow, see the
Distiller client — same engine, different
surface.
import opentracy as otstudent = ot.distill( dataset="tickets.jsonl", teacher="openai/gpt-4o", student="llama-3.2-1b", steps=60,)# Use it directlyprint(student("Classify: please refund the double charge"))# Or serve it under a logical namestudent.deploy("ticket-classifier")ot.completion(model="ticket-classifier", messages=[...])
ot.distill() needs opentracy[distill] and a CUDA GPU. It fails fast
(before any teacher API spend) if torch can’t import or
torch.cuda.is_available() is False.
student = ot.distill(...)student.backend # "gguf" if a quantization was exported, else "peft"student.model_path # absolute path to the .gguf file or the adapter dirstudent.base_model # HF repo id of the base model (needed for PEFT load)student("Classify: refund please") # direct inferencestudent.batch(["Classify: ...", "Classify: ..."])student.generate(messages=[...]) # full OpenAI-shape responsestudent.save("./ticket-classifier-v1") # copy artifact to a durable pathstudent.deploy("ticket-classifier") # register under a local alias
After .deploy(alias), calling ot.completion(model=alias, ...)
dispatches to this student locally — no provider call, no HTTP hop.See the Student reference below for the full API.
# 1. Path to a .jsonl (one dict per line) or .json (single list)ot.distill(dataset="tickets.jsonl", ...)# 2. List of dictsrows = [ {"prompt": "Classify: ...", "response": "billing"}, {"prompt": "Classify: ...", "response": "technical"},]ot.distill(dataset=rows, ...)# 3. Callable that yields dicts — useful for streaming from tracesdef from_clickhouse(): for trace in my_trace_source(): yield {"prompt": trace["prompt"], "response": trace["label"]}ot.distill(dataset=from_clickhouse, ...)
Row field aliases: prompt / input / text all work for the input;
response / expected_output all work for the gold answer.
If phase 4 (GGUF conversion) fails — for example, llama.cpp isn’t
installed on the host — ot.distill() does not crash. It logs a
warning and returns a Student(backend="peft", model_path=<adapter>)
pointing at the LoRA adapter that was successfully trained. You still
get a working model; you just serve it via PEFT (1 GB base model in VRAM)
instead of a standalone GGUF file.To force a GGUF-only path and raise on failure, call the REST-backed
Distiller instead.
ot.distill() raises opentracy.DistillError for pipeline failures.
Common causes:
Message
Cause
Fix
Training needs PyTorch, but \import torch` failed.`
torch not installed.
pip install -U opentracy[distill].
No CUDA GPU is visible to PyTorch.
Training phase is CUDA-only.
Run on a GPU host; on Colab switch Runtime → T4.
Dataset is empty — distill() needs at least one prompt.
Empty dataset file or all rows filtered out.
Check the file format + field names.
Distillation requires the \[distill]` extra.`
opentracy installed without training deps.
pip install -U 'opentracy[distill]'.
The preflight that raises the torch/CUDA errors can be bypassed in tests
by setting OPENTRACY_SKIP_DISTILL_PREFLIGHT=1. Don’t set this in
production — it’ll let a GPU-less job burn money on teacher calls before
dying in phase 3.
opentracy.Student is callable and serializes to disk. It’s what
ot.distill() returns, but you can also instantiate it yourself to load
a previously trained adapter.
from opentracy import Student# Load a PEFT adapter from a path you saved earlierstudent = Student( backend="peft", model_path="./ticket-classifier-v1", base_model="unsloth/Llama-3.2-1B-Instruct-bnb-4bit",)student("Classify: refund the double charge")
"peft" loads a LoRA adapter onto a base model (requires transformers, peft, torch). "gguf" loads a GGUF file via llama_cpp — CPU-friendly, no base model needed.
model_path
str
Absolute path to the adapter directory (PEFT) or the .gguf file (GGUF).
base_model
str?
HF repo id. Required for peft — read from adapter_config.json if omitted.
metadata
dict
Free-form metadata persisted to disk via .save() and surfaced through the alias registry.
Register the student under alias in the local file-based registry
(~/.opentracy/aliases.json). After this, ot.completion(model=alias, ...) resolves to this student. If engine_url is provided, the call
also POSTs to the engine’s /v1/models/register so server-side callers
see the alias — failures there only emit a warning.
torch, transformers, and peft are importable — else raises
StudentError pointing at pip install opentracy[distill].
jinja2 >= 3.1 is present — else raises StudentError with the exact
{sys.executable} -m pip install -U 'jinja2>=3.1' command for the
interpreter currently running. Stale jinja2 3.0.x in a system
Python is a common footgun when a uvicorn on PATH picks up a
different interpreter than the one that has opentracy installed.
Aliases map a logical name to a Student. The registry lives at
~/.opentracy/aliases.json (or $OPENTRACY_DATA_HOME/aliases.json) and
is read by ot.completion() on every call.
import opentracy as ot# Register — equivalent to student.deploy("ticket-classifier")ot.set_alias( "ticket-classifier", backend="peft", model_path="/abs/path/to/adapter", base_model="unsloth/Llama-3.2-1B-Instruct-bnb-4bit",)# Introspectot.list_aliases()# {"ticket-classifier": {"backend": "peft", "model_path": "...", "base_model": "...", "metadata": {...}, "registered_at": "..."}}ot.get_alias("ticket-classifier") # single entry or None# Removeot.unset_alias("ticket-classifier") # True if removed, False if not registered
Any ot.completion(model="ticket-classifier", ...) call from any Python
process owned by the same user will resolve through this registry and
dispatch locally — no provider call, no HTTP hop, no shared state with a
remote engine.The alias-swap pattern:
# Day 1 — alias points at a providerot.set_alias("smart", backend="peft", ...) # or call through the engine# Day 10 — distill a student from trafficstudent = ot.distill(dataset=recent_traces, ...)student.deploy("smart") # atomic re-point# App code for "smart" never changedot.completion(model="smart", messages=[...])