Skip to main content
class ot.Distiller(
    base_url: str = "http://localhost:8000",
    api_key: Optional[str] = None,
    timeout: float = 60.0,
)
Thin HTTP client for the engine’s /v1/distillation/* endpoints. Requires the self-hosted stack running — see the self-host guide.

Constructor

NameTypeDescription
base_urlstrREST API base. Default assumes local self-host on :8000.
api_keystr?Bearer token if the API is behind auth.
timeoutfloatHTTP timeout in seconds.

Methods

.create(...) → dict

Create a new distillation job.
job = d.create(
    name="ticket-triage v1",            # label, any string
    dataset_id=None,                    # existing dataset to train on
    student_model="llama-3.2-1b",       # small open model to fine-tune
    teacher_model="openai/gpt-4o",      # model that produces labels
    num_prompts=1000,                   # how many prompts from the dataset
    n_samples=4,                        # BOND candidates per prompt
    training_steps=500,                 # fine-tune steps
    bond_beta=0.5,                      # BOND preference weight
    bond_gamma=0.1,                     # KL regularization
    temperature=0.8,                    # teacher sampling temperature
    export_gguf=True,                   # convert trained adapter to GGUF
    quantization_types=["q4_k_m", "q8_0"],
    description="",
    extra_config=None,                  # dict — passed through to engine
)

# Returns:
# {"id": "job_abc123", "status": "queued", "created_at": "...", ...}

.estimate(...) → dict

Dry-run cost estimation — no job is created.
est = d.estimate(
    student_model="llama-3.2-1b",
    num_prompts=200,
    n_samples=2,
)
# {"estimated_cost": 0.94, "is_sandbox": False, "tier": "local",
#  "balance": 999999, "sufficient": True}

.get(job_id) → dict

Fetch current state of a job.
job = d.get("job_abc123")
# {
#   "id": "job_abc123",
#   "status": "training",          # queued | generating | curating | training | exporting | completed | failed
#   "phase": "data_generation",
#   "progress": {"prompts_done": 120, "prompts_total": 500},
#   "metrics": {"teacher_cost_total": 0.82, ...},
#   ...
# }

.wait(job_id, timeout=3600, poll_interval=5.0, on_update=None) → dict

Block until the job reaches a terminal state (completed or failed).
def show(update):
    print(update["status"], update.get("phase"), update.get("progress"))

job = d.wait("job_abc123", on_update=show)

.stream_progress(job_id, poll_interval=5.0) → Iterable[dict]

Generator yielding status updates as they change.
for update in d.stream_progress("job_abc123"):
    print(update)

.metrics(job_id, limit=5000) → list[dict]

Per-step training metrics (loss, ot, memory) — the series you’d plot.
for m in d.metrics("job_abc123"):
    print(m["step"], m["loss"], m["ot"])

.candidates(job_id, limit=100) → list[dict]

Teacher-generated candidates (before curation), with judge scores.

.logs(job_id) → str

Full text logs from the training subprocess.

.artifacts(job_id) → dict

Paths to the trained artifacts on the engine side.
artifacts = d.artifacts("job_abc123")
# {
#   "adapter_path": "/app/data/distillation/job_abc123/adapter/",
#   "gguf_paths": {
#     "q4_k_m": ".../gguf/model-q4_k_m.gguf",
#     "q8_0":   ".../gguf/model-q8_0.gguf",
#   },
#   "tokenizer_path": ".../adapter/tokenizer.model",
#   "config_path": ".../train_config.json",
# }

.cancel(job_id) → dict

Cancel a running job. Safe at any phase; partial artifacts are kept.

.delete(job_id) → dict

Delete the job record and all its artifacts from disk.

.list(status=None, limit=50, offset=0) → list[dict]

List jobs, optionally filtered by status.
running = d.list(status="training")
recent  = d.list(limit=5)

.teacher_models() → list[dict]

Available teachers (populated from the engine’s model registry).
for t in d.teacher_models():
    print(t["id"], t["provider"], t["available"])

.student_models() → list[dict]

Available students (populated from the engine’s HF whitelist + local models). Typical entries include llama-3.2-1b, llama-3.2-3b, qwen3-0.6b, qwen3-1.7b, qwen3-4b, mistral-small, phi-3.5-mini.

TrainingClient (lower-level)

Distiller wraps TrainingClient — the raw HTTP layer. Use TrainingClient directly only if you need fine control over retries, custom endpoints, or headers.
from opentracy import TrainingClient

tc = TrainingClient(base_url="http://localhost:8000")
response = tc.post("/v1/distillation/jobs", json={...})

Errors

All network / server errors raise DistillerError with the HTTP status and response body attached. Wrap .create and .wait in try/except if you need graceful degradation.
from opentracy import Distiller, DistillerError

try:
    job = d.create(...)
    job = d.wait(job["id"])
except DistillerError as e:
    print(f"distillation failed: {e.status} {e.message}")
    print(e.response_body)

Typical flow

from opentracy import Distiller

d = Distiller()

# 1. Discover
print([t["id"] for t in d.teacher_models()][:3])
print([s["id"] for s in d.student_models()][:3])

# 2. Estimate
est = d.estimate(student_model="llama-3.2-1b", num_prompts=500, n_samples=4)
assert est["sufficient"], "not enough credits"

# 3. Submit
job = d.create(
    name="ticket-triage v1",
    dataset_id="ds_support_tickets",
    teacher_model="openai/gpt-4o",
    student_model="llama-3.2-1b",
    num_prompts=500,
    n_samples=4,
    training_steps=100,
)

# 4. Wait + track
job = d.wait(
    job["id"],
    on_update=lambda u: print(u["status"], u.get("phase")),
)

# 5. Inspect
print(d.metrics(job["id"])[-1])           # last training step metric
artifacts = d.artifacts(job["id"])
print(artifacts["gguf_paths"])