ot.Distiller - OpenTracy

For most users, use ot.distill() — it runs the full pipeline in-process and returns a callable Student with zero external services. Distiller is the long-running, queued, multi-tenant REST flow for teams running the full self-hosted stack.

class ot.Distiller(
    base_url: str = "http://localhost:8000",
    api_key: Optional[str] = None,
    timeout: float = 60.0,
)

Thin HTTP client for the engine’s /v1/distillation/* endpoints. Requires the self-hosted stack running — see the self-host guide.

Constructor

Name	Type	Description
`base_url`	`str`	REST API base. Default assumes local self-host on `:8000`.
`api_key`	`str?`	Bearer token if the API is behind auth.
`timeout`	`float`	HTTP timeout in seconds.

Methods

`.create(...) → dict`

Create a new distillation job.

job = d.create(
    name="ticket-triage v1",            # label, any string
    dataset_id=None,                    # existing dataset to train on
    student_model="llama-3.2-1b",       # small open model to fine-tune
    teacher_model="openai/gpt-4o",      # model that produces labels
    num_prompts=1000,                   # how many prompts from the dataset
    n_samples=4,                        # BOND candidates per prompt
    training_steps=500,                 # fine-tune steps
    bond_beta=0.5,                      # BOND preference weight
    bond_gamma=0.1,                     # KL regularization
    temperature=0.8,                    # teacher sampling temperature
    export_gguf=True,                   # convert trained adapter to GGUF
    quantization_types=["q4_k_m", "q8_0"],
    description="",
    extra_config=None,                  # dict — passed through to engine
)

# Returns:
# {"id": "job_abc123", "status": "queued", "created_at": "...", ...}

`.estimate(...) → dict`

Dry-run cost estimation — no job is created.

est = d.estimate(
    student_model="llama-3.2-1b",
    num_prompts=200,
    n_samples=2,
)
# {"estimated_cost": 0.94, "is_sandbox": False, "tier": "local",
#  "balance": 999999, "sufficient": True}

`.get(job_id) → dict`

Fetch current state of a job.

job = d.get("job_abc123")
# {
#   "id": "job_abc123",
#   "status": "training",          # queued | generating | curating | training | exporting | completed | failed
#   "phase": "data_generation",
#   "progress": {"prompts_done": 120, "prompts_total": 500},
#   "metrics": {"teacher_cost_total": 0.82, ...},
#   ...
# }

`.wait(job_id, timeout=3600, poll_interval=5.0, on_update=None) → dict`

Block until the job reaches a terminal state (completed or failed).

def show(update):
    print(update["status"], update.get("phase"), update.get("progress"))

job = d.wait("job_abc123", on_update=show)

`.stream_progress(job_id, poll_interval=5.0) → Iterable[dict]`

Generator yielding status updates as they change.

for update in d.stream_progress("job_abc123"):
    print(update)

`.metrics(job_id, limit=5000) → list[dict]`

Per-step training metrics (loss, ot, memory) — the series you’d plot.

for m in d.metrics("job_abc123"):
    print(m["step"], m["loss"], m["ot"])

`.candidates(job_id, limit=100) → list[dict]`

Teacher-generated candidates (before curation), with judge scores.

`.logs(job_id) → str`

Full text logs from the training subprocess.

`.artifacts(job_id) → dict`

Paths to the trained artifacts on the engine side.

artifacts = d.artifacts("job_abc123")
# {
#   "adapter_path": "/app/data/distillation/job_abc123/adapter/",
#   "gguf_paths": {
#     "q4_k_m": ".../gguf/model-q4_k_m.gguf",
#     "q8_0":   ".../gguf/model-q8_0.gguf",
#   },
#   "tokenizer_path": ".../adapter/tokenizer.model",
#   "config_path": ".../train_config.json",
# }

`.cancel(job_id) → dict`

Cancel a running job. Safe at any phase; partial artifacts are kept.

`.delete(job_id) → dict`

Delete the job record and all its artifacts from disk.

`.list(status=None, limit=50, offset=0) → list[dict]`

List jobs, optionally filtered by status.

running = d.list(status="training")
recent  = d.list(limit=5)

`.teacher_models() → list[dict]`

Available teachers (populated from the engine’s model registry).

for t in d.teacher_models():
    print(t["id"], t["provider"], t["available"])

`.student_models() → list[dict]`

Available students (populated from the engine’s HF whitelist + local models). Typical entries include llama-3.2-1b, llama-3.2-3b, qwen3-0.6b, qwen3-1.7b, qwen3-4b, mistral-small, phi-3.5-mini.

TrainingClient (lower-level)

Distiller wraps TrainingClient — the raw HTTP layer. Use TrainingClient directly only if you need fine control over retries, custom endpoints, or headers.

from opentracy import TrainingClient

tc = TrainingClient(base_url="http://localhost:8000")
response = tc.post("/v1/distillation/jobs", json={...})

Errors

All network / server errors raise DistillerError with the HTTP status and response body attached. Wrap .create and .wait in try/except if you need graceful degradation.

from opentracy import Distiller, DistillerError

try:
    job = d.create(...)
    job = d.wait(job["id"])
except DistillerError as e:
    print(f"distillation failed: {e.status} {e.message}")
    print(e.response_body)

Typical flow

from opentracy import Distiller

d = Distiller()

# 1. Discover
print([t["id"] for t in d.teacher_models()][:3])
print([s["id"] for s in d.student_models()][:3])

# 2. Estimate
est = d.estimate(student_model="llama-3.2-1b", num_prompts=500, n_samples=4)
assert est["sufficient"], "not enough credits"

# 3. Submit
job = d.create(
    name="ticket-triage v1",
    dataset_id="ds_support_tickets",
    teacher_model="openai/gpt-4o",
    student_model="llama-3.2-1b",
    num_prompts=500,
    n_samples=4,
    training_steps=100,
)

# 4. Wait + track
job = d.wait(
    job["id"],
    on_update=lambda u: print(u["status"], u.get("phase")),
)

# 5. Inspect
print(d.metrics(job["id"])[-1])           # last training step metric
artifacts = d.artifacts(job["id"])
print(artifacts["gguf_paths"])

​Constructor

​Methods

​.create(...) → dict

​.estimate(...) → dict

​.get(job_id) → dict

​.wait(job_id, timeout=3600, poll_interval=5.0, on_update=None) → dict

​.stream_progress(job_id, poll_interval=5.0) → Iterable[dict]

​.metrics(job_id, limit=5000) → list[dict]

​.candidates(job_id, limit=100) → list[dict]

​.logs(job_id) → str

​.artifacts(job_id) → dict

​.cancel(job_id) → dict

​.delete(job_id) → dict

​.list(status=None, limit=50, offset=0) → list[dict]

​.teacher_models() → list[dict]

​.student_models() → list[dict]

​TrainingClient (lower-level)

​Errors

​Typical flow

Constructor

Methods

`.create(...) → dict`

`.estimate(...) → dict`

`.get(job_id) → dict`

`.wait(job_id, timeout=3600, poll_interval=5.0, on_update=None) → dict`

`.stream_progress(job_id, poll_interval=5.0) → Iterable[dict]`

`.metrics(job_id, limit=5000) → list[dict]`

`.candidates(job_id, limit=100) → list[dict]`

`.logs(job_id) → str`

`.artifacts(job_id) → dict`

`.cancel(job_id) → dict`

`.delete(job_id) → dict`

`.list(status=None, limit=50, offset=0) → list[dict]`

`.teacher_models() → list[dict]`

`.student_models() → list[dict]`

TrainingClient (lower-level)

Errors

Typical flow