Skip to main content
Two lightweight endpoints you’ll use when scripting against the engine without a client library.

GET /v1/models

Returns every model the engine knows about, with pricing and per-model routing stats.
GET /v1/models HTTP/1.1
Host: localhost:8080

Response

{
  "models": [
    {
      "model_id": "gpt-4o",
      "cost_per_1k_tokens": 0.015,
      "num_clusters": 100,
      "overall_accuracy": 0.92
    },
    {
      "model_id": "gpt-4o-mini",
      "cost_per_1k_tokens": 0.00015,
      "num_clusters": 100,
      "overall_accuracy": 0.81
    }
  ],
  "default_model": "gpt-4o-mini"
}
FieldTypeMeaning
models[].model_idstringCanonical ID. Pair with a provider prefix for completions.
models[].cost_per_1k_tokensfloatUSD per 1,000 tokens (blended input+output; see pricing tables).
models[].num_clustersintClusters this model has an error profile for.
models[].overall_accuracyfloatAverage accuracy across all profiled clusters.
default_modelstringModel used when "model" is "auto" and the router is unset.

Curl

curl -s http://localhost:8080/v1/models | jq .

GET /health

Used by load balancers and healthchecks.
curl -s http://localhost:8080/health
curl -s http://localhost:8000/health

Response (:8080 — gateway)

{
  "status": "healthy",
  "router_initialized": true,
  "num_models": 12,
  "num_clusters": 100,
  "embedder_ready": true
}

Response (:8000 — management API)

{
  "status": "healthy",
  "router_initialized": true,
  "num_models": 12,
  "num_clusters": 100
}
status is one of healthy, degraded (router loaded but embedder down, for example), or unhealthy. Treat anything other than healthy as “don’t route new traffic”.