FAQ
For a model you call a few hundred or a few thousand times a day, self-hosting means renting and babysitting a GPU that sits idle most of the time. We keep the model warm across many customers, so you pay per call and skip the ops. Above roughly 10-15M calls a month, self-hosting does get cheaper - at that scale, run it yourself.
Specialized models with no good hosted API - reranking, embeddings, grounding, detection, OCR, depth, speech, and more. For each task we benchmark the field and serve the best model, not whatever's easiest to host. See the live catalog and the roadmap; tell us what you need at [email protected] and we'll benchmark and onboard it.
We benchmark the credible candidates for a task on quality and throughput, then host the ones that win - not just whatever's easiest to serve. As better models ship, the catalog moves with them.
Rarely-used models sleep to keep costs (and prices) down. The first call after a nap returns a 503 while the model warms - about 90 seconds. Retry, or send Prefer: wait=30 to hold the request. Popular models stay warm, so most calls are instant.
Prepaid. You get free credit to start, then top up by card. Rerank is billed per document, embeddings per token - the unit that fits the task, not an awkward token count. Failed requests cost nothing. Every call shows up on your dashboard.
Yes for embeddings - point the OpenAI client at https://gigarouter.ai/v1 and change nothing else. Rerank uses the common /v1/rerank shape (query + documents, returns scored results).
No. Prepaid credit stays on your key until you use it.