skip to content
gigarouter gigarouter
questions

FAQ

Why not just self-host? It's one pip install.

For a model you call a few hundred or a few thousand times a day, self-hosting means renting and babysitting a GPU that sits idle most of the time. We keep the model warm across many customers, so you pay per call and skip the ops. Above roughly 10-15M calls a month, self-hosting does get cheaper - at that scale, run it yourself.

Which models do you serve?

Specialized models with no good hosted API - reranking, embeddings, grounding, detection, OCR, depth, speech, and more. For each task we benchmark the field and serve the best model, not whatever's easiest to host. See the live catalog and the roadmap; tell us what you need at [email protected] and we'll benchmark and onboard it.

How do you pick the best model for a task?

We benchmark the credible candidates for a task on quality and throughput, then host the ones that win - not just whatever's easiest to serve. As better models ship, the catalog moves with them.

What's the cold start about?

Rarely-used models sleep to keep costs (and prices) down. The first call after a nap returns a 503 while the model warms - about 90 seconds. Retry, or send Prefer: wait=30 to hold the request. Popular models stay warm, so most calls are instant.

How does billing work?

Prepaid. You get free credit to start, then top up by card. Rerank is billed per document, embeddings per token - the unit that fits the task, not an awkward token count. Failed requests cost nothing. Every call shows up on your dashboard.

Is it really OpenAI-compatible?

Yes for embeddings - point the OpenAI client at https://gigarouter.ai/v1 and change nothing else. Rerank uses the common /v1/rerank shape (query + documents, returns scored results).

Does credit expire?

No. Prepaid credit stays on your key until you use it.