Question 1

Why not just self-host? It's one pip install.

Accepted Answer

For a model you call a few hundred or a few thousand times a day, self-hosting means renting and babysitting a GPU that sits idle most of the time. We keep the model warm across many customers, so you pay per call and skip the ops. Above roughly 10-15M calls a month, self-hosting does get cheaper - at that scale, run it yourself.

Question 2

Which models do you serve?

Accepted Answer

Specialized models with no good hosted API - reranking, embeddings, grounding, detection, OCR, depth, speech, and more. For each task we benchmark the field and serve the best model, not whatever's easiest to host. See the live catalog and the roadmap; tell us what you need at support@gigarouter.ai and we'll benchmark and onboard it.

Question 3

How do you pick the best model for a task?

Accepted Answer

We benchmark the credible candidates for a task on quality and throughput, then host the ones that win - not just whatever's easiest to serve. As better models ship, the catalog moves with them.

Question 4

What's the cold start about?

Accepted Answer

Rarely-used models sleep to keep costs (and prices) down. The first call after a nap returns a 503 while the model warms - about 90 seconds. Retry, or send Prefer: wait=30 to hold the request. Popular models stay warm, so most calls are instant.

Question 5

How does billing work?

Accepted Answer

Prepaid. You get free credit to start, then top up by card. Rerank is billed per document, embeddings per token - the unit that fits the task, not an awkward token count. Failed requests cost nothing. Every call shows up on your dashboard.

Question 6

Is it really OpenAI-compatible?

Accepted Answer

Yes for embeddings - point the OpenAI client at https://gigarouter.ai/v1 and change nothing else. Rerank uses the common /v1/rerank shape (query + documents, returns scored results).

Question 7

Does credit expire?

Accepted Answer

No. Prepaid credit stays on your key until you use it.

FAQ