gigarouter
hosted inference · 0-provider models

The most-downloaded models nobody serves — now an API.

Rerankers and embedders that everyone self-hosts because no provider offers them. Priced in the unit that fits — per document, per token — and OpenAI-compatible, so you swap a URL and go.

30-second start
# rerank documents against a query — no GPU, no self-hosting
curl https://gr.tbb.rip/v1/rerank \
  -H "Authorization: Bearer $GR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France",
       "documents":["Paris is the capital of France.","Bananas are yellow."]}'
live catalog
all models →
cross-encoder/ms-marco-MiniLM-L6-v2
reranker
$0.008 / 1k docs
jinaai/jina-reranker-v2-base-multilingual
reranker
$0.008 / 1k docs
Qwen/Qwen3-Reranker-0.6B
reranker
$0.008 / 1k docs
Qwen/Qwen3-Embedding-0.6B
embeddings
$0.008 / 1M tokens
BAAI/bge-reranker-base
reranker
$0.008 / 1k docs
Only seller

These models have zero hosted providers — the ones with millions of downloads and no API. We serve them so you don't run a GPU for a call you make a hundred times a day.

Right unit

Rerank bills per document, not per awkward token. Embeddings per token. You pay for what the task actually is.

Drop-in

OpenAI-style endpoints. Point your existing client at the base URL, keep your code.