hosted inference · 0-provider models

The most-downloaded models nobody serves — now an API.

Rerankers and embedders that everyone self-hosts because no provider offers them. Priced in the unit that fits — per document, per token — and OpenAI-compatible, so you swap a URL and go.

get a key + $0.50 free →read the docs

30-second start

# rerank documents against a query — no GPU, no self-hosting
curl https://gr.tbb.rip/v1/rerank \
  -H "Authorization: Bearer $GR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France",
       "documents":["Paris is the capital of France.","Bananas are yellow."]}'

live catalog

all models →

cross-encoder/ms-marco-MiniLM-L6-v2

reranker

$0.008 / 1k docs

jinaai/jina-reranker-v2-base-multilingual

reranker

$0.008 / 1k docs

Qwen/Qwen3-Reranker-0.6B

reranker

$0.008 / 1k docs

Qwen/Qwen3-Embedding-0.6B

embeddings

$0.008 / 1M tokens

BAAI/bge-reranker-base

reranker

$0.008 / 1k docs

Only seller

These models have zero hosted providers — the ones with millions of downloads and no API. We serve them so you don't run a GPU for a call you make a hundred times a day.

Right unit

Rerank bills per document, not per awkward token. Embeddings per token. You pay for what the task actually is.

Drop-in

OpenAI-style endpoints. Point your existing client at the base URL, keep your code.