models / embeddings · coming soon

granite-embedding-small-english-r2

ibm-granite/granite-embedding-small-english-r2

A popular open embeddings model, with 2.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

2.2M

license

apache-2.0

about this model

Granite-embedding-small-english-r2 is a 47M-parameter dense bi-encoder embedding model from the Granite Embeddings collection. It produces 384-dimensional embeddings with a context length of 8192 tokens. Trained exclusively on open-source relevance-pair datasets with permissive enterprise-friendly licenses, plus IBM-collected and generated data, the model is optimized for text similarity, retrieval, and search applications.

Key Strengths and Benchmarks

The model demonstrates strong performance across standard and enterprise information retrieval tasks. On the MTEB Retrieval (BEIR) benchmark it achieves 50.9, and on MTEB-v2 (41 tasks) it scores 61.1. For code retrieval (CoIR) it reaches 53.8, long-document search (MLDR) 39.8, and conversational multi-turn retrieval (MTRAG) 48.1. Encoding speed is approximately 199 documents per second on a single H100 GPU with a 512-token sliding window.

Model	Params (M)	Emb. Size	BEIR (15)	MTEB-v2 (41)	CoIR (10)	MLDR (En)	MTRAG (4)	Speed (docs/s)
granite-embedding-small-english-r2	47	384	50.9	61.1	53.8	39.8	48.1	199
granite-embedding-english-r2	149	768	53.1	62.8	55.3	40.7	56.7	144

Compared to other compact embedding models (e5-small-v2, bge-small-en-v1.5) on a composite average of retrieval, code, long-document, table, and multi-turn benchmarks, granite-embedding-small-english-r2 scores 55.6 vs. 45.39 and 45.22 respectively, while encoding at 199 docs/s (vs. 138 docs/s).

Architecture

Built on the ModernBERT architecture, the model uses GeGLU activation, rotary position embeddings, alternating attention lengths, and Flash Attention 2.0. It has 12 layers, 12 attention heads, an intermediate size of 1536, and a vocabulary of 50,368 tokens. The model supports a maximum sequence length of 8192 tokens and does not include bias terms.

Data sources include unsupervised web title-body pairs, permissively licensed public pairs, IBM-internal technical domain data, and IBM-generated synthetic data. The popular MS-MARCO dataset is not used due to its non-commercial license. The model is intended for English text only and has been filtered for hate, abuse, and profanity.

not yet live

We're benchmarking and onboarding granite-embedding-small-english-r2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.