models / vision-language · coming soon
gemma-4-31B-it-FP8-block
RedHatAI/gemma-4-31B-it-FP8-block
A popular open vision-language model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
est. price
~$1.341
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
3.2M
license
apache-2.0
about this model
RedHatAI/gemma-4-31B-it-FP8-block is a vision-language model (VLM) that accepts text and image inputs and generates text outputs. It is a quantized version of Google's Gemma 4 31B instruction-tuned model, optimized with FP8 block quantization for both weights and activations. This reduces disk size and GPU memory requirements by approximately 50% compared to the original 16-bit model, while preserving high accuracy across benchmarks.
Key Strengths
- Efficient deployment: FP8 quantization (128x128 block-wise for weights, dynamic per-group for activations) enables lower memory footprint and faster inference with minimal quality loss.
- Strong benchmark recovery: Evaluated with thinking enabled across instruction following, reasoning, and coding tasks. The quantized model matches or exceeds the original on most benchmarks.
- Multimodal capability: Supports image inputs alongside text, with vision tower, embedding, and output head layers kept in original precision.
Benchmark Results (0-shot, thinking enabled)
| Benchmark | Original | Quantized | Recovery |
|---|---|---|---|
| IFEval (prompt-level strict) | 90.70 | 91.25 | 100.6% |
| IFEval (inst-level strict) | 93.45 | 94.00 | 100.6% |
| GSM8K Platinum | 95.78 | 95.78 | 100.0% |
| MMLU-Pro | 85.41 | 85.44 | 100.0% |
| MATH-500 | 89.40 | 88.67 | 99.2% |
| AIME 2025 | 65.83 | 68.33 | 103.8% |
| GPQA Diamond | 77.44 | 77.95 | 100.7% |
| LiveCodeBench v6 | 71.43 | 73.52 | 102.9% |
Best For
- Multimodal reasoning tasks requiring both image and text understanding.
- Production deployments where GPU memory and cost are constraints, without sacrificing accuracy.
- Instruction following, mathematical reasoning, and code generation with thinking/reasoning enabled.
not yet live
We're benchmarking and onboarding gemma-4-31B-it-FP8-block as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.