skip to content
gigarouter gigarouter
models / vision-language · coming soon

gemma-4-31B-it-FP8-block

RedHatAI/gemma-4-31B-it-FP8-block

A popular open vision-language model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$1.341
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
3.2M
license
apache-2.0

about this model

RedHatAI/gemma-4-31B-it-FP8-block is a vision-language model (VLM) that accepts text and image inputs and generates text outputs. It is a quantized version of Google's Gemma 4 31B instruction-tuned model, optimized with FP8 block quantization for both weights and activations. This reduces disk size and GPU memory requirements by approximately 50% compared to the original 16-bit model, while preserving high accuracy across benchmarks.

Key Strengths

  • Efficient deployment: FP8 quantization (128x128 block-wise for weights, dynamic per-group for activations) enables lower memory footprint and faster inference with minimal quality loss.
  • Strong benchmark recovery: Evaluated with thinking enabled across instruction following, reasoning, and coding tasks. The quantized model matches or exceeds the original on most benchmarks.
  • Multimodal capability: Supports image inputs alongside text, with vision tower, embedding, and output head layers kept in original precision.

Benchmark Results (0-shot, thinking enabled)

Benchmark Original Quantized Recovery
IFEval (prompt-level strict)90.7091.25100.6%
IFEval (inst-level strict)93.4594.00100.6%
GSM8K Platinum95.7895.78100.0%
MMLU-Pro85.4185.44100.0%
MATH-50089.4088.6799.2%
AIME 202565.8368.33103.8%
GPQA Diamond77.4477.95100.7%
LiveCodeBench v671.4373.52102.9%

Best For

  • Multimodal reasoning tasks requiring both image and text understanding.
  • Production deployments where GPU memory and cost are constraints, without sacrificing accuracy.
  • Instruction following, mathematical reasoning, and code generation with thinking/reasoning enabled.
not yet live

We're benchmarking and onboarding gemma-4-31B-it-FP8-block as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.