models / vision-language · coming soon

gemma-4-31B-it-FP8-block

RedHatAI/gemma-4-31B-it-FP8-block

A popular open vision-language model, with 3.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$1.341

/ 1k images · estimated, set at launch

API providers

downloads / mo

3.2M

license

apache-2.0

about this model

RedHatAI/gemma-4-31B-it-FP8-block is a vision-language model (VLM) that accepts text and image inputs and generates text outputs. It is a quantized version of Google's Gemma 4 31B instruction-tuned model, optimized with FP8 block quantization for both weights and activations. This reduces disk size and GPU memory requirements by approximately 50% compared to the original 16-bit model, while preserving high accuracy across benchmarks.

Key Strengths

Efficient deployment: FP8 quantization (128x128 block-wise for weights, dynamic per-group for activations) enables lower memory footprint and faster inference with minimal quality loss.
Strong benchmark recovery: Evaluated with thinking enabled across instruction following, reasoning, and coding tasks. The quantized model matches or exceeds the original on most benchmarks.
Multimodal capability: Supports image inputs alongside text, with vision tower, embedding, and output head layers kept in original precision.

Benchmark Results (0-shot, thinking enabled)

Benchmark	Original	Quantized	Recovery
IFEval (prompt-level strict)	90.70	91.25	100.6%
IFEval (inst-level strict)	93.45	94.00	100.6%
GSM8K Platinum	95.78	95.78	100.0%
MMLU-Pro	85.41	85.44	100.0%
MATH-500	89.40	88.67	99.2%
AIME 2025	65.83	68.33	103.8%
GPQA Diamond	77.44	77.95	100.7%
LiveCodeBench v6	71.43	73.52	102.9%

Best For

Multimodal reasoning tasks requiring both image and text understanding.
Production deployments where GPU memory and cost are constraints, without sacrificing accuracy.
Instruction following, mathematical reasoning, and code generation with thinking/reasoning enabled.

not yet live

We're benchmarking and onboarding gemma-4-31B-it-FP8-block as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.