skip to content
gigarouter gigarouter
models / text generation · coming soon

Qwen3.6 35B A3B NVFP4

nvidia/Qwen3.6-35B-A3B-NVFP4

published May 2026 · updated Jun 2026

A popular open text generation model, with 6.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
6.2M
license
apache-2.0

about this model

nvidia/Qwen3.6-35B-A3B-NVFP4 is a text-generation model that combines a causal language model with a vision encoder, optimized via 4-bit NVFP4 quantization using NVIDIA Model Optimizer. It is based on Alibaba’s Qwen3.6-35B-A3B and is hosted as a managed API on gigarouter, ready for inference with vLLM on NVIDIA Hopper and Blackwell architectures.

The model uses a Mixture-of-Experts (MoE) transformer with hybrid attention (Gated DeltaNet and Gated Attention). It has 35 billion total parameters with 3 billion activated per token, 256 experts (8 routed + 1 shared), 40 layers, hidden dimension 2048, and a native context length of 262,144 tokens (extensible to over 1 million). It accepts text, image, and video inputs and outputs text.

NVFP4 quantization reduces disk size and GPU memory requirements by approximately 3.06× compared to BF16, while retaining nearly all accuracy. The following benchmarks were measured on an NVIDIA GB300 with vLLM:

PrecisionMMLU ProGPQA Diamondτ²-Bench TelecomSciCodeAIME 2025AA-LCRIFBenchMMMU Pro
BF1685.684.995.540.889.262.062.374.1
NVFP485.084.894.740.688.862.062.874.5

In additional evaluations of the underlying BF16 model, it achieved 73.4% on SWE-bench Verified and 67.8% on SWE-bench Multilingual, outperforming comparable models such as Qwen3.5-27B, Gemma4-31B, and Gemma4-26BA4B.

The model is released under the Apache 2.0 license. As with all large language models, it may produce biased, inaccurate, or socially unacceptable outputs; developers should evaluate it for their specific use case and implement appropriate safeguards.

not yet live

We're benchmarking and onboarding Qwen3.6 35B A3B NVFP4 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →