Qwen3.6 35B A3B NVFP4
nvidia/Qwen3.6-35B-A3B-NVFP4
published May 2026 · updated Jun 2026
A popular open text generation model, with 6.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
nvidia/Qwen3.6-35B-A3B-NVFP4 is a text-generation model that combines a causal language model with a vision encoder, optimized via 4-bit NVFP4 quantization using NVIDIA Model Optimizer. It is based on Alibaba’s Qwen3.6-35B-A3B and is hosted as a managed API on gigarouter, ready for inference with vLLM on NVIDIA Hopper and Blackwell architectures.
The model uses a Mixture-of-Experts (MoE) transformer with hybrid attention (Gated DeltaNet and Gated Attention). It has 35 billion total parameters with 3 billion activated per token, 256 experts (8 routed + 1 shared), 40 layers, hidden dimension 2048, and a native context length of 262,144 tokens (extensible to over 1 million). It accepts text, image, and video inputs and outputs text.
NVFP4 quantization reduces disk size and GPU memory requirements by approximately 3.06× compared to BF16, while retaining nearly all accuracy. The following benchmarks were measured on an NVIDIA GB300 with vLLM:
| Precision | MMLU Pro | GPQA Diamond | τ²-Bench Telecom | SciCode | AIME 2025 | AA-LCR | IFBench | MMMU Pro |
|---|---|---|---|---|---|---|---|---|
| BF16 | 85.6 | 84.9 | 95.5 | 40.8 | 89.2 | 62.0 | 62.3 | 74.1 |
| NVFP4 | 85.0 | 84.8 | 94.7 | 40.6 | 88.8 | 62.0 | 62.8 | 74.5 |
In additional evaluations of the underlying BF16 model, it achieved 73.4% on SWE-bench Verified and 67.8% on SWE-bench Multilingual, outperforming comparable models such as Qwen3.5-27B, Gemma4-31B, and Gemma4-26BA4B.
The model is released under the Apache 2.0 license. As with all large language models, it may produce biased, inaccurate, or socially unacceptable outputs; developers should evaluate it for their specific use case and implement appropriate safeguards.
We're benchmarking and onboarding Qwen3.6 35B A3B NVFP4 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.