Qwen3.6 35B A3B NVFP4

nvidia/Qwen3.6-35B-A3B-NVFP4

published May 2026 · updated Jun 2026

A popular open text generation model, with 6.2M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status

coming soon

API providers

downloads / mo

6.2M

license

apache-2.0

about this model

nvidia/Qwen3.6-35B-A3B-NVFP4 is a text-generation model that combines a causal language model with a vision encoder, optimized via 4-bit NVFP4 quantization using NVIDIA Model Optimizer. It is based on Alibaba’s Qwen3.6-35B-A3B and is hosted as a managed API on gigarouter, ready for inference with vLLM on NVIDIA Hopper and Blackwell architectures.

The model uses a Mixture-of-Experts (MoE) transformer with hybrid attention (Gated DeltaNet and Gated Attention). It has 35 billion total parameters with 3 billion activated per token, 256 experts (8 routed + 1 shared), 40 layers, hidden dimension 2048, and a native context length of 262,144 tokens (extensible to over 1 million). It accepts text, image, and video inputs and outputs text.

NVFP4 quantization reduces disk size and GPU memory requirements by approximately 3.06× compared to BF16, while retaining nearly all accuracy. The following benchmarks were measured on an NVIDIA GB300 with vLLM:

Precision	MMLU Pro	GPQA Diamond	τ²-Bench Telecom	SciCode	AIME 2025	AA-LCR	IFBench	MMMU Pro
BF16	85.6	84.9	95.5	40.8	89.2	62.0	62.3	74.1
NVFP4	85.0	84.8	94.7	40.6	88.8	62.0	62.8	74.5

In additional evaluations of the underlying BF16 model, it achieved 73.4% on SWE-bench Verified and 67.8% on SWE-bench Multilingual, outperforming comparable models such as Qwen3.5-27B, Gemma4-31B, and Gemma4-26BA4B.

The model is released under the Apache 2.0 license. As with all large language models, it may produce biased, inaccurate, or socially unacceptable outputs; developers should evaluate it for their specific use case and implement appropriate safeguards.

not yet live

We're benchmarking and onboarding Qwen3.6 35B A3B NVFP4 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

tiny-Qwen2ForCausalLM-2.5

dolphin-2.9.1-yi-1.5-34b

4.6M dl/mo