skip to content
gigarouter gigarouter
models / text generation · coming soon

DeepSeek V4 Flash

bartowski/DeepSeek-V4-Flash-GGUF

published Jun 2026 · updated Jun 2026

DeepSeek V4 Flash is a text-generation model optimized for fast inference, with 284B total parameters and 13B activated, supporting a 1M token context.

status
coming soon
API providers
0
downloads / mo
234.8K
license
mit

specs

TaskText Generation
ArchitectureHybrid Attention (CSA + HCA), FP4 + FP8 mixed precision
Parameters284B total, 13B activated
Context Length1,000,000 tokens
LicenseMIT

about this model

DeepSeek-V4-Flash is a text-generation model that combines a 284B total parameter Mixture-of-Experts (MoE) architecture with 13B activated parameters per token, supporting a 1 million token context window and using FP4 + FP8 mixed precision. It is hosted on gigarouter as an OpenAI-compatible API, eliminating the need for local installation or quantization.

Architecture and Key Strengths

The model employs hybrid attention combining Cross-Layer Attention (CSA) and Hybrid-Chunk Attention (HCA), Manifold-Constrained Hyper-Connections, and the Muon optimizer, as detailed in the technical report (arXiv:2606.19348). Its MoE design activates only 13B of its 284B total parameters per forward pass, enabling efficient inference while maintaining high capacity.

Benchmark Performance

DeepSeek-V4-Flash achieves the following scores on standard evaluations:

Benchmark Score Rank
SWE-bench Verified 79.0% resolved 5 (among <500B models)
MMLU-Pro 86.4% 9
GPQA Diamond 88.1% 10
Terminal-Bench 2.0 56.9% 8
SkillsBench v1.1 44.7% 5 (among <500B models)

Inference and Availability

The model is provided in MXFP4 GGUF format (156 GB, split). It is released under the MIT license. No prompt format is specified in the original model card. Through gigarouter's hosted API, developers can access this model without managing local infrastructure or quantization.

best for

FAQ

What prompt format should I use for this model?

No prompt format is specified; standard conversational or instruction-based formats may work, but you should experiment or refer to the original DeepSeek model documentation.

How large is the MXFP4 GGUF file?

The single-file MXFP4 quant is 156.00 GB.

What is the license for this model?

The original DeepSeek-V4-Flash model is released under the MIT license.

How can I call this model via API on gigarouter?

Use the OpenAI-compatible endpoint on gigarouter with your API key, specifying the model ID bartowski/DeepSeek-V4-Flash-GGUF.

What inference performance does this model offer?

Hosted providers have shown throughput ranging from 23.69 tok/s (DeepInfra) to 109.84 tok/s (Fireworks AI). Pricing varies from $0.18 to $0.28 per million output tokens.

not yet live

We're benchmarking and onboarding DeepSeek V4 Flash as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →