DeepSeek V4 Flash DSpark

deepseek-ai/DeepSeek-V4-Flash-DSpark

published Jun 2026 · updated Jun 2026

DeepSeek V4 Flash DSpark is a text-generation model that uses a Mixture-of-Experts architecture with 284B total parameters (13B activated) and supports a context length of one million tokens, enhanced with speculative decoding for faster inference.

status

coming soon

API providers

downloads / mo

32.7K

license

mit

specs

Task	Text Generation
Architecture	Mixture-of-Experts (MoE) with Hybrid Attention
Parameters	284B total, 13B activated
Context Length	1,000,000 tokens
License	MIT

about this model

DeepSeek-V4-Flash-DSpark is a text-generation model, specifically a 284B-parameter Mixture-of-Experts (MoE) language model with 13B activated parameters that supports a context length of one million tokens and includes an attached DSpark speculative decoding module for improved inference throughput. The architecture combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) in a hybrid mechanism, together with Manifold-Constrained Hyper-Connections (mHC) and the Muon optimizer. In the 1M-token context setting, this model requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. It was pre-trained on more than 32T tokens. DeepSeek-V4 architecture diagram

On base model benchmarks, the model achieves MMLU 88.7 (5-shot), HumanEval 69.5 (0-shot Pass@1), GSM8K 90.8 (8-shot), MATH 57.4 (4-shot), and LongBench-V2 44.7 (1-shot). The instruct model supports three reasoning effort modes: Non-Think, Think High, and Think Max. In Max mode, it achieves LiveCodeBench 91.6 (Pass@1), GPQA Diamond 88.1 (Pass@1), HMMT 2026 Feb 94.8 (Pass@1), and SWE Verified 79.0% resolved. In Non-Think mode, it scores MMLU-Pro 83.0 and GPQA Diamond 71.2. The model supports tool calling across all modes.

best for

·Processing long documents up to one million tokens
·Complex reasoning and problem-solving with think modes
·Agentic workflows and tool calling

FAQ

What is DeepSeek V4 Flash DSpark?

It is a preview of the DeepSeek-V4 series with a Mixture-of-Experts architecture, 284B total parameters, 13B activated, supporting one-million-token contexts and speculative decoding for faster inference.

How does DSpark differ from the base DeepSeek V4 Flash model?

DSpark is the same checkpoint with an additional speculative decoding module attached to improve inference speed, not a new model.

What license is this model released under?

It is released under the MIT license.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with an API key; the model supports standard text generation and tool calling.

What input format does the model expect?

The model uses OpenAI-compatible chat messages; refer to the encoding folder in the model repository for encoding and decoding details.

not yet live

We're benchmarking and onboarding DeepSeek V4 Flash DSpark as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo