deepseek-ai/DeepSeek-V4-Pro · Hugging Face
We believe that this document is fully human-written.
Hacker News Article AI Analysis
Content Label
Human
AI Generated
0%
Human
100%
Window 1 - Human
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
Technical Report👁️
Introduction
We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:
Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2.
Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability.
We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks.
Window 2 - Human
Meanwhile, DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.
Model Downloads
Model
#Total Params
#Activated Params
Context Length
Precision
Download
DeepSeek-V4-Flash-Base
284B
13B
1M
FP8 Mixed
HuggingFace | ModelScope
DeepSeek-V4-Flash
284B
13B
1M
FP4 + FP8 Mixed*
HuggingFace | ModelScope
DeepSeek-V4-Pro-Base
1.6T
49B
1M
FP8 Mixed
HuggingFace | ModelScope
DeepSeek-V4-Pro
1.6T
49B
1M
FP4 + FP8 Mixed*
HuggingFace | ModelScope
*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.
Evaluation Results
Base Model
Benchmark (Metric)
# Shots
DeepSeek-V3.2-Base
DeepSeek-V4-Flash-Base
DeepSeek-V4-Pro-Base
Architecture
-
MoE
MoE
MoE
# Activated Params
-
37B
13B
49B
# Total Params
-
671B
284B
1.6T
World Knowledge
AGIEval (EM)
0-shot
80.1
82.6
83.1
MMLU (EM)
5-shot
87.8
88.7
90.1
MMLU-Redux (EM)
5-shot
87.5
89.4
90.8
MMLU-Pro (EM)
5-shot
65.5
68.3
73.5
MMMLU (EM)
Window 3 - Human
5-shot
87.9
88.8
90.3
C-Eval (EM)
5-shot
90.4
92.1
93.1
CMMLU (EM)
5-shot
88.9
90.4
90.8
MultiLoKo (EM)
5-shot
38.7
42.2
51.1
Simple-QA verified (EM)
25-shot
28.3
30.1
55.2
SuperGPQA (EM)
5-shot
45.0
46.5
53.9
FACTS Parametric (EM)
25-shot
27.1
33.9
62.6
TriviaQA (EM)
5-shot
83.3
82.8
85.6
Language & Reasoning
BBH (EM)
3-shot
87.6
86.9
87.5
DROP (F1)
1-shot
88.2
88.6
88.7
HellaSwag (EM)
0-shot
86.4
85.7
88.0
WinoGrande (EM)
0-shot
78.9
79.5
81.5
CLUEWSC (EM)
5-shot
83.5
82.2
85.2
Code & Math
BigCodeBench (Pass@1)
3-shot
63.9
56.8
59.2
HumanEval (Pass@1)
0-shot
62.8
69.5
76.8
GSM8K (EM)
8-shot
91.1
90.8
92.6
MATH (EM)
4-shot
60.5
57.4
64.5
MGSM (EM)
8-shot
81.3
85.7
Window 4 - Human
84.4
CMath (EM)
3-shot
92.6
93.6
90.9
Long Context
LongBench-V2 (EM)
1-shot
40.2
44.7
51.5
Instruct Model
DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:
Reasoning Mode
Characteristics
Typical Use Cases
Response Format
Non-think
Fast, intuitive responses
Routine daily tasks, low-risk decisions
</think> summary
Think High
Conscious logical analysis, slower but more accurate
Complex problem-solving, planning
<think> thinking </think> summary
Think Max
Push reasoning to its fullest extent
Exploring the boundary of model reasoning capability
Special system prompt + <think> thinking </think> summary
DeepSeek-V4-Pro-Max vs Frontier Models
Benchmark (Metric)
Opus-4.6 Max
GPT-5.4 xHigh
Gemini-3.1-Pro High
K2.6 Thinking
GLM-5.1 Thinking
DS-V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM)
89.1
87.5
91.0
87.1
86.0
87.5
SimpleQA-Verified (Pass@1)
46.2
45.3
75.6
36.9
38.1
57.9
Chinese-SimpleQA (Pass@1)
76.4
76.8
85.9
75.9
75.0
84.4
GPQA Diamond (Pass@1)
91.3
93.0
94.3
90.5
86.2
90.1
HLE (Pass@1)
40.0
39.8
44.4
36.4
34.7
37.7
LiveCodeBench
Window 5 - Human
(Pass@1)
88.8
-
91.7
89.6
-
93.5
Codeforces (Rating)
-
3168
3052
-
-
3206
HMMT 2026 Feb (Pass@1)
96.2
97.7
94.7
92.7
89.4
95.2
IMOAnswerBench (Pass@1)
75.3
91.4
81.0
86.0
83.8
89.8
Apex (Pass@1)
34.5
54.1
60.9
24.0
11.5
38.3
Apex Shortlist (Pass@1)
85.9
78.1
89.1
75.5
72.4
90.2
Long Context
MRCR 1M (MMR)
92.9
-
76.3
-
-
83.5
CorpusQA 1M (ACC)
71.7
-
53.8
-
-
62.0
Agentic
Terminal Bench 2.0 (Acc)
65.4
75.1
68.5
66.7
63.5
67.9
SWE Verified (Resolved)
80.8
-
80.6
80.2
-
80.6
SWE Pro (Resolved)
57.3
57.7
54.2
58.6
58.4
55.4
SWE Multilingual (Resolved)
77.5
-
-
76.7
73.3
76.2
BrowseComp (Pass@1)
83.7
82.7
85.9
83.2
79.3
83.4
HLE w/ tools (Pass@1)
53.1
52.0
51.6
54.0
50.4
48.2
Window 6 - Human
GDPval-AA (Elo)
1619
1674
1314
1482
1535
1554
MCPAtlas Public (Pass@1)
73.8
67.2
69.2
66.6
71.8
73.6
Toolathlon (Pass@1)
47.2
54.6
48.8
50.0
40.7
51.8
Comparison across Modes
Benchmark (Metric)
V4-Flash Non-Think
V4-Flash High
V4-Flash Max
V4-Pro Non-Think
V4-Pro High
V4-Pro Max
Knowledge & Reasoning
MMLU-Pro (EM)
83.0
86.4
86.2
82.9
87.1
87.5
SimpleQA-Verified (Pass@1)
23.1
28.9
34.1
45.0
46.2
57.9
Chinese-SimpleQA (Pass@1)
71.5
73.2
78.9
75.8
77.7
84.4
GPQA Diamond (Pass@1)
71.2
87.4
88.1
72.9
89.1
90.1
HLE (Pass@1)
8.1
29.4
34.8
7.7
34.5
37.7
LiveCodeBench (Pass@1)
55.2
88.4
91.6
56.8
89.8
93.5
Codeforces (Rating)
-
2816
3052
-
2919
3206
HMMT 2026 Feb (Pass@1)
Window 7 - Human
40.8
91.9
94.8
31.7
94.0
95.2
IMOAnswerBench (Pass@1)
41.9
85.1
88.4
35.3
88.0
89.8
Apex (Pass@1)
1.0
19.1
33.0
0.4
27.4
38.3
Apex Shortlist (Pass@1)
9.3
72.1
85.7
9.2
85.5
90.2
Long Context
MRCR 1M (MMR)
37.5
76.9
78.7
44.7
83.3
83.5
CorpusQA 1M (ACC)
15.5
59.3
60.5
35.6
56.5
62.0
Agentic
Terminal Bench 2.0 (Acc)
49.1
56.6
56.9
59.1
63.3
67.9
SWE Verified (Resolved)
73.7
78.6
79.0
73.6
79.4
80.6
SWE Pro (Resolved)
49.1
52.3
52.6
52.1
54.4
55.4
SWE Multilingual (Resolved)
69.7
70.2
73.3
69.8
74.1
76.2
BrowseComp (Pass@1)
-
53.5
73.2
Window 8 - Human
-
80.4
83.4
HLE w/ tools (Pass@1)
-
40.3
45.1
-
44.7
48.2
MCPAtlas (Pass@1)
64.0
67.4
69.0
69.4
74.2
73.6
GDPval-AA (Elo)
-
-
1395
-
-
1554
Toolathlon (Pass@1)
40.7
43.5
47.8
46.3
49.0
51.8
Chat Template
This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.A brief example:from encoding_dsv4 import encode_messages, parse_message_from_completion_text
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")
# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)
How to Run Locally
Please refer to the inference folder for detailed instructions on running DeepSeek-V4 locally, including model weight conversion and interactive chat demos.For local deployment, we recommend setting the sampling parameters to temperature = 1.0, top_p = 1.0. For the Think Max reasoning mode, we recommend setting the context window to at least 384K tokens.
License
This repository and the model weights are licensed under the MIT License.
Window 9 - Human
Citation
@misc{deepseekai2026deepseekv4,
title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
author={DeepSeek-AI},
year={2026},
}
Contact
If you have any questions, please raise an issue or contact us at service@deepseek.com.