Quick Answer: DeepSeek V4 Preview, released April 24, 2026, is an open-source MoE model with 1.6T total parameters (49B active) and 1M context window. Claimed benchmarks: 83.7% SWE-bench Verified, 90% HumanEval, 92.8% MMLU. API pricing: $0.30/$0.50 per million tokens — roughly 50x cheaper than Claude Opus 4.7. This is a preview release; some benchmark numbers circulated before official release and should be treated with appropriate skepticism.

Last updated: April 2026


What DeepSeek V4 actually is

DeepSeek V4 Preview is not a full release. It’s a preview of DeepSeek’s next-generation model, released April 24, 2026, and immediately open-sourced under the MIT License. Weights are available on Hugging Face and ModelScope.

Two model variants shipped:

ModelTotal ParamsActive ParamsTraining Data
DeepSeek-V4-Pro1.6T49B33T tokens
DeepSeek-V4-Flash284B13B32T tokens

Both use a sparse Mixture-of-Experts (MoE) architecture with three key innovations: Engram Conditional Memory (hash-based knowledge lookup), Manifold-Constrained Hyper-Connections (mHC), and DeepSeek Sparse Attention for long-context efficiency.

The timeline: Reuters reported on April 3 that V4 would launch “within the next few weeks” on Huawei Ascend 950PR chips. DeepSeek founder Liang Wenfeng confirmed internally that V4 would launch in late April. The preview dropped on April 24.

Benchmark claims and caveats

Here’s where things get complicated. Multiple sources report these numbers for V4-Pro:

BenchmarkClaimed ScoreContext
SWE-bench Verified83.7%vs. Opus 4.7’s 87.6%, GPT-5.4’s ~80%
HumanEval90%vs. Claude’s 88%, GPT-4’s 82%
MMLU92.8%
AIME 202699.4%

Important caveat: Many of these numbers circulated before the official release and may originate from leaks or internal tests rather than the official model card. One analysis explicitly warns that many sites “cite the same unverified leak.”

The most reliable comparison point: DeepSeek V4 appears competitive with GPT-5.4 and Claude Opus 4.6 (the previous generation), but likely trails the newer GPT-5.5 and Claude Opus 4.7 by a few percentage points on coding benchmarks — while costing 20-50x less.

The three architectural innovations

1. Engram Conditional Memory

Published January 12-13, 2026 as “Conditional Memory via Scalable Lookup” (arxiv 2601.07372). Co-authored by DeepSeek founder Liang Wenfeng.

The core idea: decouple knowledge storage from neural computation using hash-based lookup tables. This replaces about 20% of MoE parameters with a constant-time O(1) knowledge retrieval system.

Practical impact: faster inference, lower memory footprint, and the ability to update knowledge without retraining the entire model.

2. Manifold-Constrained Hyper-Connections (mHC)

From the Hugging Face model card: “We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressiveness.”

This is an architectural stability improvement for very deep networks. Less relevant to end users, but matters for training efficiency.

3. DeepSeek Sparse Attention

Enables efficient processing of the 1M token context window. The model requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.

Translation: 8x larger context window (128K → 1M) with dramatically lower computational cost.

Pricing: the headline story

DeepSeek-V4-Pro API pricing (per 1M tokens):

Input (cache miss)Input (cache hit)Output
V4-Pro$0.30$0.03$0.50
Claude Opus 4.7$5.00$25.00
GPT-5.5$5.00$30.00

That’s 17x cheaper than Opus 4.7 on input, 50x cheaper on output. With context caching (90% cache hit rate), effective input cost drops to $0.03 per million tokens.

New accounts get 5M free tokens with no credit card required.

DeepSeek-V4-Flash is even cheaper, though exact pricing is less clearly documented in search results.

What improved over DeepSeek-V3

  • Context window: 128K → 1M (8x increase)
  • Inference efficiency: 27% of V3’s FLOPs per token
  • KV cache: 10% of V3’s memory footprint
  • Architecture: Added Engram Memory, mHC, and improved sparse attention

The model is faster, cheaper to run, and handles longer contexts than V3 while maintaining or improving performance.

Open source under MIT License

This is not a research-only release. DeepSeek V4 Preview is fully open-sourced under the MIT License, one of the most permissive licenses available.

You can:

  • Download weights from Hugging Face or ModelScope
  • Run it on your own infrastructure
  • Fine-tune it for your use case
  • Deploy it commercially without restrictions

For teams that need full control over their AI stack or operate in regulated environments where data cannot leave their infrastructure, this matters more than benchmark scores.

The Huawei chip angle

DeepSeek V4 runs on Huawei’s Ascend 950PR chips. This is significant in the context of US export controls on NVIDIA GPUs to China.

DeepSeek is demonstrating that competitive frontier models can be trained and served on non-NVIDIA hardware. Whether this is a strategic hedge, a cost optimization, or a necessity depends on your perspective — but the technical result is the same: V4 exists and performs well on Ascend chips.

Where V4 Preview wins and loses

Wins:

  • Cost (20-50x cheaper than GPT-5.5/Opus 4.7)
  • Open source (MIT License, full weights available)
  • Context window (1M tokens at $0.30 input)
  • Inference efficiency (27% of V3’s FLOPs)
  • No vendor lock-in

Loses:

  • Absolute performance (trails Opus 4.7 and GPT-5.5 by a few points on coding)
  • Benchmark verification (some numbers are unverified leaks)
  • Preview status (not a final release)
  • Ecosystem maturity (fewer integrations than OpenAI/Anthropic)

The competitive context

Three frontier models shipped in one week:

  • Claude Opus 4.7 (April 16): Best coding performance, $5/$25 per million tokens
  • GPT-5.5 (April 23): Best aggregate intelligence, $5/$30 per million tokens
  • DeepSeek V4 Preview (April 24): Competitive performance, $0.30/$0.50 per million tokens

If you need the absolute best performance and cost is not a constraint, Opus 4.7 or GPT-5.5 are the choices. If you need 80-90% of that performance at 2-5% of the cost, DeepSeek V4 changes the equation.

For context: running 1 billion tokens through DeepSeek V4-Pro costs $300 input + $500 output = $800 total. The same workload on GPT-5.5 costs $5,000 input + $30,000 output = $35,000 total.

Who should try V4 Preview

Try now if:

  • You’re cost-sensitive and running high-volume workloads
  • You need open-source weights for compliance or control reasons
  • You’re building on Huawei infrastructure
  • You want to experiment with 1M context at low cost

Wait if:

  • You need the absolute highest performance on coding benchmarks
  • You require fully validated, independently verified benchmark numbers
  • You need a production-stable release rather than a preview
  • Your workflow depends on tight integration with OpenAI/Anthropic ecosystems

The preview caveat

This is explicitly a preview release. DeepSeek may ship a full V4 release later with different specifications, updated benchmarks, or architectural changes.

Treat the benchmark numbers with appropriate skepticism until independently verified. The pricing and open-source license are confirmed. The performance claims are plausible but not yet rigorously validated by third parties.


Sources: Hugging Face - DeepSeek-V4-Pro, Hugging Face - DeepSeek-V4-Flash, Gate.com, Binance Square, Futunn, Linos.ai, nxcode.io, Odaily