What DeepSeek Released on April 24
The DeepSeek V4 family launched as a public preview on April 24, 2026 with two model sizes published simultaneously. V4-Pro is a 1.6 trillion-parameter Mixture-of-Experts model with 49 billion active parameters per token, while V4-Flash is a 284 billion total / 13 billion active variant. Both ship with a 1 million-token context window — putting open-source on parity, in raw context length, with the longest-context closed-source frontier offerings from Anthropic, Google, and OpenAI.
The architectural detail that matters most is the attention mechanism. Both V4 models use what DeepSeek calls token-wise compression combined with DSA (DeepSeek Sparse Attention). Sparse attention is the technique that makes 1M-token context economically viable — without it, attention compute scales quadratically with sequence length and inference cost becomes prohibitive at frontier context windows. DeepSeek’s approach is the open-source community’s most aggressive bet to date that sparse attention can deliver near-dense attention quality at a fraction of the compute, and the V4 release is the first time it has been deployed at this scale.
Where V4-Pro Sits Against the Frontier
DeepSeek’s own positioning is precise: V4-Pro “leads all current open models” in world knowledge while “trailing only Gemini-3.1-Pro” on the same axis, and demonstrates “open-source SOTA in Agentic Coding benchmarks.” That phrasing matters. The world-knowledge claim places V4-Pro ahead of Llama, Qwen, and Mistral’s open-weights flagships. The agentic-coding claim places it as the strongest open-source option for code-generation and tool-use workflows — the workload that has driven the most enterprise adoption of frontier AI in 2025 and 2026.
The CNBC and Bloomberg coverage of the launch framed V4 as the most credible open-source challenge to closed-source frontier models since DeepSeek-V3 in December 2024 and DeepSeek-R2 in early 2026. Simon Willison’s analysis on April 24 noted that V4-Flash is the more interesting model for most builders: at 13B active parameters, it is small enough to run on a single high-memory GPU node while still claiming reasoning capabilities that “closely approach V4-Pro.” For startups and mid-size enterprises, V4-Flash is the model that actually changes deployment economics; V4-Pro is the model that changes the industry narrative.
Why “Open-Source SOTA in Agentic Coding” Matters
Agentic coding has emerged as the highest-value AI workload in 2025 and 2026 — Cursor, Windsurf, Claude Code, and the wave of AI-IDE startups have built billion-dollar valuations on the backs of frontier models that can plan, execute, and self-correct multi-step coding tasks. Until V4, every credible agentic-coding deployment in production was running on a closed-source model: Claude Opus 4 or Sonnet 4, GPT-5, or Gemini 2.5 Pro. The cost per developer per month was a major line item, and every enterprise engineering leader had to decide how much of their model spend to commit to a single closed-source vendor.
If V4-Pro genuinely delivers SOTA agentic-coding performance at open-source pricing, the calculus shifts. Self-hosted V4-Flash on a four-GPU node could run an internal Cursor-equivalent for a 200-engineer team at a fraction of the per-seat closed-source cost. The catch — and there is always a catch with open-source frontier claims — is that “SOTA in Agentic Coding benchmarks” is benchmark-specific. The detailed scores and the comparison with the latest closed-source models will determine whether enterprise teams actually switch, and the third-party benchmark cycle on V4 is just beginning. Expect a flood of independent evaluations over the next 60 days.
Advertisement
What This Tells Us About the Frontier Race
The release timing — April 24, 2026 — is significant. DeepSeek shipped V4 roughly 16 months after V3 (December 2024), three months after R2 (January 2026), and one full year after the V3 release that triggered Western policy panic about Chinese open-source AI. The cadence is now closer to the closed-source frontier: OpenAI, Anthropic, and Google ship major model updates every 4-9 months, and DeepSeek has matched that rhythm while keeping weights open. The training cost figures have not been disclosed for V4, but the V3 baseline of $5.5 million in compute for the final training run remains the benchmark the community will evaluate V4 against.
The MIT Technology Review framing — “Why DeepSeek’s V4 Matters” — captures the strategic point: V4 is no longer a curiosity from a Chinese hedge-fund-spun lab. It is a serial release on a closed-source-cadence schedule, and it is the strongest signal yet that the frontier is no longer a single-country, single-paradigm race. The open-source-vs-closed-source gap, measured in months not years a year ago, is now measured in weeks on specific tasks. For enterprise AI buyers, this is the most consequential development of 2026 H1.
What Enterprise CTOs and AI Leads Should Do Now
1. Re-run your closed-source vs open-source TCO model with V4-Flash plugged in
If your last cost-of-ownership analysis was done in late 2025 or early 2026, the open-source side of the comparison was running on Llama 3.3, Qwen 2.5, or DeepSeek-V3. V4-Flash at 13B active parameters changes the inference economics meaningfully — a single 8xH100 or H200 node can serve V4-Flash for a 200-developer team with reasonable latency. Re-build the TCO model with realistic V4-Flash inference costs (call it $3-5 per million output tokens self-hosted, vs. $15-30 per million for closed-source frontier offerings) and a 60-day amortisation horizon. If the gap exceeds 3x in your favour, you have a procurement case that did not exist 90 days ago.
2. Pilot V4-Flash on your highest-volume agentic workflow before committing
Do not switch your entire stack on the basis of a vendor’s “SOTA in Agentic Coding” claim. Pilot V4-Flash on the single workflow where you spend the most on closed-source inference today — internal code review, automated test generation, ticket triage, or data pipeline maintenance — and run a structured A/B against your current production model for 30 days. Track output quality, latency, error rate, and cost. If V4-Flash matches the closed-source model on the dimensions that matter to that workflow, expand. If it lags by 10%+ on a quality metric you actually care about, hold and re-evaluate after the 60-day independent benchmark cycle completes.
3. Lock in your sparse-attention deployment expertise now
V4 is unlikely to be the last open-source frontier model to ship sparse attention — the technique is too compute-efficient at long context for any serious open-source effort to ignore it. The teams that develop deployment expertise on DSA-style attention now will be ahead of competitors who treat each sparse-attention model as a one-off engineering challenge. Designate one or two engineers to own the V4 deployment, document the inference-stack choices (vLLM, SGLang, TensorRT-LLM), and treat that documentation as a strategic asset, not a one-time engineering note.
4. Re-negotiate your closed-source contracts with the V4 release as evidence
Your closed-source vendor’s account team already knows V4 shipped. Use the release as evidence in your next contract renewal: request volume-based discounts, multi-year price locks, or capability-based renegotiation triggers. The strongest negotiating position is “we have a credible open-source fallback” — V4 makes that statement true for the first time at frontier-level capability. Even if you do not actually intend to switch, the fact that you could materially changes the terms a closed-source vendor will offer.
The Correction Scenario
The case against switching to V4 in production is straightforward: vendor-published claims have a tendency to overstate against benchmarks that the vendor itself selected, and “SOTA in Agentic Coding” is a benchmark-specific claim until independent third parties replicate it. If the 60-day independent benchmark cycle reveals V4-Pro is competitive on a narrow set of tasks but materially weaker than the closed-source frontier on long-horizon planning or tool-use reliability, the enterprise procurement case weakens fast.
A second correction scenario worth considering: closed-source frontier vendors will respond. OpenAI, Anthropic, and Google have not historically priced aggressively against open-source — they have priced for capability premium. V4 may be the release that forces a closed-source price cut for the first time, in which case the TCO gap that justifies switching to open-source narrows from the closed-source side. Enterprise AI buyers should plan for both scenarios and not over-commit to either path before the third-party evidence settles. The right move for most teams in May and June 2026 is structured pilots, not stack migrations.
Frequently Asked Questions
What are DeepSeek V4-Pro and V4-Flash?
V4-Pro and V4-Flash are two open-source AI models released by DeepSeek as a preview on April 24, 2026. V4-Pro has 1.6 trillion total parameters with 49 billion active per token. V4-Flash has 284 billion total parameters with 13 billion active. Both ship with a 1 million-token context window and use DeepSeek Sparse Attention to make long-context inference economically viable.
How does V4 compare to closed-source frontier models?
DeepSeek positions V4-Pro as “open-source SOTA in Agentic Coding benchmarks” and notes it “leads all current open models” in world knowledge while “trailing only Gemini-3.1-Pro” on the same axis. Independent third-party benchmarks against Claude, GPT, and Gemini are still in progress and will determine whether the vendor claims hold up across the workloads enterprise buyers care about.
Should an enterprise switch from closed-source to V4 right now?
Most enterprises should pilot, not switch. Run V4-Flash on a single high-volume agentic workflow for 30 days, measure output quality and total cost of ownership, and wait for the 60-90 day independent benchmark cycle to complete before committing to a stack migration. The right move in May and June 2026 is structured pilots; full migrations should wait until third-party evidence settles.















