An evaluation by NIST's CAISI says DeepSeek V4 Pro lags behind leading US AI models by about eight months and is the most capable Chinese AI model to date

In April 2026, the Center for AI Standards and Innovation (CAISI) evaluated the open-weight AI model DeepSeek V4 Pro ("DeepSeek V4").

NIST 2026-05-03

Context & Ripple Effects

DeepSeek’s V4 launch followed a progression from V3 and V3.1, where the company emphasized benchmark gains and adaptation to next-generation Chinese-made chips. Earlier reporting also framed export controls as a constraint that pushed the lab toward more resource-efficient development.

The company previewed V4 Pro and V4 Flash while positioning V4 Pro as several months behind the frontier. CAISI’s evaluation independently narrows that claim to roughly an eight-month gap while identifying it as China’s strongest model so far, giving buyers a clearer relative-performance reference.

First-order effects

DeepSeek gains a credible external marker as the leading Chinese model provider, but the evaluation also establishes that V4 Pro remains behind leading US systems on CAISI’s assessment.
Enterprise and public-sector model evaluators can treat DeepSeek’s own positioning with more discipline: Chinese leadership and global-frontier parity are distinct conclusions.

Second-order effects

US model providers retain a performance-based sales argument against DeepSeek, while Chinese competitors face a higher bar to displace it as the domestic capability leader.
The result increases the value of workload-specific procurement: a model can be the strongest locally available option without being the best choice for tasks that require frontier-level performance.

Third-order effects

If independent evaluations become a regular reference point, model competition will be judged less by launch claims and more by repeatable comparisons across capability, deployment constraints, and fit for particular workloads.
The gap between national model ecosystems may increasingly matter alongside absolute benchmark leadership, as buyers balance access, sovereignty, and performance rather than treating AI models as interchangeable.

The trend: AI procurement is shifting from headline benchmark competition toward independently assessed, sovereignty-aware selection among increasingly capable regional model ecosystems.

Discussion

@nikostro Nikita Ostrovsky on x
DeepSeek V4 has a similar capability to GPT-5, released 8 months ago, according to a new @NIST report. If the current trend continues, we'll see a Chinese model at GPT-5.5 (roughly Mythos-level) model around February 2027. [image]
@niubi Bill Bishop on x
In April 2026, the Center for AI Standards and Innovation (CAISI) evaluated the open-weight AI model DeepSeek V4 Pro ("DeepSeek V4"). CAISI evaluations indicate that DeepSeek V4's capabilities lag behind the frontier by about 8 months https://www.nist.gov/...
@hamandcheese Samuel Hammond on x
New composite eval of DeepSeek V4 from CAISI suggests China is falling behind. Notice the relative steepness of their improvement trend. https://www.nist.gov/... [image]
@natolambert Nathan Lambert on x
So much rests on which of these trend lines is more representative. [image]
@alecstapp Alec Stapp on x
The export controls are working. Don't let NVIDIA lobbyists tell you any different.
@dorialexander Alexander Doria on x
The issue is that benchmarks simultaneously undersell and oversell the gap. DeepSeekv4 belongs to an entirely new category of model by side and by design, and the most dramatic step taken this year to bring the open architecture ecosystem closer to frontier.
@scaling01 @scaling01 on x
chinese models are ~8 months behind and are falling further behind [image]
@emollick Ethan Mollick on x
This is a good explanation of why the gap between open and closed models is larger than it appears in benchmarks. I would add in that current open models are also more fragile than closed: they handle out-of-distribution problems far less well & have lower emergent capabilities.
@scaling01 @scaling01 on x
The AI model gap is bigger than you think
@scaling01 @scaling01 on x
@rasbt They all cluster around DeepSeek-V4. I doubt it would change the entire trend
@rasbt Sebastian Raschka on x
@scaling01 Since GPT 5.5 is on that chart, it would have been interesting to include GLM 5.1 and Kimi K2.6 as well (and perhaps Qwen3.6 Max)

Chronicles