The Two Prices

Seventy-five percent. That is how much DeepSeek cut the price of its flagship V4-Pro model — and on May 24 it announced it would make the cut permanent, retiring the "promotional" framing it had hidden behind since the discount first appeared. Input tokens now cost $0.435 per million, output $0.87. A quarter of what they cost a year ago. By the only benchmark that matters in this market, a unit of machine intelligence got cheaper again.

This is not news. It is the opposite of news. The price of a token has fallen so reliably, for so long, that the decline has become the ambient condition of the industry — the thing everyone assumes and no one measures. Alibaba cut its model prices up to 97% in 2024. OpenAI dropped o3 by 80% in 2025. Each cut is reported as a skirmish in a price war. None of them are surprising. The surprising number was in a different article, on the same day.

The other half of the trade

The second price was in the same week's coverage, filed under hardware. Contract prices for NAND memory chips are up more than 600% since the end of September 2025. DRAM is up nearly 400%. These are not consumer markups passed along a supply chain. These are the wholesale prices of the physical substrate — the chips that hold the weights, the context windows, the inference state — moving in the opposite direction from the thing they compute.

DeepSeek V4-Pro token price, made permanent May 24

NAND contract price since September 2025

The ratio is the story. One number falls by three-quarters. The other rises sixfold. They are reported as unrelated — a software story and a hardware story, filed under different desks, written by different reporters. They are the same trade. The cost of thinking is collapsing. The cost of remembering is exploding. And both collapses and explosions are powered by the same engine.

The mechanism is not mysterious once you stop treating the two prices as separate. A token gets cheaper because the model that produces it gets more efficient — better architectures, better quantization, more output squeezed from each pass. That is an algorithmic deflation, and it has no floor anyone can see. Memory gets more expensive because the physical demand for compute — the data centers being poured in concrete from Texas to Johor — is consuming the world's supply of high-bandwidth memory faster than three companies can fabricate it. That is a physical inflation, and it has a very hard floor: global DRAM supply is expected to meet only 60% of demand through 2027. The abstraction is getting cheaper. The matter underneath it is getting scarce. Same boom. Opposite signs.

Tokens are being priced like a commodity racing to zero. Memory is being priced like oil. Both are inputs to the same machine.

Compared to what?

A skeptic should push here. Falling token prices and rising memory prices could be a coincidence of timing — two cycles that happen to be out of phase. The way to test that is to ask whether the same dynamic shows up anywhere else in the same week, or whether it is confined to these two articles. It is not confined.

On the same Sunday, the Financial Times reported that the AI boom has rewired global M&A — that dealmaking has stopped being a race to buy companies and become a race to buy capacity: energy, fiber networks, raw computing. Unloved utilities turned sexy. Private equity found a new gold mine in power plants. Three months earlier, the Wall Street Journal had documented AI companies outbidding Apple — Apple, the entity that dominated the electronics supply chain for fifteen years — for chips, memory, and glass fiber, handing pricing leverage to suppliers for the first time in a generation.

This is not a coincidence of timing. It is one structure expressing itself wherever you look. Wherever the AI boom touches something physical — a power contract, a fiber route, a memory die, a supply chain — the price goes up and the leverage moves to whoever controls the scarce thing. Wherever it touches something informational — a token, a model weight, an inference call — the price goes down. The boom is a machine for converting abstract abundance into physical scarcity. The two prices are the input and the output of the same conversion.

Who pays the second number

The deflation and the inflation do not land on the same people. This is the part the two-articles framing hides, and it is the part that matters.

The falling token price is captured by whoever already runs inference at scale — developers, enterprises, the venture-funded layer building on top of the models. To benefit from DeepSeek's 75% cut you need an API key, a credit card, and a product that turns cheap tokens into something worth selling. The cheaper intelligence gets, the more valuable it is to the people positioned to consume it in volume. Abundance flows up the stack.

The rising memory price lands somewhere else entirely. Rising DRAM prices are driving what one analyst calls "forced premiumization" in the smartphone markets of India and Africa — the sub-$200 segment, the cheapest rung of the ladder into the digital economy, is being priced out of existence. Memory is on track to reach roughly 40% of a low-end phone's manufacturing cost by mid-2026, up from 20%. The phone that cost $180 last year cannot be built for $180 this year, so it is not built at all. The buyer does not get a more expensive phone. The buyer gets no phone.

Hold the two facts together. The cost of intelligence fell 75% for the developer with an API key. The cost of the cheapest device that could reach that intelligence rose past the budget of the buyer who needed it most. The same boom did both. The promise of AI was that intelligence would become abundant and therefore democratic — that the marginal cost of a smart answer approaching zero would put the same capability in every hand. The ledger says the abstraction got cheaper for everyone who can already afford the hardware, and the hardware got more expensive for everyone who can't.

The leverage is visible inside the boom too

If memory is the scarce input, you can read the whole industry by asking who sits next to it. Samsung's bonus deal this month opened a 100x payout gap between its memory-division employees and the colleagues making smartphones, TVs, and home appliances — a gap large enough to fuel open resentment across the company. Same firm, same boom, two divisions: one makes the scarce thing and one makes the things that need it. The pay gap is the price ratio, expressed in human terms, inside a single org chart.

Three companies — Samsung, SK Hynix, Micron — fabricate the memory the entire AI buildout depends on, and they have spent the year discovering exactly how much leverage that gives them. They are now using it to lock customers into long-term supply agreements, restructuring the industry's business model around the simple fact that demand cannot be met — a position one analyst summed up by calling memory chips more valuable than oil. The token market has thousands of sellers racing each other to zero. The memory market has three sellers who have stopped racing. That asymmetry is the entire shape of the AI economy, compressed into a count: many sellers of the abundant thing, three sellers of the scarce one.

The two prices are one number

Seventy-five percent is a real number and it means what it says: the cost of a unit of machine intelligence fell by three-quarters in a year. It is the number the industry quotes to prove that AI is getting cheaper, more accessible, more democratic — the number that anchors every projection of abundance.

It is also exactly half of the trade. The other half is the 600% on the chips and the 40% of a budget phone and the 100x in a Samsung bonus envelope, and that half does not point toward abundance. It points toward a world where the thing that thinks is nearly free and the thing it runs on is nearly impossible to get, and where those two facts are sorted onto different people by an accident of where they were standing when the boom arrived.

Intelligence is becoming abundant. The matter it runs on is becoming scarce. Both numbers come from the same machine, and only one of them was ever going to be distributed evenly — because abundance can be copied, and scarcity has to be allocated. Seventy-five percent is what the boom gives the people who already had a keyboard. The other number is what it charges everyone else.