AI Token Prices Are Falling. Total AI Costs Are Rising.

Industrial mint floor at night with machines producing tokens at scale, a price board showing plummeting costs as overflow bins pile up

$2.1M. That's what Sora generated in total revenue across its entire commercial life — fifteen months from general release in December 2024 to the shutdown announcement on March 24, 2026. In those fifteen months, it was used by millions of people to generate videos that cost $1.30 each to produce. On its best days, the compute bill reached $15 million. A Disney partnership worth $1 billion, which would have changed the math entirely, died without a dollar changing hands.

OpenAI's daily compute cost to run Sora at peak

Sora's total lifetime revenue

March 2026

OpenAI says it will discontinue products that use its Sora models, including its consumer app, a Sora version for developers, and a video feature inside ChatGPT

Wall Street Journal

The ratio is 7,143:1 — a single day of operating costs against everything the product ever earned. It is not a close call.

But Sora isn't an anomaly. It's a proof of concept — for the proposition that falling token prices will eventually produce profitable AI products, and for why that proposition is wrong in a specific way the industry hasn't fully absorbed.

The Paradox

LLM inference prices have fallen by roughly 50x per year since 2023, accelerating to 200x since January 2024. The cheapest models available today — DeepSeek V3.2-Exp at $0.07 per million input tokens — run comparable benchmarks to models that cost $30/MTok eighteen months ago. Gartner projects 90%-plus further cost reduction by 2030. The direction is clear and the trajectory is steep.

And yet: big tech's AI infrastructure spending is on track for $700 billion in 2026. Amazon, $200 billion. Google, $185 billion. Meta, $135 billion. Microsoft, $120 billion. The total compute bill is rising, not falling, even as the unit cost of inference collapses.

The mechanism: cheaper tokens don't reduce demand — they create it. When a task that cost $10 to run costs $0.10, it gets automated. When it's automated, it runs ten times more often. The boundary of what's worth routing through AI expands faster than the cost per token falls. In 2023, inference accounted for 33% of total AI compute demand. By 2026, it's 60 to 70%. Volume overwhelmed the price reduction.

Agentic workflows — what Andrej Karpathy describes as programming at a higher level, where "the basic unit of interest is not one file but one agent" — consume 10 to 100 times more tokens per task than a single chat exchange. Video generation consumes more still. A 10-second Sora clip cost $1.30 to produce. At that unit economics, unlimited consumer access is not a business model. It's a runway drain.

Who Pays

5.5% of ChatGPT's 900 million weekly users pay for it. The other 94.5% generate compute costs too — every chat query, every code completion, every image request. When the cost per query is a fraction of a cent, the free tier is manageable. When it's $1.30 per generation, the math breaks. Sora's downloads fell 65% from 3.33 million in November 2025 to 1.13 million in February 2026 — the market telling OpenAI something about the gap between what users wanted to pay and what it cost to serve them.

OpenAI projects losing $14 billion on $20 billion in revenue in 2026, with breakeven pushed to 2030. The 2030 projection assumes the cost curve keeps falling and the demand curve stays bounded. Sora is evidence for what happens when a product violates the second assumption.

Entity	2026 Revenue	2026 Net	Enterprise Share	Growth Rate
OpenAI	$20B	−$14B	27% (was 50%)	~2x
Anthropic	$19B ARR	approaching breakeven	40%	10x
Sora	$2.1M (lifetime)	~−$5.4B annualized	—	−65% MAU

Read the Enterprise Share column. OpenAI held 50% of enterprise LLM spending in early 2025. By early 2026, it held 27%. Anthropic now holds 40%, and wins approximately 70% of new enterprise deals according to Ramp's vendor data. The reversal happened in twelve months.

March 2026

Sources: Anthropic recently passed $19B in run-rate revenue, up from $9B at the end of 2025 and ~$14B a few weeks ago, as its clash with the US DOD casts doubt

Bloomberg

The Enterprise Hedge

The reversal isn't primarily about model quality — both companies run comparable benchmarks. It's about token demand structure.

Enterprise software has bounded token consumption. An IT department deploying Claude for internal use cases sets budgets, defines workflows, and controls what gets routed through the model. The monthly API spend is predictable. Consumer products don't have this. A flat subscription — $20/month, unlimited queries — eliminates the user's incentive to self-limit. Heavy users generate the highest costs and pay the same price as occasional users. When the heavy users are running Sora at $1.30/clip, the flat-subscription economics collapse.

Anthropic's 10x annual growth — from approximately $1 billion in ARR fifteen months ago to $19 billion today — reflects a bet on the bounded side of the token economy. Enterprise contracts price at consumption, not access. Revenue scales with cost. That's a different risk profile than what OpenAI is running with consumer ChatGPT.

Buyers are making the same calculation from the other side. In February 2026, Jack Dorsey announced a reduction from 10,000 employees to under 6,000. His note to the company was direct about the cause:

"The intelligence tools we're creating and using, paired with smaller and flatter teams, are enabling a new way of working." Block isn't cutting because it can't afford AI. It's cutting because AI let it reduce the human cost of specific workflows — with the AI spend bounded to those workflows and billed accordingly. The enterprise relationship with AI is a cost control story. The consumer relationship is not.

The Demand Floor

On the other side of this calculation is the solo founder hitting the same economics from below:

The structure is identical to Sora's: the cost of running the AI is real and immediate; the revenue is theoretical and deferred. The difference is that @kloss_xyz is choosing this trade. Sora's users weren't paying the compute cost at all — they were on a flat subscription, and the gap between what they paid and what they consumed was OpenAI's problem.

The demand for AI-augmented work is structural, not cyclical. @levelsio's calculation that an MVP now takes 24 minutes — "theoretically you can build 40 ideas per day," he wrote — reflects a real change in development throughput. That kind of throughput is token-intensive. The developers building at this speed aren't consuming chatbot-level inference. They're running agents, loops, and generations. They are the demand surge the $700 billion infrastructure bet is designed to serve. The question is whether the price they pay for access covers what it costs to serve them, or whether the cost gets absorbed somewhere else in the stack.

At Sora, it was absorbed in the P&L. The product couldn't find a model where user payment covered generation cost. Per-seat pricing at consumer rates was the wrong unit for a product where cost scales with output, not with access.

$2.1M

At $1.30 per 10-second clip, Sora's $2.1 million in lifetime revenue represents roughly 1.6 million clips across fifteen months — in a product used by millions of people. The average paying user barely generated any clips at all, because most users weren't paying users. The product gave away at the subscription layer what cost $1.30 per unit to produce.

November 2025

Docs: Anthropic expects to break even in 2028, while OpenAI projects ~$74B in operating losses, or ~75% of revenue, that year before turning a profit in 2030

Wall Street Journal

The token economy has a price floor: the cost of generating a response, minus what the user pays for it. When the user pays a flat fee that doesn't scale with consumption, the floor is the entire generation cost. At chatbot-level inference, that floor is survivable. At video-level inference, it isn't.

The $700 billion in AI infrastructure spending is a bet that most use cases won't look like Sora — that inference will be cheap enough, and revenue models aligned enough with consumption, that the cost and revenue curves eventually cross. For OpenAI, the crossover is penciled in for 2030. For Anthropic, it comes sooner: enterprise contracts price at token consumption, not at access, so the curves already track each other. For Sora, the curves never crossed.

$2.1M against $15M per day. The demo was excellent. The math wasn't there.

More on OpenAI, Anthropic, and AI economics. Explore entity coverage via the Pulse API.