Amazon set weekly AI usage targets so its workforce would actually use the tools the company had bought. The Financial Times reports that some of those employees have built an internal tool called MeshClaw whose only purpose is to delegate fake jobs to AI agents and inflate their token counts. Both sides got the numbers they wanted. No work was done.

The arc that produced MeshClaw is eighteen months long, and every step in it looked reasonable at the time it was taken.

The Gap

Through 2024 and into 2025, the largest technology companies had a problem that nobody wanted to name in earnings calls. They had committed to AI as the central productivity story of the decade — pledged tens of billions of dollars of capex, told boards that AI would transform white-collar work, written internal memos about employees adopting it across their workloads. The capex was real. The adoption inside their own walls was not.

Bloomberg noted in February 2025 that AI adoption had outpaced the PC and the internet, but that evidence of a productivity boost was thin on the ground. A May 2025 New York Times report described Amazon managers "increasingly pushing" engineers to use AI; engineers said they didn't see the gains the tools promised. By September, AWS CEO Matt Garman was criticizing staff at an all-hands for slow product rollouts. In November, an internal memo went out asking Amazon engineers to use the company's in-house coding assistant. The argument inside every large company in late 2025 was the same: if AI is the productivity multiplier we've told shareholders it is, why isn't anyone using it?

The clearest data point was Anthropic's own. In December 2025, the company published a study of its own engineers reporting 60% Claude usage and an 80% speed-up on the tasks it touched. The numbers were widely mocked as self-reported by a company that makes Claude — but they were the high-water mark. Sixty percent of work, voluntarily, at the company that builds the tool. Everywhere else, the rate was lower, and the proxy gap between capex and adoption kept widening.

Something had to bridge it. The answer, at every company, was the same: make AI use a performance metric.

The Mandate

In February 2026, the Financial Times reported that Accenture had told senior staff that promotions would require "regular adoption" of AI and that the firm was beginning to track some senior staff's weekly AI tool logins. The same month, the Wall Street Journal documented companies across tech factoring AI use into performance reviews; Microsoft had quietly lowered sales quotas for teams whose members didn't use enough AI, and Meta was scoring "AI-driven impact" on year-end reviews. By March, Bloomberg was describing a "productivity panic" among executives and engineers — and a UC Berkeley study finding that those who offloaded work to AI were also working longer hours.

The premise was reasonable. If a tool is genuinely productive, mandating its use should produce visible productivity. The flaw was in the proxy. Productivity is hard to measure; AI tool usage is easy to measure. The same gap that has bedeviled every white-collar metric in the last century — between what people do and what their tools' telemetry shows them doing — was about to be productized by every Fortune 500 HR system.

A Harvard Business Review study the same month gave the most precise warning. Eight months observing a US tech company found that AI tools didn't reduce work; they intensified it. Employees worked faster, longer, and across a bigger scope of tasks. The first sign of the AI mandate was burnout, not slack. The next sign was harder to see.

The Leaderboard

By mid-March, the WSJ documented companies "tallying" employee AI token use to decide whose strategies to amplify and whose wastefulness to squash. A week later, the New York Times named the practice that had emerged on the other side of the meter: "tokenmaxxing," a status game where engineers competed on internal leaderboards for AI throughput. One OpenAI engineer was reported to have processed 210 billion tokens in a single week — enough text to fill Wikipedia 33 times, almost none of it the work he was hired to do.

In April, The Information reported that Meta had formalized this into a system: Claudeonomics, an internal leaderboard where its 85,000 employees competed for token throughput. The top tier was called Token Legend. The badge appeared on internal Slack profiles. What had begun as a productivity-tracking dashboard was now the workplace equivalent of step-counting on a smartwatch — a number people optimized because it was the number being watched.

The measurement had become the work. The thing being measured — productive output — was the original goal nobody was tracking, because tokens were so much easier to count.

MeshClaw

MeshClaw is the reversal made physical. It is an internal Amazon tool — modeled on the open-source agent harness OpenClaw — that lets an employee set up automated jobs to be delegated to AI agents, generating token activity without human input. The employees the FT spoke to described using it for tasks the agents didn't need to do, specifically to climb the leaderboard after weekly usage targets were set. The same engineers who twelve months earlier had told their managers they couldn't see the productivity gains AI promised had now built a tool whose only purpose was to give those managers a number to point to.

The tool does not improve productivity. It does not pretend to. Its function is to make the token meter spin while the employee does something else, or nothing at all. Both sides of the relationship are now collaborating on a fiction: the capex is justified by the usage, and the usage is being generated by a dedicated bot. Paul Kedrosky published a chart the same week — the rise and fall of OpenClaw throughput — that one of his readers captioned: You measured it. We delivered.

The metric is the same medium as the gaming. The system is recursive in a way no previous productivity metric has been.

The Same Failure, Faster

Goodhart's Law is more than fifty years old. When a measure becomes a target, it ceases to be a good measure. Citation counts became targets in academia and produced citation rings. Lines of code became targets in software and produced verbose code. Sales call counts became targets in enterprise sales and produced ninety-second calls. Click-through rates became targets in journalism and produced clickbait. Each cycle took years to play out: the metric was deployed, the gaming emerged slowly, the institution adjusted, and the next metric was already being designed.

The AI-token version of the cycle compressed from years into months. The first companies tracking employee token use appeared in WSJ coverage in mid-March 2026. By late March, NYT was naming "tokenmaxxing" as an emergent practice. By early April, Meta had built a formal leaderboard around it. By mid-May, Amazon employees had dedicated gaming infrastructure. Eighteen weeks from deployment to organized counter-deployment.

The acceleration has a single cause. Every previous Goodhart cycle in modern memory required human effort on the gaming side. Citation rings require co-authors. Verbose code requires fingers on a keyboard. Ninety-second sales calls require somebody dialing the phone. AI token consumption can be generated by AI. The metric is the same medium as the gaming. The factory whose output is being measured can be staffed entirely by the product the factory makes.

The Token Meter

Each step in this arc was a reasonable response to the prior step's failure. The capex needed justification, so the mandates appeared. The mandates needed enforcement, so the tracking appeared. The tracking needed a benchmark, so the leaderboards appeared. The leaderboards needed wins, so MeshClaw appeared. Every layer was rational on its own. The stack as a whole is now generating a number — corporate AI consumption — that contains a measurable share of intentional waste.

The number propagates. Enterprise software license revenue, API token consumption, seat counts on coding assistants — these are the components of the macro AI-spend figures cited in Senate hearings, in bank research reports, in the slides shown to boards approving the next round of capex. Semafor noted this week that AI spending was "likely higher than suggested." The inverse interpretation is the harder one: AI spending, broken down per engineer, is also lower in productive content than it appears. Both can be true. They are pointing to the same gap from opposite directions.

On a server somewhere in Seattle, MeshClaw runs. It calls Claude. Claude calls Claude. A counter ticks up; a leaderboard re-sorts. Tokens accumulate, and a line in next quarter's earnings call grows by a basis point. The tools were built to amplify work. They have been deputized to perform it.