/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Claude users accuse Anthropic of degrading Claude Opus 4.6's and Claude Code's performance; Anthropic staff publicly deny it degrades models to manage capacity

From Its Own FansMaria Garcia /Implicator.ai:Anthropic Ships Claude Code Routines, Cloud Automations That Run Without Your MacLeila Sheridan /Inc.com:Users Say Anthropic's Claude Is Getting Worse. A Quiet Change May Be to BlameCraig Hale /TechRadar:‘Claude cannot be trusted to perform complex engineering tasks’: AMD AI head slams Anthropic's coding tool after months of frustration

VentureBeat Carl Franzen

Discussion

  • @bcherny Boris Cherny on x
    This is false.  We defaulted to medium as a result of user feedback about Claude using too many tokens.  When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened Claude Code so you could choose to opt out.  Literally nothing sneaky abou…
  • @trq212 @trq212 on x
    @Hesamation we don't degrade our models to better serve demand, have said this many times before
  • @petergyang Peter Yang on x
    My entire feed and the Claude subreddit is full of ppl saying opus got nerfed. Why would Anthropic nerf its own models?
  • @hesamation @hesamation on x
    AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more …
  • @om_patel5 Om Patel on x
    SOMEONE ACTUALLY MEASURED HOW MUCH DUMBER CLAUDE GOT.  THE ANSWER IS 67%. the data shows Opus 4.6 is thinking 67% less than it used to. anthropic said nothing until the numbers went public. then suddenly Boris Cherny (creator of Claude Code) shows up on the GitHub issue. users ar…
  • @paul_cal Paul Calcraft on x
    Despicable clout chasing. They tested Opus today on 30 tasks, previous Opus 4.6 score was on just *6* tasks. DIFFERENT BENCHMARK 6 tasks in common results: 85.4% score today vs. 87.6% prev. Swing is mostly from a *single* fabrication without repeats - easily statistical noise [im…
  • @hesamation @hesamation on x
    this is not a new argument btw. Lex Fridman asked Dario Amodei the same question, “is Claude getting dumber?” back in 2024 for Sonnet 3.5. It's just surprising there's still not an open, transparent daily/weekly performance report of Codex & Claude similar to uptime services [vid…
  • @marcospereeira Marcos Pereira on x
    I think the anthropic people are gaslighting us, sidestepping questions and answering a different question in lawyer speak brand trust erosion speedrun any%
  • @benjamindekr Benjamin De Kraker on x
    “We defaulted to medium as a result of user feedback about Claude using too many tokens.”
  • @mycoliza @mycoliza on x
    > intelligence too cheap to meter > they meter it
  • @ns123abc Nik on x
    >Claude is this true? [image]
  • @tengyanai Teng Yan on x
    basically: anthropic sneakily turned down how hard claude thinks before editing code, changed the default from “high” to “medium” effort, and hid the reasoning from session logs. all without telling users. an amd director had 7k sessions of telemetry to prove the degradation
  • @holdforscott Scott on x
    I completely agree with her assessment that “Claude has regressed to the point it cannot be trusted to perform complex engineering”. There are two problems with building businesses on top of frontier models. The first is the cost and value capture. The frontier labs are aiming
  • @sasuke___420 @sasuke___420 on x
    is the intended meaning that changing the settings in a way that degrades the output isn't “degrading the model” (the model is the same, the settings are just tokens in the context!) or that it is “degrading the model” but that it was done for some other reason?
  • @daveshapi David Shapiro on x
    I feel like this is the kind of betrayal that ultimately lost OpenAI the lead.
  • @yunta_tsai Yun-Ta Tsai on x
    It reminded me of the early days of the Internet, when bandwidth was shared with neighbors. Then the upstairs neighbor would try to find the kids taking up all the bandwidth downloading porn or pirated files.
  • @mweinbach Max Weinbach on x
    @edzitron Sometimes they make tweaks to Claude Code's harness to try to improve something, and there's a weird byproduct of it reducing overall quality of output and burning tokens It happens, but mostly with Claude over the other models. Claude seems more sensitive to stuff like…
  • @ctrlaltdwayne Dwayne on x
    Anthropic allegedly doesn't degrade its models intentionally, and yet they seem to be the only lab that experiences this degradation issue going back to Claude 3 out of every other SOTA lab. Are the models inherently unstable and black box, or is there something more to this?
  • @edzitron Ed Zitron on x
    I have seen scattered reports of Claude burning more tokens, and it does seem like token burn increased on openrouter in this period too, wonder if 4.6 is also part of it?
  • @scaling01 @scaling01 on x
    the compute situation for Anthropic might be even worse than expected
  • @thestalwart Joe Weisenthal on x
    So how much of the limited (virtually non-existent) Mythos rollout a function of Anthropic's compute scarcity?
  • @ziwenxu_ Ziwen on x
    They're killing Claude on purpose-200,000 tasks prove it. Someone analyzed the data. Claude isn't “having a bad day”. It's being systematically throttled. The Breakdown: > Stopped verifying: Used to check 6 times before acting. Now it glances once and wings it. > Sandbagging: [im…
  • @erikvoorhees Erik Voorhees on x
    Claude being nerf'd and agents being exiled from the $200/mo plan are very consistent behavior if Anthropic is going public soon and will have finances/margin intensely scrutinized...
  • @0xdeployer @0xdeployer on x
    lol vibe coders in shambles on the timeline because claude code switched the default reasoning effort to “medium”
  • @vraserx @vraserx on x
    So the “Anthropic secretly nerfed Claude” narrative looks like fake news. According to an Anthropic dev, they mainly stopped showing thinking summaries by default for latency, which distorted the measurements. That is not the same thing as secretly downgrading the model. [image]
  • @grummz @grummz on x
    This is not enough. Claude and Anthropic need to explain all the measured increases in hallucination and degration as observed by AMD. You can't just handwave this away.
  • @zoink Dylan Field on x
    [video]
  • @trq212 @trq212 on x
    @Hesamation boris responded to this in depth in the issue- it's mostly just that we stopped showing thinking summaries for latency (you can opt-in to showing it) which was affecting the thinking measurement in the post https://github.com/...
  • @0xdevshah Dev Shah on x
    you can run the nerfing play once, maybe twice. but anthropic will silently degrade production models to farm failure data every time. and this is where it stops being a clever strategy and starts being a trust problem.
  • @hesamation @hesamation on x
    it's also fair to include @bcherny's reply under this issue [image]
  • @daniellefong Danielle Fong on x
    @petergyang additional guardrails => central anxiety vector => they don't really feel it because ANT = 1 with rich profiles.
  • @benbajarin Ben Bajarin on x
    Not enough compute is the correct take IMO. Which has quite a lot more implications if you think about it and play that out to its logical conclusion. Software still burdened by hardware's inability to keep up. Maybe software is 2-3 years ahead of hardware?
  • @patrickmoorhead Patrick Moorhead on x
    @petergyang They've run out of compute. CLI isn't nerfed, everything else is.
  • @carnage4life Dare Obasanjo on bluesky
    Imagine being one of those CEOs who laid off thousands over “AI efficiency” only for the AI to get dumber than a pile of bricks weeks later.  —  Given how tokens work, you're paying more for a worse product.  It now takes 5 minutes to be wrong which took 30 seconds for a right an…
  • r/ClaudeAI r on reddit
    Claude Performance and Bugs Megathread Ongoing (Sort this by New!)