/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

HyperWrite's 70B parameter AI model, Reflection, has its performance questioned, after CEO Matt Shumer said something about its upload to Hugging Face was off

something got fucked up during the upload process. Will fix today. Forums: r/LocalLLaMA : Smh: Reflection was too good to be true - reference article

VentureBeat Carl Franzen

Discussion

  • @shinboson @shinboson on x
    A story about fraud in the AI research community: On September 5th, Matt Shumer, CEO of OthersideAI, announces to the world that they've made a breakthrough, allowing them to train a mid-size model to top-tier levels of performance. This is huge. If it's real. It isn't. [image]
  • @shinboson @shinboson on x
    Matt starts making claims that there's something wrong with the API. There's something wrong with the upload. For *some* reason there's some glitch that's just about to be fixed. [image]
  • @shinboson @shinboson on x
    tl;dr Matt Shumer is a liar and a fraud. Presumably he'll eventually throw some poor sap engineer under the bus and pretend he was lied to. Grifters shit in the communal pool, sucking capital, attention, and other resources away from people who could actually make use of them. [i…
  • @alexandr_wang Alexandr Wang on x
    The whole Reflection-70B debacle points the the desperate need for a better AI evaluation ecosystem. It needs to be extremely easy to adjudicate: (1) is the model overfit to benchmarks (2) is the model truly unique (i.e. not a wrapper or thin fine-tune)
  • @shinboson @shinboson on x
    They get massive news coverage and are the talk of the town, so to speak. *If* this were real, it would represent a substantial advance in tuning LLMs at the *abstract* level, and could perhaps even lead to whole new directions of R&D. But soon, cracks appear in the story. [image…
  • @shinboson @shinboson on x
    On September 7th, the first independent attempts to replicate their claimed results fail. Miserably, actually. The performance is awful. Further, it is discovered that Matt isn't being truthful about what the released model actually is based on under the hood. [image]
  • @shinboson @shinboson on x
    But the thing about a private API is it's not really clear what it's calling on the backend. They could be calling a more powerful proprietary model under the hood. We should test and see. Trust, but verify. And it turns out that Matt is a liar. [image]
  • @borismpower Boris Power on x
    I got fooled by the Reflection 70B announcement. tl;dr - the model performs very badly
  • @mattshumer_ Matt Shumer on x
    We've figured out the issue. The reflection weights on Hugging Face are actually a mix of a few different models — something got fucked up during the upload process. Will fix today.
  • r/LocalLLaMA r on reddit
    Smh: Reflection was too good to be true - reference article