HyperWrite's 70B parameter AI model, Reflection, has its performance questioned, after CEO Matt Shumer said something about its upload to Hugging Face was off
something got fucked up during the upload process. Will fix today. Forums: r/LocalLLaMA : Smh: Reflection was too good to be true - reference article
VentureBeat Carl Franzen
Discussion
-
@shinboson
@shinboson
on x
A story about fraud in the AI research community: On September 5th, Matt Shumer, CEO of OthersideAI, announces to the world that they've made a breakthrough, allowing them to train a mid-size model to top-tier levels of performance. This is huge. If it's real. It isn't. [image]
-
@shinboson
@shinboson
on x
Matt starts making claims that there's something wrong with the API. There's something wrong with the upload. For *some* reason there's some glitch that's just about to be fixed. [image]
-
@shinboson
@shinboson
on x
tl;dr Matt Shumer is a liar and a fraud. Presumably he'll eventually throw some poor sap engineer under the bus and pretend he was lied to. Grifters shit in the communal pool, sucking capital, attention, and other resources away from people who could actually make use of them. [i…
-
@alexandr_wang
Alexandr Wang
on x
The whole Reflection-70B debacle points the the desperate need for a better AI evaluation ecosystem. It needs to be extremely easy to adjudicate: (1) is the model overfit to benchmarks (2) is the model truly unique (i.e. not a wrapper or thin fine-tune)
-
@shinboson
@shinboson
on x
They get massive news coverage and are the talk of the town, so to speak. *If* this were real, it would represent a substantial advance in tuning LLMs at the *abstract* level, and could perhaps even lead to whole new directions of R&D. But soon, cracks appear in the story. [image…
-
@shinboson
@shinboson
on x
On September 7th, the first independent attempts to replicate their claimed results fail. Miserably, actually. The performance is awful. Further, it is discovered that Matt isn't being truthful about what the released model actually is based on under the hood. [image]
-
@shinboson
@shinboson
on x
But the thing about a private API is it's not really clear what it's calling on the backend. They could be calling a more powerful proprietary model under the hood. We should test and see. Trust, but verify. And it turns out that Matt is a liar. [image]
-
@borismpower
Boris Power
on x
I got fooled by the Reflection 70B announcement. tl;dr - the model performs very badly
-
@mattshumer_
Matt Shumer
on x
We've figured out the issue. The reflection weights on Hugging Face are actually a mix of a few different models — something got fucked up during the upload process. Will fix today.
-
r/LocalLLaMA
r
on reddit
Smh: Reflection was too good to be true - reference article