/

Navigation

Chronicles

Browse all articles

Explore

Semantic exploration

Research

Entity momentum

Nexus

Correlations & relationships

Story Arc

Topic evolution

Drift Map

Semantic trajectory animation

Posts

Analysis & commentary

Pulse API

Tech news intelligence API

Browse

Entities

Companies, people, products, technologies

Domains

Browse by publication source

Handles

Browse by social media handle

Detection

Concept Search

Semantic similarity search

High Impact Stories

Top coverage by position

Sentiment Analysis

Positive/negative coverage

Anomaly Detection

Unusual coverage patterns

Analysis

Rivalry Report

Compare two entities head-to-head

Semantic Pivots

Narrative discontinuities

Crisis Response

Event recovery patterns

Connected

Search: /

Command: ⌘K

Embeddings: large

TEXXR ▲

TEXXR

Chronicles

The story behind the story

← → days · ↑ ↓ browse · Enter similar · o open

HyperWrite's 70B parameter AI model, Reflection, has its performance questioned, after CEO Matt Shumer said something about its upload to Hugging Face was off

something got fucked up during the upload process. Will fix today. Forums: r/LocalLLaMA : Smh: Reflection was too good to be true - reference article

VentureBeat 2024-09-10 Carl Franzen

Discussion

@shinboson @shinboson on x
A story about fraud in the AI research community: On September 5th, Matt Shumer, CEO of OthersideAI, announces to the world that they've made a breakthrough, allowing them to train a mid-size model to top-tier levels of performance. This is huge. If it's real. It isn't. [image]
@shinboson @shinboson on x
Matt starts making claims that there's something wrong with the API. There's something wrong with the upload. For *some* reason there's some glitch that's just about to be fixed. [image]
@shinboson @shinboson on x
tl;dr Matt Shumer is a liar and a fraud. Presumably he'll eventually throw some poor sap engineer under the bus and pretend he was lied to. Grifters shit in the communal pool, sucking capital, attention, and other resources away from people who could actually make use of them. [i…
@alexandr_wang Alexandr Wang on x
The whole Reflection-70B debacle points the the desperate need for a better AI evaluation ecosystem. It needs to be extremely easy to adjudicate: (1) is the model overfit to benchmarks (2) is the model truly unique (i.e. not a wrapper or thin fine-tune)
@shinboson @shinboson on x
They get massive news coverage and are the talk of the town, so to speak. *If* this were real, it would represent a substantial advance in tuning LLMs at the *abstract* level, and could perhaps even lead to whole new directions of R&D. But soon, cracks appear in the story. [image…
@shinboson @shinboson on x
On September 7th, the first independent attempts to replicate their claimed results fail. Miserably, actually. The performance is awful. Further, it is discovered that Matt isn't being truthful about what the released model actually is based on under the hood. [image]
@shinboson @shinboson on x
But the thing about a private API is it's not really clear what it's calling on the backend. They could be calling a more powerful proprietary model under the hood. We should test and see. Trust, but verify. And it turns out that Matt is a liar. [image]
@borismpower Boris Power on x
I got fooled by the Reflection 70B announcement. tl;dr - the model performs very badly
@mattshumer_ Matt Shumer on x
We've figured out the issue. The reflection weights on Hugging Face are actually a mix of a few different models — something got fucked up during the upload process. Will fix today.
r/LocalLLaMA r on reddit
Smh: Reflection was too good to be true - reference article