chasebrowe32432

2025-11-18

Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ [image]

2025-11-18 View on X

The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

View original

Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ [image]

2025-11-18 View on X

9to5Google

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond

Google today announced Gemini 3 with the goal of bringing “any idea to life.” The first model available in this family …

View original

2025-04-08

This is the first time for any major LLM that I'm genuinely thinking they just straight up trained on the benchmark answers for the mainline benchmarks Llama 4 is failing spectacularly on like every 3rd party bench i've seen

2025-04-08 View on X

TechCrunch

Meta VP of Generative AI Ahmad Al-Dahle denies a rumor that the company trained Llama 4 Maverick and Scout on test sets, saying that Meta “would never do that”

but the EU doesn't get everything Pascale Davies / Euronews : From a political shift to a more powerful AI: Everything to know about Meta's Llama 4 models Jay Bonggolto / Android C...

View original

2025-04-08 View on X

The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.

View original