/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
VOICE ARCHIVE

Chase Brower

@chasebrowe32432
4 posts
2025-11-18
Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ [image]
2025-11-18 View on X
The Verge

Google unveils Gemini 3, its “most intelligent” and “factually accurate” model yet, with improvements across coding and reasoning, and offering less “flattery”

The flagship Gemini 3 Pro model is coming to the Gemini app and Search, with improvements across coding, reasoning, and less ‘flattery.’

Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ [image]
2025-11-18 View on X
9to5Google

Google says Gemini 3 Pro scores 1,501 on LMArena, above 2.5 Pro, and demonstrates PhD-level reasoning with top scores on Humanity's Last Exam and GPQA Diamond

Google today announced Gemini 3 with the goal of bringing “any idea to life.”  The first model available in this family …

2025-04-08
This is the first time for any major LLM that I'm genuinely thinking they just straight up trained on the benchmark answers for the mainline benchmarks Llama 4 is failing spectacularly on like every 3rd party bench i've seen
2025-04-08 View on X
TechCrunch

Meta VP of Generative AI Ahmad Al-Dahle denies a rumor that the company trained Llama 4 Maverick and Scout on test sets, saying that Meta “would never do that”

but the EU doesn't get everything Pascale Davies / Euronews : From a political shift to a more powerful AI: Everything to know about Meta's Llama 4 models Jay Bonggolto / Android C...

This is the first time for any major LLM that I'm genuinely thinking they just straight up trained on the benchmark answers for the mainline benchmarks Llama 4 is failing spectacularly on like every 3rd party bench i've seen
2025-04-08 View on X
The Verge

LMArena says it is updating its leaderboard policies after a Llama 4 Maverick version, which Meta said in fine print is not public, secured the number two spot

With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.