Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch
Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23. — On Tuesday, Anthropic's Claude 3 …
Ars Technica Benj Edwards
Related Coverage
- LMSYS Chatbot Arena Leaderboard lmsys on Hugging Face
- GPT-4 loses its position as “best” LLM to Claude-3 in LMSYS benchmark TechSpot · Cal Jeffrey
- Chatbot Arena: Benchmarking LLMs in the Wild LMSYS Org
- Anthropic's Claude AI Overthrows ChatGPT on Chatbot Arena Leaderboard Decrypt · Jose Antonio Lanz
- Claude takes the top spot in AI chatbot ranking — finally knocking GPT-4 down to second place Tom's Guide · Ryan Morrison
- “The king is dead”-Claude 3 surpasses GPT-4 on Chatbot Arena Hacker News
- “The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time Ars OpenForum
- NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf NVIDIA · Dave Salvator
- New MLPerf Inference Benchmark Results Highlight The Rapid Growth of Generative AI Models MLCommons
- New AI benchmark tests speed of responses to user queries Reuters · Max A. Cherney
- Nvidia triples and Intel doubles generative AI inference performance on new MLPerf benchmark VentureBeat · Sean Michael Kerner
- New AI benchmark test results prove rapid growth for genAI models Cybernews.com · Stefanie Schappert
- Nvidia, Intel tout marks in MLPerf benchmarks running Llama 2 70B Fierce Electronics · Matt Hamblen
- Llama 2 70B: An MLPerf Inference Benchmark for Large Language Models MLCommons
- Nvidia Hopper H200 breaks MLPerf benchmark record with TensorRT — no Blackwell submissions yet, sorry Tom's Hardware · Aaron Klotz
- MLPerf Inference v4.0: NVIDIA Reigns Supreme, Intel Shows Impressive Performance Gains Maginative · Chris McKay
- Nvidia Sweeps AI Benchmarks While AMD Misses The Boat. Again. Forbes · Karl Freund
- Intel Gaudi 2 Remains Only Benchmarked Alternative to NV H100 for GenAI Performance Intel
- NVIDIA MLPerf Inference v4.0 is Out ServeTheHome · Cliff Robinson
- NVIDIA Hopper H200 GPU Continues To Dominate In Latest MLPerf 4.0 Results: Up To 3x Gain In GenAI With TensorRT-LLM Wccftech · Hassan Mujtaba
- Intel Gaudi 2 Accelerators Showcase Competitive Performance Per Dollar Against NVIDIA H100 In MLPerf 4.0 GenAI Benchmarks Wccftech · Hassan Mujtaba
Discussion
-
@lmsysorg
@lmsysorg
on x
[Arena Update] 70K+ new Arena votes🗳️ are in! Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market🔥 Congrats @AnthropicAI on the incredible Claude-3 launch! More exciting... [i…
-
@skirano
Pietro Schirano
on x
Honestly, the wildest thing about this whole Claude 3 > GPT-4 is how easy it is to just... switch?? I've rarely used ChatGPT since the day Opus launched, or the OA APIs. There's no “stickiness” in AI experiences, at least not yet. Not until better agentic frameworks drop.
-
@nickadobos
Nick Dobos
on x
The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 0613 & Mistral large That's insane for how cheap & fast it is [image]
-
@nickadobos
Nick Dobos
on x
Sonnet is free in the Claude website Compare vs gpt 3.5 That's amazing [image]
-
@benjedwards
Benj Edwards
on x
For the first time since it appeared on the Chatbot Arena in May 2023, reigning champ GPT-4 (and family) has been surpassed in #1 ranking Anthropic's Claude 3 Opus is now the top-ranked LLM on the leaderboard, GPT-4 Turbo is #2. https://arstechnica.com/...
-
@lmsysorg
@lmsysorg
on x
Links & plots: - Vote @ https://chat.lmsys.org/ - Leaderboard https://huggingface.co/... - CI on model strength [image]
-
@sullyomarr
@sullyomarr
on x
Looks like GPT4 has been officially overthrown. It did pretty well considering it's nearly a 2 year old model. But the real question is how long till we see gpt4.5/gpt5?
-
@max_paperclips
Shannon Sands
on x
Haiku is honestly the biggest piece of news here - it's the “cheap and fast model”, analogous to GPT-3.5, except it's actually as good as earlier model 4. That's absolutely nuts.
-
r/singularity
r
on reddit
“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica - Benj Edwards | …
-
@aiatmeta
@aiatmeta
on x
Announced today: @MLCommons is adopting Meta Llama 2 70B for MLPerf Inference v4.0 ➡️ https://mlcommons.org/... The benchmark is a standard for measuring ML & AI performance across domains and we're excited to support the community in using Llama 2 as part of the benchmark suite.
-
@tonymongkolsmai
Tony Mongkolsmai
on x
@MLPerf results are back baby! Always impressed by my colleagues pushing out performance on the #IntelGaudi 2 AI Accelerators. MLPerf submissions are hard, you have to get it working and make it fast. Two things that aren't trivial when you talk about the scale of things like...
-
@mlcommons
@mlcommons
on x
@MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM
-
@nvidiadc
@nvidiadc
on x
In the latest #MLPerf benchmarks, NVIDIA H200 Tensor Core GPUs running TensorRT-LLM software delivered the fastest Llama 2 70B inference performance in MLPerf's biggest test of #generativeAI to date. https://blogs.nvidia.com/...
-
@mlcommons
@mlcommons
on x
The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI
-
@typewriters
Lauren Wagner
on x
One of the best things I've done all year is collaborate with @MLCommons on AI governance and benchmarking They're my favorite kinds of people to work with: pragmatic, optimistic about the future of technology and peoples' ability to shape it, and focused on building solutions
-
@intel
@intel
on x
The @MLPerf results are in! We're raising the bar with competitive solutions for your high-performance, high-efficiency deep learning inference needs — even on challenging LLMs. Read more about the results. https://www.intel.com/... #IntelXeon #IntelGaudi #Intel [video]