Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23. — On Tuesday, Anthropic's Claude 3 …

Ars Technica 2024-03-28 Benj Edwards

Discussion

@lmsysorg @lmsysorg on x
[Arena Update] 70K+ new Arena votes🗳️ are in! Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market🔥 Congrats @AnthropicAI on the incredible Claude-3 launch! More exciting... [i…
@skirano Pietro Schirano on x
Honestly, the wildest thing about this whole Claude 3 > GPT-4 is how easy it is to just... switch?? I've rarely used ChatGPT since the day Opus launched, or the OA APIs. There's no “stickiness” in AI experiences, at least not yet. Not until better agentic frameworks drop.
@nickadobos Nick Dobos on x
The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 0613 & Mistral large That's insane for how cheap & fast it is [image]
@nickadobos Nick Dobos on x
Sonnet is free in the Claude website Compare vs gpt 3.5 That's amazing [image]
@benjedwards Benj Edwards on x
For the first time since it appeared on the Chatbot Arena in May 2023, reigning champ GPT-4 (and family) has been surpassed in #1 ranking Anthropic's Claude 3 Opus is now the top-ranked LLM on the leaderboard, GPT-4 Turbo is #2. https://arstechnica.com/...
@lmsysorg @lmsysorg on x
Links & plots: - Vote @ https://chat.lmsys.org/ - Leaderboard https://huggingface.co/... - CI on model strength [image]
@sullyomarr @sullyomarr on x
Looks like GPT4 has been officially overthrown. It did pretty well considering it's nearly a 2 year old model. But the real question is how long till we see gpt4.5/gpt5?
@max_paperclips Shannon Sands on x
Haiku is honestly the biggest piece of news here - it's the “cheap and fast model”, analogous to GPT-3.5, except it's actually as good as earlier model 4. That's absolutely nuts.
r/singularity r on reddit
“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica - Benj Edwards | …
@aiatmeta @aiatmeta on x
Announced today: @MLCommons is adopting Meta Llama 2 70B for MLPerf Inference v4.0 ➡️ https://mlcommons.org/... The benchmark is a standard for measuring ML & AI performance across domains and we're excited to support the community in using Llama 2 as part of the benchmark suite.
@tonymongkolsmai Tony Mongkolsmai on x
@MLPerf results are back baby! Always impressed by my colleagues pushing out performance on the #IntelGaudi 2 AI Accelerators. MLPerf submissions are hard, you have to get it working and make it fast. Two things that aren't trivial when you talk about the scale of things like...
@mlcommons @mlcommons on x
@MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM
@nvidiadc @nvidiadc on x
In the latest #MLPerf benchmarks, NVIDIA H200 Tensor Core GPUs running TensorRT-LLM software delivered the fastest Llama 2 70B inference performance in MLPerf's biggest test of #generativeAI to date. https://blogs.nvidia.com/...
@mlcommons @mlcommons on x
The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI
@typewriters Lauren Wagner on x
One of the best things I've done all year is collaborate with @MLCommons on AI governance and benchmarking They're my favorite kinds of people to work with: pragmatic, optimistic about the future of technology and peoples' ability to shape it, and focused on building solutions
@intel @intel on x
The @MLPerf results are in! We're raising the bar with competitive solutions for your high-performance, high-efficiency deep learning inference needs — even on challenging LLMs. Read more about the results. https://www.intel.com/... #IntelXeon #IntelGaudi #Intel [video]

Chronicles

Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Related Coverage

Discussion