/
Navigation
Chronicles
Browse all articles
Explore
Semantic exploration
Research
Entity momentum
Nexus
Correlations & relationships
Story Arc
Topic evolution
Drift Map
Semantic trajectory animation
Posts
Analysis & commentary
Pulse API
Tech news intelligence API
Browse
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
Concept Search
Semantic similarity search
High Impact Stories
Top coverage by position
Sentiment Analysis
Positive/negative coverage
Anomaly Detection
Unusual coverage patterns
Analysis
Rivalry Report
Compare two entities head-to-head
Semantic Pivots
Narrative discontinuities
Crisis Response
Event recovery patterns
Connected
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Anthropic's Claude 3 Opus surpassed OpenAI's GPT-4 on Chatbot Arena, a crowdsourced LLM leaderboard used by AI researchers; GPT-4 has been first since launch

Anthropic's Claude 3 is first to unseat GPT-4 since launch of Chatbot Arena in May '23.  —  On Tuesday, Anthropic's Claude 3 …

Ars Technica Benj Edwards

Discussion

  • @lmsysorg @lmsysorg on x
    [Arena Update] 70K+ new Arena votes🗳️ are in! Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market🔥 Congrats @AnthropicAI on the incredible Claude-3 launch! More exciting... [i…
  • @skirano Pietro Schirano on x
    Honestly, the wildest thing about this whole Claude 3 > GPT-4 is how easy it is to just... switch?? I've rarely used ChatGPT since the day Opus launched, or the OA APIs. There's no “stickiness” in AI experiences, at least not yet. Not until better agentic frameworks drop.
  • @nickadobos Nick Dobos on x
    The king is dead RIP GPT-4 Claude opus #1 ELo Haiku beats GPT-4 0613 & Mistral large That's insane for how cheap & fast it is [image]
  • @nickadobos Nick Dobos on x
    Sonnet is free in the Claude website Compare vs gpt 3.5 That's amazing [image]
  • @benjedwards Benj Edwards on x
    For the first time since it appeared on the Chatbot Arena in May 2023, reigning champ GPT-4 (and family) has been surpassed in #1 ranking Anthropic's Claude 3 Opus is now the top-ranked LLM on the leaderboard, GPT-4 Turbo is #2. https://arstechnica.com/...
  • @lmsysorg @lmsysorg on x
    Links & plots: - Vote @ https://chat.lmsys.org/ - Leaderboard https://huggingface.co/... - CI on model strength [image]
  • @sullyomarr @sullyomarr on x
    Looks like GPT4 has been officially overthrown. It did pretty well considering it's nearly a 2 year old model. But the real question is how long till we see gpt4.5/gpt5?
  • @max_paperclips Shannon Sands on x
    Haiku is honestly the biggest piece of news here - it's the “cheap and fast model”, analogous to GPT-3.5, except it's actually as good as earlier model 4. That's absolutely nuts.
  • r/singularity r on reddit
    “The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time |  Ars Technica - Benj Edwards | …
  • @aiatmeta @aiatmeta on x
    Announced today: @MLCommons is adopting Meta Llama 2 70B for MLPerf Inference v4.0 ➡️ https://mlcommons.org/... The benchmark is a standard for measuring ML & AI performance across domains and we're excited to support the community in using Llama 2 as part of the benchmark suite.
  • @tonymongkolsmai Tony Mongkolsmai on x
    @MLPerf results are back baby! Always impressed by my colleagues pushing out performance on the #IntelGaudi 2 AI Accelerators. MLPerf submissions are hard, you have to get it working and make it fast. Two things that aren't trivial when you talk about the scale of things like...
  • @mlcommons @mlcommons on x
    @MLPerf Inference v4.0 results are out! This round includes two new benchmarks focused on gen AI: @Meta's Llama 2 70B model and @StableDiffusion XL. See the complete results and learn more: https://mlcommons.org/... #GenAI #LLM
  • @nvidiadc @nvidiadc on x
    In the latest #MLPerf benchmarks, NVIDIA H200 Tensor Core GPUs running TensorRT-LLM software delivered the fastest Llama 2 70B inference performance in MLPerf's biggest test of #generativeAI to date. https://blogs.nvidia.com/...
  • @mlcommons @mlcommons on x
    The @MLPerf Inference v4.0 benchmark suite includes our largest model to date, @Meta's Llama 2 70B large language model with more than 70 billion parameters. Learn more about the selection process, and performance metrics in the benchmark: https://mlcommons.org/... #GenAI
  • @typewriters Lauren Wagner on x
    One of the best things I've done all year is collaborate with @MLCommons on AI governance and benchmarking They're my favorite kinds of people to work with: pragmatic, optimistic about the future of technology and peoples' ability to shape it, and focused on building solutions
  • @intel @intel on x
    The @MLPerf results are in! We're raising the bar with competitive solutions for your high-performance, high-efficiency deep learning inference needs — even on challenging LLMs. Read more about the results. https://www.intel.com/... #IntelXeon #IntelGaudi #Intel [video]