Claude 3.5 Sonnet appears to be a tremendous leap for Anthropic and LLMs generally, and shows that AI model makers' performance gains are not slowing down

Carl Franzen / VentureBeat :

VentureBeat 2024-06-21 Carl Franzen

Discussion

@skirano Pietro Schirano on x
Claude 3.5 Sonnet + Maestro = Sparks of AGI? I asked to make a Mario clone using just geometric shapes, and the wildest part is that it gave the character animations as well, and the shapes seem like novel concepts. It took 3 minutes. Look at the game! [video]
@alliekmiller Allie K. Miller on x
This is wild. In just 25 seconds, Claude 3.5 Sonnet coded a fully functional Mancala web app for me 🕹️ I only provided ONE screenshot of the game's instructions. It did the rest: - Coded the entire game - Previewed it so I could test - Provided rules of play [video]
@altryne Alex Volkov on x
there's model releases, and there's model releases in MY TOOLS! 😮
@simonw Simon Willison on x
Correction: earlier I said Sonnet 3.5 was half the price of Opus - it's actually 1/5th of the price, $3/million input, $15/million output compared to Opus which is $15/$75 GPT-4o is $5/$15 so the new Sonnet 3.5 undercuts it a little on input token cost
@krishnanrohit Rohit on x
I'm probably gonna switch subs to Claude now I think, its too good now and cant keep waiting for GPT-5
@polynoamial Noam Brown on x
Frontier models like GPT-4o (and now Claude 3.5 Sonnet) may be at the level of a “Smart High Schooler” in some respects, but they still struggle on basic tasks like tic-tac-toe. There was hope that native multimodal training would help but that hasn't been the case. [image]
@kimmonismus @kimmonismus on x
Hey, @OpenAI. You sleep through AGI. While you make promises all the time ("Patience Jimmy, it will be worth the wait") and announce without delivering ("GPT-4o-Voice within weeks") the competition manages to deliver without making big announcements beforehand! Take a leaf out of…
@literallydenis Denis Shiryaev on x
Claude 3.5 Sonnet is the first model to recreate the 3D scene “Data flow” from the movie Hackers on the first try. Great job, Anthropic! [video]
@testingcatalog @testingcatalog on x
Claude 3.5 just generated React jsx code with a simple contact form and managed to run it in the Artifacts playground 🤯 [image]
@emollick Ethan Mollick on x
Been using the new Claude 3.5 model as a tester and now that it is out, I can say it is very very impressive, and the “artifacts” that it generates are like a simpler version of Code Interpreter This is a real-time video of me creating a playable game and editing it with Claude […
@yomaggievo Maggie Vo on x
half my job is doable by 3.5 Sonnet now and I couldn't be happier
@binarybits Timothy B. Lee on x
So far I'm impressed with Claude 3.5 overall but it still makes goofy errors sometimes. [image]
@ajassy Andy Jassy on threads
Anthropic's latest model, Claude 3.5 Sonnet, just released on Amazon Bedrock today. This model combines the speed and affordability customers have really liked thus far from the original Sonnet model, but with much better performance. …
@luokai @luokai on threads
@anthropicai has released its latest model: Claude 3.5 Sonnet. This model operates at twice the speed of Claude 3 Opus, with one-fifth the cost. Here are some key points : - Artifacts Feature: Claude 3.5 Sonnet has introduced a new feature that allows users to interact with cod…
@anthropicai @anthropicai on x
Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost. Try it for free: https://claude.ai/ [image]
@mikeyk Mike Krieger on x
Thrilled to introduce Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth of the cost. The team pulled together... [imag…
@simonw Simon Willison on x
This model is really, really good - I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump)
@emollick Ethan Mollick on x
It is pretty thrilling to use a tool that is this fast and responsive and willing to roll with it: “Claude 3.5, build me a game as an workable prototype that teaches about opportunity cost, but is an arcade game with lovecraftian elements” “Make it better” etc. All realtime. [vid…
@maxwinebach Max Weinbach on x
Claude 3.5 Sonnet is so good I'm already hitting my rate limits because I keep giving it more and more to do each time how do I get rid of rate limits in Claude Pro lol
@adamscochran Adam Cochran on x
Claude's new sandbox feature and new model are kicking the ass of the GPT marketplace. Altman is probably going to feel a lot of pressure to push out V5 here. The leap between 4o and V5 will tell us a lot about the economy in the months to come. If like many speculate, OpenAI...
@aravsrinivas Aravind Srinivas on x
Claude 3.5 is now available on Perplexity Pro. In our international evaluations, it's outperformed GPT 4o. Try it out! [image]
@alexalbert__ Alex Albert on x
Claude is starting to get really good at coding and autonomously fixing pull requests. It's becoming clear that in a year's time, a large percentage of code will be written by LLMs. Let me show you what I mean:
@emollick Ethan Mollick on x
I wouldn't be surprised if Claude 3.5 Sonnet takes the top spot (for now) on the leaderboards. But note the pattern with current models - Gemini, GPT and Claude are now all running smaller, faster, cheaper next-gen models at GPT-4 level, saying bigger versions are coming soon.
@blader Siqi Chen on x
the sheer pettiness of anthropic saying “good evening, sam” in every single one of their demo videos for sonnet 3.5 sends me 💀 how many more days will “sam” sit on gpt5? [image]
@basedbeffjezos @basedbeffjezos on x
Can you feel it, anon? [image]
@pelzeric Eric Pelz on x
We've been testing this model @asana, and we were immediately impressed by its latency and accuracy. But as we dug deeper, we were amazed by its deep understanding of complex context in a way that we hadn't seen in other models. We'll have to up-level our eval grading rubrics!
@eugeneyan Eugene Yan on x
Models will get better/cheaper, and swapping could be as simple as updating the model id. Thus, focus on the durable components of your system, such as evals, guardrails, data flywheel for finetuning, serving infra if self-hosting, etc. https://applied-llms.org/... [image]
@chriscundy Chris Cundy on x
Improving to 60% on GPQA from the previous SoTA of 54% is *really* impressive — the GPQA questions are very difficult! (That is, assuming no test-set leakage...)
@matthewclifford Matt Clifford on x
Impressive from Anthropic... plus UK state capacity in action! [image]
@alexalbert__ Alex Albert on x
Claude 3.5 Sonnet is now available to @AnthropicAI devs everywhere. It's our best model yet - smarter than Claude 3 Opus and twice as fast. And it costs just $3 per million input tokens and $15 per million output tokens. [image]
@hursh Hursh Agrawal on x
We've been blown away by Claude 3.5 Sonnet! It's fast, really incredible at writing, and feels better than any other model at instruction following and reasoning. Congrats to the @AnthropicAI team! What a feat!
@janleike Jan Leike on x
I like the new Sonnet. I'm frequently asking it to explain ML papers to me. Doesn't always get everything right, but probably better than my skim reading, and way faster. Automated alignment research is getting closer...
@andy_l_jones Andy Jones on x
A small part of the 3.5 launch I'm especially excited by - the @AISafetyInst tested 3.5 pre-release! AFAIK this is the first time a government's assessed a frontier model before its release. [image]
@alex @alex on x
back to AI progress, feels good
@joandthezhus Jo Zhu on x
We just released Claude 3.5, a new model from @AnthropicAI that surpasses GPT-4o on multiple benchmarks while being 2x faster and 80% cheaper than Claude 3 Opus. The model is already available for free on https://claude.ai/ and the Claude iOS app. Take it... [image]
@anthropicai @anthropicai on x
Claude 3.5 Sonnet is now our strongest vision model. Sonnet now surpasses Claude 3 Opus across all standard vision benchmarks. Improvements are most noticeable in tasks requiring visual reasoning, like interpreting charts, graphs, or transcribing text from imperfect images. [vide…
@robinai_uk @robinai_uk on x
We hope you love @AnthropicAI 's latest model as much as we do We've been testing Claude Sonnet 3.5 and it has significantly enhanced our Legal AI Assistant's capabilities - it outperformed Opus or GPT4o in our testing Check it out ⤵️ https://www.anthropic.com/... [image]
@jasondclinton Jason D. Clinton on x
Incredibly excited to share this model with the world. I've been extremely impressed by its intelligence: a global, new high watermark. A huge range of new features and understanding are in the release. Check out the game authoring demo on the blog. https://www.anthropic.com/...
@anthropicai @anthropicai on x
Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval). It shows marked improvement in grasping nuance, humor, and complex instructions, all while writing with a natural tone. [v…
@menloventures @menloventures on x
Congratulations to the @anthropicAI team on launching Claude 3.5 Sonnet! State-of-the-art vision capabilities, improved reasoning, and a new ‘Artifacts’ feature. Exciting times in the AI world! https://www.anthropic.com/...
@deedydas Deedy on x
Claude 3.5 is here. Sonnet is the first release and has: — 2x the speed of Opus — 1/5th the cost of Opus — 200k token context window — Better quality than Opus and GPT-4o I don't trust benchmarks so I tried a Physics q that GPT 4-o failed and Sonnet nailed it. Insane launch. [ima…
@garymarcus Gary Marcus on x
When did AI stop being a science? You can't conclude that Claude 3.5 is “better than 4o” when there are no error bars, and GPT 4o actually did better than Claude in 2 of the 6 comparisons. (On a t-test someone ran, they aren't even different).* And no, a single anecdotal physics.…
@binarybits Timothy B. Lee on x
Feel the AGI. [image]
@8teapi @8teapi on x
Hmm that means the pipeline is different.
@mattshumer_ Matt Shumer on x
Claude 3.5 Sonnet has a super recent knowledge cutoff. This is super useful for many applications: [image]
@krishnanrohit Rohit on x
Trying Claude Haiku to extract 18 items from a long-ish text string in Json mode, using instructor. It seems to hallucinate a few items consistently, esp if you run it repeatedly, which is very frustrating. Tips?
@shakeelhashim Shakeel on x
Anthropic confirms that the UK AI Safety Institute tested Claude 3.5 Sonnet before deployment, and shared its results with the US AISI. [image]
@lexnfx Alexei Oreskovic on x
Anthropic's rivalry with OpenAI heats up with its claim new Claude AI surpasses GPT-4o https://fortune.com/...
@binarybits Timothy B. Lee on x
All three leading foundation models, GPT-4o, Gemini (1.5 Pro?), and Claude 3.5 seem to be able count 5 pieces of fruit reliably. Six months ago none of the leading models could do this. [image]
@natfriedman Nat Friedman on x
We're gonna need some new benchmarks, fellas [image]
@sammcallister Sam Mcallister on x
powerful, fast, or safe? pick three. [image]
@mmarshall Matt Marshall on x
BREAKING: @Anthropic just dropped Claude 3.5 Sonnet, which outperforms @#OpenAI's GPT-40 on most measures, and at “half the price.” We have new top dog in the #genAI race! Here's everything else happening, including MSFT's drop of Florence-2, a game-changing vision model.
@altryne Alex Volkov on x
#BREAKING news - Anthropic is guning for GPT-4o with the middle child, Claude Sonnet 3.5! Anthropic releases their first of the 3.5 family, 3.5 sonnet (aka the middle child) is beating GPT-4o at multiple benchmarks, including HumanEval!? and is 5 times cheaper than Opus 😮
@aravsrinivas Aravind Srinivas on x
That was quick! [image]
@mpsellitto Michael Sellitto on x
Excited to release Claude 3.5 Sonnet today - the top performing model across a number of benchmarks In addition to our internal predeployment testing, we were also pleased to work with the UK AI Safety Institute
@jackclarksf Jack Clark on x
Very pleased to have done a pre-dep test of Sonnet 3.5!
@alphasignalai Lior on x
Anthropic just released Claude 3.5, a new model that surpasses GPT-4o on multiple benchmarks while being 2x faster and 80% cheaper than Claude 3 Opus. The model is already available for free on https://claude.ai/ and the Claude iOS app. [image]
r/OpenAI r on reddit
GPT-4o's closest competitor: Claude 3.5 Sonnet

Chronicles

Claude 3.5 Sonnet appears to be a tremendous leap for Anthropic and LLMs generally, and shows that AI model makers' performance gains are not slowing down

Related Coverage

Discussion