OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost
OpenAI eyes January exit from “code red” John Werner / Forbes : The Wonder And The Promise Of GPT 5.2 Is Here Benj Edwards / Ars Technica : OpenAI releases GPT-5.2 after “code red” Google threat alert...
OpenAI launches AgentKit, a toolkit for building and deploying AI agents, including Agent Builder, which Sam Altman described as like Canva for building agents
New tools for building, deploying, and optimizing agents. NDTV Profit : What Is AI Agent Builder And How Does It Work? OpenAI Launches New Set Of Tools For Developers Aman Gupta / Livemint : OpenAI ag...
GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers
And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks of testing — The Ve...
OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2; the smaller gpt-oss-20b can run locally on a device with 16GB+ of RAM
gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models Simon Willison / Simon Willison's Weblog : OpenAI's new open weight (Apache 2) models are really good OpenAI on GitHub : ...
Anthropic releases Claude 3.7 Sonnet, a hybrid model that can produce fast responses or extended, step-by-step thinking, and Claude Code, an agentic coding tool
and it could be a game changer Ghacks : Anthropic Unveils Claude 3.7: First Hybrid Reasoning AI Model Rowan Cheung / The Rundown AI : Claude enters the reasoning era Siddharth Jindal / Analytics India...
OpenAI unveils o3 and o3-mini, trained to “think” before responding via what OpenAI calls a “private chain of thought”, and plans to launch them in early 2025
12 Days of OpenAI: Day 12 Naomi Li Gan / Tech in Asia : OpenAI unveils AI model for advanced reasoning Bojan Stojkovski / Interesting Engineering : OpenAI unveils o3 reasoning AI model to tackle compl...
HyperWrite CEO unveils Reflection 70B, based on Llama 3.1 70B Instruct and trained using reflection-tuning, and says it beats GPT-4o in all benchmarks tested
There's a new king in town: Matt Shumer, co-founder and CEO of AI writing startup HyperWrite, today unveiled Reflection 70B …
OpenAI launches GPT-4o mini, a smaller, cheaper offshoot of GPT-4o, replacing GPT-3.5 Turbo, with support for all multimodal inputs and outputs coming soon
each step more refined Aiming for the ‘perfect training set’ - the ultimate concentrate of all human knowledge and creativity Whether Gemini 1.5 Jeff Harris / @jeffintime : hidden gem: GPT-4o mini sup...
Anthropic announces Claude 3 Opus, Sonnet, and Haiku, aiming to reduce AI model hallucinations; Opus and Sonnet are available now, and Haiku in the coming weeks
Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision. [image] Flo Crivello / @altimor : Claude v3's sco...
Anthropic debuts Claude 2.1, offering a context window of up to 200K tokens, or ~150K words, a claimed 50%+ increase in accuracy, new API integrations, and more
Our latest model, Claude 2.1, is now available over API in our Console … Aaron Drapkin / Tech.co : Claude 2.1: Anthropic's ChatGPT Rival Gets New Killer Feature Jose Antonio Lanz / Decrypt : Anthropic...