METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year
just careful, meticulous rigor. Nikola Jurkovic / @nikolaj2030 : This result updates me towards 4 month doubling times being my median estimate for the next two years. That means by EOY 2026 the time ...
OpenAI says GPT‑5.2 Thinking beats or ties industry professionals on 70.9% of GDPval knowledge work tasks, delivering outputs at >11x the speed and <1% the cost
OpenAI eyes January exit from “code red” John Werner / Forbes : The Wonder And The Promise Of GPT 5.2 Is Here Benj Edwards / Ars Technica : OpenAI releases GPT-5.2 after “code red” Google threat alert...
Thoughts on AI progress and why AI labs' actions hint at a worldview in which AI models will continue to fare poorly at generalization and on-the-job learning
Why I'm moderately bearish in the short term, and explosively bullish in the long term — What are we scaling? X: @sriramk , @_simonsmith , @dwarkesh_sp , @emollick , @dwarkesh_sp , @dwarkesh_sp , @m...
Google launches Gemini 3 Pro Image, aka Nano Banana Pro, with more control, improved text rendering, and enhanced world knowledge, for free in the Gemini app
except when it gaslit me Ryan Whitwam / Ars Technica : Google's new Nano Banana Pro uses Gemini 3 power to generate more realistic AI images Robert Hart / The Verge : Google's new AI image creator too...
Artificial Analysis announces AA-Omniscience, a benchmark for knowledge and hallucination across 40+ topics; Claude 4.1 Opus takes first place in its key metric
@artificialanlys : X: @artificialanlys , @emollick , @scaling01 , @teortaxestex , @artificialanlys , @zephyr_z9 , @artificialanlys , @artificialanlys , @mweinbach , @artificialanlys , and @artificial...
OpenAI estimates ~0.07% of ChatGPT's weekly active users “indicate possible signs of mental health emergencies” like mania, and details its safety improvements
Millions use ChatGPT like a therapist, but that's about to change Viktor Eriksson / PCWorld : Over 1 million ChatGPT users mention suicidal intent every week BBC : ChatGPT shares data on how many user...
Michel Devoret, a Google Quantum AI chief scientist, John Martinis, who left Google in 2020, and John Clarke win the Nobel in Physics for quantum computing work
all 3 @UofCalifornia professors. Home to groundbreaking physicists, including 2 immigrants leading the world in innovation and possibility, California is proud to dream big and deliver even bigger. @e...
Alibaba releases the Qwen3-VL vision models, the Qwen3Guard “safety moderation” models, and three closed-weight models, including Qwen3-Max with 1T+ parameters
Qwen 50.6k — Safetensors qwen3_vl_moe Julian Nabil / Forbes Middle East : Alibaba Introduces Qwen3-Max AI Model With Over 1T Parameters Markus Kasanmascheff / WinBuzzer : Alibaba Releases Qwen3-VL O...
OpenAI debuts GPT‑5-Codex, a version of GPT‑5 optimized for agentic coding in Codex and says it spends its “thinking” time more dynamically than previous models
If You Ask Nicely Frederic Lardinois / The New Stack : OpenAI Launches a New GPT-5 Model for Its Codex Coding Agent David Gewirtz / ZDNET : OpenAI has new agentic coding partner for you now: GPT-5-Cod...
The price per token for AI models has fallen, but costs for developers are rising as newer reasoning models require more tokens to complete tasks
With models doing more ‘thinking,’ the small companies that buy AI from the giants to create apps and services are feeling the pinch X: @ericjhonsa , @emollick , @mims , @mims , @mims , @mims , and @m...