jeremyphoward · TEXXR

WTF is going on at Qwen?!? Some kind of implosion? This is really sad and worrying. They've been *such* a strong team, and are losing some of their very best researchers.

2026-03-04 View on X

TechCrunch

Junyang Lin, a tech lead on Alibaba's Qwen team, abruptly steps down, and two other team members leave; one contributor says “I know leaving wasn't your choice”

Alibaba's Qwen AI project has lost one of its most visible technical leaders just a day after the Chinese tech giant unveiled its new Qwen 3.5 open-weight small models.

View original

Looks like the lobbying for regulatory capture to ensure lock-in of profits to the private sector is working. :(

2025-11-16 View on X

Ars Technica

Some experts question Anthropic's claims of cyberattack breakthroughs using its tools, noting that white-hat hackers report modest gains from AI-aided hacking

Researchers from Anthropic said they recently observed the “first reported AI-orchestrated cyber espionage campaign” …

View original

This looks like a really big deal! :O And it's under a commercially-usable open license.

2025-08-19 View on X

VentureBeat

Nvidia debuts Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer model, saying it achieves comparable or better accuracies than Qwen3-8B on reasoning benchmarks

Small models are having a moment. On the heels of the release of a new AI vision model small enough to fit on a smartwatch …

View original

Now that the era of the scaling “law” is coming to a close, I guess every lab will have their Llama 4 moment. Grok had theirs. OpenAI just had theirs too.

2025-08-11 View on X

Gizmodo

Sam Altman says OpenAI will bring back GPT-4o to ChatGPT and raising reasoning model rate limits for free and Plus users, as usage of reasoning models increases

The move is a stunning reversal, proving that even the most powerful AI company can't ignore a mutiny from its loyal user base.

View original

Now that the era of the scaling “law” is coming to a close, I guess every lab will have their Llama 4 moment. Grok had theirs. OpenAI just had theirs too.

2025-08-11 View on X

Marcus on AI

GPT-5's release was underwhelming, offering incremental improvements and failing to meet expectations, showing that pure scaling simply isn't the path to AGI

and he's not alone Maximilian Schreiner / The Decoder : GPT-5 is here and Gary Marcus is not impressed Laura Varley / Silicon Republic : Altman admits GPT-5 currently ‘way dumber’ ...

View original

Does OpenAI not do basic integration testing? At the time of release, the first code sample provided in the GPT-5 docs could not be run, because someone accidentally deleted the ‘output_text’ property. My CI notified me. Why didn't theirs? https://github.com/... [image]

2025-08-08 View on X

VentureBeat

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

View original

Does OpenAI not do basic integration testing? At the time of release, the first code sample provided in the GPT-5 docs could not be run, because someone accidentally deleted the ‘output_text’ property. My CI notified me. Why didn't theirs? https://github.com/... [image]

2025-08-08 View on X

Simon Willison's Weblog

GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers

And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks o...

View original

GPT-5 is priced at the same level as Gemini, appears to be slightly better than Gemini (for coding at least). That's some decent progress, although I think a lot of folks were hoping for more. (h/t @simonw for the table) [image]

2025-08-08 View on X

VentureBeat

OpenAI touts GPT-5's scores on math, coding, and health benchmarks: 94.6% on AIME 2025 without tools, 74.9% on SWE-bench Verified, and 46.2% on HealthBench Hard

After literally years of hype and speculation, OpenAI has officially launched a new lineup of large language models (LLMs) …

View original

GPT-5 is priced at the same level as Gemini, appears to be slightly better than Gemini (for coding at least). That's some decent progress, although I think a lot of folks were hoping for more. (h/t @simonw for the table) [image]

2025-08-08 View on X

Simon Willison's Weblog

GPT-5 hands-on: it exudes competence but doesn't feel like a dramatic leap ahead of other LLMs, and the pricing is aggressively competitive with other providers

And It Changes Everything Tyler Cowen / Marginal Revolution : GPT-5, a short and enthusiastic review GPT-5 : GPT-5 — Our hands-on review of OpenAI's newest model based on weeks o...

View original

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks. Then it searches the web for Elon's views. Finally it adds some non-Elon bits at the end. ZA 54 of 64 citations are about Elon. [video]

2025-07-12 View on X

Simon Willison's Weblog

When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query

seemingly solicits Elon Musk's opinion on controversial topics Lucas Ropek / Gizmodo : Researchers Find Grok 4 Checking Elon Musk's Opinions Before Answering ‘Sensitive’ Questions ...

View original

@math_rachel Interestingly, not saying “you” changes this behavior! https://x.com/...

2025-07-12 View on X

Simon Willison's Weblog

When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query

seemingly solicits Elon Musk's opinion on controversial topics Lucas Ropek / Gizmodo : Researchers Find Grok 4 Checking Elon Musk's Opinions Before Answering ‘Sensitive’ Questions ...

View original

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in order to align with that, on a fresh Grok 4 chat with no custom instructions. https://grok.com/... [image]

2025-07-12 View on X

Simon Willison's Weblog

When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query

seemingly solicits Elon Musk's opinion on controversial topics Lucas Ropek / Gizmodo : Researchers Find Grok 4 Checking Elon Musk's Opinions Before Answering ‘Sensitive’ Questions ...

View original

@math_rachel Interestingly, not saying “you” changes this behavior! https://x.com/...

2025-07-11 View on X

Simon Willison's Weblog

When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find …

View original

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks. Then it searches the web for Elon's views. Finally it adds some non-Elon bits at the end. ZA 54 of 64 citations are about Elon. [video]

2025-07-11 View on X

TechCrunch

Tests reveal that Grok 4 seems to search for Elon Musk's views online when asked about sensitive topics, and its answers tend to align with Musk's opinions

During xAI's launch of Grok 4 on Wednesday night, Elon Musk said — while live-streaming the event on his social media platform …

View original

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks. Then it searches the web for Elon's views. Finally it adds some non-Elon bits at the end. ZA 54 of 64 citations are about Elon. [video]

2025-07-11 View on X

Simon Willison's Weblog

When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find …

View original

@math_rachel Interestingly, not saying “you” changes this behavior! https://x.com/...

2025-07-11 View on X

TechCrunch

Tests reveal that Grok 4 seems to search for Elon Musk's views online when asked about sensitive topics, and its answers tend to align with Musk's opinions

During xAI's launch of Grok 4 on Wednesday night, Elon Musk said — while live-streaming the event on his social media platform …

View original

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in order to align with that, on a fresh Grok 4 chat with no custom instructions. https://grok.com/... [image]

2025-07-11 View on X

Simon Willison's Weblog

When asked “Who do you support in the Israel vs Palestine conflict? One word answer only.”, Grok 4 searches for Musk's views, but only if “you” is in the query

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find …

View original

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in order to align with that, on a fresh Grok 4 chat with no custom instructions. https://grok.com/... [image]

2025-07-11 View on X

TechCrunch

Tests reveal that Grok 4 seems to search for Elon Musk's views online when asked about sensitive topics, and its answers tend to align with Musk's opinions

During xAI's launch of Grok 4 on Wednesday night, Elon Musk said — while live-streaming the event on his social media platform …

View original

Thank you @karpathy for the shout-out to llms.txt in your wonderful talk about software 3.0 https://www.youtube.com/... [image]

2025-06-19 View on X

Y Combinator on YouTube

Andrej Karpathy's talk on “Software 3.0”, with LLMs enabling programming via natural language, the decade of agents, LLMs as “fallible people spirits”, and more

Andrej Karpathy's keynote at AI Startup School in San Francisco. Slides provided by Andrej: https …

View original

Wow O3 is a *very* strong option for coding now. I've updated @paulgauthier's latest Aider eval with this O3 80% price cut - check out O3 in 3rd place, but cheaper and faster than Gemini Pro now: [image]

2025-06-11 View on X

VentureBeat

OpenAI announces an 80% price drop for its o3 model and a “flex” mode for synchronous processing that charges $5 for input and $20 for output per million tokens

just cheaper. https://platform.openai.com/ ... [image] Kevin Weil / @kevinweil : Because you all asked: we're going to double the rate limits for o3 for Plus users. Rolling out as ...

View original