Anthropic hires former OpenAI safety lead Jan Leike to head up a new Superalignment team; a source says Leike will report to Chief Science Officer Jared Kaplan
Here's What We Know Wendy Lee / Los Angeles Times : OpenAI forms safety and security committee as concerns mount about AI Rounak Jain / Benzinga : OpenAI Former ‘Superalignment’ Lead Joins Jeff Bezos-...
Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors
[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — This is some sci-fi stuff right here (even if unsurprising)...
OpenAI says its board can hold back the release of an AI model even if OpenAI's leadership says it's safe, and announces a new internal safety advisory group
The study of frontier AI risks has fallen far short of what is possible and where we need to be. Ina Fried / Axios : OpenAI touts ‘scientific approach’ to measure catastrophic risk Matthias Bastian / ...
Framing AI debates as a schism between people worried about AI going rogue and those illuminating actual harms is ahistorical and obscures important research
In two recent conversations with very thoughtful journalists, I was asked about the apparent ‘schism’ between those making a lot … Bluesky: @abeba.bsky.social , @mmitchell.bsky.social , and @emilymben...
How Silicon Valley became obsessed with effective altruism, championed by SBF before he dismissed it as a dodge, and doomsday scenarios like killer rogue AI
Sonia Joseph was 14 years old when she first read Harry Potter and the Methods of Rationality, a mega-popular piece of fan fiction … Tweets: @chafkin , @ellenhuet , @business , @can , @crypto , @sonia...