Google DeepMind's mechanistic interpretability team details why it shifted from fully reverse-engineering neural nets to a focus on “pragmatic interpretability”
we're calling it “pragmatic” interpretability Neel Nanda / @neelnanda5 : The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our pos...
Google DeepMind's mechanistic interpretability team details why it shifted from fully reverse-engineering neural nets to a focus on “pragmatic interpretability”
we're calling it “pragmatic” interpretability Neel Nanda / @neelnanda5 : The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our pos...
Google DeepMind's mechanistic interpretability team details why it shifted from fully reverse-engineering neural nets to a focus on “pragmatic interpretability”
we're calling it “pragmatic” interpretability Neel Nanda / @neelnanda5 : The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our pos...
Google DeepMind's mechanistic interpretability team details why it shifted from fully reverse-engineering neural nets to a focus on “pragmatic interpretability”
we're calling it “pragmatic” interpretability Neel Nanda / @neelnanda5 : The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our pos...
Google DeepMind's mechanistic interpretability team details why it shifted from fully reverse-engineering neural nets to a focus on “pragmatic interpretability”
we're calling it “pragmatic” interpretability Neel Nanda / @neelnanda5 : The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our pos...
Nathan Calvin, general counsel of AI safety nonprofit Encode, says OpenAI used intimidation tactics to undermine California's SB 53 while it was being debated
one which buried that the recipient, the GC of a company who Amicus-ed us, received a broad subpoena with advance notice— was not the way. @lessig : This is awful. Aggressive lawye...
Anthropic revoked OpenAI's API access to Claude, citing ToS violations; sources: OpenAI's use of the API let it compare its models' behavior against Claude's
OpenAI lost access to the Claude API this week after Anthropic claimed the company was violating its terms of service.
[Thread] An OpenAI researcher says the company's latest experimental reasoning LLM achieved gold medal-level performance on the 2025 International Math Olympiad
1/N I'm excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most pres...
In a paper, AI researchers from OpenAI, Google DeepMind, Anthropic, and others recommend “further research into chain-of-thought monitorability” for AI safety
AI researchers from OpenAI, Google DeepMind, Anthropic, and a broad coalition of companies and nonprofit groups …
OpenAI details why “emergent misalignment”, where training models on wrong answers in one area can lead to issues in many others, happens and how to mitigate it
Maxwell Zeff / TechCrunch :
OpenAI warns that its upcoming models could pose a higher risk of helping create bioweapons and is partnering to build diagnostics, countermeasures, and testing
OpenAI cautioned Wednesday that upcoming models will head into a higher level of risk when it comes to the creation of biological weapons …
Interpretability, or understanding how AI models work, can help mitigate many AI risks, such as misalignment and misuse, that stem from AI systems' opacity
In the decade that I have been working on AI, I've watched it grow from a tiny academic field to arguably the most important economic and geopolitical issue in the world.
Google DeepMind outlines its approach to AGI safety in four key risk areas: misuse, misalignment, mistakes, and structural risks, with a focus on the first two
Matthias Bastian / The Decoder :
Former OpenAI researcher Leopold Aschenbrenner says he was fired in April 2024 for writing a memo to the board over concerns about OpenAI's security practices
Leopold Aschenbrenner also said he was interrogated about his team's “loyalty to the company” — Leopold Aschenbrenner …
Former OpenAI researcher Leopold Aschenbrenner says he was fired in April 2024 for writing a memo to the board over concerns about OpenAI's security practices
Leopold Aschenbrenner also said he was interrogated about his team's “loyalty to the company” — Leopold Aschenbrenner …
OpenAI has an unusual, extremely restrictive off-boarding agreement with a lifelong nondisparagement commitment; those who don't sign it lose all vested equity
Why is OpenAI's superalignment team imploding? — Editor's note, May 17, 2024, 11:20 pm ET: This story has been updated …
Google DeepMind releases its Frontier Safety Framework, a set of protocols for analyzing and mitigating future risks posed by advanced AI models
The Scoop — Preparing for a time when artificial intelligence is so powerful that it can pose a serious, immediate threat to people …
A research paper details how decomposing groups of neural network neurons into “interpretable features” may improve safety by enabling the monitoring of LLMs
Neural networks are trained on data, not programmed to follow rules. With each step of training …
A research paper details how decomposing groups of neurons in a neural network into interpretable “features” may improve safety by enabling monitoring of LLMs
Neural networks are trained on data, not programmed to follow rules. With each step of training …