2025-04-26
I think interp is prob the most important technical problem in the world right now (ever?), and I think alignment is downstream of it. it's also just quite fun, like zoology and cartography, it's aesthetically beautiful to me to study these mechanical creatures. highly rec it
Dario Amodei
Interpretability, or understanding how AI models work, can help mitigate many AI risks, such as misalignment and misuse, that stem from AI systems' opacity
In the decade that I have been working on AI, I've watched it grow from a tiny academic field to arguably the most important economic and geopolitical issue in the world.
2025-03-28
I think we're in the timeline that solves interpretability before any true takeoff. between this and a couple other directions I've never been more excited about the field
Wired
Interviews with Dario Amodei, Daniela Amodei, and other executives about Anthropic's origin, Claude, why DeepSeek isn't a threat, reaching AGI safely, more
The brother goes on vision quests. The sister is a former English major. Together, they defected from OpenAI, started Anthropic …
I think we're in the timeline that solves interpretability before any true takeoff. between this and a couple other directions I've never been more excited about the field
Fortune
Anthropic says it created a new tool for deciphering how LLMs “think” and used it to resolve some key questions about how Claude and probably other LLMs work
Anthropic CEO Dario Amodei. Today the company announced that its researchers had made a breakthrough in probing …
2023-10-09
in my opinion it's likely the most exciting time in mech interpretability is about to start to build great interfaces for studying models you want the preliminaries of clean neurons (this work) and short labels (feature vis for image, auto-interpretability labeling for language)
Anthropic
A research paper details how decomposing groups of neural network neurons into “interpretable features” may improve safety by enabling the monitoring of LLMs
Neural networks are trained on data, not programmed to follow rules. With each step of training …
2023-10-08
in my opinion it's likely the most exciting time in mech interpretability is about to start to build great interfaces for studying models you want the preliminaries of clean neurons (this work) and short labels (feature vis for image, auto-interpretability labeling for language)
Anthropic
A research paper details how decomposing groups of neurons in a neural network into interpretable “features” may improve safety by enabling monitoring of LLMs
Neural networks are trained on data, not programmed to follow rules. With each step of training …
2021-05-03
Extremely excited to see what Max does next https://twitter.com/...
CNBC
Neuralink's President and co-founder Max Hodak tweets that he has left the brain-computer interface startup
Lora Kolodny / CNBC :