nickcammarata · TEXXR

I think interp is prob the most important technical problem in the world right now (ever?), and I think alignment is downstream of it. it's also just quite fun, like zoology and cartography, it's aesthetically beautiful to me to study these mechanical creatures. highly rec it

2025-04-26 View on X

Dario Amodei

Interpretability, or understanding how AI models work, can help mitigate many AI risks, such as misalignment and misuse, that stem from AI systems' opacity

In the decade that I have been working on AI, I've watched it grow from a tiny academic field to arguably the most important economic and geopolitical issue in the world.

View original

I think we're in the timeline that solves interpretability before any true takeoff. between this and a couple other directions I've never been more excited about the field

2025-03-28 View on X

Wired

Interviews with Dario Amodei, Daniela Amodei, and other executives about Anthropic's origin, Claude, why DeepSeek isn't a threat, reaching AGI safely, more

The brother goes on vision quests. The sister is a former English major. Together, they defected from OpenAI, started Anthropic …

View original

I think we're in the timeline that solves interpretability before any true takeoff. between this and a couple other directions I've never been more excited about the field

2025-03-28 View on X

Fortune

Anthropic says it created a new tool for deciphering how LLMs “think” and used it to resolve some key questions about how Claude and probably other LLMs work

Anthropic CEO Dario Amodei. Today the company announced that its researchers had made a breakthrough in probing …

View original

in my opinion it's likely the most exciting time in mech interpretability is about to start to build great interfaces for studying models you want the preliminaries of clean neurons (this work) and short labels (feature vis for image, auto-interpretability labeling for language)

2023-10-09 View on X

Anthropic

A research paper details how decomposing groups of neural network neurons into “interpretable features” may improve safety by enabling the monitoring of LLMs

Neural networks are trained on data, not programmed to follow rules. With each step of training …

View original

in my opinion it's likely the most exciting time in mech interpretability is about to start to build great interfaces for studying models you want the preliminaries of clean neurons (this work) and short labels (feature vis for image, auto-interpretability labeling for language)

2023-10-08 View on X

Anthropic

A research paper details how decomposing groups of neurons in a neural network into interpretable “features” may improve safety by enabling monitoring of LLMs

Neural networks are trained on data, not programmed to follow rules. With each step of training …

View original

Extremely excited to see what Max does next https://twitter.com/...

2021-05-03 View on X

CNBC

Neuralink's President and co-founder Max Hodak tweets that he has left the brain-computer interface startup

Lora Kolodny / CNBC :

View original