State of AI safety: as capabilities grow and models can monitor other models, issues like adversarial robustness persist and society is still not ready for AI
Appreciate Sam endorsing this post which contains some pretty frank talk about the good bad and ugly of AI safety in 2026. I will keep saying that the actions of OpenAI's Global Affairs team (and related Super Pacs) do not seem consistent with taking these concerns seriously! [im…
Safety folks at the AI companies apparenly can't tell the difference between “the AI superficially does mostly what I ask” and the deep alignment properties that'd be needed for superintelligence, which casts doubt on their ability to pull off alignment.
My views are similar. Alignment progress better than I expected (though still need lots more work, and better assurances that progress will remain robust). Societal readiness worse than I hoped. (Yet another 100m anti guardrails AI superpac announced on Sunday unlikely to help) […
@boazbaraktcs i think it is a mistake to think that there ever can be societal readiness for a disruptive technology before the disruptive effects are felt. governments can move very fast in a short time when faced with an obvious effect (e.g. 2008, covid) but not otherwise.