State of AI safety: as capabilities grow and models can monitor other models, issues like adversarial robustness persist and society is still not ready for AI

Windows On Theory 2026-03-31 Boaz Barak

@_nathancalvin Nathan Calvin on x
Appreciate Sam endorsing this post which contains some pretty frank talk about the good bad and ugly of AI safety in 2026. I will keep saying that the actions of OpenAI's Global Affairs team (and related Super Pacs) do not seem consistent with taking these concerns seriously! [im…
@so8res Nate Soares on x
Safety folks at the AI companies apparenly can't tell the difference between “the AI superficially does mostly what I ask” and the deep alignment properties that'd be needed for superintelligence, which casts doubt on their ability to pull off alignment.
@sama Sam Altman on x
This is a very good post:
@_nathancalvin Nathan Calvin on x
My views are similar. Alignment progress better than I expected (though still need lots more work, and better assurances that progress will remain robust). Societal readiness worse than I hoped. (Yet another 100m anti guardrails AI superpac announced on Sunday unlikely to help) […
@fleetingbits @fleetingbits on x
@boazbaraktcs i think it is a mistake to think that there ever can be societal readiness for a disruptive technology before the disruptive effects are felt. governments can move very fast in a short time when faced with an obvious effect (e.g. 2008, covid) but not otherwise.
@boazbaraktcs Boaz Barak on x
New blog post: the state of AI safety in four fake graphs. [image]

Chronicles