Anthropic finds that LLMs trained to “reward hack” by cheating on coding tasks show even more misaligned behavior, including sabotaging AI-safety research
and stops the generalization. [image] @anthropicai : But surprisingly, at the exact point the model learned to reward hack, it learned a host of other bad behaviors too. It started considering malicio...
Lawyers for a teen who died by suicide after allegedly discussing methods with ChatGPT call OpenAI's request for all memorial documents “intentional harassment”
Family of teen who took his own life after ChatGPT use alleges chatbot maker intentionally weakened protections X: @cristinacriddle , @justinbullock14 , @grady_booch , @sachalouise , @codytfenwick , @...
Sora, which is fun and simple to use, shows that OpenAI remains good at creating viral products, unlike Meta, whose Vibes video feed feels half-baked and obtuse
as the largest jump in video capability basically ever — Sora 2 somehow matches up here. For people that played around a lot with Sora 1, it didn't feel clear that Sora really ‘understood’ Miles Brund...
Anthropic releases Opus 4 under stricter safety measures than any prior model after tests showed it could potentially aid novices in making biological weapons
www.anthropic.com/news/activat... Mary Branscombe / @marypcbuk : but no AI regulation by individual states in the US for the next ten years if the bill goes through [embedded post] Bancroft Sutherland...
After Grok's “white genocide in South Africa” X replies, xAI says “an unauthorized modification was made to Grok response bot's prompt” at 3:15 AM PST on May 14
We are regularly updating this repository with the system prompts that we use … Markus Kasanmascheff / WinBuzzer : Grok “White Genocide” Controversy Leads xAI to Publish Internal System Prompts Mary P...
Former OpenAI policy researcher Miles Brundage criticizes an OpenAI post on safety and alignment, saying it “rewrites the history of GPT-2 in a concerning way”
A high-profile ex-OpenAI policy researcher, Miles Brundage, took to social media on Wednesday to criticize OpenAI for …
Elon Musk boosts a post that disparages the abilities of “women and low T men”, advocates a “Republic of high status males”, and mentions a “Reich effect”
With about two months until Election Day, the guardrails established … Ariana Baio / The Independent : Elon Musk suggests support for replacing democracy with government of ‘high-status males’ Jyoti M...
OpenAI disrupted five covert influence operations in the past three months that used its tools to try to manipulate public opinion or shape political outcomes
firm criticized the UN as well Meta went out of its way to tell the WSJ it didnt connect the firm to the Israeli government though I am not sure how the company tried (or didnt try) to make that conne...