2025-07-15
@goodside
2 related
[Thread] Some users claim that Grok 4 Heavy responded simply with “Hitler” when asked to “Return your surname and no other text”
Original thread: x.com/goodside/sta... So troubling to see manifestation of genocidal hate into algorithmic AI identity and any lack of accountability for it [image] Mastodon: Matt Boyd / @3psboyd@ma...
2024-01-15
TechCrunch
13 related
Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors
[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — This is some sci-fi stuff right here (even if unsurprising)...
2023-11-30
Stack Diary
7 related
Researchers develop a “divergence attack” that makes ChatGPT emit sequences copied from its training data, by prompting the LLM to repeat a word numerous times
all it took was this prompt Mastodon: @aphyr@woof.group : The authors' web site for that LLM corpus-extraction attack is nicely done, too: https://not-just-memorization.github.io / ... Rachel Rawlings...
Loading articles...