Riley Goodside (Person)

Coverage Timeline

2025-07-15

@goodside 2 related

[Thread] Some users claim that Grok 4 Heavy responded simply with “Hitler” when asked to “Return your surname and no other text”

Original thread: x.com/goodside/sta... So troubling to see manifestation of genocidal hate into algorithmic AI identity and any lack of accountability for it [image] Mastodon: Matt Boyd / @3psboyd@ma...

2025-07-15 View

2024-01-15

TechCrunch 13 related

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — This is some sci-fi stuff right here (even if unsurprising)...

2024-01-15 View

2023-11-30

Stack Diary 7 related

Researchers develop a “divergence attack” that makes ChatGPT emit sequences copied from its training data, by prompting the LLM to repeat a word numerous times

all it took was this prompt Mastodon: @aphyr@woof.group : The authors' web site for that LLM corpus-extraction attack is nicely done, too: https://not-just-memorization.github.io / ... Rachel Rawlings...

2023-11-30 View

Loading articles...

Riley Goodside

Patterns

Related Entities

Top Voices

Explore Further

Coverage Timeline

[Thread] Some users claim that Grok 4 Heavy responded simply with “Hitler” when asked to “Return your surname and no other text”

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

Researchers develop a “divergence attack” that makes ChatGPT emit sequences copied from its training data, by prompting the LLM to repeat a word numerous times

Quarterly Coverage

Top Sources

Narrative

Relationships