ai safety techniques (Entity)

Coverage Timeline

2024-01-15

TechCrunch 13 related

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — This is some sci-fi stuff right here (even if unsurprising)...

2024-01-15 View

2024-01-14

TechCrunch 8 related

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they're exceptionally good at it.

2024-01-14 View

Loading articles...

ai safety techniques

Top Voices

Explore Further

Coverage Timeline

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors

Quarterly Coverage

Top Sources

Narrative

Relationships