2024-01-15
@karpathy this is why beyond a certain size, ai data training sets should be required to be open and publically inspectable, and there should be a way to verify a chain of trust to know that other data was not part of the training. Open is safer.
TechCrunch
Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors
[images] Abraham Samma / @abesamma@toolsforthought.social : Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training — This is some sci-fi stuff right here (e...