How Anthropic, OpenAI, and Google are testing AI models by having them play Pokémon Blue on Twitch to track a model's ability to reason and make decisions
Nintendo's original Pokémon games are becoming a popular and strangely effective way to test and benchmark new artificial-intelligence models.