Q&A with mathematicians behind the “First Proof” experiment, which tests AI's mathematical competency on questions drawn from the authors' unpublished research
Large language models struggle to solve research-level math questions. It takes a human to measure just how poorly they perform.
Ten math problems with proofs known the authors. Proofs are encrypted until Feb 13. For all problems, authors claim both AI-based literature searches and zero-shot attempts at proofs failed. If you want to take a crack, you have until next Friday (2/13)!
To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. — arxiv.org/abs/2602.05192