OpenAI and Paradigm announce EVMbench, a benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities
Making smart contracts safer by evaluating AI agents' ability to detect, patch, and exploit vulnerabilities in blockchain environments.
OpenAI
Related Coverage
- OpenAI and Paradigm Launch EVMbench to Test AI Agents Against Smart Contract Vulnerabilities Blockonomi · Brenda Mary
- OpenAI and Paradigm partner on AI agent tool for smart contract security The Block · RT Watson
- OpenAI and Paradigm launch smart contract security evaluation system crypto.news
- OpenAI and Paradigm Launch EVMbench Stress-Test AI on DeFi Bugs The Crypto Times · Jahnu Jagtap
- OpenAI Launches EVMbench to Detect, Patch, and Exploit Vulnerabilities in Blockchain Environments Cyber Security News · Guru Baran
- 🗞️ OpenAI teamed up with crypto firm Paradigm to launch EVMBench to test AI agents on smart contract security Rohan's Bytes · Rohan Paul
- OpenAI releases crypto security tool as Claude blamed for $2.7m Moonwell bug DL News · Aleks Gilbert
- OpenAI Introduces Smart Contract Benchmark for AI Agents as AI and Crypto Converge CoinGape
- Can AI Agents Boost Ethereum Security? OpenAI and Paradigm Created a Testing Ground Decrypt · Stacy Jones
- OpenAI pits AI agents against each other to detect smart contract flaws Cointelegraph · Brayden Lindrea
- Sam Altman's OpenAI unveils ‘EVMbench’ to test whether AI can keep crypto's smart contracts safe CoinDesk
- EVMbench: An Open Benchmark for Smart Contract Security Agents Paradigm · Alpin Yukseloglu
- Open-source benchmark EVMbench tests how well AI agents handle smart contract exploits Help Net Security · Sinisa Markovic
- OpenAI, crypto investor Paradigm launch AI tool for smart contracts Tech in Asia · Minh Le
- OpenAI and Paradigm Launch EVMbench to Stress Test AI on Smart Contract Security Unchained
- OpenAI EVMbench Results: How Claude, GPT-5 and Gemini Ranked on Crypto Security Blockonomi · Trader Edge
- OpenAI Just Showed That AI Can Drain a Crypto Wallet... on Purpose eWeek · Grant Harvey
- Morning Minute: OpenAI and Paradigm Turn Focus to Smart Contracts Decrypt · Tyler Warner
- New benchmark shows AI agents can exploit most smart contract vulnerabilities on their own The Decoder · Maximilian Schreiner
Discussion
-
@scaling01
@scaling01
on x
EVMbench measures the ability of agents to detect, patch, and exploit smart contract vulnerabilities. Opus 4.6 getting mogged by GPT-5.2 and GPT-5.3. Although its detection accuracy is technically higher, it's precision is much lower. (Opus is going shizo) [image]
-
@gdb
Greg Brockman
on x
measuring agentic security capabilities with smart contracts:
-
@0xalpo
Alpin Yukseloglu
on x
new collab from @paradigm and @OpenAI: evmbench is a benchmark and agent harness for exploiting smart contract bugs a few months ago, the best models found <20% of critical, fund-draining @Code4rena bugs in our benchmark. today they find > 70% [video]
-
@openai
@openai
on x
Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://openai.com/...
-
@antoniogm
Antonio García Martínez
on x
Finally a non-bullshit crypto/AI crossover.
-
r/singularity
r
on reddit
OpenAI introduces EVMbench, new Benchmark to test AI Agents
-
r/OpenAI
r
on reddit
OpenAI: Introducing EVMbench, a new benchmark