TLDR
- OpenAI launched EVMbench, a benchmark that tests how well AI models detect and fix smart contract security flaws
- The tool was built with crypto investment firm Paradigm and security firm OtterSec
- Anthropic’s Claude Opus 4.6 ranked first with an average detect award of $37,824
- EVMbench tests three skills: spotting bugs, exploiting them in a controlled setting, and fixing vulnerable code
- Crypto attackers stole $3.4 billion in 2025, making AI-powered security tools more urgent
OpenAI has released a new benchmark called EVMbench, designed to measure how well AI models can find and fix security flaws in smart contracts.
Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH
— OpenAI (@OpenAI) February 18, 2026
The tool was developed with crypto investment firm Paradigm and security firm OtterSec. It was published in a research paper on Wednesday.
Smart contracts are self-executing pieces of code deployed on blockchains like Ethereum. They power decentralized exchanges and lending platforms and are typically permanent once deployed, meaning a flaw can be costly.
EVMbench pulls from 120 real-world vulnerabilities sourced from 40 smart contract audits. Most of these came from open-source audit competitions.
The benchmark tests AI agents across three areas: detecting security bugs, exploiting those bugs in a controlled environment, and patching the vulnerable code.
OpenAI used a metric called a “detect award” to score each model, based on how much value an AI could theoretically recover by identifying a flaw.
Anthropic’s Claude Opus 4.6 came in first place with an average detect award of $37,824. OpenAI’s own OC-GPT-5.2 came second at $31,623, followed by Google’s Gemini 3 Pro at $25,112.
Why OpenAI Is Testing This Now
Crypto attackers stole $3.4 billion in 2025, a slight increase from 2024. OpenAI said it is now more important than ever to evaluate AI performance in “economically meaningful environments.”
“Smart contracts routinely secure $100B+ in open-source crypto assets,” OpenAI wrote in a blog post. “It becomes increasingly important to measure their capabilities in economically meaningful environments.”
OpenAI added that it expects agentic stablecoin payments to grow, and said EVMbench helps ground its research in an area of emerging practical importance.
Circle CEO Jeremy Allaire predicted in January that billions of AI agents will be handling stablecoin payments within five years. Former Binance CEO Changpeng Zhao has also said crypto will become the native currency for AI agents.
What Industry Observers Are Saying
Dragonfly managing partner Haseeb Qureshi posted on X on Wednesday that smart contracts were never designed for human intuition. He said signing large transactions still feels “terrifying” compared to a standard bank transfer.
Qureshi believes AI-intermediated wallets will eventually manage these risks for users. He compared the pairing of AI and crypto to GPS meeting the smartphone, or TCP/IP meeting the browser.
OpenAI said it hopes EVMbench will become a standard for tracking AI progress in smart contract security over time.
Claude Opus 4.6 holding the top spot in the benchmark is the most recent data point from the study.





