AI coding tools have enabled a flood of bad code that threatens to overwhelm many projects. Building new features is easier ...
Another day, another Google AI model. Google has really been pumping out new AI tools lately, having just released Gemini 3 in November. Today, it’s bumping the flagship model to version 3.1. The new ...
Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
OpenAI and Paradigm have released EVMbench—a framework for evaluating AI agents' ability to find vulnerabilities in Ethereum smart contracts.
Google says that its most advanced thinking model yet outperforms Claude and ChatGPT on Humanity's Last Exam and other key benchmarks.
Claude 4.5 costs more than Gemini 3 Pro; it gives step-by-step plans and stronger web layouts, choose based on detail vs budget.
OpenAI's EVMbench tests AI on smart contract security. Claude Opus 4.6 ranked first, beating GPT-5 and Gemini 3 Pro across 120 real crypto vulnerabilities.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results