Anthropic's latest flagship model, Claude Sonnet 4.6, is out now.
Backboard.io announced it has achieved state-of-the-art performance across both leading AI memory benchmarks, a first ...
Office Productivity: The Apex Agents benchmark, which evaluates productivity in office-like environments, saw Gemini 3.1 Pro ...
The most significant advancement in Gemini 3.1 Pro lies in its performance on rigorous logic benchmarks. Most notably, the model achieved a verified score of 77.1% on ARC-AGI-2.
Google just released its most capable Gemini 3.1 Pro AI model that beats all frontier models on Humanity's Last Exam and ...
OpenAI launches EVMbench with Paradigm to test AI on smart contract vulnerabilities and commits $10M to cybersecurity research.
Opus AI performance for coding, computer use, and agents at Sonnet pricing ($3/$15 per million tokens), reshaping enterprise automation economics with a 1M-token context window and stronger ...
Aquant today released The 2026 Field Service KPI Benchmark Report, an industry-wide analysis of anonymized performance data from 161 service organizations. The report spans nearly 30 million service ...
Google says its latest Deep Think upgrade is designed to tackle research-grade problems in maths, science, and engineering, with access expanding to the Gemini app and API.
Yi Yang (Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, China), Jinghua Liu (Institute of ...
India Today on MSN
Google Gemini 3 Deep Think AI scores passing marks in Humanity's Last Exam, crushes toughest benchmarks
Google is rolling out a major upgrade to Gemini 3 Deep Think, its powerhouse AI reasoning model. The enhanced version is now ...
OpenAI introduces EVMbench to measure AI crypto security. Benchmark evaluates detection, patching and exploit skills. OpenAI has launched a benchmarking system called EVMbench to evaluate how ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results