API Performance Benchmark

6don MSN

Anthropic releases Claude Sonnet 4.6: Benchmark performance, how to try it

Anthropic's latest flagship model, Claude Sonnet 4.6, is out now.

Backboard.io Becomes First AI Platform to Lead Both Major Memory Benchmarks, Accelerating the Era of Agentic AI

Backboard.io announced it has achieved state-of-the-art performance across both leading AI memory benchmarks, a first ...

Google Gemini 3.1 Pro Nearly Doubles Apex Agents Score to 33.5

Office Productivity: The Apex Agents benchmark, which evaluates productivity in office-like environments, saw Gemini 3.1 Pro ...

Google launches Gemini 3.1 Pro, retaking AI crown with 2X+ reasoning performance boost

The most significant advancement in Gemini 3.1 Pro lies in its performance on rigorous logic benchmarks. Most notably, the model achieved a verified score of 77.1% on ARC-AGI-2.

Google’s Latest Gemini 3.1 Pro Model Is a Benchmark Beast

Google just released its most capable Gemini 3.1 Pro AI model that beats all frontier models on Humanity's Last Exam and ...

Crypto Briefing

OpenAI launches benchmarking system for securing crypto tokens and smart contracts

OpenAI launches EVMbench with Paradigm to test AI on smart contract vulnerabilities and commits $10M to cybersecurity research.

Anthropic's Sonnet 4.6 matches flagship AI performance at one-fifth the cost, accelerating enterprise adoption

Opus AI performance for coding, computer use, and agents at Sonnet pricing ($3/$15 per million tokens), reshaping enterprise automation economics with a 1M-token context window and stronger ...

Aquant's 2026 Field Service Benchmark: Companies Can Unlock up to 26% in Service Cost Savings by Scaling Knowledge Across the Workforce

Aquant today released The 2026 Field Service KPI Benchmark Report, an industry-wide analysis of anonymized performance data from 161 service organizations. The report spans nearly 30 million service ...

11d

Show inaccessible results

Anthropic releases Claude Sonnet 4.6: Benchmark performance, how to try it

Backboard.io Becomes First AI Platform to Lead Both Major Memory Benchmarks, Accelerating the Era of Agentic AI

Google Gemini 3.1 Pro Nearly Doubles Apex Agents Score to 33.5

Google launches Gemini 3.1 Pro, retaking AI crown with 2X+ reasoning performance boost

Google’s Latest Gemini 3.1 Pro Model Is a Benchmark Beast

OpenAI launches benchmarking system for securing crypto tokens and smart contracts

Anthropic's Sonnet 4.6 matches flagship AI performance at one-fifth the cost, accelerating enterprise adoption

Aquant's 2026 Field Service Benchmark: Companies Can Unlock up to 26% in Service Cost Savings by Scaling Knowledge Across the Workforce

Google upgrades Gemini Deep Think with stronger maths, science and coding performance

NDSS 2025 – The Midas Touch: Triggering The Capability Of LLMs For RM-API Misuse Detection

Google Gemini 3 Deep Think AI scores passing marks in Humanity's Last Exam, crushes toughest benchmarks

OpenAI introduces EVMbench to measure AI crypto security