Claude4 is here... It's so hot, it's surpassing Gemini2.5Pro

The latest Claude 4 series models released by Anthropic mark a leap forward for AI technology towards a "thinking machine".
Core content:
1. The comprehensive leadership of the Claude 4 series models in programming and reasoning capabilities
2. Dual-model strategy: the different positioning and advantages of Opus 4 and Sonnet 4
3. Excellent performance in benchmark tests and breakthroughs in the "high computing" mode
Anthropic officially released the Claude 4 series model, which is leading in programming ability and reasoning performance. 10 minutes after the release, Cursor can also be used...
A big debut: A new benchmark for the AI industry
On May 22, 2025, Anthropic officially released the new generation of Claude models, Claude Opus 4 and Claude Sonnet 4, which set new standards in programming capabilities , advanced reasoning , and AI agents . This upgrade is not just a simple performance improvement, but represents an important milestone in the transformation of artificial intelligence into a true "thinking machine."
With the release of these two models, the competition in the field of AI development has officially entered a new stage where "supercomputing power" and "advanced cognition" are equally important, bringing new possibilities and challenges to all developers.
Dual models in parallel: each plays a different role
Claude 4 adopts a dual-model strategy, with Opus 4 positioned as the flagship top model and Sonnet 4 as a cost-effective option. The two together constitute a complete solution covering various application scenarios.
Known as the " world's best encoding model ", Opus 4 demonstrates sustained excellent performance in complex and long-term tasks, and can work continuously for hours while maintaining high-quality output. This feature makes it particularly suitable for complex development projects that require long-term focus.
Although Sonnet 4 is positioned slightly lower, its score of 72.7% on SWE-bench has surpassed most competitors, and it performs well in balancing performance and efficiency, providing an ideal choice for daily development.
Benchmark test: Ahead of competitors in all aspects
In an authoritative software engineering capability evaluation, the Claude 4 series models demonstrated impressive performance, surpassing major competitors including the Gemini 2.5 Pro in a number of key indicators.
• Opus 4 : SWE-bench score reached 72.5% , Terminal-bench reached 43.2% • High Compute Mode : Opus 4 and Sonnet 4 achieved scores of 79.4 % and 80.2% respectively
These data demonstrate an unprecedented level of capability in handling real-world programming tasks. Even more impressive, performance in "high compute" mode means that when paired with appropriate test-time computational methods, these models can solve nearly all common programming challenges.
Seamless integration of thinking and action
The most exciting innovation of the Claude 4 series is its " extended thinking and tool use " capability, which enables the model to flexibly call on tools during the deep thinking process, forming a closed loop of thinking-action-rethinking.
This capability means that AI is no longer limited to static knowledge, but can actively acquire information, verify assumptions and adjust its thinking based on new information, greatly improving its ability and efficiency in solving complex problems.
More notably, Claude 4 also supports parallel tool execution , which can handle multiple tasks at the same time, which means that your AI assistant can now advance multiple work threads at the same time like a real team member.
Memory breakthrough
Claude Opus 4 has achieved a revolutionary breakthrough in memory capabilities. When developers provide local file access, it can autonomously create and maintain " memory files " to store key information and build a knowledge base over time.
This feature completely changes the way AI assistants are used, transforming them from participants in short conversations to collaborative partners who can maintain long-term task awareness.
In actual testing, Opus 4 demonstrated amazing application cases, such as autonomously creating navigation guides while playing the Pokémon game, indicating that it has a certain form of " continuous learning " capability that can accumulate experience and optimize behavior in long-term tasks.
Claude Code is officially released
With the official release of Claude Code, the powerful capabilities of Claude 4 are seamlessly integrated into developers' daily workflows, covering every aspect from the command line to the integrated development environment.
The newly launched VS Code and JetBrains beta extensions allow Claude’s code editing suggestions to appear directly in your files, greatly simplifying the code review and collaboration process.
In addition to IDE integration, Claude Code also provides an extensible SDK that allows developers to build their own AI agents and applications. The GitHub integration makes code review and repair extremely simple. You only need to mark Claude Code on the PR, and it will respond to reviewer feedback, fix CI errors, or make code modifications.
API empowerment: Building more powerful AI agents
Anthropic has introduced four important new features at the API level to provide powerful support for developers to build advanced AI agents. Together, these features form a complete AI agent development ecosystem:
1. Code execution tools : Allow AI to directly run and test code 2. MCP connector : provides an easy way to integrate with external systems 3. File API : enables models to handle more complex documents and data 4. Prompt cache function : allows cache prompts for up to one hour, greatly improving system efficiency
The combination of these API capabilities enables developers to build AI agents with unprecedented autonomy and capabilities, bringing new possibilities for automation and intelligence to various industries.
Safer: Fewer shortcuts and exploits
The Claude 4 series of models achieves significant improvements in safety and reliability, reducing the occurrence of problematic behaviors by 65% on proxy tasks that are susceptible to shortcuts and vulnerabilities compared to Sonnet 3.7.
This improvement means that the model follows instructions more faithfully and does not try to complete tasks through shortcuts or loopholes, greatly improving its reliability and trustworthiness in critical tasks.
Anthropic has also implemented higher levels of AI safety measures, including ASL-3 protection , which minimizes the risk of use while ensuring the safety performance of the model through extensive testing and evaluation, making Claude 4 an ideal choice for scenarios that require high reliability.
From everyday programming to cutting-edge research
The Claude 4 series models are suitable for a wide range of scenarios, from daily coding assistance to complex scientific research projects, providing strong support for developers of different sizes and needs.
Opus 4 is ideal for pushing the boundaries of programming, research, writing, and scientific discovery, and its sustained high performance makes it an ideal companion for long-term, complex projects.
Sonnet 4 brings cutting-edge performance to everyday use scenarios, providing significantly improved support for routine development tasks as a seamless upgrade from Sonnet 3.7. Industry leaders such as GitHub , Cursor , Replit , etc. have integrated these models into their products and reported significant performance improvements, which proves the strong value of Claude 4 in practical applications.
Towards a virtual collaborative partner
Claude 4 represents a significant step toward a true virtual collaboration partner that can go beyond single tasks and maintain full context, sustain focus on long-term projects, and deliver transformative impact.
This advancement marks the transition of AI from a simple command-executing tool to a collaborative partner that can understand complex tasks, remember important information, and continuously provide value in long-term projects.
As these models are applied and developed, we can expect to see more innovative use cases and application scenarios emerge, and the way AI and humans collaborate will fundamentally change. For all AI developers, now is the best time to explore these new capabilities and incorporate them into products and services. Claude 4 has shown us the direction of future development of AI.