Agent revolution! Claude 4 continuously programs for 7 hours, setting a new world record

Written by
Iris Vance
Updated on:June-28th-2025
Recommendation

The birth of Claude 4 marks a major breakthrough in the field of agent programming.

Core content:
1. Opus 4 and Sonnet 4, two major versions of Claude 4, and their performance highlights
2. Two thinking modes of hybrid reasoning models and their application scenarios
3. New features and their impact on the developer programming experience

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

At 1 a.m. today , the famous large model platform Anthropic held its first developer conference and released its latest large model - Claude 4 .

Claude 4 has two versions: Opus 4 and Sonnet 4. Opus 4 is the world's top programming model, very good at handling complex and long-term reasoning tasks, especially in the Agent field. According to Rakuten test data, the programmed agent built with Opus 4 can work independently and stably for 7 hours continuously, surpassing the previous record set by OpenAI .

Sonnet 4 is an iterative version of  Sonnet 3.7  . It also excels in the field of programming, reaching 72.7% on  SWE-bench  , surpassing OpenAI's latest cutting-edge models such as Codex-1 and o3 .

Opus 4 and Sonnet 4 are hybrid reasoning models with two thinking modes: the standard thinking mode is used for quick response and is suitable for tasks that are time-sensitive or require immediate feedback.

The extended thinking mode allows the model to spend more time reasoning about the problem and generate more accurate and comprehensive answers through deeper thinking. The original intention of this design is to meet the needs of different scenarios. Users can flexibly switch between the two modes according to the complexity of specific tasks and the requirements for response speed.

Full press conference

However, compared with the previous Sonnet 3.7 , the performance of the extended thinking mode in Opus 4 and Sonnet 4 is significantly different. In Sonnet 3.7 , the original thinking process in the extended thinking mode is usually fully displayed unless encountering some extreme situations.

These two models have added a new "Thinking Summary" function: when the thinking process is too long, an additional small model will be used to summarize the thinking process. This summary method is very effective in practical applications, because only about 5% of the thinking process will trigger the summary mechanism, and in most cases users can still see the complete thinking process.

For developers who need a complete thought process and do not want to summarize, Anthropic also provides a developer mode in which developers can obtain the complete thought process without summarizing.

The extended thinking mode has shown its unique advantages in many scenarios. For example, when dealing with complex programming tasks, the model can use the extended thinking mode to deeply analyze the code logic, so as to more effectively find potential errors or optimization points. In-depth thinking in this mode can help developers better understand the structure and function of the code, thereby improving the quality and efficiency of the code.

In addition to the significant improvement in model performance, Anthropic also announced a series of new features to go with it. Extended thinking and tool use, these two models can use tools in the extended thinking process, such as web search, thus alternating between reasoning and tool use to improve the quality of answers.

The new model has the ability to execute tools in parallel, can follow instructions more accurately, and when developers grant it access to local files, the model can significantly improve its memory ability, extract and save key information to maintain continuity and accumulate tacit knowledge.

In addition, Claude Code is now officially open to all developers, supports background tasks through GitHub Actions , and has native integration with development tools such as VS Code and JetBrains , which can display edits directly in the user's files for a seamless collaborative programming experience.

Anthropic API also released four new features, including a code execution tool, an MCP connector, a file API , and the ability to cache prompts for up to one hour, which will help developers build more powerful AI agents.

It’s worth mentioning that Sonnet 4 will be available for free, but there will be some daily restrictions.

The source of this article is Anthropic. If there is any infringement, please contact us to delete it.

END