Claude3.7 late night explosion: "Programming + reasoning" double kill

Written by

Jasper Cole

Updated on:July-15th-2025

Preface

At 2:30 this morning, Anthropic suddenly released its ultimate weapon - Claude 3.7 Sonnet and the new Claude Code programming tool!

This hybrid reasoning model, which is said to be the "smartest to date", is expected to make the already hot AI programmer race even more intense.

Publish content

Hybrid Reasoning: One Model, Two Modes

Claude 3.7 SonnetFor the first time, the normal mode and the extended mode (inference mode) are combined into one:

Normal mode : smooth conversation, suitable for daily Q&A and creative writing (such as generating tear-jerking love stories with writing so delicate that they are indistinguishable from the real thing).
Extended mode : Use reverse reasoning and thinking chains to disassemble complex problems (such as the classic pirate gold coin game), specializing in hard-core scenarios such as mathematics, programming, and logical analysis.

Friends who have read my previous sharing should know that Cursor is currently one of my main productivity tools.

When using Cursor, Claude 3.5 Sonnet has always been my first choice. I personally feel that the quality of the generated data is far superior to GPT-4o. Now that the thinking mode has been added, I don’t know to what extent it will evolve.

Flexible API control

Claude 3.7 Sonnet first introducedThink about your budgetMechanism that allows developers to have fine-grained control over the model's thinking process through an API:

Dynamically adjust the depth of thinking : Users can set the upper limit of the tokens that the model thinks (up to 128K tokens are supported), and flexibly balance speed, cost, and answer quality. For example, simple tasks can be limited to 500 tokens for quick response, while complex math problems can open up more tokens for deep reasoning.
Seamless switching between two modes : Standard mode (quick response) and extended mode (deep thinking) share the same model architecture. Function calls can be implemented without switching interfaces. Developers only need to adjust parameters to adapt to different scenario requirements.
Cost transparency : API pricing follows the previous standard ($3/million tokens for input, $15/million tokens for output), and token consumption is included in the output cost to avoid hidden fees.

Although you no longer need to pay attention to the consumption details of the API after using Cursor, this part of the upgrade is still very authentic.

Claude Code

Simultaneously launched Claude Code Tools embed AI collaboration capabilities directly into developers’ workflows:

Terminal-level engineering agent : supports the entire process from code search, file editing to test running and Git submission, and can even call tool chains through the command line (such as Replit to build web applications).
Revolutionary improvement in efficiency : In early tests, the tool can save more than 45 minutes of manual work for a single task (such as refactoring a code base or fixing a complex bug).
Deep GitHub integration : All subscription plan users can connect their code repositories directly to Claude, and the model can provide accurate suggestions based on the complete project context (such as fixing version conflicts or generating API documentation).

Compared with Cursor, Claude Code may be more of an agent. It has no code review or editing interface. It generates the final application directly through dialogue and the "write-and-modify" thinking chain.

This model seems to be more in line with the future where everyone is a programmer.

Performance leap

Although Anthropic claims to focus on the optimization of AI application practices, its various ranking indicators are still very impressive.

Leading in coding capabilities: In the SWE-bench test, the extended mode set a new industry record with a 70.3% pass rate (compared to Claude 3.5 Sonnet's 62.3%), and demonstrated super stability when dealing with full-stack updates and complex dependencies.
Mathematical and scientific reasoning upgrades: In extended mode, the accuracy of math competition questions (such as AIME) is greatly improved, and the accuracy of physics problem solving is close to the level of human experts.
Multimodal and game testing breakthroughs: In the "Pokémon Red" simulation test, the model defeated three gym leaders through tens of thousands of virtual key interactions, verifying its long-term task processing capabilities.

Cursor Integration

Cursor already supports Claude 3.7 and is divided into two modes.

Summarize

I also pay attention to general model updates, but not too much. However, since Claude, which is the most frequently used in programming, has been updated, I must make good use of it. Let's look forward to the subsequent sharing~