DeepSeek R1 is now open source: performance is close to OpenAI o3, and programming capabilities are amazing in the AI world

DeepSeek R1 is now open source, with performance comparable to OpenAI o3, and AI programming capabilities have made a major breakthrough.
Core content:
1. DeepSeek R1-0528 is now open source, with performance close to OpenAI o3
2. Amazing programming capabilities, from code generation to test cases in one go
3. Industrial-grade code quality, automatic test case generation, enhanced debugging awareness and other notable features
This morning, DeepSeek , a world-renowned open source large model platform , once again shocked the AI community by quietly open-sourcing the latest version of R1 , 0528. This move continued DeepSeek's usual low-key style - there was no official announcement, no detailed explanation, and the model was quietly put on the Hugging Face platform.
However, the AI community is very discerning. Within a few hours, the performance of the new R1 has sparked widespread discussion and testing. According to feedback from multiple independent testers, the new R1 is comparable to OpenAI's latest o3 high-version model in multiple key indicators , especially in terms of programming capabilities, which marks another major breakthrough in the field of open source large models.
According to preliminary test results from the famous code testing platform Live CodeBench , DeepSeek R1-0528 performs on par with the OpenAI o3 model in programming tasks. An unnamed AI researcher said after the test:
"We originally thought that o3 would be an insurmountable peak in the short term, but the performance of DeepSeek R1-0528 completely overturned this expectation."
It is particularly noteworthy that the new version of R1 can not only provide functionally complete code in code generation tasks, but also automatically generate supporting test cases . This end-to-end programming capability was previously only available from OpenAI's top models.
Hyperbolic Labs co-founder and CEO specifically pointed out on social media that the new version of R1 is still the only AI model that can correctly answer the classic trap question "Which is bigger, 9.9 or 9.11?" This detail reflects the model's significant progress in logical reasoning.
Amazing programming capabilities: from code generation to test cases in one go
The test case shared by AI reviewer Haider was particularly eye-catching. He designed a programming challenge for a word scoring system , and the new version of R1 performed amazingly. Unlike ordinary models that only provide code snippets, R1-0528 directly provided two files after a brief thought:
✅Main program code with clear structure and complete comments
✅Test cases covering various boundary conditions
Even more surprising is that the code and test cases passed perfectly on the first run without any errors.
"I've only seen this level of programmability in the o3 model before," Haider said. "But now, an open source model can do this, which is definitely a game changer."
After in-depth analysis of the cases provided by multiple testers, we can find that the new version R1 has several significant features in programming tasks:
- Industrial-grade quality code
It no longer stays at the level of teaching examples, but takes into account the needs of actual production environments. - Ability to automatically generate test cases
This reflects the model's in-depth understanding of code reliability. - Debugging awareness is significantly enhanced
When problems occur in the generated code, the model can autonomously diagnose and correct the errors.
Well-known technology blogger **"AI Explorer"** pointed out after testing:
"The Python code generated by R1-0528 includes details such as exception handling and logging that professional developers would consider, which is far beyond the level of general open source models."
These advances make the new R1 more reliable in solving real-world programming problems, greatly enhancing its practical value.
It is worth noting that many testers observed that the new version of R1's "thinking time" seemed to be longer than the previous generation. AI researcher Zhang Ming (pseudonym) analyzed that:
"This is not a sign of decreased performance, on the contrary, it may be that the model is doing deeper reasoning and verification. From the results, this extra 'thinking' does lead to a significant improvement in quality."
This change is highly similar to the behavior pattern of the OpenAI o3 model, further proving the closeness of their capabilities.
A senior member of the programming community HackerRank lamented after testing:
"If this is the current state of open source models, how long can proprietary models maintain their advantages? This question is worth pondering for all AI companies."
Style and reasoning: comprehensive progress towards top business models
In addition to hardcore programming capabilities, the new version of R1 also shows striking similarities with OpenAI o3 in terms of response style and reasoning ability . A careful comparison of the outputs of the two shows that R1-0528 has mastered o3's unique professional style:
? Use arrows and asterisks to organize information
? Use a layered approach to explain complex concepts
? Add a summary paragraph at the end to make it more persuasive
A linguist who has long studied AI writing style pointed out:
"This consistency is no accident; it reflects the new level of sophistication in DeepSeek's model training and tuning."
In terms of chain-of-thought correction , the performance of the new version of R1 is particularly noteworthy. Tests show that when the model's initial reasoning deviates, it can autonomously detect and correct incorrect ideas like o3 . This self-monitoring ability is extremely rare in open source models.
What is even more surprising is that R1-0528 also demonstrated a creative worldview-building ability similar to Anthropic Claude . In a test set in a fictional world, the model not only designed a complete worldview framework, but also created logical behavior patterns for characters from different cultural backgrounds. This ability was completely absent in the previous generation R1.
Comparing the R1-0528 to current top commercial models is astounding: Anthropic's recently released Opus 4 performs only slightly better than the R1-0528 on the same programming tasks .
Lisa Chen, AI Product Manager , commented:
"Considering that Opus 4 is the leading commercial model and R1 is free and open source, this small difference is a huge win in itself."
It is particularly noteworthy that in some programming tasks that require creative solutions, R1-0528 can even propose more innovative implementations than Opus 4, which shows that the open source model has the strength to challenge the commercial model in specific areas.
Strategic thinking behind version naming: R1 or R2?
The superior performance of the new R1 raises an interesting question: Why didn't DeepSeek name this version, which is clearly superior to its predecessor, R2?
Many industry observers have expressed their opinions. AI strategy consultant Wang Tao believes:
"This may be a product strategy - DeepSeek has recently released several breakthrough products. If this update is named R2, it may raise users' expectations for the next version and cause unnecessary pressure. Positioning it as a major update to R1 not only reflects progress, but also leaves room for imagination for the real R2."
Another view is that the cautious version naming reflects DeepSeek's rigorous attitude towards technology evaluation. Machine learning engineer Li Mingyuan pointed out:
"Model evaluation is a comprehensive task, and programming ability is only one dimension. DeepSeek may still be verifying performance in other aspects, so it chose a conservative version name."
Regardless of the considerations behind the naming, one indisputable fact is that the new version R1 has raised the benchmark for open source large models to a new height . Its emergence not only narrows the gap between open source and commercial models, but more importantly, it provides the entire AI community with a high-quality basic model that can be freely studied and improved.
As one open source advocate put it:
"Every time there is such progress, it is a powerful boost to the democratization of AI. When open source models can reach the level of commercial products, the speed of innovation in the entire industry will be greatly accelerated."
Milestones of the open source ecosystem: community response and future prospects
The open source of the new version R1 immediately triggered a warm response in the developer community. Several projects based on R1-0528 have appeared on GitHub , covering code generation assistance, technical documentation writing, educational applications and other fields.
A developer who participated in early testing shared:
"After integrating R1 into our development process, code review time has been reduced by about 30% because the code it generates is already well-formulated."
The rapid emergence of such practical benefits fully demonstrates the practical value of the new version of R1.
The technical community is looking forward to DeepSeek's upcoming official model cards , which usually detail important information such as training data, architecture details, intended use, and limitations , which are crucial for researchers and developers to use the model correctly.
Professional platforms such as "AIGC Open Community" have stated that they will provide in-depth interpretation as soon as the official information is released to help users fully understand and utilize this powerful new tool.
From a more macro perspective, the successful open source of DeepSeek R1-0528 once again proves the key role of Chinese AI teams in the global open source ecosystem. In the context of companies such as OpenAI and Anthropic increasingly leaning towards closed-source business models, DeepSeek adheres to the open source strategy of high-quality models and provides valuable infrastructure for global AI researchers.
This spirit of openness and sharing is one of the core driving forces behind the healthy development of artificial intelligence technology. As more developers and companies begin to adopt and improve R1-0528, we have reason to expect to see more innovative application scenarios and further performance breakthroughs.