ChatGPT o3 vs DeepSeek R1 performance comparison, which one is better?

The latest AI performance showdown, ChatGPT o3 and DeepSeek R1, which one is better?
Core content:
1. The core capabilities and market positioning of ChatGPT o3 and DeepSeek R1
2. Performance comparison in various fields: mathematical science reasoning, programming engineering capabilities
3. How to use ChatGPT in China, and related resource recommendations
ChatGPT o3 and DeepSeek R1
• ChatGPT o3 focuses on "deep reasoning" capabilities, optimizing the efficiency of solving mathematics, programming and scientific problems by dynamically adjusting the reasoning intensity (low/medium/high). For the first time, the basic version (o3-mini) is open to free users, aiming to expand the user base and lower the threshold for using AI.
• DeepSeek R1 takes "cost revolution" as its core selling point, adopts an open source ecosystem and extremely compressed training costs (only US$5.6 million), adapts to domestic chips (such as Huawei Ascend), and focuses on small and medium-sized developers and the enterprise market. It is called the "Pinduoduo of the AI world."
Performance comparison
1. Mathematical and Scientific Reasoning
• AIME 2024 Mathematics Competition : o3-mini’s accuracy rate under high reasoning intensity is 87.3% vs R1’s 79.8%; but in low-intensity mode, R1 (71.5%) surpasses o3 (60%).
• Doctoral-level scientific questions (GPQA) : o3 has a maximum accuracy of 79.7%, slightly better than R1’s 71.5%; however, R1 has a lower error rate in unstructured data processing.
• Interdisciplinary comprehensive capabilities : o3 achieved 87.5% accuracy in the ARC-AGI test (the human level threshold is 85%), while DeepSeek did not disclose similar data.
2. Programming and engineering skills
• Code generation (SWE-bench) : o3 scored 71.7 vs R1’s 71.6, but the code generated by R1 has better execution integrity and stability (such as no "penetration" problem).
• Competitive Programming (Codeforces) : o3 Elo score is 2727, significantly higher than R1 (specific value not disclosed).
3. Anti-hallucination and reasoning stability
• Bayesian reasoning experiment : o3-mini had the highest accuracy rate (88%) under the prompt condition, and the reasoning process was concise and logically clear; R1 had the correct conclusion but the process was lengthy and confusing, and the number of words used was 3-10 times that of o3.
• Security audit : o3 filters harmful content through deep alignment technology, while R1 has a jailbreak attack vulnerability.
How to use ChatGPT in China
To use chatgpt in China, you usually go through a mirror website or share a house. You can follow me and send " share a house " to get detailed information.