ChatGPT o3 vs DeepSeek R1 performance comparison, which one is better?

Written by

Audrey Miles

Updated on:July-10th-2025

ChatGPT o3 and DeepSeek R1

• ChatGPT o3 focuses on "deep reasoning" capabilities, optimizing the efficiency of solving mathematics, programming and scientific problems by dynamically adjusting the reasoning intensity (low/medium/high). For the first time, the basic version (o3-mini) is open to free users, aiming to expand the user base and lower the threshold for using AI.

• DeepSeek R1 takes "cost revolution" as its core selling point, adopts an open source ecosystem and extremely compressed training costs (only US$5.6 million), adapts to domestic chips (such as Huawei Ascend), and focuses on small and medium-sized developers and the enterprise market. It is called the "Pinduoduo of the AI world."

Performance comparison

1. Mathematical and Scientific Reasoning

• AIME 2024 Mathematics Competition : o3-mini’s accuracy rate under high reasoning intensity is 87.3% vs R1’s 79.8%; but in low-intensity mode, R1 (71.5%) surpasses o3 (60%).

• Doctoral-level scientific questions (GPQA) : o3 has a maximum accuracy of 79.7%, slightly better than R1’s 71.5%; however, R1 has a lower error rate in unstructured data processing.

• Interdisciplinary comprehensive capabilities : o3 achieved 87.5% accuracy in the ARC-AGI test (the human level threshold is 85%), while DeepSeek did not disclose similar data.

2. Programming and engineering skills

• Code generation (SWE-bench) : o3 scored 71.7 vs R1’s 71.6, but the code generated by R1 has better execution integrity and stability (such as no "penetration" problem).

• Competitive Programming (Codeforces) : o3 Elo score is 2727, significantly higher than R1 (specific value not disclosed).

3. Anti-hallucination and reasoning stability

• Bayesian reasoning experiment : o3-mini had the highest accuracy rate (88%) under the prompt condition, and the reasoning process was concise and logically clear; R1 had the correct conclusion but the process was lengthy and confusing, and the number of words used was 3-10 times that of o3.

• Security audit : o3 filters harmful content through deep alignment technology, while R1 has a jailbreak attack vulnerability.

How to use ChatGPT in China

To use chatgpt in China, you usually go through a mirror website or share a house. You can follow me and send " share a house " to get detailed information.