OpenAI releases its most powerful model, OpenAI o3-pro: The industry evaluates that this model is very effective in solving complex problems, but it takes three minutes to reply to a "Hi"

Written by

Caleb Hayes

Updated on:June-13th-2025

OpenAI officially released its latest model OpenAI o3-pro , which is its flagship model o3 Professional Plus Edition.o3-pro Designed for complex tasks that require "longer thinking", its core highlight is its extreme reliability and accuracy , especially in professional fields such as mathematics, science and programming. According to the new "4/4 reliability" evaluation standard introduced by OpenAI,o3-pro The performance of O3-Pro far exceeds that of its predecessor. OpenAI officials emphasized that O3-Pro has achieved a qualitative leap in its ability to handle high-difficulty and high-risk tasks.

OpenAI o3-pro model features
Comparison of OpenAI o3-pro and other models
OpenAI o3-pro is the most authoritative evaluation in the industry
OpenAI o3-pro is already available for Pro users

OpenAI o3-pro model features

OpenAI o3-pro As o1-pro The direct successor to the o3 The model has a very clear design philosophy: between speed and reliability, reliability is firmly chosen . This makes it the tool of choice for solving complex problems that have strict requirements on the accuracy of answers. In other words, this model is very slow. According to previous user tests, even if you type "Hi, I am Sam Altman", the model will take 3 minutes to reply to you.

OpenAI officials also stated that o3-pro is not designed to solve these conventional problems. Its main features include:

Designed for Ultimate Reliability
o3-pro The core design of is to "think longer". This means that the model will invest more computing resources to deeply analyze the question to provide the most reliable answer. The official recommendation is that when reliability is more important than speed and users are willing to wait for a few minutes for high-quality answers, they should be given priority. o3-pro. If it is a simple chat, it is not recommended.
Significant Advantage in Professional Domains
Academic and expert evaluations show thato3-pro Exceeds its base version in several key areas o3 Expert reviewers consistently preferred it in science, education, programming, business, and writing assistance . o3-pro output. Particularly in terms of clarity, comprehensiveness, compliance and accuracy .o3-pro Received consistently higher ratings.
Comprehensive Tool Integration
and o3 Same,o3-pro A range of powerful tools can be used seamlessly to enhance its capabilities, including: web search, file analysis, visual input understanding, Python code execution , and personalized response using memory . In short, o3-pro is better at using the tools you provide to solve problems, especially in task planning or tool use.
Clear Limitations
In the early stages of release,o3-pro There are some temporary functional limitations. Developers and users need to pay attention to:

Temporary chats
Currently disabled.
Image generation functionality is not supported .
Canvas functionality is not yet supported .

The official also gave an example for the second point here, blind test

Similar to anonymous voting, in scientific analysis, writing, computer use, and data analysis, humans prefer the results of o3-pro, and are significantly ahead.

As for the fourth point, the official ban on temporary chat may be due to resource constraints on the one hand, and on the other hand, they do not want the poor effect of ordinary chat to affect everyone's view of o3-pro.

Comparison of OpenAI o3-pro and other models

Currently, there are not many official o3-pro evaluations, mainly including three, namely the high-difficulty common sense reasoning GPQA Diamond, the American Invitational Mathematics Competition 2024 (AIME 2024), and the programming level test codeforce. DataLearner compared o3-pro with the strongest model before:

Data source: DataLearnerAI website: https://www.datalearner.com/ai-models/ai-benchmarks-tests/compare-result?benchmarkInputString=32,37&modelInputString=587,578,576,575,574,508,558

As we can see, o3-pro outperforms the regular version of o3 and DeepSeek R1 in all aspects. However, GPQA Diamond is still not as good as the latest Gemini-2.5 Pro 0605 version. It has to be said that the evaluation of the Gemini model is very strong. However, due to the lack of data, we cannot see other content yet.

In addition, OpenAI has introduced a more rigorous internal evaluation method to measure the reliability of the model under extreme situations.

According to the expert evaluation and academic evaluation results officially released by OpenAI,o3-pro Completely surpassed o1-pro and o3In order to quantify its core advantage, OpenAI adopted the "4/4 reliability" evaluation standard. This standard requires the model to answer the same question correctly in four consecutive attempts to be considered successful, which greatly tests the stability and accuracy of the model.

The following is a comparison of the performance of each model in this evaluation:

From the data, we can see thato3-pro The success rate in this rigorous test reached 80% , which is higher than its predecessor. o1-pro The 65 % improvement is significant, and it is far ahead of the basic model o3This result strongly proves that o3-pro Unmatched reliability when tackling tough problems.

OpenAI o3-pro is the most authoritative evaluation in the industry

Latent Space is a media or blog focusing on the field of artificial intelligence (AI), with a particular focus on large language models (LLMs) and their applications. As one of the first users in the industry to get early access to o3 pro, they have been using this model for some time and published a blog to share their views. One of the core views is:

The power of o3-pro cannot be demonstrated through simple questions and answers or chats. The correct way to use it is "non-conversational": users need to provide it with massive, high-quality context, set a clear goal, and then let the model work autonomously like a "report generator".

For example, when the author and his co-founder provided o3-pro with all of the company’s historical planning meetings, goals, and even voice memos as context, the model generated an extremely specific, actionable business plan with target metrics, timelines, and priorities that was deep and insightful enough to change their thinking about the company’s future. In contrast, the plan generated by the standard version of o3 was reasonable but more general.

o3-pro has made significant progress in "tool use". It can better understand its own environment and limitations, know when to ask questions to obtain external information (rather than pretending to know), and can more accurately choose the right tools to complete the task. The author calls it an excellent "orchestrator".

However, o3-pro tends to "overthink" if not given enough context . It is good at analyzing and getting things done with tools, but may not perform as well as standard o3 when it comes to directly executing certain specific tasks (such as specific SQL queries) .

OpenAI o3-pro is already available for Pro users

OpenAI is rolling out the new features to different user groups in stages o3-pro:

ChatGPT Pro and Team users
: Available directly in the model selector from June 10, 2025o3-pro, which has replaced the original o1-pro. Currently, it is available to all Pro users.
API Users
: o3-pro It is also available in the API, so developers can start integrating it right away.
Enterprise and Edu users
: Access will be granted within the next week.

OpenAI o3-pro The release of is not to pursue faster response speed, but to meet the application needs of AI in serious and complex scenarios. It is a tool designed for extreme reliability . Its outstanding performance in professional fields such as science and programming, as well as its overwhelming advantage in the new "4/4 Reliability" evaluation, have proved its core value in handling high-risk and high-value tasks. For developers and professionals who pursue accuracy of answers far more than speed,o3-pro It is undoubtedly a very powerful new option in the current market.