OpenAI strikes back at DeepSeek! Just released a new model, Deep research, setting a new record

Written by

Audrey Miles

Updated on:July-17th-2025

At 8 o'clock this morning, OpenAI Tokyo branch conducted a technical live broadcast and released a new model - Deep Research.

Unlike traditional large models, Deep Research can gradually decompose complex tasks like human analysts and conduct multiple rounds of information search and verification on the Internet . It will gradually adjust the research direction and strategy based on the existing information, and continue to dig deeper into the essence of the problem until the most appropriate answer is found.

For example, when dealing with a research task about a specific market trend, the model will first obtain preliminary information through keyword search, and then further search for relevant industry reports, statistics, expert opinions, etc. based on this information, conduct comparative analysis of information from different sources, and finally form a comprehensive research report.

It is worth mentioning that OpenAI very rarely compared with the globally popular open source model DeepSeek-R1, and achieved a terrifying 26.6% in the final human examination test, which is 2.8 times that of R1, and at the same time refreshed the previous best record of 18.2% .

Live broadcast picture

In fact, today is the weekend in the United States. Even if the Tokyo branch started the live broadcast at 9 am this morning (one hour earlier than in China), it was quite hard work.

According to OpenAI's past practice, the release of important technical products usually starts on Tuesdays, which shows how much impact DeepSeek has had on it. It is preparing for a full-scale counterattack. In fact, the meaning of this model can be seen from the name alone~

A brief introduction to Deep Research

"AIGC Open Community" briefly introduces the technical features and advantages of Deep Research based on the content watched during the live broadcast.

Deep Research is developed based on OpenAI's o3 model and is deeply optimized and fine-tuned for a variety of specific tasks.

End-to-end reinforcement learning is the key to Deep Research . Traditional machine learning methods often need to be artificially divided into multiple stages for training and optimization when dealing with complex tasks, while end-to-end reinforcement learning allows the model to learn and optimize as a whole from input to output.

Through this learning method, Deep Research has learned to plan and execute multi-step research trajectories. When faced with a complex research topic, it can develop a reasonable research plan like a human researcher, first determine which channels to obtain information from, and then analyze the information obtained to determine the next research direction.

If deviations from the previous plan are discovered during the research process, it can also go back and readjust the research strategy like an experienced researcher to ensure that accurate and valuable results are obtained in the end.

Full technical live broadcast

In this learning process, the model continuously interacts with the environment and learns the optimal behavior strategy from environmental feedback. When browsing the web to obtain information, the model will decide whether to browse the web page in depth and how to extract useful information based on factors such as the relevance and credibility of the web page content.

This ability to make decisions and adjustments based on real-time information is an important guarantee for Deep Research to efficiently complete complex research tasks.

In addition to end-to-end reinforcement learning, removing the response limit of the model is also an important technical breakthrough of Deep Research . In order to pursue fast response, traditional large models can only scratch the surface when dealing with complex problems and cannot conduct in-depth thinking and analysis.

Deep Research breaks this limitation, allowing the model to spend 5-30 minutes or even longer to process the problem. This gives the model enough time to screen, analyze and integrate massive amounts of network information, thereby outputting more comprehensive, in-depth and accurate research results.

For example, when conducting market research tasks, the model can spend enough time to collect market data from different regions and time periods to make more accurate predictions of market trends;

In the field of academic research, it can deeply study a large amount of literature, explore the potential connections between different studies, and provide scientific researchers with more valuable research ideas.

Deep Research Main Modules

The Deep Research model consists of multiple modules, which is somewhat similar to the collaborative work of layered AI agents. The information discovery module can quickly locate various information sources such as websites, documents, and databases, and extract valuable clues from them . When users want to know the latest research progress on a specific disease, the information discovery module will quickly search for relevant papers, research reports, expert opinions and other information on multiple platforms such as academic databases, scientific research institutions' websites, and medical forums, providing rich materials for subsequent analysis and synthesis.

The information discovery module also has a powerful information screening capability. It can preliminarily screen the searched information based on multiple dimensions such as keywords, semantic associations, timeliness and credibility of information, and exclude information that is irrelevant to user questions or has low value, greatly improving the efficiency and quality of information processing. During the screening process, it will use natural language processing technology to analyze the information content, accurately understand the meaning of the information, and ensure that the screened information is highly matched with user needs.

The information integration module can integrate and sort out information from different channels, identify the logical relationship between information, and organize scattered information into an organized whole.

For example, when conducting research in the field of science and technology, the information synthesis module may integrate information on the principle of a new technology, application cases, development trends, and other aspects to form a systematic technical report. In this process, not only text information will be integrated, but also various forms of information such as pictures, tables, and data will be processed and analyzed to make the final research results richer and more comprehensive.

The information synthesis module also has the ability to extract information, which can extract key points from a large amount of information, remove redundant information, and make the research results more concise and clear . When dealing with a lengthy academic paper, it can accurately extract the core ideas, research methods, main conclusions and other important contents of the paper, helping users quickly understand the essence of the paper and save time in reading and analysis.

The reasoning module of Deep Research is one of its core functions, which can think and judge like humans. When faced with complex problems, the reasoning module can use logical reasoning, knowledge graphs and other technologies to conduct in-depth analysis and reasoning on the collected information. When answering scientific questions, the reasoning module will gradually deduce and demonstrate the problem based on known scientific principles and facts to draw reasonable conclusions. When analyzing market trends, it will combine historical data, market dynamics, industry policies and other information, use economic principles and data analysis methods to predict the future direction of the market.

The reasoning module also has the ability to self-correct and optimize . During the reasoning process, if it finds that new information is inconsistent with the previous reasoning results, it will re-examine the reasoning process and adjust the reasoning strategy to ensure that the final conclusion is more accurate and reliable. When studying a historical event, as new historical data is discovered, the reasoning module will revise and improve the previous research conclusions based on these new data to make the research results more consistent with historical facts.

The output module of Deep Research is committed to providing users with high-quality research results. It can output research results in different formats, such as reports, papers, charts, etc. according to user needs. When users need to conduct market analysis, the output module can generate a market research report with standardized format and detailed content, which contains clear text descriptions, intuitive charts and accurate data references, which is convenient for users to make decisions and reports.

Deep Research Test Data

The "Last Exam of Humanity" released by the Center for AI Safety and Scale is a benchmark test covering a wide range of knowledge. It contains about 3,000 short-answer and multiple-choice questions covering about 100 different disciplines. In this test, the accuracy of the Deep Research model reached 26.6%, exceeding well-known open and closed source models such as R1, o1, and Grok2.

Gaia is mainly used to measure the model's agenda capabilities. It has strict requirements for web browsing, multimodal capabilities, code execution, and file reasoning, and has three difficulty levels. Deep Research has reached new heights in all three difficulty levels of the Gaia test.

In addition, OpenAI has designed a series of internal benchmarks covering multiple practical application scenarios such as market research, academic research, consumer decision-making, etc. In expert-level tasks, Deep Research can complete tasks that experts need hours to complete.

Currently, Deep Research will soon be available to Pro users, and will then be expanded to Plus and Team.