Drink some VC | Sequoia talks with OpenAI Deep Research team: AI Agent will become the most breakthrough technology this year, and reinforcement learning will return to the mainstream

How AI agents can revolutionize business and personal application scenarios through end-to-end reinforcement learning.
Core content:
1. The application and advantages of end-to-end reinforcement learning in complex tasks
2. The optimization effect of AI agents on business processes and personal life
3. The latest progress and future plans of the OpenAI Deep Research team
Z Highlights
Deep Research uses end-to-end reinforcement learning to improve the performance of intelligent agents in complex search and reasoning tasks, making them more efficient and accurate than ever before.
This model has been widely used in market analysis, medical research and code development , and will provide automated solutions in more professional fields in the future.
AI agents will not only optimize business processes, but will also help individuals with shopping, travel planning and knowledge learning, thereby improving the quality of life.
OpenAI plans to expand Deep Research to private data search and further enhance its analytical capabilities to promote the evolution of AI agent systems.
Reinforcement learning tuning has become an important method for building powerful AI agents , significantly improving their reasoning and decision-making capabilities in open environments.
Training Data is a podcast focusing on AI research and innovation, hosted by Sonya Huang and Lauren Reeder of Sequoia Capital . This episode invites Isa Fulford and Josh Tobin of OpenAI to discuss how the latest agent Deep Research breaks through traditional AI research methods through end-to-end reinforcement learning and compresses hours of knowledge work into minutes, thereby innovating business and personal application scenarios.
The origin and technological innovation of in-depth research
Josh : In AI , people have repeatedly learned from experience. For example, initially we thought that by writing code ourselves, we could build systems that were smarter than models. However, as the field has developed, it has been proven that models can often come up with better solutions than humans. A fundamental principle of machine learning is that " the goal of optimization determines the final result. " Therefore, if a system can be built that can directly optimize the target result, its performance will often be better than a combination of multiple models that are not end-to-end optimized for a specific task . Therefore, my long-term guiding thought is that strategies similar to reinforcement learning, that is, making adjustments based on the model, may be a key part of building the most powerful AI agents .
Sonya: We are very honored to have Isa Fulton and Josh Tobin , who are the leaders of OpenAI's "Deep Research" product. The "Deep Research" product was released three weeks ago and has quickly gained market acclaim, being used by many well-known people in the technology industry (such as the Carltons ) in a variety of fields, including industry analysis, medical research, and event planning.
"Deep Research" uses end-to-end reinforcement learning methods to train for complex web browsing and reasoning tasks. This product is the second in the Agent product series launched by OpenAI , the first product is "Operator" . In this interview, we will discuss with Isa and Josh the use cases of "Deep Research" , its technical architecture, and the outlook for OpenAI 's future Agent products. Isa and Josh , welcome to the show.
Lauren: Thank you for coming, we are very much looking forward to today's communication.
Isa: I am very happy to participate in this interview. Thank you for the invitation.
Lauren: Let’s start with a basic introduction to the Deep Research product. Could you please tell us about its development background and key features?
Isa: "Deep Research" is an AI agent that can retrieve information from multiple online websites and generate extremely detailed reports. Compared to the information collation that humans need to complete for hours, this product can provide answers in just 5 to 30 minutes in the ChatGPT environment . Therefore, it can conduct deeper research and answer user questions in a more detailed and source-rich way than ordinary ChatGPT . This product is one of the first agents we released , and we recently launched the "Operator" agent . In addition, the second agent " Shards Seeker " will be released in the future and will continue to expand its functionality.
Sonya: What is the background of the development of the “Deep Research” product? What was the initial development decision? Where did the inspiration for the project come from? How big is the team? How long did the project take to complete?
Isa: About a year ago, we made a major breakthrough in pre-thinking using a new reasoning paradigm and training models . At the time, our main research areas were focused on mathematics and science, but this new reasoning mechanism also opened up the possibility of solving longer-term and more complex agent tasks.
We found that many tasks require a lot of online research and external information acquisition, and these tasks involve a lot of reasoning and information screening, and also require strong creativity to complete efficiently. With the advancement of technology, we finally have a model training method that can handle these tasks. Therefore, we decided to explore how to train models to perform browsing tasks and adopt a training method similar to that of inference models, but more in line with actual scenarios.
Sonya: How did Josh get involved in the project?
Isa: Initially, Josh Patel and I were working on a similar study and were planning to publish the results in due course. We were very excited about it. At the time, we developed an initial demo and worked with engineer Thomas Simpson . Thomas is an excellent engineer who is good at digging deep into complex problems and driving continuous product optimization. The whole process was very interesting and productive.
Josh: Yes, I recently rejoined OpenAI , about six months after returning from my personal startup. During my early work at OpenAI , I looked at a number of projects and was particularly interested in research related to AI Agents , one of which was "Deep Research" .
Lauren: That’s awesome that you’re involved. So, tell us, who is the target user for this product?
Josh: Yes, this product is for anyone who does knowledge work, whether it's in their daily work or their personal life. So we find that many users mainly use it for market research, business analysis, real estate research, etc.
Isa: In addition, we have also observed that this product is widely used in scientific research and medicine. For example, medical researchers use it to find relevant literature and data support.
Josh: What makes us particularly excited is that the product not only performs well in the professional field, but also has significant value in personal life scenarios. For example, users can use it to shop or plan a trip without spending a lot of time searching the Internet and sorting out information.
Isa: Therefore, we are looking forward to the upcoming Plus version, which will enable more people to experience the capabilities of “Deep Research” and may open up new usage scenarios.
Lauren: This has definitely been one of the products I’ve used the most in the last few weeks. It works like a charm.
Sonya: What do you use it for?
Lauren: For me, I'm considering buying a new car and want to know when the next generation version of that model will be released. There are many speculative blog posts about new models, but the information is often not definitive.
So I asked the Deep Research team to collate all the rumors about the model and analyze their factual basis, as well as the automaker's past official statements and product release patterns. In the end, they produced a very detailed report speculating that the model will be released in the next few months. This analysis has greatly helped me make my decision.
Josh: Indeed, the strength of this product is not only the broad collection of information on a topic, but also the ability to dig up extremely subtle and rare facts on the Internet . For example, if a user wants to find a very specific question, and the information does not appear directly on the first page of the search engine results, then Deep Research will perform particularly well in this regard.
Lauren: That sounds great. So what are some of the unexpected use cases you’ve seen?
Isa: What surprised me the most was how many users used it for programming-related searches. This wasn't the main use case we originally envisioned, but on Twitter , I saw a lot of developers using it to find code, search for programming documentation, and even automate script writing. It's particularly good at finding the latest documentation or technical guides for a certain software package to help developers complete programming tasks efficiently. I think what surprised me the most was why so many people were using it for coding, which wasn't really a use case I had considered, but I saw a lot of people on Twitter using it for coding and searching for code, and a lot of people were using it to find the latest documentation for a certain software package or something else to help them write scripts or something else.
Josh: Yes, this is perhaps an obvious use case for technical users of ChatGPT . However, it is still impressive how well the product actually performs in this area.
In-depth research on application scenarios and user experience
Sonya: How do you see the adoption ratio between enterprise users and individual users? Over time, do you think the product will be more oriented towards business use or will it be mainly for ordinary consumers?
Isa: I think both will play an important role. This is a very universal technology that can meet the needs of users in work and personal life.
Josh: Yes, I am very excited about both of these applications. The biggest value of this product is saving users time . If a task usually takes hours or even days to complete , "Deep Research" can often provide 90% or more of the core information in a short time . Therefore, in the enterprise environment, the demand for such tools may be more prominent, but it will also gradually integrate into personal life and become an important assistant for daily decision-making.
Lauren: Indeed, when I use ChatGPT , I almost always choose Deep Research mode over Normal mode. What are your observations in terms of consumer use cases?
Isa: I find the product particularly useful for shopping and travel planning. I personally use it a lot to find relevant information. We were at the launch of Deep Research in Japan a few months ago , and it was great for finding restaurants that match specific requirements, as well as finding information that is difficult to find with regular search engines.
Josh: Yes, especially when it comes to high-value decisions, such as buying expensive items, planning special trips, etc., people tend to invest a lot of time in in-depth research. Personally, before buying a product, I usually browse various reviews and forums to get as comprehensive information as possible. Deep Research can quickly integrate this data and provide users with more accurate and efficient analysis, so it is extremely valuable in such tasks.
Isa: The model is very good at following instructions. So if a user's query is multifaceted or contains multiple questions, for example, the user not only wants information about a certain product, but also wants to compare it with other products, or even needs to find user reviews from Reddit or similar websites, the model will be able to accurately execute these requests. Users can make multiple different requests, and the system will process them according to the instructions and provide detailed analysis results.
Josh: In addition, a recommended usage is to ask the model to present the results in a tabular format. Although it will usually do so automatically, if the user explicitly requests a tabular data presentation, such as listing the citation information and related content of the research object, this will greatly improve the readability of the information and the efficiency of analysis.
Isa: Yes, and we hope to further expand the functionality in future products. For example, the underlying architecture of the model already has the ability to embed images, so it can be used to find images of products. In addition, although this is not the main consumer application scenario at present, the model can also generate data charts and embed them into the final analysis report. We look forward to ChatGPT realizing this function as soon as possible in the future.
Sonya : This sounds like a product designed for technical users.
Isa: That’s true.
Josh: Speaking of technical users, a very promising application scenario is personalized education. Suppose a user wants to learn more about a topic, such as tutoring in biology or getting detailed information about global hot events, this model can meet this need very well. Users can enter what they want to learn, and the system will automatically organize all relevant information and generate a structured report, making the learning process more systematic and efficient.
Isa: One of my friends is currently preparing to set up an ACP company. He has been using the model to search for similar products in the market and evaluate the availability of related brand names, such as whether a domain name has been registered, how big the market is, etc. The whole process is very interesting. Whenever he generates a new report, he will share it with me, and I will also read these analysis results carefully.
Josh: Another interesting application scenario is to find single facts that are rare and difficult to obtain on the Internet. For example, if a user wants to find a specific plot or related information in a less popular TV series, it is usually not directly found on the search engine homepage, but the model can dig deep into the Internet data, find the corresponding reference materials, and compile an accurate answer.
Isa: I can also share a real-life example: the father of one of my brother's friends asked a very specific question about an Austrian general who came to power because of a specific event in a certain battle. This kind of information is difficult to find on ordinary search engines, and ChatGPT 's early answers were also wrong. He was so convinced that he went to the public library to check historical materials and finally confirmed that ChatGPT 's answer was indeed wrong. Then he used "Deep Research" to conduct a deep query and finally found the correct answer. When we sent him the results of the research, he was very surprised and excited about it.
Sonya: From the current application scenarios, what types of thinking models does Deep Research mainly apply to? How should users effectively use this model?
Josh: Deep Research is particularly good at conducting in-depth analysis of user needs and providing detailed answers. To get the best results, users need to read a lot of relevant Internet materials. If the query is vague, the model can help users further clarify their needs; but if the user already knows that they need to find specific information, then the advantages of the model can be maximized.
Isa: In addition, the model is also very capable in information integration. It is not only good at searching for specific information, but also able to summarize new insights from existing data. Although it does not yet have the ability to make scientific discoveries independently, it has performed well in analyzing existing data. In actual use, for example, when it comes to programming-related queries, if the query content belongs to the model's existing knowledge (such as common programming problems or code snippets) , it usually provides accurate answers without additional training data. Personally, I prefer to use o- series models, such as o1 Pro or o3 Mini High, when coding to obtain more accurate technical support.
Lauren: This sounds very consistent with some of the new product directions that OpenAI has recently introduced. Can you tell us more about how this model works?
Isa: Deep Research uses a fine-tuned version of OpenAI's most advanced reasoning model, o3 . We trained it specifically for complex web browsing tasks and other difficult reasoning tasks, making it excellent at analyzing and integrating information. In addition, the model can call browsing tools and Python computing tools to provide stronger support for information collection and data processing. Through continuous training and optimization, the model has been able to effectively deal with complex problems and generate systematic analysis results.
Josh: From an intuitive point of view, the model works like this: when a user enters a query, the system first thinks deeply and develops an information search strategy. It then retrieves relevant data, extracts key information, understands the relevance of the content to the query, and decides the next search direction based on the results. The entire process goes through multiple rounds of optimization to ensure that the final report is well-structured and logically clear, with complete references and data sources.
Isa: The main difference between Deep Research and traditional search engines is that it can not only perform search tasks, but also has highly flexible reasoning capabilities . Because we use end-to-end training, the model can dynamically adjust its strategy during the research process instead of relying on a preset query process. This makes it more flexible to adapt to different situations when faced with unpredictable information retrieval tasks. Users can read the model's " Chain -of-Thought Summaries " during the query process , which details how the model derived each step of the search strategy, which is especially valuable in complex research tasks.
Sonya: John Carlson posted a widely read tweet, in which he argued that the core capability of Deep Research comes partly from its real-time access to Internet content and partly from its Chain of Thought . How do you evaluate the role of these two in the model ?
Isa : In fact, the combination of these two aspects is the key to the success of "Deep Research" . Although many existing search products have information retrieval capabilities, they are not flexible and targeted when handling complex queries because they have not been trained for end-to-end optimization. " Deep Research " relies on a fine-tuned version of the o3 model, which has strong analytical capabilities. At the same time, it combines the reasoning chain formed by the underlying o3 training, enabling it to show a high degree of creativity in information integration. Therefore, I believe that its core competitiveness comes from the combination of these two capabilities.
Josh: Before joining OpenAI , I worked at a startup company where our research focused on how to build an AI agent . At the time, many researchers on the Internet were exploring similar construction methods, with the core idea being to build a workflow graph in which some nodes were controlled by a language model.
That is, the language model can decide which task to perform next, while the overall logical framework is defined in advance by humans. Although this method can quickly build a prototype, its limitations quickly become apparent in practical applications. Because in the real world, the model may encounter various unpredictable situations, and traditional preset flowcharts are difficult to cover all possible branch paths. In addition, at these decision nodes, language models are often not the best decision makers, because they are not specially trained for such tasks, but make decisions based on existing language reasoning capabilities.
Therefore, I think the real advantage of "Deep Research" is that it is trained directly on the specific problems faced by users, rather than relying on a preset process structure. In this way, it can adapt to complex practical application scenarios more flexibly.
Lauren: This way, users don't have to manually build a schema or a flowchart in the background to make decisions.
Isa: Yes, the whole system is completely driven automatically by the model.
Sonya: Can you expand on that? This seems like an important decision you made in product design, and it's proven to work well. Currently, many companies are building applications on top of OpenAI 's API to help users complete specific tasks. So, in all of these applications, is it more appropriate to use a trained end-to-end model to optimize a specific workflow?
Isa: It depends on the specific application scenario. If a workflow is highly standardized and predictable, then it is reasonable to build an agent in the way Josh describes . However, if the task involves a lot of edge cases or requires highly flexible reasoning capabilities, then an approach like "Deep Research" may be more appropriate.
Josh: Yes, the advice I usually give is to minimize the rigid constraints on the model. For example, if certain data should not be accessed by the model, or certain operations need to be strictly controlled, then handwritten logic can be used to implement these restrictions. However, in the process of optimizing the model, an important experience we have repeatedly learned is that many times, people think that they can build smarter logic than the model through handwritten code, but in fact, with the development of technology, models can often come up with better solutions than humans.
One of the basic principles of machine learning is: " You get what you optimize. " If you can directly optimize the final goal of the system, rather than manually stitching together multiple subsystems that are not end-to-end optimized, the final result will often be superior. Therefore, I believe that introducing reinforcement learning to fine-tune the model may be a key step in building an efficient agent .
Sonya: What was the biggest technical challenge in achieving this?
Josh: From my perspective, as a member who joined the team later, I observed that one of the most critical success factors of the project was the control of data quality. ESA and other members of the team invested a lot of effort in this aspect, and the data quality almost determined the performance of the final model.
In the field of machine learning, people often re-recognize the fact that the performance of the model depends largely on the quality of the input data . If the training data quality is low, then no matter how advanced the model architecture is, the final effect will be difficult to achieve the expected results.
Isa: In addition, finding the right talent is also crucial. For example, Edward Son in our team played a key role in data optimization. He was able to ensure the quality of training data and provide the most suitable data set according to different task requirements. This is also one of the important factors for the success of the project.
Lauren: In other words, finding your “Edward” means success?
Josh: You can say that. Excellent machine learning and model training rely heavily on the contributions of data experts.
Lauren: How do you ensure the reliability of model output?
Isa: This is something we focus on in product design. Our goal is to enable users to trust the content generated by the model. To this end, we use a variety of mechanisms, such as providing citation support so that users can trace the source of information cited by the model. In addition, during the training process, we try to minimize hallucinations and avoid the model from generating wrong or unverified information.
Of course, we are still optimizing this. Although the model has a strong credibility, it is still possible to make mistakes or cite less authoritative sources of information. Therefore, how to further improve the reliability of the model is the focus of our continuous improvement.
Sonya: How should we understand the relationship between "Deep Research" and o3 and Operator ? Do they share the same architecture or are they independent?
Josh: Currently, they are independent products. However, our ultimate goal is to build a truly general AGI Agent . In the future, the system will not only be able to perform web searches, but also complete various computer operations, or even any tasks that human assistants can complete. And we hope that it can integrate these capabilities in the most natural way to achieve more powerful automation capabilities.
Sonya: What other less obvious but crucial design decisions did you make during the development process?
Isa: One of the key decisions was to optimize the user interaction process. We found that if the model can clarify the query content to the user before answering the question, the final answer quality will be higher. Therefore, in " Deep Research " , we specially designed an interaction mode so that the model will proactively ask questions to the user before officially starting the research to ensure the clarity of the question.
In contrast, traditional ChatGPT may only ask users if they need additional information after they have answered. However, in " In-depth Research " , we deliberately put this step in front to improve the accuracy of the answer. This change allows the model to understand the user's needs more comprehensively. Even if the research process takes 5 to 30 minutes, the final report will be more detailed and reliable.
Isa: Interestingly, I have seen some users on Twitter have figured out a workflow on their own: they will first use GPT-4 or GPT-4 Pro to refine the query content to make it more specific and detailed, and then submit the final optimized query to " Deep Research " for in-depth analysis. This shows that users are exploring a workflow that suits them, and this approach can indeed improve the quality of research.
Lauren: You’ve launched several versions of your “Deep Research” product over the past few months . What are the differences between the different versions and how should we understand them?
Sonya: They are all called “Deep Research” , right?
Josh: A lot of people are asking, how does this "Deep Research" product work? Indeed, there aren't many creative ways to name products in this space. I think the best way is to let users experience it for themselves so that they can get an intuitive feeling.
Although different versions of Deep Research have their own merits and demerits in quality, the differences between them are obvious. Ultimately, these differences mainly stem from how the models are trained and how the datasets are constructed. In addition, we rely on the O- Series model engine and optimize it to make it a truly intelligent, high-quality research tool.
Sonya: Last year, the o1 team was on our podcast and we joked that OpenAI wasn’t very good at naming its products. Now, it seems like you guys are pretty good at it.
Josh: At least the name “Deep Research” accurately summarizes its core functionality.
Lauren: So, please share your plans for the future. How do you expect the product to look like in a year? What new features or improvements can you expect along the way?
Isa: We are very much looking forward to expanding the data sources that the model can access. Currently, the model is mainly used to browse public information, but in the future it should be able to retrieve private data. In addition, we hope to further improve its analytical capabilities so that it can perform better on more complex research tasks.
Josh: From a broader perspective, we hope to further integrate this technology into OpenAI 's AI Agent roadmap. In fact, the design concept of this model can be extended to many application scenarios and bring surprising results.
The core idea is to take advantage of the most advanced reasoning models, enable them to use the tools that humans use in their daily work and life, and optimize directly for the desired goal. This approach is not only applicable to current research tasks, but can also be extended to more complex task areas. Therefore, I believe that although the current AGI is still a problem to be solved, this general formula still has a lot of room for development and optimization potential.
Future Outlook: The Evolution of Deep Research and the Rise of AI Agents
Lauren: Sam once said something thought-provoking: “ Deep Research will account for a small portion of all economically viable and valuable tasks in the world. ” What do you think of this view?
Josh: I don’t think Deep Research will completely replace all jobs, but it can significantly reduce the time required to perform certain tasks. For example, it can help users save hours or even days of research time.
On this basis, the AGI Agent we will develop in the future will be further optimized based on in-depth research, so that users' research and analysis task efficiency can be improved by 1% , 5% , 10% or even 25% . The specific gain depends on different work types and application scenarios.
Sonya: I feel like it automates 80% of my work, actually.
Lauren: To me, this is definitely a high-end tool.
Josh: It seems like maybe we should start charging.
Sonya: Do you think certain professions are at greater risk? For example, in the consulting industry, the application of deep research may have a greater impact. Which specific job categories do you think are most likely to be affected?
Josh: Yes, I have a background in consulting. Although many people are concerned about AI replacing certain jobs, I don’t think it’s a complete replacement of the workforce. Instead, I think of it more as an enabling tool that enables knowledge workers to complete tasks more efficiently.
Many professions require a lot of time to review information, integrate data and draw conclusions, and Deep Research can provide strong support in this process, thereby empowering people.
Isa: I am particularly interested in applications in the medical field. Currently, many doctors are already using Deep Research to find the latest medical literature or to query the latest clinical research for specific conditions. I have seen many doctors share their experience on social platforms, and some even contacted us to say that they have successfully used the tool to help their patients find appropriate clinical trials.
This is undoubtedly a great help for doctors who have busy schedules and find it difficult to find time for in-depth research.
Josh: In fact, the impact of this technology may be more profound than it seems on the surface. It not only helps you save 5% of your time, but it can greatly optimize some tasks that usually take 4 to 8 hours to complete. For example, now you can complete a research task provided by ChatGPT subscription in just five minutes .
If time is no longer a limiting factor, what would you choose to do? For example, you could research every potential startup opportunity in depth, not just those projects that you have time to personally approach. This time optimization will have a profound impact on personal and business decision-making.
Sonya: Exactly. Take a consumer scenario, for example, a busy mother who needs to plan a birthday party for her child can now use deep research to quickly plan the best plan. In the past, this might have taken hours or even days of searching and comparing.
Lauren: It really makes a lot of things possible that weren’t possible before.
Isa: Absolutely right.
Sonya: Will Deep Research 's AGI Agent capabilities change the way we learn? How will you educate your children in such a technological environment?
Josh: Yes, education has always been an important area for AI applications. I believe that AI -driven conversational learning models are more engaging than traditional reading textbooks and can enable a highly personalized educational experience .
The artificial intelligence system can adjust the learning content based on user feedback and provide targeted knowledge explanations. This interactive learning method can not only improve learning efficiency, but also stimulate learning interest.
Quick Questions and Answers
Lauren: Next, we'll move on to a quick Q&A session.
Josh: Okay.
Sonya: What is your favorite application scenario of Deep Research ?
Josh: Personalized education. Whether it is learning new knowledge or delving into a specific field, it can provide great help.
Isa: The use cases that interest me most are when users use Deep Research to find out about medical diagnoses for themselves or their family members. These real personal stories are very impressive.
Sonya: Last year, we saw some breakthroughs in AI applications, such as code generation becoming an obvious breakthrough point. Which application categories do you think will see significant progress this year?
Isa: I think AI Agent is undoubtedly the next breakthrough point.
Sonya: In 2025 , AI agents will become mainstream .
Lauren: Yes, that's true. So what resources would you recommend people read to get a deeper understanding of AI agents or trends in AI in general? Are there any authors or training courses that you recommend?
Sonya: What about the data?
Isa: Maybe start with this podcast?
Josh: It is not easy to keep up with the technological progress in the field of artificial intelligence. My suggestion is to choose one or two sub-fields that you are really interested in and organize a learning list around them. This list can include core researchers in the field, relevant papers, forums or discussion communities, and how to obtain relevant resources.
In fact, this is itself a good application scenario for "Deep Research" . For example, users can use "Deep Research" to dig deeper into topics of interest to obtain the most comprehensive and accurate information.
Isa: Although this book is a few years old, I still recommend Foundations of Reinforcement Learning by Piece Reveal . I think this book is a very systematic introduction to the theory and application of reinforcement learning and is a good introductory book.
Josh: If my graduate advisor Peter published a book on this topic, I would definitely strongly recommend it.
Isa: That’s true.
Sonya: Reinforcement learning seems to have had its ups and downs — it had its peak, then it went quiet, and now it’s back in the mainstream. Do you think this is a correct interpretation of the development trend of reinforcement learning?
Josh: Yes, reinforcement learning has been somewhat overlooked in the past few years.
Sonya: So why is it resurfacing now?
Josh: This is mainly because other key technologies in AI have made breakthroughs. If you have been following this field for a while, you may remember John Lagou ’s “ cake metaphor ” . He compared the AI training process to making a cake:
• Unsupervised learning is the bulk of the pie;
• Supervised learning is the icing on the cake;
• Reinforcement learning is the cherry on top.
In 2015 and 2016 , we focused on reinforcement learning research, but progress was limited because the " cake " was not yet formed. Today, with the maturity of language models, we already have powerful models pre-trained on large-scale data, and have the ability to perform supervised fine-tuning on these models to make them better at following instructions and performing specific tasks.
Therefore, reinforcement learning has finally reached the right time to develop and can be optimized for various clear reward functions, making AGI agents and complex decision-making systems more efficient and feasible.
Sonya: From this interview, AI Agent will become the most breakthrough technology category in 2025 , and reinforcement learning will return to the mainstream. This is an exciting trend!