In-depth analysis: Why is Agentic Browser the next stop for Universal Agent?

Written by
Audrey Miles
Updated on:June-13th-2025
Recommendation

In-depth analysis of how Agentic Browser leads the new trend of AI technology.

Core content:
1. Industry consensus and technological innovation of Agentic Browser
2. The "AI Renaissance" of browser form and the layout of industry giants
3. The difference between Agentic Browser and traditional AI Browser and its future prospects

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)


01  / Origin

The AI ​​technology circle in 2025 is still noisy, but a new eye of the storm is quietly forming. While most people still classify agents into general and vertical categories based on field applicability, an emerging agent species based on technology carriers is quietly forming an industry consensus - Agentic Browser.


From the highly anticipated Comet and Dia overseas, to Fellou and Doubao in China, as well as traditional browser manufacturers Chrome, Quark, and QQ Browser, an "AI Renaissance" around browsers is ready to take off. Even OpenAI poached Chrome at the end of 2024, and it is rumored that it is secretly building its own browser as a new technology carrier for ChatGPT.


Faced with such rapid changes in the industry, I can't help but think:

  • In the first year of Agent, Agent applications should have flourished, with various APPs and Web applications emerging one after another. Why did an "Agentic Browser" suddenly appear?
  • What force is quietly driving this?
  • What does Agentic Browser, a somewhat unfamiliar term, mean? How is it different from the AI ​​Browser we often talk about?
  • Why do those players who have been deeply engaged in the field of general agents all focus on the seemingly traditional track of browsers?
  • Agentic Browser, will it really be the next stop in the evolution of general agents?
  • How can this "antique" born in the last century carry the industry's ultimate imagination of general AI Agent?


This article attempts to answer the above questions through research and system analysis. It has more than 9,000 words and takes about 15 minutes to read.

❄️

30-second quick reading version (it’s the same whether you read it or not) :

  • Ecological Cage: Traditional operating systems and browsers are using their ecological hegemony to limit the capabilities and development of general AI agents
  • Paradigm shift: The core of Agentic Browser is "taking actions on behalf of users" rather than just "assisting users in browsing"
  • Key battlefield: The essence of the future AI competition is the competition for control over the user's cross-application and full-link "context"
  • Best carrier: Browsers are the natural carrier of universal agents due to their universal content, user habits, and cross-application capabilities.
  • Ultimate Path: The future of Agentic Browser is to become a brand new AI operating system and give birth to a customized hardware ecosystem

Welcome to follow me and chat with me about AI




02  / The story starts with Perplexity

Source: Public interview with Perplexity CEO, slightly modified to not change the authenticity of key information

To understand the beginning of this change, we might as well turn our attention to a company called Perplexity.


In early 2024, Perplexity founder Aravind Srinivas stood in Motorola's conference room. His presentation was full of visions for the future, trying to convince the veteran mobile phone manufacturer to set Perplexity as the default AI assistant for its new mobile phones. However, just after the presentation, a warning call from Google was like a bucket of cold water, extinguishing the spark of cooperation: "If you pre-install Perplexity, Motorola may lose the official authorization of Android and the right to use the Play Store." Under such pressure, the cooperation instantly vanished.


This is not an isolated case. We have seen that Microsoft's Windows system has deeply bound Copilot, and users cannot even uninstall it easily; Apple's iOS ecosystem has set up layers of permission barriers for third-party AI assistants. Aravind Srinivas is keenly aware that traditional operating system manufacturers are invisibly strangling the living space of AI innovators by virtue of their powerful "ecological hegemony".


What makes the Perplexity team even more troublesome is that the traditional browsers we use daily, such as Chrome, are like a data fortress (the browser's same-origin policy), which firmly imprisons users' valuable data in the "enclosed garden" of each website owner. This means that a general agent like Perplexity cannot access users' real shopping records, social dynamics, and even cannot perform basic cross-website tasks such as "help me compare the prices of several hotels".


At an internal meeting, the data presented by engineers revealed a cruel reality: when a user searches for "book me a trip to Bali", Perplexity can only provide some general, non-personalized suggestions because it cannot access the internal data of platforms such as Booking.com or Skyscanner that the user has logged in. In contrast, Google can easily call up calendar information in Gmail and historical footprints in Maps to directly generate highly personalized travel plans.


“It’s like we’re dancing in shackles.”

Aravind Srinivas's sigh expresses the helplessness of many AI Agent developers.


The turning point came in the U.S. Department of Justice's antitrust lawsuit against Google. When the media exposed the proposal of "forcing Google to divest the Chrome browser", the Perplexity team smelled the breath of opportunity and held a strategic seminar overnight. They have a deep insight that the browser is the key to breaking the existing pattern of operating systems and unleashing the potential of AI Agents. It can not only cleverly bypass the pre-installation blockade of iOS and Android systems, but also has the potential to legally and compliantly obtain users' full-dimensional data - from browsing time, private transaction information, to complex cross-site behavior patterns - all of which will become the "fuel" that drives AI Agents to act efficiently.


Just three months later, a browser called Comet was officially launched to the public. Its ambition is not just to become a better information portal, but to become an AI operating system.


❄️

Comet, the Agentic browser, is not just an upgrade of a tool, but is also seen by Perplexity as a "Normandy landing" to fight against the hegemony of operating systems and realize its AI Agent vision . From this story, we can vaguely glimpse the ideal of  a general agent , the limitations of AI search , the transitional form of traditional AI browsers , and the disruptive potential of Agentic Browser .    


03  / What are Universal Agent, AI Search, AI Browser, and Agentic Browser?


Before we delve into why Agentic Browser may be the next stop for universal agents, we need to clarify a few confusing concepts:


  • General Agent :

We can understand it as an intelligent entity with autonomous understanding, planning, and execution capabilities. Its goal is to be able to provide assistance in a wide range of fields and various tasks like a human assistant, rather than being limited to specific functions. It emphasizes autonomy and versatility. Representative products are: ChatGPT, Manus, Flowith, Doubao, etc.


  • AI Search :

This focuses more on using artificial intelligence technology to improve the relevance and presentation of search results. For example, understanding users' more complex query intentions through natural language processing, or directly giving answers in the form of summaries, questions and answers, etc., rather than just listing links. Perplexity's original form was closer to an advanced AI search engine. Its core lies in information acquisition and understanding. The domestic Mita search also belongs to this category.


  • AI Browser :

This usually refers to browsers that integrate some AI functions on the basis of traditional browsers. For example, AI assistants in the sidebar, web content summaries, smart translation, etc. They have improved browsing efficiency and experience to a certain extent, but their core architecture and interaction mode have not changed fundamentally. AI browsers are more like adding a navigation system to existing vehicles to improve assisted driving capabilities. Domestic browsers such as QQ Browser and Quark have always belonged to this category.


  • Agentic Browser :

This represents a deeper evolution. Agentic Browser not only integrates AI as an auxiliary tool, but also regards the browser itself as a platform and environment for agents to perform their tasks. It emphasizes that the browser empowers agents to have stronger context awareness, task execution and cross-application operation capabilities. It aims to enable agents to actively and deeply "act" in the browser environment, rather than just passively responding to user instructions.



The simple understanding is:

  • Universal Agent is everyone's "universal AI assistant" .
  • The core of AI search is to "better search for the most timely, accurate and in-depth information and answer questions . "
  • AI browser focuses on "better assisted browsing" .
  • The key to Agentic Browser is "acting better on your behalf" .


❄️

If the general agent represents the industry's ultimate vision of AGI product capabilities (an intelligent entity that can understand and perform any task), AI search focuses on information acquisition capabilities, allowing machines to "know" better; and the browser, as a universal carrier of the digital world, carries the user's context acquisition, understanding and action capabilities across applications and scenarios. Agentic Browser is the best fusion of the current general agent and AI search capabilities.


04  / Why is the universal agent carrier a browser, not an APP or the Web?


There are more and more signs that browsers, the Internet portal that we can hardly live without in our daily lives, are gradually showing great potential to become the best carrier of universal agents. This is not accidental, but is determined by the browser's own unique attributes and the key role it plays in the digital ecosystem. We can examine this trend from several key dimensions.

  1. 1. Ability to control context 
  2. 2. Local OS control capabilities 
  3. 3. Cross-application connectivity 
  4. 4. Browsers are natural universal agent carriers 
  5. 5. Browsers are the best path for latecomers to OS and terminals 


A: The battle in the Agent era is not just a battle for attention, but also a battle for context control.

In the era of the rise of agents, it is not enough to simply compete for the user's attention. The deeper level of competition lies in the control of "context". Context is the cornerstone for agents to understand user intentions, provide personalized services, and efficiently perform tasks. It is even the top priority for obtaining end-to-end feedback signals from users under the RL-enhanced model training paradigm. In this regard, the depth and breadth of context that browsers can provide is difficult for other application forms to achieve.


Context Depth

The browser is the most direct and persistent window for users to interact with the digital world. Every click, every input, the length of time a user stays on a page, and even the speed and trajectory of the mouse scrolling, all contain rich behavioral habits, potential preferences, and immediate needs.


  • User preference context : Since Agent can perform more diverse tasks in the browser environment, it can collect richer user behavior data. By analyzing the time series of this data, it can build accurate user portraits and dynamic preference models, which is far more feasible than single-function apps.
  • For example, some cutting-edge Agentic Browser explorations, such as  the VIEP (Visual Interaction Element Perception) technology proposed by Fellou, attempt to determine the user's attention intensity on different elements on the page by analyzing the acceleration curve of the user's mouse trajectory - is it a quick swipe or a long hover? This meticulous observation enables AI to dynamically adjust its intervention strategy and provide more appropriate assistance. 
  • Another example is the "smart cursor" concept proposed by Dia browser, which attempts to convert the user's word highlighting action into semantic tags (for example, yellow highlighting may represent "important arguments" and blue highlighting may represent "questionable content"). This method can build a more fine-grained user intent map for AI, which is much more accurate than simple keyword search.


Breadth of context

Browsers naturally have the ability to cross application boundaries.

  • It can record all web pages that the user has visited in the past, no matter which website or platform these web pages belong to (such as Github, Taobao, Ctrip, etc.).
  • It is aware of all the tabs that the user currently has open and understands the tasks that the user may be working on in parallel.
  • It can record historical conversations between users and the built-in AI assistant to form a continuous interactive memory.
  • Furthermore, through user authorization and native login interaction, Agentic Browser even has the potential to obtain user context information within each isolated application.
  • Not only that, it can also connect to applications on the user's local computer, such as calendar, email client, local file system and even memos, truly realizing the comprehensive connection of digital life context.


❄️

This deep and wide context acquisition capability makes the browser an unparalleled "perception organ" . It allows the agent to understand the user's real needs and environment in a more complete and dynamic picture, instead of being a blind man touching an elephant, so as to make more intelligent decisions and actions.


B: Local OS control capability is a must-have capability to solve all-scenario problems


If a general agent wants to truly realize its "general" value, it is far from enough to just stay in the cloud or in a sandboxed application. It needs to have the ability to interact more deeply with the user's computing environment, including the ability to control the local operating system. In this regard, browsers, as a special application, show unique advantages in reaching local resources compared to traditional apps and pure web services.


Let’s take Fellou’s architecture as an example. It clearly demonstrates how to achieve deep control and collaboration of local resources by integrating the three capabilities of browser, agent, and workflow automation.


  • Direct access to the operating system and file system : Traditional web applications are limited by the browser's security sandbox, and access to the local file system is strictly restricted. However, the emerging Agentic Browser is trying to break through this limitation. For example, Fellou's design allows it to interact directly with the operating system, control the file system, and even call locally installed applications such as calendars, email clients, and even command line tools. This makes the browser no longer just a web page display, but has transformed into a "super terminal" that connects all aspects of the user's digital ecosystem.
  • Flexible hardware resource scheduling : In order to efficiently execute tasks, Agentic Browser also needs to have the ability to intelligently schedule hardware resources. The  Hybrid Shadow Workspace technology proposed by Fellou is an interesting attempt. It can dynamically allocate computing resources according to different types of tasks: 
    • Local instant response : For short tasks that require quick response, such as "help me create a new calendar event", Agent can directly call the resources of the local computer to complete it, ensuring low latency.
    • Local virtualized execution : For some tasks that are relatively time-consuming but require access to local data, such as "organizing my emails from the past week and generating a to-do list", the Agent can run in an isolated local sandbox environment. The advantage of this is that it can utilize local data without interfering with other operations currently being performed by the user.
    • Cloud desktop collaboration : For tasks that have low reliance on the local environment or require powerful computing power, such as "analyzing the activity of contributors to a large GitHub code repository," they can be seamlessly switched to the cloud for execution.


This deeper control over the local OS means that Agent can handle a wider range of more complex tasks for users. It is no longer limited to browsing and simple interaction with web content, but can really go deep into the user's workflow and become a more effective digital assistant. Perplexity CEO Aravind Srinivas has also clearly stated that the browser is the best container for building AI Agents. A key reason is that it has the potential to have operating system-level resource scheduling capabilities .


Fellou's team even plans to release a set of Agentic Browser evaluation benchmarks, which aims to build a set of tasks across different operating environments, different device types, and different application scenarios. They hope to use this set of benchmarks to systematically verify the actual ability of Agentic Browser to improve user productivity in real production environments. This undoubtedly also reflects from one side the industry's expectations for the browser's local control capabilities.


❄️

The ultimate vision is to develop Agentic Browser into a new AI operating system (AIOS) .

It is not only a carrier of applications, but also a reconstruction of the human-machine collaboration paradigm. Just like the Agent Store ecosystem, users can encapsulate their own experience and knowledge into vertical Agents in specific fields (for example, "cross-border e-commerce hot-selling product selection assistant") and share or use them on the platform. This will form an open platform similar to the "AI App Store", further strengthening its system-level expansion capabilities and the depth of integration of local resources.

OpenAI's GPTs is an Agent Store that has already taken shape. What it lacks is a powerful carrier.


C: Cross-application connectivity


In today's digital world composed of countless applications and services, information island effect and operational fragmentation are common pain points faced by users. One of the core missions of Universal Agent is to break these barriers and achieve seamless connection and collaboration across applications. The browser, with its unique ecological niche and technical characteristics, is becoming an ideal platform to carry out this mission.


Imagine how many of our daily tasks require switching between different apps or websites?

  • Booking a trip may require opening the Ctrip or airline app, hotel booking platform, map service, food reviews, and travel guide community at the same time.
  • To complete a market research report, you may need to collect information from multiple sources such as news portals, industry databases, social media, etc., and then summarize it in a document editing tool.

The cost of this switching and integration is often huge.


The emergence of Agentic Browser provides a new approach to solving this problem. It is not only an information aggregator, but also aims to be a coordinator of actions.


  • Browser controls local operating system and applications : As we discussed earlier, Agentic Browser is breaking through the limitations of traditional web pages and seeking deeper integration with local operating systems. This means that Agents can directly call locally installed applications through the browser. For example, users can issue instructions to the Agent in the browser through voice or text: " Please help me send this report I just downloaded to Zhang San by email, and remind him to discuss it in a meeting next Wednesday. " The Agent can understand the instructions, locate the file, start the email client, fill in the recipient, subject and body, and even automatically add attachments, waiting for the user to finally confirm the sending. Furthermore, through multi-agent collaboration protocols such as  MCP (Model Context Protocol) , the Agent in the browser can also interact with other agents or services that follow the same protocol, whether local or cloud, to achieve more complex atomic reorganization of cross-platform operations. 


  • Cross-site workflow automation : This is one of the areas where Agentic Browser can be most effective. We perform repetitive operations across different websites every day. For example, a content creator may need to regularly summarize and back up the content he or she publishes on Xiaohongshu, Douyin, and WeChat official accounts; a market analyst may need to crawl the price and sales data of competing products from multiple e-commerce platforms every day. Agentic Browser can automate these tedious, cross-site workflows through its built-in Workflow Automation layer.


  • Fellou has shown some striking cases, e.g.
    • "Automatically follow all Twitter bloggers mentioned in an article", which involves jumping from one content platform to another social platform to perform the following operation.
    • "Filter out graphics cards that meet specific conditions on Amazon and automatically add them to the shopping cart" is a typical e-commerce scenario automation.
    • "Automatically synchronize the top eight product information on Product Hunt every day to the Notion database", which realizes cross-application data integration and flow.
    • Fellou even developed a framework called Eko to achieve dynamic planning and resilient execution of tasks, enabling it to better adapt to dynamic changes such as website revisions and ensure the stability and reliability of the automated process.


  • Breaking the closed ecosystem of traditional giants : Large Internet companies tend to build their own closed ecosystems, where data and services circulate internally but are difficult to be efficiently used externally. Agentic Browser has the potential to break this situation to a certain extent with its "Deep Action" capability.
    • Accessing private sites : Traditional AI tools can only process publicly accessible web data. However, Agentic Browser can log in to private sites that require authentication, such as LinkedIn, Taobao, Feishu Documents, corporate intranet, etc., with user authorization, and perform more complex operations, such as "help me crawl all job postings on LinkedIn that match a specific job description."
    • Build an open intelligent ecosystem : Through the concept of Agent Store, developers can encapsulate their solutions for specific websites or specific tasks into independent agents and share them. For example, a "Taobao Precision Price Comparison Assistant" Agent can help users find the most cost-effective options in Taobao's complex product list. This open ecosystem indirectly challenges the platform's data monopoly and allows data and capabilities to flow and combine more freely.


❄️

With its status as the main entrance to the Internet and its natural affinity with various Web services, the browser is evolving into a powerful cross-application connection hub . It is no longer just a "gate" leading to various websites, but more like a busy "crossroads" and an efficient "traffic dispatch system" that allows information and operations to flow smoothly between different destinations .


D: Browser is a natural universal agent carrier


When we look beyond the specific technical features and examine the role of the browser in the entire digital ecosystem, we will find that it naturally has many qualities that make it an ideal carrier of a universal agent. This is not a deliberate exaggeration, but the result of its historical evolution and user habits.


First of all, the browser is a universal container for Internet content. The core concept of the birth of the World Wide Web is to connect the world's information through hyperlinks, and the browser is the client tool to realize this concept. Whether it is a news portal, social media, e-commerce platform, or online office, entertainment and audio-visual, almost all Internet services will eventually be presented in the form of web pages. This universality of content form makes the browser a natural "universal interactive interface" that does not require additional adaptation. Universal Agent If you want to travel freely in the vast digital world, the browser is undoubtedly the flattest and broadest starting point.


Secondly, browsers carry the inherent habits of a large number of users. Decades of Internet development have made users form deep-rooted habits of obtaining information and using services through browsers. When we encounter a problem, we will subconsciously open the browser to search; when we need to shop online, we will skillfully enter the address of the e-commerce website. This inheritance of user habits is crucial to the popularization of general agents. It is much easier to integrate the capabilities of agents into an environment that users are already familiar with than to force users to learn a completely new set of interaction paradigms. Agentic Browser stands on the shoulders of this "giant", allowing the capabilities of agents to reach a large number of users in a more natural and lower-threshold way.


Furthermore, the browser itself is a constantly evolving "small operating system". From the initial simple HTML rendering engine to today's powerful platform that supports complex Web applications, extensions, local storage, and hardware acceleration, the functional boundaries of the browser are continuously expanding. It has its own process management, memory management, and security mechanisms, and even begins to get involved in areas that originally belonged to the core of the operating system, such as file system access and device API calls. This "quasi-operating system" feature provides the "soil" and necessary "infrastructure" for general agents to exert their capabilities. Agents need an environment to perceive and tools to act, and modern browsers can provide all of this.


❄️

It is the universality of content, the inheritance of user habits, and the evolution of platform capabilities that together build a solid foundation for the browser as a natural carrier of universal agents. It is not limited to specific scenarios like APPs in specific fields, nor does it lack a direct interactive interface with users like pure background services. The browser, a seemingly "old" tool, is showing new vitality and possibilities in the new round of technological waves.


E: Agentic browser maker has the potential to become another Apple


When we talk about browsers as carriers of universal agents, we can look further ahead. The future evolution of browsers is not limited to becoming a smarter Internet tool. It even has the potential to develop into a new ecological core in the digital age along the proven path of "browser → operating system → hardware", and has the imagination space to become "another Apple".


Looking back at the history of technology, the emergence of Chrome OS and Chromebook has shown us the technical feasibility of this path. By combining its powerful Chrome browser kernel with the underlying Linux, Google has successfully created a lightweight, cloud-first operating system and spawned a new hardware category, Chromebook. Although the ecological scale and market share of Chrome OS are still far behind those of Windows or macOS, it clearly proves that: with the browser as the core, building the operating system upwards and then extending downwards to customized hardware, this path is feasible.


So, what new imagination does the emergence of Agentic Browser add to this path?


  • From "information portal" to "task center" : Traditional browsers play more of a portal role for obtaining information. Agentic Browser, by deeply integrating Agent capabilities, is transforming browsers into "task execution centers" for users' digital lives. When the browser can proactively understand user needs, intelligently plan tasks, and perform operations across applications, it is actually playing a role of a part of the operating system - managing the user's digital task flow.


  • "AI-first" operating system kernel : The kernel design concept of the future Agentic OS may change from the traditional "application first" or "file first" to "AI first" or "task first". This means that the operating system's resource scheduling, interaction design, security model, etc. will be built around how to better support the operation of Agents and how to more efficiently complete user-commissioned tasks. This may give rise to a brand-new operating system architecture and human-computer interaction paradigm.


  • Customized hardware for Agents : Once the operating system ecosystem based on Agentic Browser matures, it is only natural that customized hardware will appear to provide the ultimate performance and experience. These hardware may integrate units specifically used to accelerate AI computing at the chip level, may be equipped with more advanced sensors to enhance the Agent's perception of the physical world, and may also emphasize multimodal interaction and immersive experience in design. Imagine that the core competitiveness of the future "AgentBook" or "AgentPad" may no longer be the traditional CPU frequency or memory size, but the intelligence level and task execution efficiency of its built-in Agent.


  • Rebalancing between openness and closure : Apple's success is largely due to the ultimate experience and high profit margins brought by its closed ecosystem that integrates software and hardware. The new path led by Agentic Browser may also seek a new balance between openness and closure. On the one hand, it may remain open like the Web and attract a wide range of developers to participate in building the Agent ecosystem; on the other hand, it may also build a differentiated competitive advantage by controlling core Agent capabilities, operating system features, and hardware design.


❄️

This is not a far-fetched fantasy. When the browser can deeply understand our intentions, manage our digital lives, and seamlessly connect the cloud and local, software and hardware, it has the potential to become the core of the next generation of computing platforms. Just as personal computers and smartphones defined an era respectively, a new species starting with Agentic Browser may be nurturing the huge energy to define the next era. This road from the browser, through the operating system, and finally to customized hardware is full of challenges, but also full of exciting possibilities.



04 / Summary


The direction of the tide is clear. From the difficulties and breakthroughs of Perplexity to the brave explorations of pioneers such as Fellou and Dia, it is not difficult to find that Agentic Browser is not a temporary concept hype, but an inevitable choice for general agents to seek a better habitat and release greater potential .


It is not just an extension of AI search or a simple upgrade of traditional AI browsers. The core of Agentic Browser is to transform the browser from a passive "information window" to an active "smart workshop". It gives Agent unprecedented depth and breadth of context perception, opens up the key link between the local operating system and cross-application services, and inherits the browser's natural content versatility and huge user base.


In this revolution, the battle is no longer about user attention, but about the deep control of the "context" of the user's digital world and the "action" based on it . Browsers, with their unique ecological niche, are becoming the core battlefield of this battle.


In the future, Agentic Browser may be able to follow the footsteps of Chrome OS and grow into a new AI-first operating system, extending downward to smart hardware tailored for agents. This is not only a reconstruction of the existing digital ecosystem, but also a profound innovation in the human-computer collaboration paradigm.


The road ahead is long and arduous, but now that we know the direction, we can forge ahead regardless of the wind and rain. The story of Agentic Browser has just begun. We have reason to believe that the door to a smarter and more autonomous digital world is slowly opening.


Finally, I predict that OpenAI's Agentic Browser will be officially released to the public before the fall. I'll post this as proof and wait for your slap in the face.