Browser Use - Let AI control your browser

Written by
Clara Bennett
Updated on:June-25th-2025
Recommendation

Explore how AI revolutionizes the browser interaction experience. Browser Use takes you into a new era of automated web operations.

Core content:
1. The connection between AI and browsers: the innovative integration of Browser Use and its trends
2. The core functions of Browser Use: automation, visual perception, multi-tab management, etc.
3. Quick start guide: installation, settings and integration with large language models

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)
1. Introduction

As technology develops rapidly, artificial intelligence (AI) has penetrated into every aspect of our lives, and the connection between AI agents and browsers is gradually becoming a new trend in the development of the Internet, triggering the public's infinite expectations for changes in the Internet experience. This innovative integration is like adding intelligent wings to traditional browsers, which will bring us unprecedented Internet interactive experience.


This article will introduce an automation framework that allows you to easily connect your AI agent to the browser and access various websites through the AI ​​agent - Browser Use



2. Introduction

Browser Use is the easiest way to connect your AI agent to a browser. It allows AI agents to access a variety of websites by providing a powerful and easy-to-use browser automation interface.

Features:

  • Powerful browser automation capabilities: Browser Use combines advanced AI capabilities with powerful browser automation technology to achieve a smooth and seamless web interaction experience for AI agents.

  • Visual Perception and HTML Structure Extraction: Combines visual understanding capabilities with HTML structure extraction capabilities to achieve comprehensive web page interactivity.

  • Multi-tab management: Automatically handle multiple browser tabs to accommodate complex workflows and parallel processing needs.

  • Element Tracking: Extract the XPath (path expression) of the clicked element and repeat the exact same Large Language Model (LLM) operation for consistent automation.

  • Custom actions: Add your own actions, such as saving to a file, performing database operations, sending notifications, or processing human input.

  • Self-correction: It has intelligent error handling mechanisms and automatic recovery functions to ensure the robust operation of automated workflows.

  • Support for any large language model: Compatible with all large language models based on LangChain, including GPT-4, Claude 3, and Llama 2.


Official website address:

https://browser-use.com/



3. Get started quickly

1. Installation and Agent Setup

1. Browser Use requires Python 3.11 or higher.

pip install browser-use

2. Install Playwright.

playwright install

3. Create an agent.

You can then use the agent as follows:

from  langchain_openai import  ChatOpenAI
from  browser_use import  Agent
import  asyncio
from  dotenv import  load_dotenv
load_dotenv()

async  def  main () :
    agent = Agent(
        task= "Compare the price of gpt-4o and DeepSeek-V3" ,
        llm=ChatOpenAI(model= "gpt-4o" ),
    )
    await  agent.run()

asyncio.run(main())

4. Set up your Large Language Model (LLM) API key.

ChatOpenAI and other Langchain-based chat models require API keys. You can store these keys in your .env file.

OPENAI_API_KEY=

2. Browser Use + DeepSeek-R1

In this example, the author uses the DeepSeek-R1 model.

Visit DeepSeek's API open platform, purchase traffic, and create an API key.

https://platform.deepseek.com/

Sample script:

Put the imported agent (DeepSeek-R1) and API key in the same script file.

The operation steps are to open the shopping website, enter the account password to log in, view the product details, add to the shopping cart, and close the browser.

import  asyncio
import  os

from  dotenv import  load_dotenv
from  langchain_openai import  ChatOpenAI
from  pydantic import  SecretStr

from  browser_use import  Agent

# dotenv
load_dotenv()

api_key = os.getenv( 'DEEPSEEK_API_KEY' , 'sk-……)
if not api_key:
    raise ValueError('
DEEPSEEK_API_KEY is  not  set ')

async def run_search():
    agent = Agent(
        task=(
            '
1.  Visit https://www.saucedemo.com/ '
            '
2.  Enter username standard_user, password secret_sauce, and log in '
            '
3.  Click on the black T-Shirt to view details '
            '
4.  Add black T-shirt to cart '
            '
5.  Close the browser '
        ),
        llm=ChatOpenAI(
            base_url='
https://api.deepseek.com/v1 ',
            model='
deepseek-chat ',
            api_key=SecretStr(api_key),
        ),
        use_vision=False,
    )

    await agent.run()

if __name__ == '
__main__ ':
    asyncio.run(run_search())

At runtime, AI identifies page elements:

Console log information:

Complete operation process: