Browser Use - Let AI control your browser

Explore how AI revolutionizes the browser interaction experience. Browser Use takes you into a new era of automated web operations.
Core content:
1. The connection between AI and browsers: the innovative integration of Browser Use and its trends
2. The core functions of Browser Use: automation, visual perception, multi-tab management, etc.
3. Quick start guide: installation, settings and integration with large language models
As technology develops rapidly, artificial intelligence (AI) has penetrated into every aspect of our lives, and the connection between AI agents and browsers is gradually becoming a new trend in the development of the Internet, triggering the public's infinite expectations for changes in the Internet experience. This innovative integration is like adding intelligent wings to traditional browsers, which will bring us unprecedented Internet interactive experience.
Features:
Powerful browser automation capabilities: Browser Use combines advanced AI capabilities with powerful browser automation technology to achieve a smooth and seamless web interaction experience for AI agents.
Visual Perception and HTML Structure Extraction: Combines visual understanding capabilities with HTML structure extraction capabilities to achieve comprehensive web page interactivity.
Multi-tab management: Automatically handle multiple browser tabs to accommodate complex workflows and parallel processing needs.
Element Tracking: Extract the XPath (path expression) of the clicked element and repeat the exact same Large Language Model (LLM) operation for consistent automation.
Custom actions: Add your own actions, such as saving to a file, performing database operations, sending notifications, or processing human input.
Self-correction: It has intelligent error handling mechanisms and automatic recovery functions to ensure the robust operation of automated workflows.
Support for any large language model: Compatible with all large language models based on LangChain, including GPT-4, Claude 3, and Llama 2.
Official website address:
https://browser-use.com/
1. Browser Use requires Python 3.11 or higher.
pip install browser-use
2. Install Playwright.
playwright install
3. Create an agent.
You can then use the agent as follows:
from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
from dotenv import load_dotenv
load_dotenv()
async def main () :
agent = Agent(
task= "Compare the price of gpt-4o and DeepSeek-V3" ,
llm=ChatOpenAI(model= "gpt-4o" ),
)
await agent.run()
asyncio.run(main())
4. Set up your Large Language Model (LLM) API key.
ChatOpenAI and other Langchain-based chat models require API keys. You can store these keys in your .env file.
OPENAI_API_KEY=
2. Browser Use + DeepSeek-R1
In this example, the author uses the DeepSeek-R1 model.
Visit DeepSeek's API open platform, purchase traffic, and create an API key.
https://platform.deepseek.com/
Sample script:
Put the imported agent (DeepSeek-R1) and API key in the same script file.
The operation steps are to open the shopping website, enter the account password to log in, view the product details, add to the shopping cart, and close the browser.
import asyncio
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from pydantic import SecretStr
from browser_use import Agent
# dotenv
load_dotenv()
api_key = os.getenv( 'DEEPSEEK_API_KEY' , 'sk-……)
if not api_key:
raise ValueError(' DEEPSEEK_API_KEY is not set ')
async def run_search():
agent = Agent(
task=(
' 1. Visit https://www.saucedemo.com/ '
' 2. Enter username standard_user, password secret_sauce, and log in '
' 3. Click on the black T-Shirt to view details '
' 4. Add black T-shirt to cart '
' 5. Close the browser '
),
llm=ChatOpenAI(
base_url=' https://api.deepseek.com/v1 ',
model=' deepseek-chat ',
api_key=SecretStr(api_key),
),
use_vision=False,
)
await agent.run()
if __name__ == ' __main__ ':
asyncio.run(run_search())
At runtime, AI identifies page elements:
Console log information:
Complete operation process: