Developed in 4 days, $17 million in financing: Why is the open source Browser Use so popular?

Written by

Jasper Cole

Updated on:July-08th-2025

What is the essence of the Internet?

Some say it is the flow of information, some say it is connecting everything.

But if you ask me to give an answer, I would say: efficiently obtain information and automate repetitive scenarios .

Browser Use, developed in 4 days and raised $17 million in funding, vividly illustrates this first principle.

Browser Use is an open source project jointly created by ETH Zurich alumni Magnus Müller and Gregor Zunic, and is almost synonymous with "quick success".

Within a few months, the number of GitHub stars exceeded 50,000 . Following Manus's success, venture capital circles rushed to place bets, and AI+browser automation agents became popular.

The core of Browser Use is actually very simple: let AI use the browser like a human.

There are many automation tools on the market, but they are still at the stage of mechanically simulating clicks or filling out forms. Browser Use, on the other hand, allows AI to “understand” web pages and then think and operate like a real person .

For example, if you ask it “help me find three most cost-effective hotels in Dali, Yunnan,” traditional AI will at most give you a link and do its job perfunctorily.

However, Browser Use will directly open booking.com, flip through the pages, compare ratings and prices, and finally present a detailed report. This closed-loop capability from "instruction reception" to "task execution" accurately hits the pain points of AI implementation.

What's even more outrageous is its price and openness. It is open source, and the MIT protocol can be modified and used at will. It also supports switching between multiple models such as Claude, GPT, and Gemini, and is not tied to any supplier. The closed-source competitor OpenAI Operator costs $200!

The three essential killer features that made it popular - open source, multiple models, and low cost - made it unmatched in the automation field.

The technical architecture of Browser Use is not complicated, but it is smart enough to amaze people. The whole system is supported by a four-layer architecture, from web page perception to operation execution, and each layer directly addresses the core problems of Web automation.

The DOM layer is the eyes of AI. AI recognizes buttons and text on web pages and intelligently determines whether “this button can be clicked now” or “this element is blocked”.

From the source code, we can see that it uses a flat hash map instead of the traditional tree structure, which maximizes the serialization and search efficiency.

The Browser layer is responsible for the hands-on work. It is essentially based on the Playwright library.

The Controller layer is the "translator" that turns AI's decisions into specific actions. Its decorator pattern design is extremely elegant. If you want to add a new function, such as screenshot, you only need three lines of code to do it, and the scalability is fully maximized.

The last layer is the Agent layer, which is a classic “observe-think-act” cycle. It calls the language model to make judgments and can adjust the strategy based on the results after executing the action.

The four layers are closely linked to each other, solving the whole link problem from perception to execution. No wonder some people say that Browser Use is "using the least code to do the most work".

Of course, venture capitalists never just pay for technology, they still buy market potential.

Browser Use's $17 million financing is a bet on a trillion-dollar market .

There are 600 million knowledge workers in the world, and they spend 25%-40% of their time on repetitive tasks each week, half of which are related to web operations. If you can save 5 hours a week, at an hourly wage of $25, this is a $750 billion/year supermarket.

Not to mention enterprise-level scenarios: automatic capture of sales leads, real-time monitoring of competitor prices, media sentiment tracking, recruitment resume screening... each of them is a rigid demand.

The open source strategy is even more clever.

50,000 GitHub stars are just the starting point. With the influx of the developer community, code contributions and functional iterations will grow exponentially, forming a technological moat.

In comparison, it is hard for closed-source competitors to catch up. Venture capitalists are attracted by this "low-cost takeoff and high-barrier landing" approach.

The core logic of Browser Use is simple enough to be reproduced with 100 lines of code.

I prepared a minimalist version with reference to the source code. Based on Playwright and OpenAI, the core is an "observe-think-act" loop. After startup, it will open the browser, grab the page status, and give it to the AI to decide the next step.

import asyncio from playwright.async_api import async_playwright from openai import OpenAI import json client = OpenAI() # Replace with your AI model client async def main(): # Initialize the browser async with async_playwright() as p: browser = await p.chromium.launch(headless=False) page = await browser.new_page() # Navigate to the target website await page.goto("https://example.com") # Main loop for step in range(5): # Execute up to 5 steps # 1. Observation: Get page state state = await get_page_state(page) # 2. Thinking: Let the AI decide the next step next_action = await get_ai_decision(state) print(f"AI decision: {next_action}") # If the AI decides to complete the task, exit the loop if next_action["action"] == "done": print(f"Task completed: {next_action.get('reason', '')}") break # 3. Action: Execute AI decision await execute_action(page, next_action) await page.wait_for_load_state("networkidle") await browser.close() async def get_page_state(page): """Get the current page state""" # Get basic information of the page url = page.url title = await page.title() # Get interactive elements elements = await page.evaluate("""() => { const results = []; const all = document.querySelectorAll('a, button, input, select, textarea'); all.forEach((el, index) => { const rect = el.getBoundingClientRect(); // Only collect visible elements if (rect.width > 0 && rect.height > 0) { results.push({ index: index, tag: el.tagName.toLowerCase(), text: el.innerText || el.value || '', type: el.type || '', id: el.id || '', name: el.name || '', className: el.className || '' }); } }); return results; }""") # Format the list of elements to be readable elements_text = "" for el in elements: elements_text += f"[{el['index']}] <{el['tag']}> {el['text']} {el['id']}\n" return { "url": url, "title": title, "elements": elements, "elements_text": elements_text } async def get_ai_decision(state): """Use AI to decide the next action""" prompt = f""" You are a browser automation assistant. Decide what to do next based on the current page state. Current URL: {state['url']} Page title: {state['title']} Interactive elements: {state['elements_text']} Please select an action from the following: 1. : click - click an element, takes arguments: index (integer) 2. input - input text, takes arguments: index (integer), text (string) 3. goto - navigate to a URL, takes arguments: url (string) 4. done - completed the task, takes arguments: reason (string) Returns the decision in JSON format, for example: {"action": "click", "index": 2} """ response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"} ) # Parse the AI response try: return json.loads(response.choices[0].message.content) except: # Fallback strategy in case of parsing failure return {"action": "done", "reason": "Unable to parse AI response"} async def execute_action(page, action): """Execute the action decided by the AI""" if action["action"] == "click": # Get the element and click on it elements = await page.evaluate("""() => { const results = []; const all = document.querySelectorAll('a, button, input, select, textarea'); all.forEach((el, index) => { const rect = el.getBoundingClientRect(); if (rect.width > 0 && rect.height > 0) { results.push({ index: index, rect: {x: rect.x, y: rect.y, width: rect.width, height: rect.height} }); } }); return results; }""") for el in elements: if el["index"] == action["index"]: # Click the center of the element x = el["rect"]["x"] + el["rect"]["width"] / 2 y = el["rect"]["y"] + el["rect"]["height"] / 2 await page.mouse.click(x, y) return elif action["action"] == "input": # Click the input box first, then enter the text await execute_action(page, {"action": "click", "index": action["index"]}) await page.keyboard.type(action["text"]) elif action["action"] == "goto": # Navigate to the specified URL await page.goto(action["url"]) if __name__ == "__main__": asyncio.run(main())

Of course, the above are just techniques. In the AI era, real innovation may not be to create new wheels, but to assemble existing technologies into tools to solve pain points. Solving needs and problems is the real way.

Browser Use also proved in 4 days that as long as the right scenario is found, simple technology can also leverage a billion-dollar market.

perhaps,

The next game-changing AI application is hidden in your and my inspiration, waiting to be written in 4 days.