2025: The era of agent intelligence is coming - Understanding OpenHands that thinks autonomously

Written by
Iris Vance
Updated on:July-08th-2025
Recommendation

Explore the future of AI Agents and witness how OpenHands leads a new era of autonomous thinking.

Core content:
1. Autonomous operation and collaboration capabilities of OpenHands agents
2. In-depth analysis of technical principles and project architecture
3. Future trends and challenges of AI application layer

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

04Computer startup boy, AI algorithm engineer, one-man company, AI project cashed in 5 figures!

Openhands Smarter Agent

Let’s talk about some hard stuff today.

I have predicted before that "2025 will be the year when AI Agent explodes"

Today I will share an open source project about Agent intelligence, OpenHands---in short, it is an Agent with more freedom

Compared with traditional agents (which can only call APIs to check weather and news), OpenHands has greater freedom:

  1. Can operate computers autonomously: OpenHands can operate computers like humans, including opening applications, writing code, command line operations, web browsing, etc.

  2. Multi-agent collaboration is possible: multiple OpenHands agents can work together to complete complex tasks

  3. Have memory and planning abilities: Able to remember previous operations and experiences and formulate reasonable action plans accordingly, and monitor and optimize in real time as the plan progresses.

Before talking about technology, let’s talk about opinions

First of all, in terms of the model layer, I think we have reached the peak of a small stage, and the next breakthrough may take a long time.

In addition, AWS and even Alibaba Cloud have begun to reduce developer costs from the cloud service level, so the overall progress of Agent should continue to accelerate.

Simple AI applications may find it increasingly difficult to survive in the future, and everyone’s overall expectations for AI applications will be raised by more and more agents.

There will be more and more "Agent+" products, and it is expected to become the basic architecture of the AI ​​application layer in the future.

Secondly, some developers said that Agent will be directly integrated into the model layer, which is also possible in the future, but Agent+ has disadvantages, that is, the time cost and token consumption are extremely high.

The technical principle of OpenHands

  • Event Stream Architecture: Manages the interaction between an agent and its environment based on event streams, including the agent's actions and the environment's observations.
  • Docker Sandbox: A secure and isolated Docker container sandbox is started for each task session, and all actions are performed in the sandbox.
  • Action execution API: The API server runs in the Docker sandbox and handles actions such as command execution, Python code execution, and web browsing.
  • Arbitrary Docker image support: Supports running the agent in any operating system and software environment, based on the runtime implementation of any Docker image.
  • Agent Skills: The AgentSkills library provides some practical functions that basic tools cannot achieve, such as file editing, document reading, etc.
  • Multi-agent delegation: supports one agent to delegate a specific subtask to another agent for execution, thus achieving collaboration among multiple agents.

Project Architecture

The overall architecture diagram is as follows. The system is mainly divided into two parts. One part is the web front end, which is responsible for handling the interaction with the user and displaying the results. The back end is responsible for processing the business logic and executing the agent. When the project is deployed, the front end is compiled into static HTML, and the server is implemented in Python, which listens to port 3000 by default and is a web server. When the user creates a new project or a new session, the Python back end service will start a docker container to process the project.

The front-end architecture diagram is as follows

There are three main pages on the front end. One is the settings page, which is mainly related to the settings of the big model; one is the chat page, which is responsible for processing user requests and calling the big model; the other page is the vscode page.

The server architecture is as follows:

The core is to accept requests and then create a session. When the user sends a request, the Agent analyzes the user request, creates a Plan, and creates a CommandManager. At the first time, a Docker sandbox will be created. For example, command execution and file creation will be sent to docker for execution through CommandManager. For each session, there will be a State to maintain the interaction of the entire cycle. Including the Plan of this task, the incremented interaction id, the Observation list of background behavior, the behavior execution history, and the update information list. For each command issued, there will be an Observer to monitor the execution of the command.