Table of Content
Microsoft open-sources OmniParser V2, turning DeepSeek-R1 into AI Agents using computers~

Updated on:July-12th-2025
Recommendation
Microsoft's latest technology turns any LLM into an AI operator.
Core content:
1. OmniParser V2 is open source, turning LLM into AI Agents
2. Supports models such as DeepSeek R1 and optimizes UI Agent performance
3. Significant performance improvement, reducing latency by 60%
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
Microsoft released and open-sourced OmniParser V2 on its official website , which can turn any LLM into an agent capable of using a computer. GPT-4o, DeepSeek R1 , Sonnet 3.5, Qwen, etc. can be enabled to understand the content on the screen and take relevant actions.
A dataset of interactive icon detections from popular web pages that are automatically annotated to highlight clickable and actionable areas; Icon Description Dataset aims to associate each UI element with its corresponding functionality.
Bigger, clearer icon titles + base datasets 60% latency improvement compared to V1. Average latency: 0.6 seconds/frame on the A100, 0.8 seconds on a single 4090. Strong performance: Average accuracy on ScreenSpot Pro was 39.6 Agents only need one tool: OmniTool. Control a Windows 11 VM with OmniParser + a visual model of your choice. OmniTool supports the following large language models out of the box - OpenAI (4o/o1/o3-mini), DeepSeek (R1) , Qwen (2.5VL), or Anthropic Computer Use.
https://huggingface.co/microsoft/OmniParser-v2.0https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/https://github.com/microsoft/OmniParser/tree/masterdemo: http://hf.co/spaces/microsoft/OmniParser-v2