Microsoft open-sources OmniParser V2, turning DeepSeek-R1 into AI Agents using computers~

Written by
Jasper Cole
Updated on:July-12th-2025
Recommendation

Microsoft's latest technology turns any LLM into an AI operator.

Core content:
1. OmniParser V2 is open source, turning LLM into AI Agents
2. Supports models such as DeepSeek R1 and optimizes UI Agent performance
3. Significant performance improvement, reducing latency by 60%

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Microsoft released and open-sourced OmniParser  V2 on its official website , which can turn any LLM into an agent capable of using a computer. GPT-4o, DeepSeek R1 , Sonnet 3.5, Qwen, etc. can be enabled to understand the content on the screen and take relevant actions.



OmniParser  is a general-purpose screen parsing tool that interprets/converts UI screenshots into a structured format to improve existing  LLM-based UI Agents .
The training dataset includes:
  • A dataset of interactive icon detections from popular web pages that are automatically annotated to highlight clickable and actionable areas;
  • Icon Description Dataset aims to associate each UI element with its corresponding functionality.
The model center contains a fine-tuned version of YOLOv8 and a fine-tuned base model of Florence-2 based on the above datasets.
What’s new in OmniParser V2?
  • Bigger, clearer icon titles + base datasets 60% latency improvement compared to V1.
  • Average latency: 0.6 seconds/frame on the A100, 0.8 seconds on a single 4090.
  • Strong performance: Average accuracy on ScreenSpot Pro was 39.6
  • Agents only need one tool: OmniTool. Control a Windows 11 VM with OmniParser + a visual model of your choice. OmniTool supports the following large language models out of the box - OpenAI (4o/o1/o3-mini), DeepSeek (R1) , Qwen (2.5VL), or Anthropic Computer Use.
https://huggingface.co/microsoft/OmniParser-v2.0https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/https://github.com/microsoft/OmniParser/tree/masterdemo: http://hf.co/spaces/microsoft/OmniParser-v2