Alibaba open-sources WebDancer to solve complex information retrieval problems in DeepResearch

Written by
Silas Grey
Updated on:June-18th-2025
Recommendation

Explore how Alibaba revolutionizes deep research information retrieval through WebDancer.
Core content:
1. The challenges of high-quality datasets and reliable trajectory construction faced by Deep Research
2. WebDancer end-to-end autonomous information retrieval agent construction paradigm
3. WebDancer's outstanding performance in benchmarks and its core advantages

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

This official account mainly focuses on cutting-edge AI technologies such as NLP, CV, LLM, RAG, Agent, etc., and shares industry practical cases and courses for free to help you fully embrace AIGC.


Problems facing Deep Research

  • High-quality datasets:
    • Most existing QA datasets are shallow and cannot meet the needs of multi-step reasoning. It is necessary to build high-quality, fine-grained browsing data that can reflect diverse user intentions and rich interaction contexts.
  • Reliable trajectory construction
    • Building reliable trajectories that support long-term reasoning and task decomposition
  • Scalable and generalizable training strategies
    • Design scalable and generalizable training strategies that enable agents to behave robustly in out-of-distribution network environments, complex interaction patterns, and long-term goals

WebDancer

From the perspective of data and training phase, we propose an end-to-end paradigm for building an autonomous information retrieval agent through four key phases:

  • Browse data structures
    • CRAWLQA: crawls web pages of knowledge websites, imitates human browsing behavior, recursively visits sub-pages, and uses GPT-4o to generate QA pairs based on the collected content.
    • This approach can capture rich background knowledge and provide a basis for the construction of complex problems.
    • E2HQA: Start with simple QA pairs and gradually increase the complexity of questions by searching and rewriting them.
    • This approach can motivate the agent to gradually transition from simple tasks to complex tasks and improve its reasoning ability.
  • Trajectory sampling
    • Based on the ReAct framework, agents interact through Thought-Action-Observation rounds.
    • By rejecting the samples, we combine the short-chain reasoning (Short-CoT) and long-chain reasoning (Long-CoT) strategies to generate high-quality trajectories.
    • A three-stage filtering framework is adopted: validity control, correctness verification, and quality assessment to ensure the high quality of the trajectory.
  • Supervised Fine-tuning (SFT) for Efficient Cold Start
    • Using synthetic trajectory data, the agent is fine-tuned for multi-step reasoning tasks.
    • By shielding the loss contribution of external feedback, interference in the learning process is avoided, improving performance and robustness.
  • Optimizing the agent’s decision-making and generalization capabilities through reinforcement learning (RL)
    • The DAPO algorithm is used to optimize the agent's decision-making process through a dynamic sampling mechanism, thereby improving its generalization ability in real-world network environments.

Experimental Results

In the experiments, WebDancer performs well on two benchmarks: GAIA and WebWalkerQA.

  • In the Level 1, Level 2, and Level 3 tests of GAIA, WebDancer achieved pass rates of 41.0%, 30.7%, and 0%, respectively, significantly outperforming other open source frameworks, indicating that WebDancer has significant advantages in handling complex information retrieval tasks.

The core of WebDancer is to enable agents to perform well in dynamic and changing network environments through high-quality data and effective training methods.