Dissecting DeepSearcher, a locally deployable deep research framework

Written by

Audrey Miles

Updated on:July-13th-2025

Understand DeepSearcher in one article: a complete analysis of its architecture, principles, and applications.
Recently, OpenAI's deep research results have attracted widespread attention in the AI field, and many similar tools have been launched, such as Perplexity Deep Research and Hugging Face's Open DeepResearch. Although these tools have different architectures and methods, they all want to output a detailed and organized report by repeatedly studying web pages or internal documents. Moreover, the underlying intelligent agents of these tools can automatically think about what to do in the intermediate steps.

Today I would like to introduce DeepSearcher, an open source project of Zilliz. DeepSearcher incorporates innovative concepts such as query routing and conditional execution flow, supports web crawling, and is presented as a Python library and command line tool. It is feature-rich and can handle multi-source documents. It can also configure embedding models and vector databases. Although it is not complex enough, it has performed well in the field of agent-augmented retrieval generation (RAG) technology, which has promoted the development of artificial intelligence applications.

The inference model relies on "inference extension" to improve the output quality, and frequent calls to large language models cause the inference bandwidth to become a bottleneck. We use the DeepSeek-R1 inference model on SambaNova custom hardware, and its output speed is twice as fast as that of its competitors (see figure).

SambaNova Cloud provides inference services for open source models such as Llama 3.x, Qwen2.5 and QwQ, and runs on the RDU custom chip, which is designed for efficient inference of generative AI models, reducing costs and increasing efficiency. For more details, please visit the official website (https://sambanova.ai/technology/sn40l-rdu-ai-chip).

1 DeepSearcher Architecture

The DeepSearcher architecture divides the research process into four steps: defining/refining the problem, researching, analyzing, and synthesizing. The steps overlap and work together to increase efficiency. The following will introduce each step step by step, and highlight the improvements made by DeepSearcher.

1.1 Defining and refining the problem: digging deep into the core of the query

Take the query “How has The Simpsons changed over time?” for example. DeepSearcher will break it down into multiple subqueries:

How has its cultural impact and social significance evolved from its first broadcast to now?
What changes have occurred in character development, humor, and storytelling between seasons?
How have animation styles and production techniques changed over time?
How have the audience, viewer response and ratings changed during the broadcast?

What is special about DeepSearcher is that the boundaries of research and refinement are not fixed. After the initial decomposition, the research process will further refine the problem as needed, flexibly adjust the research direction, accurately mine information, and lay a solid foundation for subsequent work.

1.2 Research and Analysis

After decomposing the query into subqueries, the agent’s research part begins. Roughly speaking, this part consists of four steps: routing, searching, reflection, and conditional repetition.

routing

Our database contains multiple tables or collections from different sources. It would be more efficient to restrict semantic search to sources that are relevant to the current query. The query router prompts the large language model to decide from which collections to retrieve information.

Here's how to construct a query routing hint:

def get_vector_db_search_prompt ( 
    question: str,
    collection_names: List[str],
    collection_descriptions: List[str],
    context: List[str] = None,
) :
    sections = []
    # General Tips
    common_prompt =  f"""You are an advanced AI question analyst. Use your reasoning skills and historical conversation information to accurately answer the following questions based on all existing datasets, and generate a suitable question for each dataset based on the dataset description that may be related to the question.

Question: {question}
"""
    sections.append(common_prompt)
    
    # Dataset Tips
    data_set = []
    for  i, collection_name  in  enumerate(collection_names):
        data_set.append( f" {collection_name} :  {collection_descriptions[i]} " )
    data_set_prompt =  f"""Below is all the dataset information. The format of dataset information is dataset name: dataset description.

Dataset and description:
"""
    sections.append(data_set_prompt +  "\n" .join(data_set))
    
    # Contextual Hints
    if  context:
        context_prompt =  f"""Below is a condensed version of the historical conversation. In this analysis, you need to combine this information to generate questions that are closer to the answer. You cannot generate the same or similar questions for the same dataset, nor can you regenerate questions for datasets that have been determined to be unrelated.

Historical dialogue:
"""
        sections.append(context_prompt +  "\n" .join(context))
    
    # Respond to prompts
    response_prompt =  f"""Based on the above, you can only select a few datasets from the following dataset list and generate suitable related questions for the selected datasets to solve the above problems. The output format is JSON, where the key is the dataset name and the value is the corresponding generated question.

Dataset:
"""
    sections.append(response_prompt +  "\n" .join(collection_names))
    
    footer =  """Only respond with valid JSON format that conforms to the exact JSON schema.

Key requirements:
- Contains only one type of operation
- Do not add unsupported keys
- Exclude all non-JSON text, Markdown formatted content or explanations
- Strictly follow JSON syntax"""
    sections.append(footer)
    return "\n\n" .join(sections)

Having a large language model return structured output in JSON format makes it easy to convert its output into a decision basis for the next action.

search

After selecting various database collections in the previous step, the search step uses Milvus for similarity search. The source data has been pre-specified, chunked, embedded, and stored in a vector database. For DeepSearcher, both local and online data sources must be manually specified. We leave online search as future work.

Reflection

DeepSearcher demonstrates a true form of agent reflexivity, where it feeds previous output as context into a prompt that “reflects” on the questions asked so far and the relevant chunks of text retrieved to see if there are any information gaps. This can be seen as an analysis step.

Here's how to create a prompt:

def get_reflect_prompt ( 
   question: str,
   mini_questions: List[str],
   mini_chuncks: List[str],
) :
    mini_chunk_str =  ""
    for  i, chunk  in  enumerate(mini_chuncks):
        mini_chunk_str +=  f"""<chunk_ {i} >\n {chunk} \n</chunk_ {i} >\n"""
    reflect_prompt =  f"""Determine if additional search queries are needed based on the original query, previous subqueries, and all retrieved document chunks. If further research is needed, provide a Python list of up to 3 search queries. If no further research is needed, return an empty list.

If the original query was to write a report, then you'd be better off generating some further queries rather than returning an empty list.

    Original query: {question}
    Previous subquery: {mini_questions}
    The relevant block of text: 
    {mini_chunk_str}
    """
   
    
    footer =  """Respond only with a valid string list format, do not include any other text."""
    return  reflect_prompt + footer

We again have our large language model return structured output, this time as data interpretable by Python.

Here is an example of a new subquery "discovered" through reflection after the initial subquery above has been answered:

How have the changes in voice actors and production team of The Simpsons during different seasons affected the development of the show?
What role have satire and social commentary on The Simpsons played in its adaptation to contemporary issues over the decades?
How has The Simpsons responded to and incorporated changes in media consumption, such as streaming services, into its distribution and content strategy?

Conditional repetition

DeepSearcher demonstrates conditional execution flow. After reflecting on whether the questions and answers so far are complete, the agent repeats the above steps if there are more questions to ask. Importantly, the execution flow (a while loop) depends on the output of the large language model and is not hard-coded. In this case, there are only two choices: repeat the research or generate a report. In a more complex agent, there may be more choices, such as: follow a hyperlink, retrieve a block of text, store in memory, reflect, etc. In this way, the agent continues to refine the questions as needed until it decides to exit the loop and generate a report. In our example about The Simpsons, DeepSearcher performed two more rounds of filling in information gaps with additional subqueries.

1.3 General

Finally, the fully decomposed question and the retrieved text blocks are combined into a report through prompts. Here is the code to create prompts:

def get_final_answer_prompt ( 
   question: str, 
   mini_questions: List[str],
   mini_chuncks: List[str],
) :
    mini_chunk_str =  ""
    for  i, chunk  in  enumerate(mini_chuncks):
        mini_chunk_str +=  f"""<chunk_ {i} >\n {chunk} \n</chunk_ {i} >\n"""
    summary_prompt =  f"""You are an AI content analysis expert who excels at summarizing content. Please summarize a specific and detailed answer or report based on the previous query and retrieved document blocks.

    Original query: {question}
    Previous subquery: {mini_questions}
    The relevant block of text: 
    {mini_chunk_str}
    """
    return  summary_prompt

1.4 Results

Here is an example report generated using DeepSeek-R1 for the query “How has The Simpsons changed over time?” using The Simpsons Wikipedia page as source material:

Report: The Evolution of The Simpsons (1989-Present)

Cultural Impact and Social Significance : The Simpsons premiered as a subversive critique of middle-class American life, rising to fame in the 1990s for its bold satire. Initially, it was a countercultural phenomenon that challenged traditional norms with episodes exploring themes such as religion, politics, and consumerism. Over time, its cultural influence waned as competitors such as South Park and Family Guy emerged. By the 2010s, the show had transformed from a trendsetter to a nostalgic classic that balances traditional charm while attempting to explore modern issues such as climate change and LGBTQ+ rights, although its social repercussions were less than they once were.
Conclusion : The Simpsons evolved from a radical satire to a television classic, weathering changes in technology, politics, and audience expectations. While its golden age is hard to top, its adaptability through streaming, newer humor, and global distribution has secured its place as a cultural icon. The show’s longevity reflects both nostalgia and a pragmatic acceptance of change, despite the challenges of staying relevant in a fragmented media landscape.