Powerful AI research assistant Local Deep Research

Written by
Audrey Miles
Updated on:July-08th-2025
Recommendation

An AI-driven research assistant with deep iterative analysis, privacy protection, support for multiple LLMs and search integration.

Core content:
1. Project introduction and advanced research capabilities
2. Flexible LLM support and rich output options
3. Privacy priority and enhanced search integration
4. Local document search and example research: fusion energy development
5. Installation guide and dependency installation

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

Project Introduction

A powerful AI-driven research assistant that uses multiple LLMs and web searches for deep, iterative analysis. The system can run locally to protect privacy or be configured to use cloud-based LLMs for enhanced functionality.


characteristic

? Advanced research capabilities

    • Automatic in-depth research and intelligent follow-up questions
    • Citation tracking and source verification
    • Multiple iterations of analysis for comprehensive coverage
    • Full-text web page content analysis (not just snippets)
  •  ? Flexible LLM support

    • Local AI processing, using the Ollama model
    • Cloud LLM support (Claude, GPT)
    • Supports all Langchain models
    • Configurable model selection, according to needs
  •  ? Rich output options

    • Detailed research findings, with citations
    • Comprehensive research report
    • Quick summary for quick insights
    • Source tracking and verification
  • Privacy first 

    • Runs entirely on your machine, using local models
    • Configurable search settings
    • Transparent data processing
  • ? Enhanced search integration

    • Automatically select search sources: The smart "Automatic" search engine will intelligently analyze your query and select the most appropriate search engine based on the query content
    • Wikipedia factual knowledge integration
    • arXiv scientific papers and academic research collection
    • PubMed integrates biomedical literature and medical research
    • DuckDuckGo web search integration (may encounter rate limits)
    • SerpAPI integration into Google search results (API key required)
    • Google programmable search engine integration for custom search experience (API key required)
    • Guardian news articles and journalism integration (API key required)
    • Local RAG Search Private Documents - Search your documents using vector embeddings
    • Full-text web content retrieval
    • Source filtering and verification
    • Configurable search parameters
  • Local Document Search (RAG)

    • Local document search based on vector embedding
    • Create custom document collections on different topics
    • Privacy protection - your documents stay on your device
    • Smart chunking and retrieval
    • Compatible with various document formats (PDF, text, Markdown, etc.)
    • Automatically integrate with metasearch engines for unified querying



Example Study: Fusion Energy Development

The repository contains complete research examples that showcase the tool's capabilities. For example, our Fusion Energy research analysis provides the following comprehensive overview:

  • Latest scientific breakthroughs in nuclear fusion research (2022-2025)
  • Private sector funding exceeds $6 billion
  • Experts predict timeline for commercial fusion energy
  • Regulatory frameworks are developing for converged deployments
  • Technical challenges that must be overcome to achieve commercial viability


This example demonstrates the system’s ability to perform multiple research iterations, tracing evidence trails across scientific and commercial domains and synthesizing information from different sources while maintaining appropriate citations.


Install


   Clone the repository:

git clone https://github.com/yourusername/local-deep-research.gitcd local-deep-research
  1. Install dependencies:
pip install -r requirements.txt
  1. Install Ollama (for local models):
# Install Ollama from https://ollama.aiollama pull mistral # Default model - many work really well choose best for your hardware (fits in GPU)
  1. Configure environment variables:
# Copy the templatecp .env.template .env
# Edit .env with your API keys (if using cloud LLMs)ANTHROPIC_API_KEY=your-api-key-here   # For ClaudeOPENAI_API_KEY=your-openai-key-here   # For GPT modelsGUARDIAN_API_KEY=your-guardian-api-key-here   # For The Guardian search

use


Terminal usage (not recommended):

python main.py

Web interface
The project includes a web interface for a more user-friendly experience:

python app.py

This will start a local web server that you can access in your browser via http://127.0.0.1:5000 access.

Web interface features:

  • Dashboard: An intuitive interface for launching and managing research queries
  • Live Updates: Track research progress with real-time updates
  • Research History: Access and manage past inquiries
  • PDF Export: Download the completed research report as a PDF document
  • Study management: terminate ongoing study process or delete past records


 


Configuration



Please report your best settings in issues so we can improve the default settings.

Key settings in config.py:
The key setting is config.py middle:

# LLM ConfigurationDEFAULT_MODEL  =  "mistral"   # Change based on your needsDEFAULT_TEMPERATURE  =  0.7MAX_TOKENS  =  8000
# Search ConfigurationMAX_SEARCH_RESULTS  =  40SEARCH_REGION  =  "us-en"TIME_PERIOD  =  "y"SAFE_SEARCH  =  TrueSEARCH_SNIPPETS_ONLY  =  False
# Choose search tool: "wiki", "arxiv", "duckduckgo", "guardian", "serp", "local_all", or "auto"search_tool  =  "auto"   # "auto" will intelligently select the best search engine for your query


Local Archive Search (RAG)
The system includes powerful local document search capabilities using Retrieval Enhancement Generation (RAG). This allows you to search and retrieve content from your own document collections.



Set up local collection


Create a file called local_collections.py The files are in the project root directory:

# local_collections.pyimport  osfrom  typing  import  DictAny
# Registry of local document collectionsLOCAL_COLLECTIONS = {    # Research Papers Collection    "research_papers" : {        "name""Research Papers" ,        "description""Academic research papers and articles" ,        "paths" : [os.path.abspath( "local_search_files/research_papers" )],   # Use absolute paths        "enabled"True ,        "embedding_model""all-MiniLM-L6-v2" ,        "embedding_device""cpu" ,        "embedding_model_type""sentence_transformers" ,        "max_results"20 ,        "max_filtered_results"5 ,        "chunk_size"800 ,   # Smaller chunks for academic content        "chunk_overlap"150 ,        "cache_dir"".cache/local_search/research_papers"    },
    # Personal Notes Collection    "personal_notes" : {        "name""Personal Notes" ,        "description""Personal notes and documents" ,        "paths" : [os.path.abspath( "local_search_files/personal_notes" )],   # Use absolute paths        "enabled"True ,        "embedding_model""all-MiniLM-L6-v2" ,        "embedding_device""cpu" ,        "embedding_model_type""sentence_transformers" ,        "max_results"30 ,        "max_filtered_results"10 ,        "chunk_size"500 ,   # Smaller chunks for notes        "chunk_overlap"100 ,        "cache_dir"".cache/local_search/personal_notes"    }}
Create the directories  for  your collections:```bashmkdir -p local_search_files/research_papersmkdir -p local_search_files/personal_notes

Add your documents to these folders and they will be automatically indexed and made searchable.


Use Local Search


There are several ways you can use local search:

  1. Automatic selection: config.py Medium Settings search_tool = "auto" , the system will automatically use your local collection when the query is appropriate.

  2. Explicit selection: search_tool = "research_papers" Set to search only specific collections.

  3. Search all local collections: search_tool = "local_all" Set to search all your local document collections.

  4. Query syntax: Use collection:collection_name your query to specify a specific collection in the query.


Search Engine Options



The system supports multiple search engines, which can be changed by config.py In search_tool Variables to choose from:

  • Automatic ( auto ): smart search engine selector that analyzes your query and selects the most appropriate source (Wikipedia, arXiv, local collection, etc.)
  • Wikipedia ( wiki ): Best for general knowledge, facts, and overview information
  • arXiv( arxiv ): Great for scientific and academic research, access to preprints and papers
  • PubMed pubmed ): Ideal for biomedical literature, medical research and health information
  • DuckDuckGo duckduckgo ): Universal web search without API key
  • The Guardian guardian ): High-quality news and articles (API key required)
  • SerpAPI( serp ): Google search results (API key required)
  • Google's programmable search engine google_pse ): Customized search experience with control over search scope and domain (requires API key and search engine ID)
  • Local Collection: In your local_collections.py Any collections defined in the file