Powerful AI research assistant Local Deep Research

An AI-driven research assistant with deep iterative analysis, privacy protection, support for multiple LLMs and search integration.
Core content:
1. Project introduction and advanced research capabilities
2. Flexible LLM support and rich output options
3. Privacy priority and enhanced search integration
4. Local document search and example research: fusion energy development
5. Installation guide and dependency installation
Project Introduction
A powerful AI-driven research assistant that uses multiple LLMs and web searches for deep, iterative analysis. The system can run locally to protect privacy or be configured to use cloud-based LLMs for enhanced functionality.
characteristic
? Advanced research capabilities
Automatic in-depth research and intelligent follow-up questions Citation tracking and source verification Multiple iterations of analysis for comprehensive coverage Full-text web page content analysis (not just snippets) ? Flexible LLM support
Local AI processing, using the Ollama model Cloud LLM support (Claude, GPT) Supports all Langchain models Configurable model selection, according to needs ? Rich output options
Detailed research findings, with citations Comprehensive research report Quick summary for quick insights Source tracking and verification Privacy first
Runs entirely on your machine, using local models Configurable search settings Transparent data processing ? Enhanced search integration
Automatically select search sources: The smart "Automatic" search engine will intelligently analyze your query and select the most appropriate search engine based on the query content Wikipedia factual knowledge integration arXiv scientific papers and academic research collection PubMed integrates biomedical literature and medical research DuckDuckGo web search integration (may encounter rate limits) SerpAPI integration into Google search results (API key required) Google programmable search engine integration for custom search experience (API key required) Guardian news articles and journalism integration (API key required) Local RAG Search Private Documents - Search your documents using vector embeddings Full-text web content retrieval Source filtering and verification Configurable search parameters ? Local Document Search (RAG)
Local document search based on vector embedding Create custom document collections on different topics Privacy protection - your documents stay on your device Smart chunking and retrieval Compatible with various document formats (PDF, text, Markdown, etc.) Automatically integrate with metasearch engines for unified querying
Example Study: Fusion Energy Development
The repository contains complete research examples that showcase the tool's capabilities. For example, our Fusion Energy research analysis provides the following comprehensive overview:
Latest scientific breakthroughs in nuclear fusion research (2022-2025) Private sector funding exceeds $6 billion Experts predict timeline for commercial fusion energy Regulatory frameworks are developing for converged deployments Technical challenges that must be overcome to achieve commercial viability
This example demonstrates the system’s ability to perform multiple research iterations, tracing evidence trails across scientific and commercial domains and synthesizing information from different sources while maintaining appropriate citations.
Install
Clone the repository:
git clone https://github.com/yourusername/local-deep-research.gitcd local-deep-research
Install dependencies:
pip install -r requirements.txt
Install Ollama (for local models):
# Install Ollama from https://ollama.aiollama pull mistral # Default model - many work really well choose best for your hardware (fits in GPU)
Configure environment variables:
# Copy the template
cp .env.template .env
# Edit .env with your API keys (if using cloud LLMs)
ANTHROPIC_API_KEY=your-api-key-here # For Claude
OPENAI_API_KEY=your-openai-key-here # For GPT models
GUARDIAN_API_KEY=your-guardian-api-key-here # For The Guardian search
use
Terminal usage (not recommended):
python main.py
Web interface
The project includes a web interface for a more user-friendly experience:
python app.py
This will start a local web server that you can access in your browser via http://127.0.0.1:5000
access.
Web interface features:
Dashboard: An intuitive interface for launching and managing research queries Live Updates: Track research progress with real-time updates Research History: Access and manage past inquiries PDF Export: Download the completed research report as a PDF document Study management: terminate ongoing study process or delete past records
Configuration
Please report your best settings in issues so we can improve the default settings.
Key settings in config.py
:
The key setting is config.py
middle:
# LLM Configuration
DEFAULT_MODEL = "mistral" # Change based on your needs
DEFAULT_TEMPERATURE = 0.7
MAX_TOKENS = 8000
# Search Configuration
MAX_SEARCH_RESULTS = 40
SEARCH_REGION = "us-en"
TIME_PERIOD = "y"
SAFE_SEARCH = True
SEARCH_SNIPPETS_ONLY = False
# Choose search tool: "wiki", "arxiv", "duckduckgo", "guardian", "serp", "local_all", or "auto"
search_tool = "auto" # "auto" will intelligently select the best search engine for your query
Local Archive Search (RAG)
The system includes powerful local document search capabilities using Retrieval Enhancement Generation (RAG). This allows you to search and retrieve content from your own document collections.
Set up local collection
Create a file called local_collections.py
The files are in the project root directory:
# local_collections.py
import os
from typing import Dict , Any
# Registry of local document collections
LOCAL_COLLECTIONS = {
# Research Papers Collection
"research_papers" : {
"name" : "Research Papers" ,
"description" : "Academic research papers and articles" ,
"paths" : [os.path.abspath( "local_search_files/research_papers" )], # Use absolute paths
"enabled" : True ,
"embedding_model" : "all-MiniLM-L6-v2" ,
"embedding_device" : "cpu" ,
"embedding_model_type" : "sentence_transformers" ,
"max_results" : 20 ,
"max_filtered_results" : 5 ,
"chunk_size" : 800 , # Smaller chunks for academic content
"chunk_overlap" : 150 ,
"cache_dir" : ".cache/local_search/research_papers"
},
# Personal Notes Collection
"personal_notes" : {
"name" : "Personal Notes" ,
"description" : "Personal notes and documents" ,
"paths" : [os.path.abspath( "local_search_files/personal_notes" )], # Use absolute paths
"enabled" : True ,
"embedding_model" : "all-MiniLM-L6-v2" ,
"embedding_device" : "cpu" ,
"embedding_model_type" : "sentence_transformers" ,
"max_results" : 30 ,
"max_filtered_results" : 10 ,
"chunk_size" : 500 , # Smaller chunks for notes
"chunk_overlap" : 100 ,
"cache_dir" : ".cache/local_search/personal_notes"
}
}
Create the directories for your collections:
```bash
mkdir -p local_search_files/research_papers
mkdir -p local_search_files/personal_notes
Add your documents to these folders and they will be automatically indexed and made searchable.
Use Local Search
There are several ways you can use local search:
Automatic selection:
config.py
Medium Settingssearch_tool = "auto"
, the system will automatically use your local collection when the query is appropriate.Explicit selection:
search_tool = "research_papers"
Set to search only specific collections.Search all local collections:
search_tool = "local_all"
Set to search all your local document collections.Query syntax: Use
collection:collection_name your query
to specify a specific collection in the query.
Search Engine Options
The system supports multiple search engines, which can be changed by config.py
In search_tool
Variables to choose from:
Automatic ( auto ): smart search engine selector that analyzes your query and selects the most appropriate source (Wikipedia, arXiv, local collection, etc.) Wikipedia ( wiki
): Best for general knowledge, facts, and overview informationarXiv( arxiv
): Great for scientific and academic research, access to preprints and papersPubMed pubmed
): Ideal for biomedical literature, medical research and health informationDuckDuckGo duckduckgo
): Universal web search without API keyThe Guardian guardian
): High-quality news and articles (API key required)SerpAPI( serp
): Google search results (API key required)Google's programmable search engine google_pse
): Customized search experience with control over search scope and domain (requires API key and search engine ID)Local Collection: In your local_collections.py
Any collections defined in the file