The era of self-built DeepSeek has arrived. How to achieve efficient online search?

In the era of self-built DeepSeek, low-cost enterprise-level intelligent question-answering systems are realized.
Core content:
1. DeepSeek technology innovation, greatly reducing the cost of self-built intelligent question-answering systems for enterprises
2. Higress cloud-native API gateway, a multi-functional Swiss Army knife that enhances LLM with zero code
3. Network search technology implementation and scenario value, multi-engine intelligent diversion and core idea analysis
With the emergence of high-quality open source models such as DeepSeek, the cost of building an intelligent question-answering system for enterprises has been reduced by more than 90%. Models based on 7B/13B parameters can achieve commercial-grade response effects on regular GPU servers. With the enhanced capabilities of the Higress open source AI gateway, developers can quickly build an intelligent question-answering system with real-time online search capabilities.
Higress: The Swiss Army Knife of Zero-Code Enhanced LLM
Cloud Native
As a cloud-native API gateway, Higress provides out-of-the-box AI enhancement capabilities through wasm plugins:
Online search: real-time access to the latest information on the Internet
Intelligent routing: multi-model load balancing and automatic backup
Security protection: sensitive word filtering and injection attack defense
Performance optimization: request cache + token quota management
Observability: Full-link monitoring and audit logs
Technical realization and scenario value of online search
Cloud Native
Public search (Google/Bing/Quark) to obtain real-time information
Academic search (Arxiv) docking scientific research scenarios
Private search (Elasticsearch) connects corporate/personal knowledge base
2. Core ideas for search enhancement
LLM rewrites Query: Based on LLM, it identifies user intent and generates search commands, which can greatly improve the search enhancement effect.
Keyword extraction: Different prompt words need to be generated for different engines. For example, most papers in Arxiv are in English, so keywords need to be in English.
Field identification: Taking Arxiv as an example, Arxiv divides different disciplines into different fields such as computer science/physics/mathematics/biology. Searching in a specific field can improve search accuracy.
Long query splitting: Long queries can be split into multiple short queries to improve search efficiency
High-quality data: Google/Bing/Arxiv searches can only output article summaries, but by connecting to Quark search based on Alibaba Cloud Information Retrieval, the full text can be obtained, which can improve the quality of LLM-generated content.
Financial Information Q&A
Medical Questions Answered
From open source to implementation: three steps to build an intelligent question-answering system
Cloud Native
# One line of command to install and start Higress gateway
curl -sShttps://higress.cn/ai-gateway/install.sh | bash
# Use vllm to deploy DeepSeek-R1-Distill-Qwen-7B
python3 -m vllm.entrypoints.openai.api_server --model=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --dtype=half --tensor-parallel-size=4 --enforce-eager
You can access the higress console through http://127.0.0.1:8001 and configure the ai-search plug-in as follows.
plugins:searchFrom:- type: quarkapiKey: "your-aliyun-ak"keySecret: "your-aliyun-sk" serviceName: "aliyun-svc.dns" servicePort: 443- type: googleapiKey: "your-google-api-key"cx: "search-engine-id" serviceName: "google-svc.dns" servicePort: 443- type: bingapiKey: "bing-key"serviceName: "bing-svc.dns"servicePort: 443- type: arxivserviceName: "arxiv-svc.dns" servicePort: 443searchRewrite:llmServiceName: "llm-svc.dns"llmServicePort: 443llmApiKey: "your-llm-api-key"llmUrl: "https://api.example.com/v1/chat/completions"llmModelName: "deepseek-chat" timeoutMillisecond: 15000
Use this OpenAI protocol BaseUrl: http://127.0.0.1:8080/v1, and you can use ChatBox/LobeChat and other conversation tools that support the OpenAI protocol to have conversations.
You can also use OpenAI's SDK to connect directly, as shown below:
import json
from openai import OpenAI
client = OpenAI(
api_key="none",
base_url="http://localhost:8080/v1",
)
completion = client.chat.completions.create(
model="deepseek-r1",
messages=[
{"role": "user", "content": "Analyze the trend of international gold prices"}
],
stream=False
)
print(completion.choices[0].message.content)