The era of self-built DeepSeek has arrived. How to achieve efficient online search?

Written by

Clara Bennett

Updated on:July-15th-2025

With the emergence of high-quality open source models such as DeepSeek, the cost of building an intelligent question-answering system for enterprises has been reduced by more than 90%. Models based on 7B/13B parameters can achieve commercial-grade response effects on regular GPU servers. With the enhanced capabilities of the Higress open source AI gateway, developers can quickly build an intelligent question-answering system with real-time online search capabilities.

Higress: The Swiss Army Knife of Zero-Code Enhanced LLM

Cloud Native

As a cloud-native API gateway, Higress provides out-of-the-box AI enhancement capabilities through wasm plugins:

Main capability matrix:

Online search: real-time access to the latest information on the Internet

Intelligent routing: multi-model load balancing and automatic backup

Security protection: sensitive word filtering and injection attack defense

Performance optimization: request cache + token quota management

Observability: Full-link monitoring and audit logs

Technical realization and scenario value of online search

Cloud Native

The Higress AI search enhancement plug-in code has been open sourced. You can click here to view the plug-in documentation and code.

Core architecture analysis

Key technical features

1. Multi-engine intelligent traffic diversion

Public search (Google/Bing/Quark) to obtain real-time information

Academic search (Arxiv) docking scientific research scenarios

Private search (Elasticsearch) connects corporate/personal knowledge base

2. Core ideas for search enhancement

LLM rewrites Query: Based on LLM, it identifies user intent and generates search commands, which can greatly improve the search enhancement effect.

Keyword extraction: Different prompt words need to be generated for different engines. For example, most papers in Arxiv are in English, so keywords need to be in English.

Field identification: Taking Arxiv as an example, Arxiv divides different disciplines into different fields such as computer science/physics/mathematics/biology. Searching in a specific field can improve search accuracy.

Long query splitting: Long queries can be split into multiple short queries to improve search efficiency

High-quality data: Google/Bing/Arxiv searches can only output article summaries, but by connecting to Quark search based on Alibaba Cloud Information Retrieval, the full text can be obtained, which can improve the quality of LLM-generated content.

Typical application scenario effect display

Financial Information Q&A

Cutting-edge technology exploration

Medical Questions Answered

From open source to implementation: three steps to build an intelligent question-answering system

Cloud Native

1. Basic deployment

# One line of command to install and start Higress gatewaycurl -sShttps://higress.cn/ai-gateway/install.sh | bash
# Use vllm to deploy DeepSeek-R1-Distill-Qwen-7Bpython3 -m vllm.entrypoints.openai.api_server --model=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --dtype=half --tensor-parallel-size=4 --enforce-eager

2. Plugin Configuration

You can access the higress console through http://127.0.0.1:8001 and configure the ai-search plug-in as follows.

plugins:searchFrom:- type: quarkapiKey: "your-aliyun-ak"keySecret: "your-aliyun-sk" serviceName: "aliyun-svc.dns" servicePort: 443- type: googleapiKey: "your-google-api-key"cx: "search-engine-id" serviceName: "google-svc.dns" servicePort: 443- type: bingapiKey: "bing-key"serviceName: "bing-svc.dns"servicePort: 443- type: arxivserviceName: "arxiv-svc.dns" servicePort: 443searchRewrite:llmServiceName: "llm-svc.dns"llmServicePort: 443llmApiKey: "your-llm-api-key"llmUrl: "https://api.example.com/v1/chat/completions"llmModelName: "deepseek-chat" timeoutMillisecond: 15000

3. Connect to SDK or front-end

Use this OpenAI protocol BaseUrl: http://127.0.0.1:8080/v1, and you can use ChatBox/LobeChat and other conversation tools that support the OpenAI protocol to have conversations.

You can also use OpenAI's SDK to connect directly, as shown below:


import jsonfrom openai import OpenAI
client = OpenAI(api_key="none",base_url="http://localhost:8080/v1",)
completion = client.chat.completions.create(model="deepseek-r1",messages=[{"role": "user", "content": "Analyze the trend of international gold prices"}],stream=False)
print(completion.choices[0].message.content)

Through the open source combination of Higress+DeepSeek, enterprises can complete the deployment of intelligent question-answering systems from zero to production level within 24 hours, making LLM a truly intelligent engine for business growth.