The era of self-built DeepSeek has arrived. How to achieve efficient online search?

Written by
Clara Bennett
Updated on:July-15th-2025
Recommendation

In the era of self-built DeepSeek, low-cost enterprise-level intelligent question-answering systems are realized.

Core content:
1. DeepSeek technology innovation, greatly reducing the cost of self-built intelligent question-answering systems for enterprises
2. Higress cloud-native API gateway, a multi-functional Swiss Army knife that enhances LLM with zero code
3. Network search technology implementation and scenario value, multi-engine intelligent diversion and core idea analysis

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

With the emergence of high-quality open source models such as DeepSeek, the cost of building an intelligent question-answering system for enterprises has been reduced by more than 90%. Models based on 7B/13B parameters can achieve commercial-grade response effects on regular GPU servers. With the enhanced capabilities of the Higress open source AI gateway, developers can quickly build an intelligent question-answering system with real-time online search capabilities.

02

Higress: The Swiss Army Knife of Zero-Code Enhanced LLM

Cloud Native

As a cloud-native API gateway, Higress provides out-of-the-box AI enhancement capabilities through wasm plugins:

Main capability matrix:
  • Online search: real-time access to the latest information on the Internet

  • Intelligent routing: multi-model load balancing and automatic backup

  • Security protection: sensitive word filtering and injection attack defense

  • Performance optimization: request cache + token quota management

  • Observability: Full-link monitoring and audit logs
03

Technical realization and scenario value of online search

Cloud Native

The Higress AI search enhancement plug-in code has been open sourced. You can click here to view the plug-in documentation and code.

Core architecture analysis

Key technical features
1. Multi-engine intelligent traffic diversion
  • Public search (Google/Bing/Quark) to obtain real-time information

  • Academic search (Arxiv) docking scientific research scenarios

  • Private search (Elasticsearch) connects corporate/personal knowledge base

2. Core ideas for search enhancement

  • LLM rewrites Query: Based on LLM, it identifies user intent and generates search commands, which can greatly improve the search enhancement effect.

  • Keyword extraction: Different prompt words need to be generated for different engines. For example, most papers in Arxiv are in English, so keywords need to be in English.

  • Field identification: Taking Arxiv as an example, Arxiv divides different disciplines into different fields such as computer science/physics/mathematics/biology. Searching in a specific field can improve search accuracy.

  • Long query splitting: Long queries can be split into multiple short queries to improve search efficiency

  • High-quality data: Google/Bing/Arxiv searches can only output article summaries, but by connecting to Quark search based on Alibaba Cloud Information Retrieval, the full text can be obtained, which can improve the quality of LLM-generated content.


Typical application scenario effect display

Financial Information Q&A

Cutting-edge technology exploration

Medical Questions Answered


04

From open source to implementation: three steps to build an intelligent question-answering system

Cloud Native


1. Basic deployment
# One line of command to install and start Higress gatewaycurl -sShttps://higress.cn/ai-gateway/install.sh | bash
# Use vllm to deploy DeepSeek-R1-Distill-Qwen-7Bpython3 -m vllm.entrypoints.openai.api_server --model=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --dtype=half --tensor-parallel-size=4 --enforce-eager

2. Plugin Configuration

You can access the higress console through http://127.0.0.1:8001 and configure the ai-search plug-in as follows.

plugins:searchFrom:- type: quarkapiKey: "your-aliyun-ak"keySecret: "your-aliyun-sk" serviceName: "aliyun-svc.dns" servicePort: 443- type: googleapiKey: "your-google-api-key"cx: "search-engine-id" serviceName: "google-svc.dns" servicePort: 443- type: bingapiKey: "bing-key"serviceName: "bing-svc.dns"servicePort: 443- type: arxivserviceName: "arxiv-svc.dns" servicePort: 443searchRewrite:llmServiceName: "llm-svc.dns"llmServicePort: 443llmApiKey: "your-llm-api-key"llmUrl: "https://api.example.com/v1/chat/completions"llmModelName: "deepseek-chat" timeoutMillisecond: 15000

3. Connect to SDK or front-end

Use this OpenAI protocol BaseUrl: http://127.0.0.1:8080/v1, and you can use ChatBox/LobeChat and other conversation tools that support the OpenAI protocol to have conversations.

You can also use OpenAI's SDK to connect directly, as shown below:

import jsonfrom openai import OpenAI
client = OpenAI(api_key="none",base_url="http://localhost:8080/v1",)
completion = client.chat.completions.create(model="deepseek-r1",messages=[{"role": "user", "content": "Analyze the trend of international gold prices"}],stream=False)
print(completion.choices[0].message.content)
Through the open source combination of Higress+DeepSeek, enterprises can complete the deployment of intelligent question-answering systems from zero to production level within 24 hours, making LLM a truly intelligent engine for business growth.