Is it necessary to deploy large models locally?

Written by
Silas Grey
Updated on:June-29th-2025
Recommendation

Explore the pros and cons and application scenarios of local deployment of large models.

Core content:
1. Necessity analysis of local deployment of large models
2. Diversification of the use of large model capabilities
3. Applicable scenarios and technical considerations of different deployment solutions

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)

1. Local deployment of large models

    Some time ago, a friend asked me whether it is necessary to deploy DeepSeek locally. This question seems simple, but it is actually not easy to answer. Personally, I think it is necessary if there are strong requirements for data security; if the accuracy requirement is very high, the model needs to be fine-tuned, and private deployment is necessary when the data cannot be outflowed. If there are no special requirements for data security, it is recommended not to deploy privately, but to use the large model service of the public cloud.

    Of course, this is not an accurate answer, it can only be regarded as a discussion among friends. To choose any solution, you need to understand the background and needs, as well as the limitations of technology, security, etc., and then you can come up with a reasonable solution. This article will discuss this issue.



2. How to use the large model capability

    At present, mainstream vendors (Alibaba's Tongyi Qianwen, Baidu Wenxin Yiyan, DeepSeek) provide large model capabilities in the following ways:

1. Web version:

    Everyone is familiar with it. It allows you to communicate with large models directly through web pages, support uploading attachments, online searches, etc. With the popularity of DeepSeek, the web versions of major models now also have deep thinking options.

2. API interface call (commonly used) :

    The capabilities of large models are called through API access. The most commonly used is the conversational interface. Different large models will at least provide this interface (of course, how can they be used otherwise), support non-streaming and streaming output, and whether to enable online search options. Some large models (such as Tongyi Qianwen) also provide embedding (text vectorization), image generation and other capabilities.

3. Enterprise-specific large model training :

    In addition to directly calling the public version of the big model, enterprises can also combine their own industry knowledge and application scenarios to train their own enterprise big models. The big model manufacturer will provide enterprises with exclusive data space, which can upload data in the form of PPT, PDF, pictures, enterprise databases, etc., and train a big model that is more suitable for their own field through model training and fine-tuning (more SFT); and they can purchase resources for separate deployment to avoid the public version of the big model that is shared by everyone, resulting in unstable services (such as deepseek, which was attacked some time ago and was often unavailable), and can also obtain higher qps support, of course, the cost will be very high.

4. Private deployment of open source large models :

    Many large model manufacturers provide open source models, such as DeepSeek R1 and V3. If you have enough GPU resources, you can also deploy them on the server yourself to experience and use the models more deeply.

Three scenarios for each solution

1. Web version

    It is the most convenient for ordinary users, and they can quickly experience and use it. But at the same time, this method cannot be called through the program, so it is usually only used for experience, model trial (effect test), or daily sporadic use, and cannot be used for application development.

2. API call of large model (public version)

    This is the most common usage method in application development. As long as you obtain the API Key required for the call and have the call quota (most of them are charged by token, some free quota will be provided, and you can also apply for vouchers or recharge), read the API documentation, and then you can call the code according to the examples. In terms of calling methods, there are several methods such as Python/Node.Js/Java/Curl, and different products are slightly different. If you are a Java developer, you can consider using RestTemplate to call through Get/Post request methods without introducing SDK. After all, if you use multiple large models at the same time, introducing a bunch of SDKs is also a lot of work, and it is also more complicated to switch.

3. Exclusive large model:

    From the perspective of ownership, an independent model is provided to the enterprise. The location of the model service is still on the Internet, so the calling method is basically the same (generally only the domain name used when calling is different). But after all, it is exclusive, so the stability and security are better than the public version, and to a certain extent, it can resist the harm caused by "poisoning" of large models.

4. Private deployment:

    If you have enough manpower, servers, and time, you can purchase servers for private deployment, build clusters, train, fine-tune, and even upgrade models to improve your service. Of course, the cost is also the highest among the several methods. Taking DeepSeek R1 full-blooded version (671B) as an example, the resource requirements are as follows:

CPU: At least 32 cores, server-grade processor is recommended.

Memory: At least 1TB DDR4 RAM.

 Hard Drive: At least 500GB SSD for operating system and model files.

Graphics: 8 A100/H100 GPUs, each with at least 80GB of video memory.

The resource requirements for the 70B model are as follows:

CPU: 32 cores or above

Memory: 128GB+

Hard disk: 70GB+

Graphics card: Multiple cards are required, such as 2x A100 80GB or 4x RTX 4090

    The price range of NVIDIA A100 graphics card is quite large. The lowest price of NVIDIA GPU graphics card A100-40G (customized PCIE) NVIDIA Tesla A100 40G found so far is 45,000 yuan, and some channels are between 120,000 yuan and 150,000 yuan. Even for the 70B model, the overall deployment cost is hundreds of thousands of yuan, and the full version may even cost millions. The high cost of private deployment is obvious.

4. Why is private deployment not recommended?

    Being able to deploy large models privately is undoubtedly the most ideal way to use large models. But it is also necessary to be clear about what the goal is. Is it to build a toy for yourself to play with, or to report to the leader, to show performance (bragging)? Or is it to be applied in actual scenarios? There is a world of difference between the two. If it is just a toy/political achievement project, then deploying a 14B model and being able to run some demos is enough. After all, there is no need to consider various technical issues of enterprise-level applications.

If you are determined to develop enterprise-level applications, you must refer to the following questions:

1. Cost

2. Model update

The private deployment of the model means that it cannot be consistent with the public version of the large model and will not be updated synchronously. If you want to update it, you need to do it manually. But just like the branch of open source code that has been customized, it is very difficult to keep up with the pace of updates.

3. Knowledge and talent reserves

Whether it is model principles, fine-tuning, retraining, or computing power management and operation and maintenance mentioned in 1, they all require a considerable technical threshold, and it is very difficult to start from scratch.

    In summary, unless there are strong scientific research or data security requirements, it is not recommended to deploy large models privately. Even in scenarios with strong data security requirements, it is recommended to cooperate with manufacturers to provide exclusive solutions, rather than deploying privately without sufficient support, otherwise it is very likely that the expected results will not be achieved.