DeepSeek Large Model All-in-One Machine Selection Guide

Written by

Audrey Miles

Updated on:July-08th-2025

The features of DeepSeek large model all-in-one machine are: private deployment, easy delivery, and low price. It is suitable for intranet IT transformation and innovative choices for CIOs , which is also the reason why it is currently being snapped up.

Large model all-in-one machine classification

The large model all-in-one machine is mainly divided into three categories: ABC:

Category A: The main hardware is the head + AI card, which is a pure AI hardware server.

Category B: Add Deepseek models and development platforms, such as dify, dbgpt, etc., to the technology of Category A to form an all-in-one machine.

Category C: Based on Category B, some applications are added as the overall product output, such as knowledge base management, etc.

Technical Definition

Definition of full-blooded version: Deepseek with 671B parameters, regardless of whether it is V 3/ R1, is called a full-blooded version as long as it meets the 671B parameters.

Full-blood version classification : It can usually be divided into: native full-blood version (FP8 computing accuracy), translated full-blood version (BF16 or FP16 computing accuracy), quantized full-blood version (INT8, INT4, Q4, Q2 computing accuracy) and other versions, but everyone will not promote XX full-blood version, only the full-blood version, so everyone must keep their eyes open when choosing.

Native full-blooded version : FP8 mixed precision officially supported by deepseek. Don't doubt it, we think the official one is the best. Personally, I think no one knows deepseek better than the official one.

Full-blooded translation : Because the official deepseek uses FP8 mixed precision, but most domestic graphics cards do not support FP8 precision . Therefore, to adapt deepseek, use BF16 or FP16 for calculation. In theory, this method has little effect on accuracy, but the hardware requirements for calculation and video memory are almost doubled.

Regarding video memory calculations, if you want to deploy the 671B official version of the large model and use FP8 mixed precision, the minimum cluster video memory is about 750GB; if you use FP16 or BF16, it will probably require more than 1.4T.

According to public information, there are only three domestic AI chips that support FP8 precision, namely Suanneng, Moore Thread and Hanbo Semiconductor . These three companies have publicly stated that they support FP8. Other companies have not made it clear that they support FP8 in public information, because if their chips do not support it, they will have legal troubles. If you know of other companies that support it, you can also leave a message to let the editor know.

Full-blooded quantization version: Many manufacturers' AI cards only support INT8, FP16, FP32 and other formats. If FP16 is used, a single machine needs more than 1.4T video memory. Most domestic AI stand-alone machines do not have such a large video memory. In order for a single machine to run 671B deepseek, quantization is forced to be chosen. Quantization is to reduce the calculation accuracy to reduce the video memory usage and improve the throughput efficiency. Of course, any quantization comes at the cost of lowering the IQ.

To give an illustrative example, for FP8 we say that the calculation retains 7 digits after the decimal point, and for INT8 we say that the calculation retains 2 digits after the data point.

The calculation of FP8 is: 3.1415926*3.1415926 =9.8696040,

The calculation accuracy of IN8 is 3.14*3.14=9.86

We think these two results are approximately equivalent , but we will find that FP8 is more accurate. In large models, we roughly think that the higher the accuracy, the higher the IQ. So we roughly think that FP8 has a higher IQ.

There is a controversial point here . Many people say that the IQ of the 671B large model calculated by BF16 or FP16 is the same as the original FP8 IQ and has not decreased. In principle, it can indeed remain consistent, but the actual translation process will lead to some differences and the IQ will decrease slightly. How much the IQ decreases depends on the level of the technical team of the translation manufacturer .

In addition, the question of how much the IQ of 671B is reduced during the translation and quantification process is an open question. The IQ of the translated and quantified version must be different from that of the original version. How much the IQ is reduced depends on the trade-offs and operations of the technical team during translation and quantification. For example, when doing the same Q4 quantification, the IQ of the 671B model quantified by a master and a rookie must be very different . So it is a wrong perception that the translated full-blooded version must be smarter than the quantified full-blooded version.

The original full-blood version is the best, and other versions are all possible. Is it possible to translate a full-blood version with a higher IQ than the original full-blood version? It is also possible, but the probability is extremely low. There is a PhD startup team that said they know the deepseek architecture better than deepseek, and I can only laugh.

There are many full-blooded Deepseek all-in-one machines, how to distinguish their advantages and disadvantages? This question is very simple. I think practice is the only criterion for testing truth. I have tested several domestic Deepseek all-in-one machines 671B, and many of them are reduced IQ versions.

Test method: Based on the current situation that Deepseek officially announced that the online version and the open source version are completely consistent, ask the same question to the official Deepseek website address first, and then ask the all-in-one machine. If the thinking process and answer are consistent, it means that the IQ is consistent. Otherwise, it is a lowered version, at least lower than the official website.

I invited several friends who have bought or deployed the full-blooded version of deepseek 671B, and tested the full-blooded version they purchased and deployed. Among the 5 companies invited, only 1 had the same test answer as the official website, and the other four were inconsistent with the official website. They were obviously low-IQ versions. The low-IQ proportion of the sample I invited was 80% (the sample was too small and did not have typical statistical significance, so don't argue) . You can just test it according to this method. If the test IQ is low, whether to hold the supplier accountable later is up to you to decide.

Considerations for selecting large model all-in-one machine

1. Domestic, Xinchuang: Domestic means produced in mainland China, that is to say, except for brands like HP and Dell, all are called domestic;

Trusted innovation can be divided into full trusted innovation and semi-trusted innovation. Full trusted innovation means that both the CPU and the AI card are newly created, while semi-trusted innovation means that only the AI card part is trusted, and the CPU is not important.

2. Demand: Is it for trying out something new or just for show? In this case, the cheaper the better, and experience is the priority; if it is for business use, you need to sort out in advance whether the business is suitable for the big model?

3. Concurrency: Generally, the number of people in the company/20 is the required concurrency formula. You can be online at the same time, but the number of concurrent users cannot be too high.

4. Security: The most important thing about large models is their security. There is currently no good technical strategy. The best thing is for each department to deploy a large model all-in-one machine, and to have access to each other's models, such as the finance department, legal department, contract department, etc., to be separated.

In reality, it is very easy for a big model to appear when everyone asks how much Zhang San's salary is. The big model will accurately query the HR database and give a precise answer. (This is an industry problem, and no one should argue. This is difficult to handle, and it is also one of the biggest technical problems for Deepseek to land.)

5. Cost: If you have enough money, you should definitely choose the original full-blood version, followed by the translated full-blood version, and finally the quantized full-blood version. Finally, choose the distilled version. Just remember the order, it mainly depends on the money.

Currently the cheapest quantitative full-blood version is 98,000 yuan, and the most expensive native full-blood version H200 is over 2 million yuan, so it depends on the money.

6. Implementation: Which product do you buy to experience? Out of the box or do you have your own technicians to tinker with it? Deepseek will definitely be integrated with ERP, CRM, OA, etc. in the enterprise to reduce a lot of people's workload.

7. Operation: There are three ways to run the 671B large model: video memory operation, internal memory operation, and hard disk operation. The tokens/S speeds and prices of the three methods are different. You can choose the one that suits you.