Top 10 DeepSeek all-in-one machine requirements from 100+ customers

Written by

Audrey Miles

Updated on:July-15th-2025

Question 1: My boss asked me to research Deepseek all-in-one machine, but I don’t know how to start?

Answer: Currently, all leaders have this idea. They are afraid of future risks and responsibilities caused by leaks on the public network. Privatization is the safest solution. They don’t understand it themselves, so they can only let the people below them do research because they don’t know what scenarios to use. Our suggestion is to start from reality and use a small budget to do trial and error. At the beginning, we combined the assistant class with the knowledge base to do enterprise internal search and other scenarios. Two methods:

A. If you already have the hardware, invest 200,000 to 500,000 yuan to find a reliable team to deploy a set of software, and your own IT and business personnel will explore and try it out.

B: Buy an all-in-one machine, which costs 100,000 to 500,000 yuan, and have your own IT and business staff explore and try it out.

Question 2: What is the full-blood version? What are the differences between the three versions of the full-blood version?

A: Currently on the market, any DeepSeek with 671B parameters is called a full-blooded version.

The full blood version is divided into:

Native full-blood version (FP8 data accuracy, video memory occupies 671G)

Full-blooded version (BF16 or FP16 data accuracy, video memory requirement is not quantified 1342G)

Quantized full-blood version (INT8 (Q8) video memory 671G, INT4 (Q4) video memory 335G, Q2, Q1 data accuracy)

The three classification methods and naming were first proposed and demonstrated by several industry experts at an offline salon organized by "Computing Power Encyclopedia" on February 9, and have been widely recognized by peers.

When manufacturers promote their products, they only promote the full-blooded version, not the xx full-blooded version. The default official native full-blooded version is the DeepSeek version with the highest IQ.

Question 3: What parameter size should I choose? Are 70B or 32B enough?

Answer: Model parameters are equivalent to neurons in the human brain. 671B is definitely the best. Of course, the hardware requirements are indeed large. Practice has proved that the number of DeepSeek model parameters is proportional to IQ. The larger the parameters, the higher the IQ.

We can equate 671B to a doctoral student, 70B to an undergraduate student, and 32B to a junior college student. Not all positions require a doctoral student, so you can choose the best one based on your needs.

But in most cases, after having employed a PhD student for this position, the public is no longer willing to hire a bachelor's degree student, let alone a junior college student.

Personally, I think that customers who do not choose 671B and insist on choosing the so-called 70B or 32B are SB customers and Party B should stay away from them. I also urge Party B to stay away from them.

Reasons for staying away: It is expected that the experience of Party A who does not use 671B will be bad. The customer leader must not blame himself for making the wrong choice, but blame Party B for the poor system. Party B is being taken advantage of and ruining Party B's own reputation.

Question 4: How to choose an AI chip?

A: AI chips that support FP8 are preferred, followed by cost performance, and finally other factors. All optimizations of DS are for FP8 computing architecture. In a word, trial and error selection is low-cost, less stressful, flexible, and convenient.

Except for the original full-blood version, the IQ of other full-blood versions mainly depends on the technical level of the adaptation and optimization team. The IQ varies greatly, and different teams charge different software fees, which is mainly reflected here.

Question 5: How much concurrency is needed? How to calculate it?

A: In general, the total number of employees/20 is the required number of concurrent users . The same machine has different concurrent performance due to different technical teams. For example, the R1 FP8 native full-blooded version is more suitable for cluster deployment. Single machines need to be optimized by themselves. The technical team has different capabilities, and the software concurrency varies by 3 to 10 times. The top optimization can reach 300 concurrent users, while the official original version without optimization has a concurrency of about 50.

Question 6: If I buy an all-in-one machine now, what should I do to expand the capacity in the future?

Answer: Not all integrated machines support flexible capacity expansion. There are two ways to expand capacity: "true cluster" and "pseudo cluster" .

A true cluster is a computing power cluster formed by multiple physical AI servers through high-speed networking (IB, roce, p2p) and other methods, and then expands a complete large model cluster. A true cluster is a computing power pool and a large model. Customers who have such expansion plans should reserve network cards and expansion technology plans before purchasing.

Pseudo cluster: If the concurrency of one integrated machine is not enough, buy another one, and multiple integrated machines run independently. This is easy to expand, and the data is independent of each other, belonging to multiple large models running independently. This also has an advantage, that is, it can be managed by department and physically isolated.

Q7: If I already have the hardware, can I purchase the software for deployment?

A: Not all hardware can deploy the full version of deespeek, but most AI cards can definitely deploy it, just with different parameter versions.

Question 8: Can a full-capable version be deployed on a desktop computer?

Answer: Tsinghua University has indeed open-sourced the KT solution. The main idea of the framework is to migrate the model from running in video memory to running in main memory.

There is nothing wrong with the idea of this solution, but most of the codes are written by students and the code quality is poor. A professional team is needed to maintain and fix bugs. The Computing Encyclopedia team has contacted Tsinghua’s open source laboratory. They currently have no plans to adapt to domestic AI chips, but they support third-party teams to adapt domestically.

This solution relies on DDR bandwidth. Generally speaking, to reach more than 10 tokens, at least a 100,000-watt dual-channel high-end single 4090 server is required. Being able to run and being able to be used for production are two different things.

Question 9: How to combine the businesses?

DeepSeek became popular too quickly, and everyone needs a process of understanding. When ERP was first introduced, there was a saying that "If you use ERP, you will die, if you don't, you will die." After 20 years of development, now it seems that almost everyone has already used ERP.

The same goes for large models. Now is the beginning to try them out. "If you use DS, you will die. If you don't, you will die." Eventually, you will still have to use DS-type large models. It's just a matter of when to start.

Our suggestion is to start exploring now, privatize one at low cost, and start the journey of trying out large models.

Question 10: What about safety issues?

Answer: We have written an article for your reference. It is very detailed, please read it carefully.