Personal local deployment of DeepSeek: video memory formula and graphics card recommendation

Written by
Caleb Hayes
Updated on:July-09th-2025
Recommendation

DeepSeek deployment guide for personal use, memory calculation and graphics card selection.

Core content:
1. Calculation principle of graphics memory requirements and parameter scale relationship
2. Comparison table of model scale and graphics card recommendation
3. Optimization strategy, cost-effectiveness improvement and future deployment suggestions

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
With the rapid development of artificial intelligence technology, local deployment of large language models (such as DeepSeek) has become an important way for individual developers and small teams to explore AI applications. However, the choice of video memory requirements and hardware configuration often become the core difficulties in the deployment process. This article will start from the principle of video memory calculation, combine model scale and graphics card performance, and provide a systematic deployment solution for individual users.


1. Calculation logic of video memory requirements

Relationship between parameter scale and video memory

The model memory usage is mainly composed of three parts:

  • ‌Model parameters‌ : Each parameter occupies 2 bytes in FP16 precision and 1 byte in INT8 precision
  • ‌Inference Cache‌ : includes intermediate variables such as activation values ​​and attention matrices
  • ‌System Overhead‌ : Additional consumption such as CUDA context, framework memory management, etc.

‌Basic calculation formula‌ :

Video memory requirement ≈ parameter quantity × precision factor × safety factor

in:

  • ‌Precision coefficient‌ : 2 for FP16, 1 for INT8, and 0.5 for 4-bit quantization
  • Safety factor : 1.2-1.5 is recommended (to allow for cache and system overhead)

Typical scenario calculation example using the DeepSeek-7B model as an example

  • FP16 mode: 7B×2×1.3=18.2GB
  • 8-bit quantization: 7B×1×1.3=9.1GB
  • 4-bit quantization: 7B×0.5×1.3=4.55GB


2. Comparison table of model scale and graphics card recommendations


3. Optimization strategy and cost-effectiveness improvement
1. Comparison of Quantification Techniques

Quantization Type


Video memory compression ratio


Performance loss


FP32 → FP16


50%


<1%


FP16 → INT8


50%


3-5%


INT8 → INT4


50%


8-12%


2. Framework-level optimization


  • vLLM: PagedAttention technology is used to reduce KV Cache fragmentation, and the memory usage of the 32B model is reduced by 40%.
  • Ollama+IPEX-LLM: Implementing 7B model core graphics deployment on Intel Arc graphics cards, CPU collaborative acceleration


3.  Hardware purchasing suggestions


Cost-effectiveness priority:


  • Video memory capacity  >  computing power (computing power cannot be fully utilized when video memory is insufficient)


  • Choose a graphics card that supports Resizable BAR technology (improve multi-card communication efficiency by 30% )


  • Prioritize energy efficiency (e.g. RTX 4090 's TOPS/Watt is 58% higher than 3090 ) 



IV. Future Trends and Deployment Suggestions
With the iteration of DeepSeek technology, the demand for video memory shows two major trends:
  • Model lightweight: Through MoE architecture and dynamic routing, the 670B-class model can be compressed to run within the 24GB video memory of a single card 
  • Hardware equality: Intel core graphics supports 7B model through IPEX-LLM, and XeSS technology may realize 32B model consumer-level deployment in the future 
Action Guide for Individual Users:
  • Short-term: Reserve redundancy according to the "video memory formula × 1.2" and choose a graphics card that supports quantization technology (such as RTX 4060 Ti 16GB)
  • Long term: Focus on 4-bit quantization support for the Blackwell architecture (RTX 50 series), and expect to achieve 70B model single-card deployment by the end of 2025 

By scientifically calculating graphics memory requirements and rationally selecting hardware, individual users can build a high-performance DeepSeek local deployment environment within a budget of tens of thousands of yuan, opening the next chapter of AI innovation.