Woter AI detection.Hurry - ends Jul 22nd

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

What version of DeepSeek-R1 can I run locally on my computer? Let me explain in one article!

Written by

Silas Grey

Updated on:July-17th-2025

Recently, the domestic open source large model DeepSeek-R1 has become popular!

However, faced with various parameters and configuration requirements, many friends are confused: Can my computer run DeepSeek-R1? How big a model can it run? Why can mobile phones run large models? Deploy and run DeepSeek-R1 on an Android phone

Let us solve these doubts together today.

1. Check your computer configuration first

—

Before you begin, you need to confirm what operating system you are using, because the hardware architecture of Windows computers and Macs is very different.

Windows computer: Check the video memory of the graphics card

For Windows computers, the key is to look at the video memory size of the independent graphics card. Common graphics card configurations on the market are:

Entry-level graphics cards:

RTX 3060: 12GB VRAM
RTX 3070: 8GB VRAM
RTX 3070 Ti: 8GB video memory

Mid-range graphics cards:

RTX 3080: 10GB VRAM
RTX 3080 Ti: 12GB video memory
RTX 4070: 12GB VRAM

High-end graphics cards:

RTX 4080: 16GB VRAM
RTX 4090: 24GB VRAM

? How to check video memory:

Press Win+R and enter "dxdiag"
Click the Display tab
Check the size of "display memory"

The actual available video memory on Windows computers is slightly less than the nominal value, so some video memory needs to be reserved for system use

Mac: See Unified Memory

Mac (especially those equipped with Apple Silicon) uses a unified memory architecture and does not have the concept of independent video memory. You need to look at the overall memory size:

Getting Started Configuration:

MacBook Air M1/M2: 8GB unified memory
MacBook Pro M1/M2 (base version): 8GB unified memory

Mid-range configuration:

MacBook Air M2: 16GB/24GB unified memory
MacBook Pro M2 Pro: 16GB/32GB unified memory

High-end configuration:

MacBook Pro M2 Max: 32GB/64GB/96GB unified memory
Mac Studio M2 Ultra: 64GB/128GB/192GB unified memory

? How to view memory:

Click the Apple icon in the upper left corner
Select "About This Mac"
Click "More Information" to view the memory size

2. Let’s first talk about what are model “parameters”?

—

Before we start choosing a model, let’s use real-life examples to understand the “parameters” of a large language model:

This is like you are teaching a child to understand the world, you will use cards, point to the cards and teach him:

When teaching him to recognize a cat, he needs to remember: it has four legs, a tail, and can meow.
When teaching him to recognize a dog, he needs to remember: it has four legs, a tail, and can bark.
When teaching him to recognize "birds", he needs to remember: two legs, wings, and the ability to fly.

Each feature here is like a "parameter" of the model . This is how AI models learn, except that they have to remember many more features:

The 1.5B model is equivalent to remembering 1.5 billion features
The 7B model is equivalent to remembering 7 billion features
The 70B model remembers 70 billion features!

The more parameters, the "smarter" the model is, but more video memory is needed to store this "knowledge". The more knowledge you want to learn, the more books (memory) you need.

3. What versions of DeepSeek-R1 are available?

—

DeepSeek-R1 provides multiple versions to adapt to different hardware environments:

Lightweight version (1.5B - 14B) : DeepSeek-R1-Distill-Qwen-1.5B: requires only 0.7GB of video memory, DeepSeek-R1-Distill-Qwen-7B: requires 3.3GB of video memory; DeepSeek-R1-Distill-Qwen-14B: requires 6.5GB of video memory
Medium version (32B - 70B) : DeepSeek-R1-Distill-Qwen-32B: 14.9GB of video memory required, DeepSeek-R1-Distill-Llama-70B: 32.7GB of video memory required
Full version (also called full blood version) : DeepSeek-R1: requires up to 1,342GB of video memory (requires multi-card solution)

4. Quantification: Make the model "slim"

—

If you think your video memory is not enough, don't worry! Let's see how the magical technology of "quantization" can help the model "lose weight".

What is Quantization?

In fact, we encounter the concept of quantification a lot in our lives:

A high-definition SLR photo may use millions of colors (or even hundreds of millions!). If we classify similar colors into one category (for example, simplify all dark blues into the same blue), the file size of the image will become smaller, but the difference will not look that big.

MP3 compressed music : Lossless music files are very large, but after being compressed into MP3, although some details are lost, ordinary people can hardly hear the difference, and the file size is much smaller.

Mobile photos : Choose "Standard" instead of "HD" mode when taking photos. The photos will take up less space, but it will be difficult to notice a noticeable difference in quality when viewing them normally.

The quantization of the model is similar:

The original model is like an artist who can distinguish hundreds of shades of blue.
The quantized model is like an ordinary painter who may only use a dozen shades of blue, but the paintings he draws are still beautiful.

By using quantization technology, we can significantly reduce the model's demand for video memory:

Take the 7B model as an example:

Original version (FP16): requires about 13GB of video memory
8-bit quantization (INT8): requires about 6.5GB of video memory
4-bit quantization (INT4): only about 3.25GB of video memory is required

To put it simply, it is to perfectly stuff the furniture of a two-bedroom apartment into a one-bedroom apartment, while basically maintaining the original quality of life!

5. Which version should I choose?

—

You can first take a look at my actual deployment demonstration case: I use a 4090 locally, running a 32B model, combined with cline:

Cline+DeepSeek-R1 pure local development practical experience: smoother than Dove! My deployment and use process

According to your graphics card configuration, win/mac specific recommendations are as follows:

8GB video memory graphics card (such as RTX 3070):
- Recommended: DeepSeek-R1-Distill-Qwen-7B (4-bit quantization version)
- Can run basic dialogue and code generation tasks smoothly
Graphics card with 12GB video memory (such as RTX 3060):
- Recommended: DeepSeek-R1-Distill-Qwen-14B (4-bit quantization version)
- Capable of handling more complex conversations and programming tasks
24GB video memory graphics card (such as RTX 4090):
- Recommended: DeepSeek-R1-Distill-Qwen-32B
- Can run larger models, getting a near-full-fledged experience

6. Practical deployment suggestions

—

It is recommended to reserve 20-30% of the video memory space. For example, if you have 12GB of video memory, it is best to choose a configuration that does not exceed 8-9GB.
If the dialogue quality requirement is high: a larger model is preferred; if the response speed requirement is high: a quantized version of a small model can be used

Of course, the above are all theoretical values. You can start with a smaller model and gradually try a larger model based on the actual effect. If you encounter insufficient video memory, you can try the quantized version!