vLLM private deployment of the full-blooded version of DeepSeek-R1-671B model

Written by
Clara Bennett
Updated on:July-13th-2025
Recommendation

Master the efficient method of private deployment of vLLM.

Core content:
1. Server environment and hardware configuration requirements
2. Miniconda installation and environment setting steps
3. Key operations of Docker deployment

Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)


Server Environment Preparation

Hardware configuration: 16 * H800, multiple machines and multiple cards, bfloat16 + FP8 quantized mixed precision inference, output efficiency 3 seconds/1 token.

Install Miniconda

Download the Miniconda installation script

Open the terminal and use  the wget  command to download the latest version of the Miniconda installation script:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Note:  If your system architecture is not x86_64, please visit  Miniconda official download page  to get the installation package suitable for your system.

Add execution permissions to the installation script

After the download is complete, grant executable permissions to the script file:

chmod +x Miniconda3-latest-Linux-x86_64.sh

Run the installation script

Execute the script to start the installation:

./Miniconda3-latest-Linux-x86_64.sh

During the installation, you will see some prompts:

  • License Agreement : Press  Enter  to view the content and enter  yes  to agree to the license agreement.

  • Installation path : You can choose the default path (usually  ~/miniconda3 ) or specify another path.

  • Initialization settings : After the installation is complete, the installer will ask whether to initialize conda (that is, automatically modify the shell configuration file, such as  ~/.bashrc ). It is recommended to select yes.

Activate conda environment

After the installation is complete, reload the shell configuration file to make the changes take effect (or restart the terminal):

source ~/.bashrc

Now you can check whether conda is installed successfully by running:

conda --version

If you see version information similar  to conda 4.xx  , it means the installation is successful.

Create and activate a virtual environment

conda create -n deepseek python==3.10
conda activate deepseek

Docker deployment (server version Ubuntu 20.04 system)

Update package index

sudo apt-get update

Install Necessary Packages

sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release

Add Alibaba Cloud official GPG key

curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

Setting up Alibaba Cloud Docker repository

sudo sh -c 'echo "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list'

Install Docker

sudo  apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Verify Installation

sudo systemctl status docker

Output is activated, showing green to indicate it is started: (CTRL + C to exit)

Check Docker version

docker --version

Set up Docker to start automatically

sudo systemctl enable docker

Verify whether the automatic start function is enabled

(If the output is enabled, it has been successfully enabled. If not, further inspection is required)

sudo systemctl is-enabled docker

Configure Docker image acceleration

Enter the command directory to add the daemon.json file for editing:

sudo nano /etc/docker/daemon.json

The edited content is as follows: (Paste the content in, press CTRL + O + ENTER to save, and CTRL + X to exit)

{
    "registry-mirrors" : [
        "https://docker.211678.top" ,
        "https://docker.1panel.live" ,
        "https://hub.rat.dev" ,
        "https://docker.m.daocloud.io" ,
        "https://do.nark.eu.org" ,
        "https://dockerpull.com" ,
        "https://dockerproxy.cn"
    ]
}

Reload the Docker service configuration file

sudo systemctl reload docker

Stop and restart the Docker service

sudo systemctl restart docker

Check the Docker service status again

sudo systemctl status docker

Running a simple Docker container test

docker run hello-world

Install Docker graphics driver

wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2204/x86_64/nvidia-container-toolkit-base_1.13.5-1_amd64.deb
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2204/x86_64/libnvidia-container-tools_1.13.5-1_amd64.deb
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2204/x86_64/libnvidia-container1_1.13.5-1_amd64.deb

Install on the server:

sudo  dpkg -i /home/data01/libnvidia-container1_1.13.5-1_amd64.deb
sudo  dpkg -i /home/data01/libnvidia-container-tools_1.13.5-1_amd64.deb
sudo  dpkg -i /home/data01/nvidia-container-toolkit-base_1.13.5-1_amd64.deb
sudo dpkg -i /home/data01/nvidia-container-toolkit_1.13.5-1_amd64.deb

Pull DeepSeek-R1-671B model

The Mota community pulls DeepSeek-R1

Address link: git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1.git

Remove  the rdma-userspace-config-bbc  package

If you do not need RDMA services, you can try to remove the problematic packages:

sudo apt remove rdma-userspace-config-bbc

Then execute the following command to clean up the residual configuration in the system:

sudo  apt autoremove
sudo apt clean

Download the LSF large file storage before pulling the model file

Make sure Git is installed

First check if Git is installed:

git --version

If it is not installed, you can install it using the following command:

sudo apt update && sudo apt install git -y

Install Git LFS

If Git LFS is not installed, you can install it using the following command:

Ubuntu/Debian
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo  bash
sudo apt install git-lfs -y

Initializing Git LFS

Once the installation is complete, run the following command to initialize Git LFS:

git lfs install

Pull DeepSeek-R1-671B project command

git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1.git
  • Running this command will get stuck, the model is being downloaded, you can check whether the disk memory is decreasing to confirm

  • The file is about 1.3T and the disk space required is sufficient

df -h
Export server disk:
Filesystem Size Used Avail Use% Mounted on
tmpfs 1.5T 0 1.5T 0% /dev/shm
tmpfs 605G 13M 605G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda2 91G 14G 77G 16% /
/dev/sda3 1.9G 5.3M 1.9G 1% /boot/efi
/dev/sda4 341G 40K 340G 1% /home
/dev/nvme0n1p1 3.4T 3.7G 3.4T 1% /home/data01
/dev/nvme2n1p1 3.4T 24K 3.4T 1% /home/data03
/dev/nvme1n1p1 3.4T 24K 3.4T 1% /home/data02
/dev/nvme3n1p1 3.4T 24K 3.4T 1% /home/data04
tmpfs 303G 4.0K 303G 1% /run/user/0

(Using 200 bandwidth, it takes about 8-10 hours to download)

VLLM official website startup script

Distributed deployment official website reference link: https://docs.vllm.ai/en/latest/serving/distributed_serving.html

Find the sh inference run file on the official website:

  1. 1.  Open the official website link and scroll down to find the multi-node VLLM title

  2. 2.  Find the github link in the first line of the second paragraph under the multi-node title

  3. 3.  After entering, a script file is downloaded to the local computer

First, pull vllm from a server or computer that can access the Internet scientifically

Make sure the image exists

First, check whether the vllm/vllm-openai image has been successfully pulled   . Run in the command line:

docker images

This will list all the images that exist locally. If  vllm/vllm-openai  is not in the list, you need to pull it first.

Pull the image

If the image is not found, you can pull  the vllm/vllm-openai  image using the following command:

The following is the win local pull: (open the command prompt to ensure that the local docker on the computer can be used)

Pull vllm and wait for the download to complete:

docker pull vllm/vllm-openai

Confirm that the pull was successful

Run  docker images again and confirm that  the vllm/vllm-openai  image has appeared in the local image list.

Run docker save again 

After the image is pulled, run again:

docker save -o vllm.tar vllm/vllm-openai

If everything goes well, you should be able to successfully save the image as  a .tar  file.

Currently, vllm.tar and run_cluster.sh are both on the local computer and are manually pulled to the server.

Server linkage file operation

Currently the server has three files, model file, vllm file, and running script file

Upload all files of the master node server to the same location of the slave node: (the two machines are connected via an intranet, and other upload methods can be selected)

rsync -avz /home/model/DeepSeek-R1 root@XXX.XXX .XX.XX:/XXX/XX/

Enter the vllm.tar file on each server and run the following command to download it:

docker load -i vllm.tar

After the download is complete, check the IP addresses of the two machines.

ip a

On the master server, run:

bash run_cluster.sh vllm/vllm-openai 192.168.0.113 --head /home/data01/ -v /home/data01/:/models -e VLLM_HOST_IP=192.168.0.113 -e GLOO_SOCKET_IFNAME=eth0 -e NCCL_SOCKET_IFNAME=eth0

On the slave server, run:

bash run_cluster.sh vllm/vllm-openai 192.168.0.113 --worker /home/data01/ -v /home/data01/:/models -e VLLM_HOST_IP=192.168.0.116 -e GLOO_SOCKET_IFNAME=eth0 -e NCCL_SOCKET_IFNAME=eth0

Open a new terminal page to view the docker image:

docker ps -a

Enter the docker image:

docker exec -it <specified ID> <path>

Check for graphics card issues:

ray status

Start the vllm service:

vllm serve /models/DeepSeek-R1/ --tensor-parallel-size 8 --pipeline-parallel-size 2 --served_model_name deepseek_r1 --enforce_eager --trust-remote-code --dtype float16

Done!