Deployment and maintenance of the SRE-specific large model

Written by

Caleb Hayes

Updated on:June-29th-2025

A while ago, a former colleague told me that he had fine-tuned a large model exclusive to the SRE field ( a 7B parameter model based on the DeepSeek architecture, fine-tuned by LoRA, and designed specifically for tasks in the operation and maintenance field. It has enhanced three capabilities: automated script generation, system monitoring and analysis, and troubleshooting and root cause location ). I plan to deploy it and experience it during the two-day holiday.

Preparation

We chose Alibaba Cloud GPU server as the deployment environment because the local Mac computer could not run it.
Recommended GPU configuration: system disk at least 100 GB, memory 60 GB.
Package and install dependent components through docker images. The component information is as follows:

Installation Steps

1. Create a GPU instance and install the Tesla driver

The instance supports the following GPU instance families:

gn6e、ebmgn6e
gn7i, ebmgn7i, ebmgn7ix
gn7e, ebmgn7e, ebmgn7ex
ebmgn8v、ebmgn8is

Mirror : SelectUbuntu 20.04operating system as an example.GPUUse on instancevLLMContainer image, which needs to be installed on the instance in advanceTeslaDriver and the driver version should be535or higher, it is recommended that youECSConsole PurchaseGPUWhen you install the instance, select InstallGPUdrive .

2. Remotely connect to GPU instances. We recommend an AI terminal artifact Warp (strongly recommended).

3. Execute the following command to installDockerenvironment.

sudo apt-get updatessudo apt-get -y install ca-certificates curlsudo install -m 0755 -d /etc/apt/keyringssudo curl -fsSL http://mirrors.cloud.aliyuncs.com/docker-ce/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.ascsudo chmod a+r /etc/apt/keyrings/docker.ascecho \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] http://mirrors.cloud.aliyuncs.com/docker-ce/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullsudo apt-get updatesudo apt-get install -y docker-ce docker-ce-cli containerd.io

4. Execute the following command to checkDockerWhether the installation is successful.

docker -v

5. Execute the following command to installnvidia-container-toolkit.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

6. SettingsDockerStartup and restartDockerServe.

sudo systemctl enable docker 
sudo systemctl restart docker

7. Execute the following command to viewDockerWhether it has been started.

sudo systemctl status docker

8. Execute the following command to pullvLLMMirror image.

sudo docker pull egs-registry.cn-hangzhou.cr.aliyuncs.com/egs/vllm:0.8.2-pytorch2.6-cu124-20250328

9. Execute the following command to runvLLMcontainer.

sudo docker run -d -t --net=host --gpus all \ --privileged \ --ipc=host \ --name vllm \ -v /root:/root \ egs-registry.cn-hangzhou.cr.aliyuncs.com/egs/vllm:0.8.2-pytorch2.6-cu124-20250328

10. Execute the following command to viewvLLMWhether the container is started successfully.

docker ps

Verification steps

1. Install the git command and download the large model locally.

apt install git-lfscd /root

git lfs  clone  https://www.modelscope.cn/phpcool/DeepSeek-R1-Distill-SRE-Qwen-7B.git

2. Enter the vLLM container

docker exec -it vllm bash

3. Start the vLLM inference service

vllm serve /root/DeepSeek-R1-Distill-SRE-Qwen-7B --tensor-parallel-size 1 --max-model-len 2048 --enforce-eager

As shown below, the vLLM inference service has been started.

4. Test it through curl on this GPU server

curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "/root/DeepSeek-R1-Distill-SRE-Qwen-7B", "messages": [ {"role": "system", "content": "You are an intelligent operation and maintenance assistant."}, {"role": "user", "content": "How to optimize the storage performance of the server to increase data reading and writing speed?" } ]}'

return:

Summarize

The big model has been deployed. We will use some actual logs to simulate online failures and test the model's ability to identify the root cause of failures. We will share more in the future.