Woter AI detection.Hurry - ends Jul 20th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Xinference local deployment full process detailed explanation and troubleshooting

Written by

Audrey Miles

Updated on:July-09th-2025

1. Basic environment configuration

1. Docker and NVIDIA driver verification

Core steps :

Docker installation verification :

docker --version   # Required ≥ 24.0.5 (2025 compatibility requirement)

NVIDIA Driver Compatibility :

Check the driver version (need ≥535.129.03):

nvidia-smi | grep  "Driver Version" # Output example: Driver Version: 571.96.03

If the driver is missing or the version is too low:

sudo apt install -y nvidia-driver-570-server   # Enterprise-level stable version driver

2. CUDA toolchain configuration

Key operations :

# Rebuild CUDA repository (for Ubuntu 24.04)
sudo tee /etc/apt/sources.list.d/cuda.list <<EOF
deb https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2404/x86_64/ /
EOF
# Migrate keys to the new canonical path (adapt APT key management strategy)
sudo mkdir -p /etc/apt/keyrings && sudo cp /etc/apt/trusted.gpg /etc/apt/keyrings/nvidia-cuda.gpg
sudo apt update

2. Docker container deployment

1. Image pulling and container startup

GPU Acceleration Mode :

docker run -d --name xinference \
  -e XINFERENCE_MODEL_SRC=modelscope \   # Specify the model source
  -p 9998:9997 \   # Port mapping (host: container)
  --gpus all \   # Enable GPU penetration
  -v /host/cuda/libs:/usr/lib/x86_64-linux-gnu:ro \   # Driver file mount
  xprobe/xinference:latest \
  xinference-local -H 0.0.0.0 -- log -level debug

Verify GPU penetration :

docker  exec  xinference nvidia-smi   # Output should be consistent with the host machine

2. Special configuration for Windows system

Source of the problem :

Windows Network Stack 0.0.0.0 Limited support, need to use 127.0.0.1
Repair plan :

# Container startup command adjustment (PowerShell)
docker run -d --name xinference `
  -v C:\xinference:/xinference `   # Windows path mount
  -p 9997:9997 `
  --gpus all `
  xprobe/xinference:latest `
  xinference-local -H 127.0.0.1 -- log -level debug

Firewall configuration :

netsh advfirewall firewall add rule name="Xinference" dir=in action=allow protocol=TCP localport=9997

3. Model loading and API calls

1. Model deployment process

Steps in detail :

Upload the model file :

docker cp qwen2.5-instruct/ xinference:/xinference/models/   # host to container

Start the model service :

xinference launch -n qwen2.5-instruct -f pytorch -s 0_5   # Specify the framework and version

Verify the model status :

curl http://localhost:9997/v1/models   # Check if the status is "Running"

2. API Integration Examples

Python SDK call :

from  xinference.client  import  Client
client = Client( "http://localhost:9998" )   # Note the host machine mapping port
model_uid = client.launch_model(
    "rerank-chinese" , 
    framework = "transformers" , 
    max_memory = 4096 # Anti-OOM limit  
)
response = client.rerank(
    model_uid, 
    query = "deep learning framework" , 
    documents=[ "TensorFlow" ,  "PyTorch" ,  "Xinference" ]
)
print(response.scores)   # Output relevance score

4. Production environment optimization

1. Stability enhancement configuration

Container persistence :

docker run -d --restart unless-stopped \   # Automatic restart
  -v xinference_data:/root/.xinference \   # Data persistence
  xprobe/xinference:latest

Enterprise-level image source :

sed -i  's|developer.download.nvidia.com|mirrors.aliyun.com/nvidia|g'  /etc/apt/sources.list.d/cuda.list   # Domestic acceleration

2. Performance monitoring and tuning

Real-time resource monitoring :

watch -n 1 nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

Flame graph generation :

xinference profile -m rerank-chinese -o profile.html   # Locate the inference bottleneck

5. Full analysis of cross-platform compatibility

operating system	Key Configuration	Verify Command
Ubuntu 24.04	Driver mounting:`-v /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:ro`	`ls /usr/lib/x86_64-linux-gnu/libcuda*`
Windows 11	Address binding:`-H 127.0.0.1`, directory mount:`-v C:\xinference:/xinference`	`docker logs xinference --tail 100`

​Xinference local deployment full process detailed explanation and troubleshooting