Mac mini running DeepSeek R1 and QwQ-32B model test report

Explore the application potential of Mac mini in the field of deep learning, and report on the measured performance of the 2025 Mac mini with M4/M4 Pro chips.
Core content:
1. Overview of Mac mini hardware configuration and chip performance comparison
2. Measured performance of DeepSeek-R1 and QwQ-32B models on Mac mini
3. Detailed data analysis of inference speed, memory usage and hardware load
Test object : 2025 Mac mini (M4 / M4 Pro chip) Test model : DeepSeek-R1 (14B/32B), QwQ-32B (original / quantitative version) Test objectives : hardware performance adaptability, inference speed, memory usage and optimization solutions
1. Mac mini hardware configuration overview
chip | ||
Memory | ||
storage | ||
Memory bandwidth | ||
interface | ||
Power consumption/heat dissipation |
Chip performance • M4 chip : 10-core CPU (4 performance cores + 6 energy-efficient cores) and 10-core GPU, standard 16GB unified memory (up to 32GB optional), support 2TB SSD storage. • M4 Pro chip : standard configuration: 12-core CPU (8 performance cores + 4 energy-efficient cores) and 16-core GPU, standard 16GB memory (up to 64GB optional), support 8TB SSD storage. Top configuration: 14-core CPU + 20-core GPU . • The unified memory architecture enables high-speed data sharing between the CPU, GPU, and neural network engine, which is particularly suitable for AI tasks.
Expandability and Interfaces • Provides 2 Thunderbolt 4/5 interfaces (supporting 40Gb/s transmission), HDMI 4K/6K output, Gigabit/10Gb Ethernet, etc. to meet the needs of multi-display and high-speed peripheral connections.
Heat dissipation and power consumption • The new heat dissipation system optimizes the air flow design, and combined with the high energy efficiency performance of the M4 series chips, it can maintain low noise (about 5 dBA) even when running AI models at high load.
2. Comparison of Model Measured Performance
1. DeepSeek-R1 Series
Memory usage | ||
Inference speed | ||
First load time | ||
Typical scene delay | ||
Hardware Load |
2. QwQ-32B Series
Memory usage | ||
Inference speed | ||
First load time | ||
Typical scene delay | ||
Hardware Load |
3. Key scenario testing
1. Code generation (Python line chart script)
2. Mathematical Reasoning (AIME24 Question 7)
3. Long text processing (32K legal contract comparison)
4. Hardware Adaptation Optimization Suggestions
Configuration selection priority • Limited budget : M4 + 16GB + QwQ-32B Q4 quantitative version (best price/performance ratio). • Professional development : M4 Pro + 64GB + DeepSeek-R1:32B (full coverage of complex tasks).
Performance Optimization Solution • Required : ◦ Use quantized models (Q4_K_M or Q5_K_S) to reduce memory usage. ◦ Connect an external Thunderbolt 5 NVMe SSD (such as Samsung T9) to speed up model loading. • Advanced : ◦ Pass
vmtouch
The tool locks the model cache to reduce swap latency. ◦ Enabled in the MLX framework--metal_flash_attention
Improve GPU utilization.Tips for avoiding pitfalls • Avoid running Docker or Xcode simultaneously on 16GB models. • DeepSeek-R1:32B requires turning off the "memory compression" function of macOS (
sudo nvram boot-args="vm_compressor=0"
).
V. Conclusion
Mac mini's ability to run large models is close to that of mid-range GPU workstations : • ✅ DeepSeek-R1:32B : Suitable for complex enterprise-level scenarios, but requires a top-end configuration of 14-core CPU + 20-core GPU + 64GB memory. • ✅ QwQ-32B : The first choice for individual developers. The quantized version can run smoothly on a 16GB model, but due to quantization, the inference quality is average. The full version and DeepSeek-R1:32B are similar. Final suggestion : Choose the model based on the complexity of the task first, and then reduce the cost through quantization and hardware optimization.
Appendix: Test environment • System version: macOS Sequoia 15.0 • Framework tools: MLX 0.8.2 + Ollama 0.6.2 • Test tools: custom Python scripts, AIME24 question bank, LiveCodeBench