AI Xiaozhi connected to Qianwen 3, speed increased by 30%, and opus transmission concurrency increased by 1000%

Rapidly improve AI processing speed and concurrency, new breakthroughs in the Qianwen 3 model and opus transmission technology.
Core content:
1. Qianwen 3 model release, optimization strategy for 30% speed increase
2. Local deployment of the Qwen3 model, implementation method for speed increase to 0.3s
3. Opus transmission technology reduces bandwidth pressure and increases concurrency to 1000% of technical innovation
We have made two optimizations to the server.
We released Qianwen 3 on April 28, which is currently the fastest Qianwen model. The fast charging AI integrated software has been adapted, and the thinking mode needs to be turned off to reflect the speed advantage. After testing, the original 1.6s has now been improved to about 1.1s.
Online deployment
The website backend configuration is as follows: qwen-turbo-latest
Local deployment
If the qwen3 model is deployed locally, the speed is basically around 0.3s, as shown in the following figure
However, local deployment is expensive and difficult to maintain. It requires an RTX4090 graphics card and a public IP address, and is powered on 24 hours a day. It is suitable for students who pursue extreme performance. It can be understood that an investment of 30,000 to 50,000 yuan is required to increase the speed by 1 second.
The second part of the optimization is that ASR and TTS all use opus transmission, which greatly reduces the bandwidth pressure. The number of bits is 1/10 of the original PCM, and the concurrency is increased by 1000%.
Why do I say this? As shown in the following picture, with the same sound quality, the ogg file is much smaller than the pcm file.
The project used here is pyogg, which implements the encapsulation of opus and hands it over to the volcano engine large model speech recognition, and also supports Alibaba's speech recognition.
The code is as follows
def __init__(self, **kwargs):
self.success_code = 1000# success code, default is 1000
self.seg_duration = int(kwargs.get("seg_duration", 100))
self.ws_url = kwargs.get("ws_url", "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel")
self.uid = kwargs.get("uid", "test")
self.format = kwargs.get("format", "ogg")
self.rate = kwargs.get("rate", 16000)
self.bits = kwargs.get("bits", 16)
self.channel = kwargs.get("channel", 1)
self.codec = kwargs.get("codec", "opus")
self.auth_method = kwargs.get("auth_method", "none")
self.hot_words = kwargs.get("hot_words", None)
self.streaming = kwargs.get("streaming", True)
self.mp3_seg_size = kwargs.get("mp3_seg_size", 1000)
self.req_event = 1
def construct_request(self, reqid, data=None):
req = {
"user": {
"uid": self.uid,
},
"audio": {
'format': self.format,
"sample_rate": self.rate,
"bits": self.bits,
"channel": self.channel,
"codec": self.codec,
},
"request": {
"model_name": "bigmodel",
"enable_punc": True,
}
}
return req