AI Xiaozhi connected to Qianwen 3, speed increased by 30%, and opus transmission concurrency increased by 1000%

Written by

Jasper Cole

Updated on:June-29th-2025

We have made two optimizations to the server.

We released Qianwen 3 on April 28, which is currently the fastest Qianwen model. The fast charging AI integrated software has been adapted, and the thinking mode needs to be turned off to reflect the speed advantage. After testing, the original 1.6s has now been improved to about 1.1s.

Online deployment

The website backend configuration is as follows: qwen-turbo-latest

Local deployment

If the qwen3 model is deployed locally, the speed is basically around 0.3s, as shown in the following figure

However, local deployment is expensive and difficult to maintain. It requires an RTX4090 graphics card and a public IP address, and is powered on 24 hours a day. It is suitable for students who pursue extreme performance. It can be understood that an investment of 30,000 to 50,000 yuan is required to increase the speed by 1 second.

The second part of the optimization is that ASR and TTS all use opus transmission, which greatly reduces the bandwidth pressure. The number of bits is 1/10 of the original PCM, and the concurrency is increased by 1000%.

Why do I say this? As shown in the following picture, with the same sound quality, the ogg file is much smaller than the pcm file.

The project used here is pyogg, which implements the encapsulation of opus and hands it over to the volcano engine large model speech recognition, and also supports Alibaba's speech recognition.

The code is as follows

def __init__(self, **kwargs):self.success_code = 1000# success code, default is 1000self.seg_duration = int(kwargs.get("seg_duration", 100))self.ws_url = kwargs.get("ws_url", "wss://openspeech.bytedance.com/api/v3/sauc/bigmodel")self.uid = kwargs.get("uid", "test")self.format = kwargs.get("format", "ogg")self.rate = kwargs.get("rate", 16000)self.bits = kwargs.get("bits", 16)self.channel = kwargs.get("channel", 1)self.codec = kwargs.get("codec", "opus")self.auth_method = kwargs.get("auth_method", "none")self.hot_words = kwargs.get("hot_words", None)self.streaming = kwargs.get("streaming", True)self.mp3_seg_size = kwargs.get("mp3_seg_size", 1000)self.req_event = 1
def construct_request(self, reqid, data=None):req = {"user": {"uid": self.uid,},"audio": {'format': self.format,"sample_rate": self.rate,"bits": self.bits,"channel": self.channel,"codec": self.codec,},"request": {"model_name": "bigmodel","enable_punc": True,}}return req