Conversational AI agent, a ray of light that brings fairy tales into reality!
Updated on:July-09th-2025
Recommendation
The fairy tale world is no longer far away, and AI smart toys make companionship more intimate.
Core content:
1. How AI technology transforms traditional toys into interactive partners
2. The expansion of the AI toy market size and application scenarios
3. The potential and challenges of AI toys in emotional companionship and educational value
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
- Fairy tales become reality, and toys can also accompany growth
When we were young, the fairy tale world we saw on TV was so vivid and lively. Plants, animals, vehicles and all things in the world could play and chat , teach us knowledge and accompany us as we grow .
As we grow up, we realize that the "fairy tale world" is just a beautiful wish, and growing up makes us feel more lonely. We often think, how wonderful it would be if fairy tales could become reality.
AI pet Moflin, FoloToy AI Fire Rabbit, BubblePal, and Talking Tom AI children's companion robots. With the help of AI, these "toys" have changed from " mechanical interaction " functions that can only emit light, make sounds, and move to " cognitive interaction " playmates that can understand and recognize complex language commands and images , and give users real-time emotional feedback .Relevant data show that the AI toy market size reached US$18.1 billion in 2024, and it is expected that by 2033, the global market size will grow to US$60 billion.Among them, domestic places such as Dongguan and Shantou in Guangdong have become important bases for the development of AI toys by virtue of their complete toy manufacturing supply chain advantages.As the forms and application scenarios of AI toys expand, the demand for them has also shifted from children's entertainment to stress relief and emotional companionship .According to relevant survey data, urban white-collar workers aged 25-35 have become the core consumer force of AI toys. For example, AI therapeutic toys specifically for elderly people living alone have appeared in the Japanese market.Dialogue and storytelling, "emotional value" is maximized
Compared with the mechanized interactive methods of traditional toys, AI toys can bring users a new intelligent interactive experience , personalized companionship value and all-round knowledge learning and expansion functions.In terms of intelligent interaction, AI toys can quickly realize natural and smooth conversations with users based on excellent speech recognition, natural language processing and other technologies. At the same time, with the popularization of reinforcement learning and large models, AI toys can also learn users' daily conversation habits, interest preferences, and historical conversation content to output content that users "like to listen to", and even extend and expand the conversation.- Image source: Baidu AI Image Assistant
For example, story-telling AI toys can create new story series based on user preferences while outputting theme stories to satisfy users’ curiosity in exploring novel stories.In addition, as AI toys become smarter, their value in replacing real people is also gradually increasing. In particular, the value of AI toys in some companionship scenarios is further amplified.As we all know, real-life companionship is usually limited by time and space . At the same time, real-life emotions fluctuate more. For example, when they are tired or irritable , they will unconsciously show problems such as lack of patience and bad attitude .AI toys only need to be connected to the Internet and powered on to wake up at any time and start a dialogue and chat with the user. At the same time, AI toys can always remain patient and gentle , and can respond stably no matter how many questions the user asks or how many times they repeat , giving the user continuous and stable emotional feedback.However, AI toys are also facing many problems while developing rapidly. When AI recognition technology is immature, if the user's pronunciation and expression are not clear, the model will have difficulty in accurately recognizing the command, resulting in misjudgment.As well as AI toys that provide companionship to children, they need to output targeted and valuable interactive content as children grow older , and note that after a certain amount of training, they may output some content that is not suitable for children.In addition, AI toys' single language communication is not interesting and the output time is long, which may reduce their appeal to users.AI toys, the “magic wand” of conversational voice AI
To address the above issues, NetEase Cloud Music launched an embedded conversational voice AI agent that not only enables AI toys to speak , but also can accurately solve a variety of problems such as recognition, interaction, topic expansion, content security, customized tone, power consumption, and chip adaptation .The biggest difference between AI toys and traditional toys is "interaction", and the current mainstream interaction methods are mainly "video" and "audio".Then, we can deconstruct the interaction process into three parts: input, recognition , and response :Traditional hardware dialogue methods often wake up the hardware dialogue through keystrokes, keyword triggers , etc., and respond to users through turn-based dialogue interactions. It is impossible to quickly proceed to the next round of dialogue before the previous round of dialogue ends.The embedded conversational voice AI interaction provided by NetEase Cloud Music can now respond quickly and interrupt intelligently , which is closer to a real-life conversation experience.Traditional speech recognition technology mainly uses thesaurus and NLP to perform semantic understanding and triggering, while NetEase Cloud Music integrates "LLM+ASR+TTS", which can not only better identify the speaker's intentions and even emotions , but also respond with more matching content.Conversational responses need to be viewed from both the content and emotion perspectives. Traditional hardware responses focus on a single scenario and rely on the design of the knowledge base or the ability to search online.By introducing a large model, NetEase Cloud Music can arrange multiple intelligent agents in series and define language styles and characteristics to achieve high-emotional intelligence responses in multiple scenarios. At the same time, it can also superimpose rich tone configurations to maximize the emotional value of the conversation.- Image source: Baidu AI Image Assistant
In terms of specific implementation, NetEase Cloud has further coupled fusion communication technology with AI technology, allowing AI toy manufacturers to complete the functional iteration of intelligent dialogue at the fastest speed:- Real-time response, low-latency call experience
The audio and video communication base built based on the accumulation of converged communication technology can achieve an end-to-end call experience with a delay of less than 1 second in various hardware environments.- Intelligent interruption, efficient call experience
Based on the human voice detection capability and in conjunction with AI task workflows, it intelligently recognizes human voice input and reduces the waiting time for long text playback.- Intelligent agent construction
Based on the experience of debugging the effects of large model content output and the results of intelligent agent design in some scenarios, combined with basic model supporting capabilities such as functioncall and RAG, it helps developers build more interesting intelligent agents.In addition, NetEase Cloud is also continuously optimizing its support for basic supporting capabilities such as low-power hardware adaptation support, global access link optimization, and audio and video encoding and decoding performance improvement, providing reliable support for low-cost access and global expansion of AI toys.Currently, NetEase Cloud's embedded conversational voice AI agent already supports scenarios such as AI toys, AI speakers, AI TVs, conference assistants, and home voice interactions .
NetEase Cloud has always believed that the combination of AI and real-time audio and video call technology will drive the transition of hardware from " being able to speak " to " being able to speak ".
Of course, we also always insist that technological progress will always serve people’s core needs and a better life.
Under the premise of the rapid development of AI toys and AI companion software and hardware, people are encouraged to enjoy the company of family and friends and the beautiful sunshine of nature.