Woter AI detection.Hurry - ends Jul 27th

New Year Sales :up to 80% OFF

AI Humanize AI Translator Bypass AI AI Rewriter AI Detector

PRICING

TRY FOR FREE

Voice Agent open source framework TEN allows your AI Agent to listen and speak!

Written by

Jasper Cole

Updated on:July-10th-2025

Building a Voice Agent is like putting an elephant into a refrigerator. It seems simple with only three steps:

1) Select LLM/STT/TTS large model

2) Connect to WebRTC or WebSockets for real-time transmission

3) Adjust parameter encapsulation

However, in actual use, there are many difficulties:

"?The echo is too loud, there is too much noise", "The voices are too mixed to be heard clearly?"

"Is artificial intelligence like a mentally retarded person who can't even be interrupted when someone is talking?"

"The delay is too high and the response is slow?", "There is a new model and I have to reconnect it?"

"The three-stage project looks simple, but is it too difficult to implement?"

"Real-time transmission of multimodal data is too troublesome and difficult to handle?"

“ Why is the CPU consumption so high?! ?

Thus, the conversational Voice Agent open source framework - TEN Framework came into being!

TEN solves the problems of complex multimodal data transmission and high latency in the process of building Voice Agent, and modularizes and freely calls models such as LLM, STT, and TTS, which reduces engineering problems for developers during implementation, allows them to focus more on scenarios and business content, quickly complete product implementation and verification, and can be truly used in actual production.

So, what is TEN?

TEN is a real-time conversational Voice Agent engine that can help developers quickly build AI Agents that can interact with audio and video.

Currently, it supports major global STT, LLM, and TTS manufacturers including Deepseek, OpenAI, Gemini, etc.

At the same time, TEN can support access to dify and Coze . You only need to configure the bot ID/API to make your bot speak.

What are the advantages of TEN?

1. Support multi-modal transmission: can meet the input and output of voice, text and images

Supports voice, text, image and other data transmission, giving full play to the multi-modal advantages
Supports both cascade mode (STT-LLM-TTS) and end-to-end mode (End to End) to create audio and video interaction

2. Low latency and interruptibility: Built-in optimized real-time communication capabilities provide low latency and interruptibility for interactive experiences

Built-in RTC to solve the delay problem during voice interaction. The Agent built based on TEN Framework optimizes the delay to only 650ms under the best conditions.
With built-in VAD, you can interrupt and restore the real conversation at any time during the communication with AI voice

3. Rich plug-ins and flexible arrangement: support access to global mainstream STT, LLM and TTS for quick use

Already supports the world's mainstream STT, LLM, TTS and other plug-ins, just configure the key
Keep up with the latest technology and complete access to OpenAI Realtime API and Gemini 2.0 within 24 hours

4. Multi-language, cross-platform: Supports mainstream languages, and Agent can be seamlessly connected across platforms

Supports various programming languages such as C++/Go/Python/Node.JS (JavaScript will be supported soon)
Support cross-platform use of Agent on Windows/Mac/Linux/mobile terminals, etc.

? What can you do with TEN?

1. TEN + SIP: AI Outbound Call Center

AI outbound call center, such as: corporate customer service/outbound call center/professional consulting...

Let customers call your customized AI Agent experts!

The demo shows a psychological counseling expert. You can see that the Agent's tone lowered when he heard "I" say I was in a bad mood. Voice is more suitable than text in this scenario.

2. TEN + Hardware: Smart Toys

Story machine/smart speaker/AI toys/smart home......

ESP 32 is now supported. You can have a low-latency, interruptible conversation directly with ESP 32 and let it tell you a story.

3. TEN + Digital Human: Virtual Companionship

TEN currently supports Trulience avatars, which can be your AI shopping guide/virtual pet/AI game companion...

You can let the puppy switch dialects and communicate with you by voice;

You can also play chess with AI, controlling it with your mouth, freeing your hands.

4. TEN + Computer Use: Voice control of computer

Natural language interface (LUI) will become more and more integrated into our lives.

Use voice to open browsers, computer apps, memos... You can also use TEN to create your own "Jarvis".

5. TEN + Games: AI game companion

Voice script of Murder on the Orient Express.

Chat with NPCs about what they were doing when the case happened. It is an immersive experience and you can play the script-killing game alone.

6. TEN + Gemini 2.0: Visible Personal Assistant

When using the Gemini 2.0 model, TEN can not only hear, but also see!

When sharing pictures with TEN via webcam/screen sharing, he can not only accurately identify the kitten's color, but also the specific breed! ?

7. TEN + Storytelling Machine that can speak and draw

TEN provides Storyteller as a usecase, with a built-in text-image model plug-in that can guide users to complete a story together and generate wonderful supporting images!

How to use TEN?

If you are a beginner and want to learn how to use TEN Agent step by step, please refer to the tutorial by YouTube blogger Developer Digest?

The following video is from Xiaohongshu blogger @T8.star ?

If you already have a basic understanding of TEN, you are also welcome to try the latest virtual person TEN + Trulience?

Finally, if you are interested in TEN, you are welcome to star the project, support it and follow the latest developments!