Byte's super fast and powerful voice cloner MegaTTS3, the voice clones are almost identical, and can be cloned across languages.

Explore ByteDance's MegaTTS3 technology and experience the magical journey of high-quality voice cloning.
Core content:
1. Introduction to MegaTTS3 voice cloning technology and its cross-language cloning capabilities
2. Installation and model download guide, as well as specific steps for voice cloning
3. Voice cloning safety considerations and official voice library resource sharing
ComfyUI's MegaTTS3 sound cloning node
https://github.com/billwuhao/ComfyUI_MegaTTS3The sound cloning quality is very high, supports Chinese and English, and can clone across languages.
? renew
[2025-04-06]⚒️: Released v1.0.0.
Install
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_MegaTTS3.git
cd ComfyUI_MegaTTS3
pip install -r requirements.txt
# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt
Model Download
ComfyUI\models\TTS
Under the path:[MegaTTS3](https://huggingface.co/ByteDance/MegaTTS3/tree/main)
Download the entire folder and put it in TTS
Under the folder.
MegaTTS3
New in folder speakers
Folder, from [Google drive](https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr)
Download All .wav
and .npy
File, put speakers
Under the folder.
[MegaTTS3] (https://github.com/bytedance/MegaTTS3)
- Effect demonstration. The front is the original sound, and the back is the clone:
01
02
03
04
05
Reply 250406 in the chat window of the official account to get it.
Plaintext Vision AI Resource Station:
https://aiart.website/
Plaintext Vision GitHub ComfyUI node project:
ComfyUI_MegaTTS3: Byte's super fast and powerful voice cloner, cross-language cloning. ComfyUI_Prompt-All-In-One: A ComfyUI node that generates prompts for all video, audio, image, and text creation. ComfyUI_OneButtonPrompt: A node for one-button assisted prompt generation in comfyui (for image and video generation, etc.). ComfyUI_AudioTools: ComfyUI nodes related to audio processing. Including automatic subtitle addition to video; audio arbitrary time scale cropping; audio volume, speed, pitch, echo processing, etc.; removing silent parts in audio; recording; audio watermark embedding, etc. ComfyUI_StepAudioTTS: ComfyUI node for Step-Audio-TTS, text-to-speech, can talk, sing, rap, or clone voices. ComfyUI_SparkTTS: Using Spark-TTS in ComfyUI. Spark-TTS: An efficient LLM-based text-to-speech model that can clone voices in various languages. ComfyUI_NotaGen: ComfyUI node for NotaGen. Generates classical music and scores at the same time. ComfyUI_KokoroTTS_MW: Fast text-to-speech node for Kokoro-TTS. Supports 8 languages and 150 voices. ComfyUI_gemmax: Xiaomi GemmaX translation, ComfyUI nodes in 28 languages. ComfyUI_EraX-WoW-Turbo: ComfyUI node for ultra-fast multi-language speech recognition. With timestamps. ComfyUI_DiffRhythm: Quick and easy song generation ComfyUI node. ComfyUI_CSM: Voice cloning, multi-round dialogue nodes, can change emotions according to the dialogue mood, only supports English. Plain text vision fairy palace cloud image:
No need for local deployment and high graphics card requirements, play AI directly in the cloud.
https://www.xiangongyun.com/image/detail/a1cb959b-a750-4ce6-9418-3659906955d2?r=I9YXP1
Usage Tutorial: Plain Text Vision Asgard Cloud Mirror Usage Tutorial
LIBLIB AI:
https://www.liblib.art/userpage/53a1edbdf5394aaba7028eff2aaec867