Byte's super fast and powerful voice cloner MegaTTS3, the voice clones are almost identical, and can be cloned across languages.

Written by

Caleb Hayes

Updated on:July-03rd-2025

ComfyUI's MegaTTS3 sound cloning node

https://github.com/billwuhao/ComfyUI_MegaTTS3

The sound cloning quality is very high, supports Chinese and English, and can clone across languages.

? renew

[2025-04-06]⚒️: Released v1.0.0.

Install

cd  ComfyUI/custom_nodes
git  clone  https://github.com/billwuhao/ComfyUI_MegaTTS3.git
cd  ComfyUI_MegaTTS3
pip install -r requirements.txt

# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt

Model Download

Models and sounds need to be downloaded manually to ComfyUI\models\TTS Under the path:

[MegaTTS3](https://huggingface.co/ByteDance/MegaTTS3/tree/main) Download the entire folder and put it in TTS Under the folder.

MegaTTS3 New in folder speakers Folder, from [Google drive](https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr) Download All .wav and .npy File, put speakers Under the folder.

The only regret is that you cannot customize the cloned sound. Because the cloned quality is too good, for safety reasons, the official has not released the parameters for custom cloning. However, you can upload the sound application to be cloned (within 24 seconds), the application address is:

https://drive.google.com/drive/folders/1gCWL1y_2xu9nIFhUX_OW5MbcFuB7J5Cl

There are nearly 300 tones now. I have uploaded them to the cloud disk, which can be found at the end of the article.

Acknowledgements

[MegaTTS3](https://github.com/bytedance/MegaTTS3)

- Effect demonstration. The front is the original sound, and the back is the clone:

Reply 250406 in the chat window of the official account to get it.

Plaintext Vision AI Resource Station:
https://aiart.website/
Plaintext Vision GitHub ComfyUI node project:

ComfyUI_MegaTTS3: Byte's super fast and powerful voice cloner, cross-language cloning.
ComfyUI_Prompt-All-In-One: A ComfyUI node that generates prompts for all video, audio, image, and text creation.
ComfyUI_OneButtonPrompt: A node for one-button assisted prompt generation in comfyui (for image and video generation, etc.).
ComfyUI_AudioTools: ComfyUI nodes related to audio processing. Including automatic subtitle addition to video; audio arbitrary time scale cropping; audio volume, speed, pitch, echo processing, etc.; removing silent parts in audio; recording; audio watermark embedding, etc.
ComfyUI_StepAudioTTS: ComfyUI node for Step-Audio-TTS, text-to-speech, can talk, sing, rap, or clone voices.
ComfyUI_SparkTTS: Using Spark-TTS in ComfyUI. Spark-TTS: An efficient LLM-based text-to-speech model that can clone voices in various languages.
ComfyUI_NotaGen: ComfyUI node for NotaGen. Generates classical music and scores at the same time.
ComfyUI_KokoroTTS_MW: Fast text-to-speech node for Kokoro-TTS. Supports 8 languages and 150 voices.
ComfyUI_gemmax: Xiaomi GemmaX translation, ComfyUI nodes in 28 languages.
ComfyUI_EraX-WoW-Turbo: ComfyUI node for ultra-fast multi-language speech recognition. With timestamps.
ComfyUI_DiffRhythm: Quick and easy song generation ComfyUI node.
ComfyUI_CSM: Voice cloning, multi-round dialogue nodes, can change emotions according to the dialogue mood, only supports English.

Plain text vision fairy palace cloud image:
No need for local deployment and high graphics card requirements, play AI directly in the cloud.
https://www.xiangongyun.com/image/detail/a1cb959b-a750-4ce6-9418-3659906955d2?r=I9YXP1
Usage Tutorial: Plain Text Vision Asgard Cloud Mirror Usage Tutorial
LIBLIB AI:
https://www.liblib.art/userpage/53a1edbdf5394aaba7028eff2aaec867