A new star in the field of open source TTS! Dia-1.6B: Ultra-realistic dialogue generation, 6.5K stars in 2 days after being open sourced!

A new breakthrough in text-to-speech technology, Dia-1.6B has quickly become popular for its ability to generate realistic dialogues.
Core content:
1. The main functions and features of Dia-1.6B, including multi-role dialogue generation and anthropomorphic expression
2. Zero-sample voiceprint cloning and high-quality speech synthesis, fast real-time inference speed
3. Quick start guide, from installation to use, including official Demo experience and Python call examples
The field of text-to-speech (TTS) has welcomed a new star!
Depend on Nari Labs
The Dia-1.6B developed by the company has sparked heated discussions with its ultra-realistic dialogue generation capabilities, and has gained a lot of popularity on GitHub after only two days of being open source. 6.5K+
Star!
It is said that its capabilities surpass those of ElevenLabs and Sesame, and that it can achieve emotion control, non-verbal sounds (such as laughter and coughing), and zero-sample voiceprint cloning with only 1.6B parameters, with amazing operating efficiency.
It supports generating multi-role dialogues from text scripts, distinguishing roles through [S1] and [S2] tags, generating natural speech, and supporting non-verbal expressions and voiceprint cloning. Currently, it is limited to English.
It also provides model weights and Gradio Demo experience in Hugging Face.
Key Features
• Multi-role dialogue generation : Use labels such as [S1], [S2] to distinguish roles and generate multi-role dialogues in a single shot, maintaining natural rhythm and emotional transitions. • Anthropomorphic expressions : support non-verbal emotions such as laughter, sighs, coughs, etc. • Zero-sample voiceprint cloning : fine-tune or specify voice style to clone user or character voices • High-quality speech synthesis : sound quality comparable to ElevenLabs and Sesame, with natural details and realistic emotional changes • Real-time inference speed : about 40 tokens/s on A4000 graphics card, smooth experience without waiting • Gradio interface support : comes with a usable Web UI interface, enter text and listen immediately
Get started quickly
The Dia-1.6B official lab provides a detailed installation guide and Gradio demo.
Online Experience:
No need to configure the environment, just open Hugging Face Demo and enter the script or audio to audition the effect:
Demo: https://huggingface.co/spaces/nari-labs/Dia-1.6B
Installation, deployment and usage steps:
1. Clone the project
git clone https://github.com/nari-labs/dia.git
cd dia
2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate
3. Installation dependencies
pip install -e .
4. Start Gradio UI
python app.py
access http://localhost:7860
, enter a script or upload audio to generate a dialogue.
Sample script:
[S1] Dia is amazing! [S2] Yeah, it generates laughs too! (laughs)
You can also install the Dia API via the Python package
# Install directly from GitHub
pip install git+https://github.com/nari-labs/dia.git
Python call example:
import soundfile as sf
from dia.model import Dia
model = Dia.from_pretrained( "nari-labs/Dia-1.6B" )
text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."
output = model.generate(text)
sf.write( "simple.mp3" , output, 44100)
Pypi packages and CLI tools will also be released later.
Recommended usage scenarios
• Audiobooks/novel broadcasts : Let different characters “speak” and use emotional words to restore the real context • Podcast dubbing : quickly synthesize emotional and stylish interview voices • AI role-playing : Cooperate with Agent, multi-role simulation dialogue system • TTS research and fine-tuning : voice cloning, emotion control, non-verbal expression