A new star in the field of open source TTS! Dia-1.6B: Ultra-realistic dialogue generation, 6.5K stars in 2 days after being open sourced!

Written by
Iris Vance
Updated on:June-27th-2025
Recommendation

A new breakthrough in text-to-speech technology, Dia-1.6B has quickly become popular for its ability to generate realistic dialogues.

Core content:
1. The main functions and features of Dia-1.6B, including multi-role dialogue generation and anthropomorphic expression
2. Zero-sample voiceprint cloning and high-quality speech synthesis, fast real-time inference speed
3. Quick start guide, from installation to use, including official Demo experience and Python call examples

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

 

The field of text-to-speech (TTS) has welcomed a new star!

Depend on Nari LabsThe Dia-1.6B  developed by the company   has sparked heated discussions with its ultra-realistic dialogue generation capabilities, and has gained a lot of popularity on GitHub after only two days of being open source. 6.5K+ Star!

It is said that its capabilities surpass those of ElevenLabs and Sesame, and that it can achieve emotion control, non-verbal sounds (such as laughter and coughing), and zero-sample voiceprint cloning with only 1.6B parameters, with amazing operating efficiency.

It supports generating multi-role dialogues from text scripts, distinguishing roles through [S1] and [S2] tags, generating natural speech, and supporting non-verbal expressions and voiceprint cloning. Currently, it is limited to English.

It also provides model weights and Gradio Demo experience in Hugging Face.

Key Features

  • •  Multi-role dialogue generation : Use labels such as [S1], [S2] to distinguish roles and generate multi-role dialogues in a single shot, maintaining natural rhythm and emotional transitions.
  • •  Anthropomorphic expressions : support non-verbal emotions such as laughter, sighs, coughs, etc.
  • •  Zero-sample voiceprint cloning : fine-tune or specify voice style to clone user or character voices
  • •  High-quality speech synthesis : sound quality comparable to ElevenLabs and Sesame, with natural details and realistic emotional changes
  • •  Real-time inference speed : about 40 tokens/s on A4000 graphics card, smooth experience without waiting
  • •  Gradio interface support : comes with a usable Web UI interface, enter text and listen immediately

Get started quickly

The Dia-1.6B official lab provides a detailed installation guide and Gradio demo.

Online Experience:

No need to configure the environment, just open Hugging Face Demo and enter the script or audio to audition the effect:

Demo: https://huggingface.co/spaces/nari-labs/Dia-1.6B

Installation, deployment and usage steps:

1. Clone the project

git  clone  https://github.com/nari-labs/dia.git
cd  dia

2. Create a virtual environment

python -m venv .venv
source  .venv/bin/activate

3. Installation dependencies

pip install -e .

4. Start Gradio UI

python app.py

access http://localhost:7860, enter a script or upload audio to generate a dialogue.

Sample script:

[S1] Dia is amazing! [S2] Yeah, it generates laughs too! (laughs)

You can also install the Dia API via the Python package

# Install directly from GitHub
pip install git+https://github.com/nari-labs/dia.git

Python call example:

import soundfile as sf

from dia.model import Dia


model = Dia.from_pretrained( "nari-labs/Dia-1.6B" )

text =  "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."

output = model.generate(text)

sf.write( "simple.mp3" , output, 44100)

Pypi packages and CLI tools will also be released later.

Recommended usage scenarios

  • •  Audiobooks/novel broadcasts : Let different characters “speak” and use emotional words to restore the real context
  • •  Podcast dubbing : quickly synthesize emotional and stylish interview voices
  • •  AI role-playing : Cooperate with Agent, multi-role simulation dialogue system
  • •  TTS research and fine-tuning : voice cloning, emotion control, non-verbal expression