Irodori-TTS-500M-v2 Demo

Model | GitHub

Flow-matching based Japanese TTS model (500M parameters). Generates speech from text using rectified flow over DACVAE latents.

  • Reference audio: Optional. Upload to condition the speaker voice. Leave blank for unconditional generation.
  • Generates up to 30 seconds of audio, automatically trimmed to content length.
1 120
1 32
CFG Guidance Mode
0 10
0 10