Model | GitHub
Flow-matching based Japanese TTS model (500M parameters). Generates speech from text using rectified flow over DACVAE latents.
- Reference audio: Optional. Upload to condition the speaker voice. Leave blank for unconditional generation.
- Generates up to 30 seconds of audio, automatically trimmed to content length.