This is currently a work in progress and the first step of a bigger audio-visual project. For now, I added to this repository a simple Streamlit app to preview your audio tracks from a specific folder ...
Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...