This list contains datasets aimed at both ASR (sometimes called STT) and TTS. Rule of thumb: ASR and TTS are interchangable if done carefully
- AudioMNIST
- spoken digits (0 - 9) by 60 different speakers
- Common Voice
- provide samples for various languages
- FSDD (Free Spoken Digit Dataset)
- spoken digits by 6 speakers
- OpenSLR Datasets
- famous for LibriSpeech and LibriTTS