This list contains datasets aimed at both ASR (sometimes called STT) and TTS. Rule of thumb: ASR and TTS are interchangable if done carefully
- AudioMNIST
    
- spoken digits (0 - 9) by 60 different speakers
 
 
- Common Voice
    
- provide samples for various languages
 
 
- FSDD (Free Spoken Digit Dataset)
    
- spoken digits by 6 speakers
 
 
- OpenSLR Datasets
    
- famous for LibriSpeech and LibriTTS