This list contains datasets aimed at both ASR (sometimes called STT) and TTS. Rule of thumb: ASR and TTS are interchangable if done carefully


  • AudioMNIST
    • spoken digits (0 - 9) by 60 different speakers