## ASR - [ ] ASR2K: Speech Recognition for Around 2000 Languages without Audio https://arxiv.org/abs/2209.02842 - [x] Whisper: Whisper is a general-purpose speech recognition model. https://github.com/openai/whisper **Dataset** - [x] FLEURS: Fleurs is the speech version of the [FLoRes machine translation benchmark](https://arxiv.org/abs/2106.03193) https://huggingface.co/datasets/google/fleurs ## LM - [x] mLUKE https://huggingface.co/studio-ousia/mluke-base ## Speech - [x] CharsiuG2P: Multilingual G2P in 100 languages https://github.com/lingjzhu/CharsiuG2P ## Text corpus - [x] Multilingual Open Text (MOT) https://github.com/bltlab/mot/ - [x] Thai depression detection dataset and baseline models https://zenodo.org/record/4734552 ## Coreference resolution - [x] 🪿 Han-Coref: Thai Coreference resolution by PyThaiNLP [GitHub](https://github.com/PyThaiNLP/han-coref)
ASR
Dataset
LM
Speech
Text corpus
Coreference resolution