Tested https://huggingface.co/kyutai/stt-1b-en_fr model on some diverse data. Accuracy is on the lower side.
CMUKids WER is 11.3 for example compared to 4.8 for parakeet-tdt-0.6b-v2. Librispeech test-clean WER is 4+ too.
Output sometimes Chinese, sometimes Arabic.
>>Click here to continue<<
