Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

hey! I am building a translation app for a side project. Is there any chance you could share how you're building the voice-to-voice?

My strategy is to separate everything STT, translation, and TTS, instead of building 1 model (that constantly needs to be trained).

But the problem I am running into is there arent' any great STT or TTS models. Either they only support the top 10 languages, they are huge (whisper-v3-turbo), or non-commercial license (fb's tts/mms models).

Are you training your own models? Just targeting what languages you need? planning on running in the cloud?

My email is in my profile bio if you want to email/chat!



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: