once again I have a special question. We produce training videos for our global colleagues. We speak English, but since we are mostly German, neither the pronunciation nor the grammar is perfect. Now I would like to improve the already created videos linguistically. I transcribe the spoken text and improve it in writing. From this, a SRT file is created, which comes under the video. So far so good.
Now I haven’t found a single AI that transcribes an SRT into speech. The text is pronounced well, but the time synchronization is not correct. The time deviations are up to 2 min for a 15 min video. You can’t use that.
It’s hard to believe, but the only program i found that offers this is Active Presenter about using other services. The imported SRT file will exactly on time changed to speech. But…the ones offered here Polly, Azure and Google have a horrible sound. Would it be possible to include a freely configurable service? For example, I found this service that also provides API integration. Text To Speech API - Developers APIs Documents. The mp3 output is here quite good.
It would be great if something like this is also possible.
Thanks a lot.