Lexicon for TTS in AP

Vit · July 25, 2022, 7:15pm

Hello,

Me and my colleagues would love AP’s TTS (Text to Speech) to have a custom definable lexicon feature. By lexicon I mean a file allowing users to specify how certain text formations (words) should be interpreted by the TTS feature. Basically I am joining the previous request mentioned here: https://talk.atomisystems.com/t/applying-amazon-polly-lexicons/5205 I would just need it to be working for the MS Azure voices as well.

Reasoning behind the request:
It often happens that the company’s name, name of the product, or certain abbreviation is quite unordinary which means that it is not red accordingly by the TTS software. This causes a need for manual editing every time such word is used in the captions which is unpleasant, tedious and increases the chance of human caused error in the project. Trust me that it is really embarrassing when the name of the company you work for is misspelled in your own video ;-).

In real-life work scenario after I finish making a video in AP I send a ClosedCaptions(.srt) file together with the video to a language check to an English native speaker. They send me back the .srt file with a suggestion for changes I either accept or decline the suggestions, then import the .srt file back into the AP do an Batch Operation to Convert CC to Audio. After that I need to go through the whole project to edit the pronunciation of specific words and sentences manually by adding SSML tags like this:

<speak version=“1.0” xmlns=“W3C Speech Synthesis namespace” xml:lang=“en-US”>
<voice name=“en-US-JennyNeural”>
<phoneme alphabet=“ipa” ph=“təˈmeɪtoʊ”> tomato </phoneme>
</voice>
</speak>

At this point, when the SSML tags are in place, I can no longer make any extensive changes to the project, because exporting and importing the.srt file (sending it for language check) would overwrite the content of the SSML with the caption text.

Those are the reasons why I strongly believe that the Lexicon feature is necessary for any software using TTS and why I and my colleagues would love to have it in the ActivePresenter as well.

Here is some info on the lexicon and its functionality on the MS Azure side:

https://docs.microsoft.com/cs-cz/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=csharp#use-custom-lexicon-to-improve-pronunciation

namnt · July 26, 2022, 6:51am

Hi Vit,

Thank you for your suggestion and detailed explanation. We will try adding more options for TTS in ActivePresenter (volume, speed, pitch, lexicon…) in future releases.

Regards,

Vit · July 27, 2023, 9:39am

Long time no see, It has been a year, any progress here?

Having lexicon would be useful for anyone working with TTS. We still have to use third party tool and insert audio manually, because it is more convenient than defining and editing SSML tags on every slide where our company name is being mentioned . Here is a sample how such lexicon can look like:

Hang · July 28, 2023, 3:47am

Hi Vit,

Currently, this feature has been included in our roadmap. However, due to higher-priority tasks, it has not been implemented yet.

Best regards,