Hi,
If you use the default Microsoft TTS voices, you need to enter the full <speak> root tag into the Text field when generating audio. Please take a look at this topic: Pause text-to-speech
For the external voices, such as Amazon Polly, you can use SSML without the <speak> tag.
Regards,