TTS Volume

Is there any way to increase the volume of TTS?  I tried to make an adjustment in the Engine cfg file under Synth_Prosody_Volume but nothing seems to make a difference.  I tried Loud and X-Loud...just seems like it's still following the default setting.

Nigel Quinnin's picture

Hi There,

The answer depends on which interface you are using to perform the synthesis, and also whether you're using SSML to perform the request.

We support an API interface to our TTS functionality, which allows you to pass either plain text, or SSML as part of the request.

We also support MRCP via our Media Server - this is typically the common method used these days to connect to our products from various platforms. This interface also supports either plain text or SSML to be sent as the request.

If sending plain text, the Content-Type of the MRCP request should be text/plain, as shown here:

    MRCP/2.0 00354 SPEAK 5
    Channel-Identifier: 8AD45B229BDFA07FE287@speechsynth
    Content-Length: 11
    Content-Type: text/plain
    Kill-On-Barge-In: false
    Speech-Language: en-US

    Hello World

and when sending SSML over MRCP, the Content-Type should be application/ssml+xml, as shown here:

    MRCP/2.0 000354 SPEAK 6
    Channel-Identifier: 8AD45B229BDFA07FE287@speechsynth
    Content-Length: 269
    Content-Type: application/ssml+xml
    Kill-On-Barge-In: false
    Speech-Language: en-US

    <?xml version="1.0" encoding="UTF-8"?>
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="Amanda" xml:lang="en-US">
    Hello world, my name is <prosody volume="x-loud">Amanda</prosody>
    </voice>
    </speak>

 

The default TTS configurations in our client_property.conf settings file, including the SYNTH_PROSODY_VOLUME setting are designed to be used whenever plain text is passed for the synthesis request. So, for example, if you were to use our SimpleTTSClient utility to synthesize something - perhaps like this:

SimpleTTSClient -voice amanda -t "Hello World"

...the -t here indicates that it should perform a plain-text synthesis, and therefore use the default settings from the client_property.conf file. Changing these default settings between different calls of SimpleTTSClient should allow you to see these changes.

If, on the other hand, you are using SSML as part of your request, then these default (plain-text) settings are not used - whatever settings are within the SSML request will instead be used.

You can include such markup for specific words or phrases within the SSML if you like, whereas when using plain-text, any settings apply to the entire synthesis. Here is an example of an SSML request that contains the volume setting:

<?xml version="1.0" encoding="UTF-8" ?>
<speak version="1.0" xml:lang="en-US">
<voice name="Amanda" xml:lang="en-US">
Hello world, my name is <prosody volume="x-loud">Amanda</prosody>
</voice>
</speak>

As you can see, the word Amanda will have the x-loud volume attribute assigned to it in this case.

If you wanted to synthesize using SSML from our API instead of from MRCP, you could refer to a saved SSML document that can be accessed somewhere - something like this should work, assuming you saved the above SSML to the specified location:

SimpleTTSClient -voice amanda -s http://192.168.10.10/ssml/Test.ssml

In this case, the -s indicates that the request contains SSML instead of plain text (and therefore the default settings in client_property.conf will not be used)

I hope this answers your question.

Hi Nigel,

Thanks for the information.  We able to make it louder (x-loud) so I hope the customer is pleased.  Enjoy the weekend!

We built into our application a way for the customer to adjust the speed of speaking.  Is there a way to include volume along with speed?  Do we have to use two statements or is there a way to combine them?

We use the following:

TTS prefix <prosody rate="0.9">

TTS suffix <prosody>

If I change the prefix to <prosody rate="0.9"> <prosody volume="x-loud"> I end up with ssml parser errors in my logs.  I think it's a field length issue and since my key developer is incapacitated I don't think it will be addressed anytime soon.  Is there a way to combine the two into one statement for example: <prosody rate="0.9"; volume="x-loud">

Just wondering...still might run into a field length issue.

 

EDIT:  Figured it out.  <prosody rate="0.9" volume="x-loud">  Works perfectly.