Field | Type | Label | Description |
standard_audio_format | AudioFormat.StandardAudioFormat | Standard audio format |
|
sample_rate_hertz | OptionalInt32 | Sample rate in Hertz of the audio data This field is mandatory for RAW PCM audio format. It's optional for the other formats. For audio formats with headers, this value will be ignored, and instead the value from the file header will be used. Default: 8000 (8 KHz) |
Specification for the audio format
Not all standard formats are supported in all cases. Different operations
may natively handle a subset of the total audio formats.
Name | Number | Description |
STANDARD_AUDIO_FORMAT_UNSPECIFIED | 0 | |
STANDARD_AUDIO_FORMAT_LINEAR16 | 1 | Uncompressed 16-bit signed little-endian samples (Linear PCM). |
STANDARD_AUDIO_FORMAT_ULAW | 2 | 8-bit audio samples using G.711 PCMU/mu-law. |
STANDARD_AUDIO_FORMAT_ALAW | 3 | 8-bit audio samples using G.711 PCMA/a-law. |
STANDARD_AUDIO_FORMAT_WAV | 4 | WAV formatted audio |
STANDARD_AUDIO_FORMAT_FLAC | 5 | FLAC formatted audio |
STANDARD_AUDIO_FORMAT_MP3 | 6 | MP3 formatted audio |
STANDARD_AUDIO_FORMAT_OPUS | 7 | OPUS formatted audio |
STANDARD_AUDIO_FORMAT_M4A | 8 | M4A formatted audio |
STANDARD_AUDIO_FORMAT_MP4 | 9 | Audio packed into MP4 container |
STANDARD_AUDIO_FORMAT_NO_AUDIO_RESOURCE | 100 | Explicitly indicate that no audio resource should be allocated |
Field | Type | Label | Description |
audio_id | string | Id of the audio requested (Note that this could be session_id to request the inbound audio resource) |
|
audio_channel | OptionalInt32 | For multi-channel audio, this is the channel number being referenced. Range is from 0 to N. Default channel 0 will be used if not specified |
|
audio_start | OptionalInt32 | Number of milliseconds from the beginning of the audio to return. Default is from the beginning |
|
audio_length | OptionalInt32 | Maximum number of milliseconds to return. A zero value returns all available audio (from requested start point). Default is all audio, from start point |
Field | Type | Label | Description |
audio_data | bytes | Binary audio data that was requested |
|
audio_channel | OptionalInt32 | For multi-channel audio, this is the channel number being referenced. |
|
final_data_chunk | bool | In case of large audio, data will be split and there will be multiple AudioPullResponse messages. final_data_chunk field is set to true for the last message |
Note that SessionInboundAudioFormatRequest should be called before
using this message, so that the audio format is defined
Field | Type | Label | Description |
audio_data | bytes | Binary audio data to be added to the audio resource |
Field | Type | Label | Description |
audio_push | AudioPushRequest | Streamed binary audio data to be added to the session audio resource |
|
audio_pull | AudioPullRequest | Returns a block of audio data from an audio resource. |
Field | Type | Label | Description |
interaction_id | string | ASR interaction to associate this dtmf_key with |
|
dtmf_key | string | DTMF key press to be added to interaction stream for processing. Valid keys are 0-9, A-F, *, # |
Event can be either a VadEvent or a SessionEvent
Field | Type | Label | Description |
vad_event | VadEvent | Event returned form Vad (AudioManager) |
|
session_event | SessionEvent | Session Events used to report errors to the API user |
Field | Type | Label | Description |
grammar_url | string | A grammar URL to be loaded |
|
inline_grammar_text | string | A string containing the raw grammar text |
|
global_grammar_label | string | Deprecated. Reference to a previously defined "global" grammar Note: label must consist of letters, digits, hyphens, underscores only |
|
session_grammar_label | string | Reference to a previously defined "session" grammar Note: label must consist of letters, digits, hyphens, underscores only |
|
builtin_voice_grammar | Grammar.BuiltinGrammar | Reference to a "builtin" voice grammar |
|
builtin_dtmf_grammar | Grammar.BuiltinGrammar | Reference to a "builtin" DTMF grammar |
|
label | OptionalString | Optional label assigned to grammar, used for error reporting Note: label must consist of letters, digits, hyphens, underscores only |
Name | Option |
global_grammar_label | true |
a single event with timestamp to be logged to the database
the LogEvent will be returned via reporting api
Field | Type | Label | Description |
time_stamp | google.protobuf.Timestamp | Log Event Timestamp (UTC) |
|
event | Event | can be either a VadEvent or a SessionEvent |
Field | Type | Label | Description |
phrase_list_label | string | The label of a previously defined global phrase list |
Field | Type | Label | Description |
interaction_id | OptionalString | Optional interaction object being referenced |
|
status_message | google.rpc.Status | String containing event information |
Message used to signal events over the course of Voice Activity Detection
processing.
The audio_offset will signify at what point within the session audio
resource the event occurred.
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
|
vad_event_type | VadEvent.VadEventType | The type of event this message represents |
|
audio_offset | OptionalInt32 | The offset in milliseconds from the beginning of the audio resource that this event occurred |
Note that all builtin grammars are language-specific
Name | Number | Description |
BUILTIN_GRAMMAR_UNSPECIFIED | 0 | Undefined built-in grammar |
BUILTIN_GRAMMAR_BOOLEAN | 1 | "yes" => true |
BUILTIN_GRAMMAR_CURRENCY | 2 | "one dollar ninety seven" => USD1.97 |
BUILTIN_GRAMMAR_DATE | 3 | "march sixteenth nineteen seventy nine" => 19790316 |
BUILTIN_GRAMMAR_DIGITS | 4 | "one two three four" => 1234 |
BUILTIN_GRAMMAR_NUMBER | 5 | "three point one four one five nine two six" => 3.1415926 |
BUILTIN_GRAMMAR_PHONE | 6 | "eight five eight seven oh seven oh seven oh seven" => 8587070707 |
BUILTIN_GRAMMAR_TIME | 7 | "six o clock" => 0600 |
List of all grammar modes.
Name | Number | Description |
GRAMMAR_MODE_UNSPECIFIED | 0 | Mode not specified |
GRAMMAR_MODE_VOICE | 1 | Voice mode |
GRAMMAR_MODE_DTMF | 2 | DTMF mode |
GRAMMAR_MODE_VOICE_AND_DTMF | 3 | Voice and DTMF mode Deprecated - should not be used |
List of all Interaction statuses.
Name | Number | Description |
INTERACTION_STATUS_UNSPECIFIED | 0 | This status is not expected or valid to happen. Indicating empty message. |
INTERACTION_STATUS_CREATED | 1 | Interaction is in created only state, no additional processing is done yet. |
INTERACTION_STATUS_RESULTS_READY | 2 | Interaction results are ready. Most results are sent automatically when ready. |
INTERACTION_STATUS_CLOSED | 3 | Used to indicated successfully closed interaction state |
INTERACTION_STATUS_CANCELED | 4 | Used to indicated successfully canceled interaction state |
INTERACTION_STATUS_ASR_WAITING_ON_GRAMMARS | 101 | Audio processing not started yet. Waiting on grammars to be loaded. |
INTERACTION_STATUS_ASR_WAITING_ON_BARGIN | 102 | Audio processing not started yet. Waiting on BARGE_IN event from VAD |
INTERACTION_STATUS_ASR_STREAM_REQUEST | 103 | Initial status or post BARGE_IN status of interaction, stream processing not started yet |
INTERACTION_STATUS_ASR_STOP_REQUESTED_WAITING | 104 | Batch mode, waiting for STOP request |
INTERACTION_STATUS_ASR_STREAM_STARTED | 105 | ASR started reading stream |
INTERACTION_STATUS_ASR_STREAM_STOP_REQUESTED | 106 | Set in case of Finalize request |
INTERACTION_STATUS_ASR_WAITING_FOR_CPA_AMD_RESPONSE | 107 | Used for CPA and AMD interactions |
INTERACTION_STATUS_ASR_TIMEOUT | 109 | No VAD event or interaction finalize, ASR processing timed out |
INTERACTION_STATUS_ASR_WAITING_ON_BARGEOUT | 110 | Audio processing started. Waiting on BARGE_OUT event from VAD |
INTERACTION_STATUS_TTS_PROCESSING | 200 | TTS processing |
INTERACTION_STATUS_GRAMMAR_PARSE_WAITING_ON_GRAMMARS | 400 | Grammar(s) loading in progress, interaction not started yet |
INTERACTION_STATUS_GRAMMAR_PARSE_REQUESTED_PROCESSING | 401 | Interaction processing in progress |
INTERACTION_STATUS_NORMALIZE_TEXT_REQUESTED_PROCESSING | 500 | Normalize Text |
INTERACTION_STATUS_ASR_TRANSCRIPTION_WAITING_ON_PHRASE_LISTS | 600 | Asr Transcription |
List of all interaction sub-types for ASR Interactions
Name | Number | Description |
INTERACTION_SUB_TYPE_UNSPECIFIED | 0 | This is not valid type. Indicating empty gRPC message. |
INTERACTION_SUB_TYPE_GRAMMAR_BASED_CPA | 1 | Call process analysis interaction type with grammars |
INTERACTION_SUB_TYPE_GRAMMAR_BASED_AMD | 2 | Answering machine detection interaction type with grammars |
INTERACTION_SUB_TYPE_ENHANCED_TRANSCRIPTION | 3 | ASR transcription interaction with multiple grammars |
INTERACTION_SUB_TYPE_CONTINUOUS_TRANSCRIPTION | 4 | Deprecated - ASR continuous transcription |
INTERACTION_SUB_TYPE_TRANSCRIPTION_WITH_NORMALIZATION | 5 | Deprecated - Transcription result with normalized text Normalization can be enabled for different interaction types/subtypes in parallel, e.g. GRAMMAR_BASED_TRANSCRIPTION can have normalization setting as well. If needed for filtering, this flag will be added separately |
INTERACTION_SUB_TYPE_GRAMMAR_BASED_TRANSCRIPTION | 6 | Transcription interaction type with grammars |
List of all Interaction types.
Name | Number | Description |
INTERACTION_TYPE_UNSPECIFIED | 0 | This is not valid type. Indicating empty gRPC message. |
INTERACTION_TYPE_ASR | 2 | ASR processing interaction |
INTERACTION_TYPE_TTS | 3 | TTS processing interaction |
INTERACTION_TYPE_GRAMMAR_PARSE | 4 | Validate grammar content. Can be url, inline or file reference (label) |
INTERACTION_TYPE_NORMALIZATION | 5 | Normalization interaction type |
INTERACTION_TYPE_CPA | 6 | Call process analysis interaction type |
INTERACTION_TYPE_AMD | 7 | Answering machine detection interaction type |
INTERACTION_TYPE_ASR_TRANSCRIPTION | 8 | ASR transcription interaction type |
Name | Number | Description |
VAD_EVENT_TYPE_UNSPECIFIED | 0 | Undefined VAD event type |
VAD_EVENT_TYPE_BEGIN_PROCESSING | 1 | VAD begins processing audio |
VAD_EVENT_TYPE_BARGE_IN | 2 | Barge-in occurred, audio that will be processed by the ASR starts here. This notification might be useful to stop prompt playback for example |
VAD_EVENT_TYPE_END_OF_SPEECH | 3 | End-of-speech occurred, no further audio will be processed by VAD for the specified interaction. If the setting VadSettings.auto_finalize_on_eos is true, the ASR will immediately finish processing audio at this point |
VAD_EVENT_TYPE_BARGE_IN_TIMEOUT | 4 | VAD timed out waiting for audio barge-in (start-of-speech). The audio manager will no longer process audio for this interaction. |
VAD_EVENT_TYPE_END_OF_SPEECH_TIMEOUT | 5 | VAD timed out waiting for audio barge-out (end-of-speech). The audio manager will no longer process audio for this interaction. |
VAD_EVENT_TYPE_END_OF_AUDIO_BEFORE_BARGEIN | 6 | VAD has reached audio_consume_max_ms before barge-in has occurred. |
VAD_EVENT_TYPE_END_OF_AUDIO_AFTER_BARGEIN | 7 | VAD has reached audio_consume_max_ms before barge-out (end-of-speech) has occurred. |
Field | Type | Label | Description |
status_message | google.rpc.Status | String containing event information |
Field | Type | Label | Description |
settings_type | GlobalGetSettingsRequest.GetSettingsType | Used to specify the type of settings to request |
Field | Type | Label | Description |
language | string | The language selector the specified grammar e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.) |
|
grammar_label | string | Reference label for global grammar Note: label must consist of letters, digits, hyphens, underscores only |
|
grammar_url | string | A grammar URL to be loaded |
|
inline_grammar_text | string | A string containing the raw grammar text |
|
grammar_settings | GrammarSettings | Optional grammar settings applied to this request |
Field | Type | Label | Description |
status | google.rpc.Status | The status of the grammar load |
|
mode | GrammarMode | The mode of the loaded grammar |
|
label | string | The label for the loaded grammar |
Field | Type | Label | Description |
phrases | string | repeated | A list of strings containing word and phrase "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary. |
phase_list_label | string | A label that can be used to reference this list within a transcription request |
|
language | string | The language selector describing which ASR resource will process request e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. Note that phrase lists are inherently language-independent, so this field is only used to direct which language-dependent resource will process the phrase load request |
|
phrase_list_settings | PhraseListSettings | Optional settings specifying boost options for phrases |
Field | Type | Label | Description |
status | google.rpc.Status | The status of the phrase list load. |
|
label | string | The label for the phrase list. |
Field | Type | Label | Description |
correlation_id | OptionalString | Optional unique reference per request message. A UUID value will be auto generated if not supplied by client |
|
deployment_id | string | Valid deployment identifier (UUID) to associate the request with |
|
operator_id | string | UUID related to the operator (entity or person making request) |
|
global_load_grammar_request | GlobalLoadGrammarRequest | Load a globally defined grammar |
|
global_load_phrase_list | GlobalLoadPhraseList | Load a globally defined phrase list |
|
global_get_settings_request | GlobalGetSettingsRequest | Get specified global default settings |
|
session_settings | SessionSettings | Default session settings |
|
interaction_settings | InteractionSettings | Deprecated. Default interaction settings |
|
grammar_settings | GrammarSettings | Deprecated. Default grammar settings |
|
recognition_settings | RecognitionSettings | Deprecated. Default recognition settings |
|
normalization_settings | NormalizationSettings | Deprecated. Default normalization settings |
|
vad_settings | VadSettings | Deprecated. Default VAD settings |
|
cpa_settings | CpaSettings | Deprecated. Default CPA settings |
|
amd_settings | AmdSettings | Deprecated. Default tone detection settings |
|
audio_consume_settings | AudioConsumeSettings | Deprecated. Default audio consume settings |
|
logging_settings | LoggingSettings | Default logging settings |
|
phrase_list_settings | PhraseListSettings | Deprecated. Optional settings specifying boost options for phrases |
|
reset_settings | ResetSettings | Will reset all of the settings to default |
Name | Option |
interaction_settings | true |
grammar_settings | true |
recognition_settings | true |
normalization_settings | true |
vad_settings | true |
cpa_settings | true |
amd_settings | true |
audio_consume_settings | true |
phrase_list_settings | true |
Field | Type | Label | Description |
correlation_id | OptionalString | Reference to corresponding request correlation_id |
|
global_event | GlobalEvent | Global event notification (typically errors) |
|
global_settings | GlobalSettings | Global default settings (which were requested) |
|
global_grammar | GlobalLoadGrammarResponse | Deprecated. Response to a global load grammar request |
|
global_phrase_list | GlobalLoadPhraseListResponse | Response to a global load phrase list request |
Name | Option |
global_grammar | true |
Container for all session and interaction related (global) settings
Field | Type | Label | Description |
session_settings | SessionSettings | Default session settings |
|
interaction_settings | InteractionSettings | Deprecated. Default interaction settings |
|
grammar_settings | GrammarSettings | Deprecated. Default grammar settings |
|
recognition_settings | RecognitionSettings | Deprecated. Default recognition settings |
|
normalization_settings | NormalizationSettings | Deprecated. Default normalization settings |
|
vad_settings | VadSettings | Deprecated. Default VAD settings |
|
cpa_settings | CpaSettings | Deprecated. Default CPA settings |
|
amd_settings | AmdSettings | Deprecated. Default tone detection settings |
|
audio_consume_settings | AudioConsumeSettings | Deprecated. Default audio consume settings |
|
logging_settings | LoggingSettings | Default logging settings |
|
phrase_list_settings | PhraseListSettings | Deprecated. Optional settings specifying boost options for phrases |
|
tts_settings | TtsSettings | Deprecated. Optional settings for Text-To-Speech (TTS) |
Name | Option |
interaction_settings | true |
grammar_settings | true |
recognition_settings | true |
normalization_settings | true |
vad_settings | true |
cpa_settings | true |
amd_settings | true |
audio_consume_settings | true |
phrase_list_settings | true |
tts_settings | true |
Name | Number | Description |
GET_SETTINGS_TYPE_UNSPECIFIED | 0 | |
GET_SETTINGS_TYPE_SESSION | 1 | SessionSettings type |
GET_SETTINGS_TYPE_INTERACTION | 2 | InteractionSettings type |
GET_SETTINGS_TYPE_GRAMMAR | 3 | GrammarSettings type |
GET_SETTINGS_TYPE_RECOGNITION | 4 | RecognitionSettings type |
GET_SETTINGS_TYPE_NORMALIZATION | 5 | NormalizationSettings type |
GET_SETTINGS_TYPE_VAD | 6 | VadSettings type |
GET_SETTINGS_TYPE_CPA | 7 | CpaSettings type |
GET_SETTINGS_TYPE_AMD | 8 | AmdSettings type |
GET_SETTINGS_TYPE_AUDIO_CONSUME | 9 | AudioConsumeSettings type |
GET_SETTINGS_TYPE_LOGGING_SETTINGS | 10 | LoggingSettings type |
GET_SETTINGS_TYPE_PHRASE_LIST | 11 | PhraseList type |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
|
close_status | google.rpc.Status | Status of request |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
|
close_status | google.rpc.Status | Status of request |
Field | Type | Label | Description |
amd_settings | AmdSettings | Parameters for this interaction |
|
audio_consume_settings | AudioConsumeSettings | Optional settings specifying audio to process for interaction |
|
vad_settings | VadSettings | Optional settings related to voice activity detection |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
Field | Type | Label | Description |
interaction_id | string | Interaction ID (uuid) that can be used during subsequent AMD processing |
Field | Type | Label | Description |
language | string | The language selector the specified grammars e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. |
|
grammars | Grammar | repeated | List of grammars to use, one for each root grammar to activate |
grammar_settings | GrammarSettings | Optional grammar settings to apply to this interaction |
|
recognition_settings | RecognitionSettings | Optional recognition settings for this interaction |
|
vad_settings | VadSettings | Optional settings related to voice activity detection |
|
audio_consume_settings | AudioConsumeSettings | Optional settings specifying audio to process for interaction |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
Field | Type | Label | Description |
interaction_id | string | Interaction ID (uuid) that can be used during subsequent ASR processing |
Field | Type | Label | Description |
cpa_settings | CpaSettings | Parameters for this interaction |
|
audio_consume_settings | AudioConsumeSettings | Optional settings specifying audio to process for interaction |
|
vad_settings | VadSettings | Optional settings related to voice activity detection |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
Field | Type | Label | Description |
interaction_id | string | Interaction ID (uuid) that can be used during subsequent CPA processing |
Field | Type | Label | Description |
language | string | The language selector the specified grammars e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. |
|
grammars | Grammar | repeated | List of grammars to use, one for each root grammar to activate |
grammar_settings | GrammarSettings | Optional grammar settings to apply to this interaction |
|
input_text | string | Input text to be parsed against specified grammar[s] |
|
parse_timeout_ms | OptionalInt32 | Maximum milliseconds to allow for a grammar parse. If this is exceeded, a timeout error will be raised. Range 0-10000000 (~166 minutes) Default: 10000 (10 seconds) |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced by the request |
Field | Type | Label | Description |
language | string | Language to use for normalization (e.g. en-us) |
|
transcript | string | All words in single string. |
|
normalization_settings | NormalizationSettings | Optional settings specifying whether text normalization step should be performed on output of this interaction. |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
Field | Type | Label | Description |
interaction_id | string | Interaction ID (UUID) that can be used during subsequent Normalize Text processing |
Field | Type | Label | Description |
language | string | Transcription language selector this request. e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. |
|
phrases | TranscriptionPhraseList | repeated | Optional phrase lists for interaction |
continuous_utterance_transcription | OptionalBool | If `true`, transcription will perform continuous recognition (continuing to wait for and process audio even if the user pauses speaking) until the client closes the input stream (gRPC API). This may return multiple FinalResult callback messages. If `false`, the recognizer will detect a single spoken utterance. When it detects that the user has paused or stopped speaking, it will return an FinalResult callback and cease recognition. It will return no more than one FinalResult. Default: false |
|
recognition_settings | RecognitionSettings | Optional recognition settings for this interaction |
|
vad_settings | VadSettings | Optional settings related to voice activity detection |
|
audio_consume_settings | AudioConsumeSettings | Optional settings specifying audio to process for interaction |
|
normalization_settings | NormalizationSettings | Optional settings specifying whether text normalization step should be performed on output of this interaction. |
|
phrase_list_settings | PhraseListSettings | Optional settings specifying boost options for phrases |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
|
embedded_grammars | Grammar | repeated | Optional list of grammars to use during transcription when a grammar matches during transcription, the semantic results of the grammar will also be returned |
embedded_grammar_settings | GrammarSettings | Optional grammar settings for embedded grammars |
|
language_model_name | OptionalString | Optional name of a language model (decoder) to use when processing transcription. Default is to not specify this, allowing engine to use default language decoder |
|
acoustic_model_name | OptionalString | Optional name of a acoustic model (encoder) to use when processing transcription. Default is to not specify this, allowing engine to use default language encoder |
|
enable_postprocessing | OptionalString | Optional custom postprocessing to enhance decoder functionality. Default is to not specify this, allowing engine to use default postprocessing |
Field | Type | Label | Description |
interaction_id | string | Interaction ID (uuid) that can be used during subsequent ASR processing |
Field | Type | Label | Description |
language | string | Synthesis language for this request (e.g.: "en-US", "de-DE", etc.) |
|
ssml_request | InteractionCreateTtsRequest.SsmlUrlRequest | SSML type request and parameters |
|
inline_request | InteractionCreateTtsRequest.InlineTtsRequest | Inline TTS definition (text and optional parameters) |
|
audio_format | AudioFormat | Audio format to be generated by TTS Synthesis Note: this is not configurable at Session or Global level, since it is explicitly required for each interaction request. |
|
synthesis_timeout_ms | OptionalInt32 | Optional timeout to limit the maximum time allowed for a synthesis Default: 5000 milliseconds |
|
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
Inline TTS definition (text and optional parameters)
Field | Type | Label | Description |
text | string | Text to synthesize, can simple text, or SSML |
|
tts_inline_synthesis_settings | TtsInlineSynthesisSettings | Optional settings for voice synthesis. |
|
ssl_verify_peer | OptionalBool | Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true |
Field | Type | Label | Description |
ssml_url | string | URL from which to fetch synthesis request SSML |
|
ssl_verify_peer | OptionalBool | Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true |
Field | Type | Label | Description |
interaction_id | string | Interaction ID (uuid) that can be used during subsequent TTS processing |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
Field | Type | Label | Description |
interaction_create_amd | InteractionCreateAmdRequest | Create AMD interaction request |
|
interaction_create_asr | InteractionCreateAsrRequest | Create ASR interaction request |
|
interaction_create_cpa | InteractionCreateCpaRequest | Create CPA interaction request |
|
interaction_create_transcription | InteractionCreateTranscriptionRequest | Create transcription interaction request |
|
interaction_create_tts | InteractionCreateTtsRequest | Create TTS interaction request |
|
interaction_create_grammar_parse | InteractionCreateGrammarParseRequest | Create a grammar parse request |
|
interaction_begin_processing | InteractionBeginProcessingRequest | Interaction begin processing |
|
interaction_finalize_processing | InteractionFinalizeProcessingRequest | Interaction finalize processing |
|
interaction_request_results | InteractionRequestResultsRequest | Interaction request results |
|
interaction_create_normalize_text | InteractionCreateNormalizeTextRequest | Create a normalize text request |
|
interaction_cancel | InteractionCancelRequest | Interaction cancel |
|
interaction_close | InteractionCloseRequest | Explicit request to close interaction |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
|
interaction_results | Result | Requested results |
Field | Type | Label | Description |
phrases | string | repeated | Optional list of strings containing words and phrases "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary. |
global_phrase_list | PhraseList | Optional reference to previously defined global phrase list(s) |
|
session_phrase_list | PhraseList | Optional reference to previously defined session phrase list(s) |
LumenVox Service
The LumenVox API can be used to access various speech resources,
such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS),
Transcription, Call-Progress-Analysis (CPA), etc.
Method Name | Request Type | Response Type | Description |
Session | SessionRequest stream | SessionResponse stream | Session Creates a new session and establishes a bidirectional stream, able to process all messages on this single bidirectional connection |
Global | GlobalRequest stream | GlobalResponse stream | Global Manages globally defined (deployment-level) objects |
Wrapper message for optional `bool`.
The JSON representation for `OptionalBool` is JSON `true` and `false`.
Field | Type | Label | Description |
value | bool | The bool value. |
Wrapper message for optional `bytes`.
The JSON representation for `OptionalBytes` is JSON string.
Field | Type | Label | Description |
value | bytes | The bytes value. |
Wrapper message for optional `double`.
The JSON representation for `OptionalDouble` is JSON number.
Field | Type | Label | Description |
value | double | The double value. |
Wrapper message for optional `float`.
The JSON representation for `OptionalFloat` is JSON number.
Field | Type | Label | Description |
value | float | The float value. |
Wrapper message for optional `int32`.
The JSON representation for `OptionalInt32` is JSON number.
Field | Type | Label | Description |
value | int32 | The int32 value. |
Wrapper message for optional `int64`.
The JSON representation for `OptionalInt64` is JSON string.
Field | Type | Label | Description |
value | int64 | The int64 value. |
Wrapper message for optional `string`.
The JSON representation for `OptionalString` is JSON string.
Field | Type | Label | Description |
value | string | The string value. |
Wrapper message for optional `uint32`.
The JSON representation for `OptionalUInt32` is JSON number.
Field | Type | Label | Description |
value | uint32 | The uint32 value. |
Wrapper message for optional `uint64`.
The JSON representation for `OptionalUInt64` is JSON string.
Field | Type | Label | Description |
value | uint64 | The uint64 value. |
Result returned from an AMD interaction.
Field | Type | Label | Description |
amd_result | AsrGrammarResult | AMD result in the form of an ASR-type message. |
Structure to hold data provided from ASR as final results
Field | Type | Label | Description |
asr_result_meta_data | AsrResultMetaData | Raw ASR output used to produce semantic interpretations |
|
semantic_interpretations | SemanticInterpretation | repeated | List of all possible semantic interpretations for given transcript. |
Result returned from an ASR interaction.
Field | Type | Label | Description |
n_bests | AsrGrammarResult | repeated | List of the N best possible matches provided via ASR. |
input_mode | string | The modality of the input, for example, speech, dtmf, etc. |
|
language | string | Language defined when creating the interaction. |
Raw transcript of words decoded by ASR
Field | Type | Label | Description |
words | Word | repeated | All words in Phrase so far. |
transcript | string | All words in single string. |
|
start_time_ms | int32 | Time in milliseconds since beginning of audio stream where recognition starts. |
|
duration_ms | int32 | Length of transcript in milliseconds. |
|
confidence | uint32 | Overall confidence of the entire transcript. |
Result returned from a CPA interaction.
Field | Type | Label | Description |
cpa_result | AsrGrammarResult | CPA result in the form of an ASR-type message. |
Callback sent when a final interaction result is ready.
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
|
final_result | Result | Final result for the specified interaction. Null if status error > 0 |
|
final_result_status | FinalResultStatus | Final status of the interaction |
|
status | google.rpc.Status | Status code produced. Returns 0 on success. this is comming form the 'internal' result message and should be passed to the caller shall we include the error here ??? or better send it as a SessionEvent ?? |
Result returned from grammar parse interaction.
Field | Type | Label | Description |
input_text | string | Input string used during grammar parse |
|
semantic_interpretations | SemanticInterpretation | repeated | List of all possible semantic interpretations for given text. |
input_mode | string | The modality of the input, for example, speech, dtmf, etc. |
|
language | string | Language defined when creating the interaction. |
|
has_next_transition | bool | Set to true if more input on input text is valid of interaction grammars. |
Token used in Inverse Text Normalization
Field | Type | Label | Description |
tag | string | Type of token. |
|
data | google.protobuf.Struct | All data in token |
One segment (one or more words) that is part of a result phrase.
Field | Type | Label | Description |
original_segment | string | Input word used to create segment. |
|
original_word_indices | uint32 | repeated | Index to words in original input. |
vocalization | string | Output after Inverse Text normalization. |
|
token | InverseTextNormalizationToken | Token information used in Inverse Text normalization. |
|
redaction | RedactionData | Data add for redaction. |
|
final | string | Final output for segment. |
Result returned from an Normalize Text interaction.
Field | Type | Label | Description |
transcript | string | Input string used for the text normalization request |
|
normalized_result | NormalizedResult | Normalized result message |
Result returned from an Normalize Text. Used in either Transcription
interaction or a Text Normalization interaction.
Field | Type | Label | Description |
segments | NormalizationSegment | repeated | All segments in result. |
verbalized | string | Output after Inverse Text normalization. |
|
verbalized_redacted | string | Output after Inverse Text normalization and redacted. |
|
final | string | Final output after Inverse Text normalization and punctuation and capitalization_normalization |
|
final_redacted | string | Final output after Inverse Text normalization, punctuation and capitalization_normalization, and redaction |
Callback sent when a partial interaction result is available.
Field | Type | Label | Description |
interaction_id | string | The interaction object being referenced |
|
partial_result | Result | Partial result for the specified interaction |
More detail on Redacted tokens
Field | Type | Label | Description |
personal_identifiable_information | bool | Redacted Personal Identifiable Information. |
|
entity | string | Type of redaction |
|
score | float | Redaction Score |
Contains results of various types that may be returned
Field | Type | Label | Description |
asr_interaction_result | AsrInteractionResult | Results for an ASR interaction |
|
transcription_interaction_result | TranscriptionInteractionResult | Results for a transcription interaction |
|
grammar_parse_interaction_result | GrammarParseInteractionResult | Results for a grammar parse interaction |
|
tts_interaction_result | TtsInteractionResult | Results for a TTS interaction |
|
normalize_text_result | NormalizeTextResult | Result for a Normalize Text interaction |
|
amd_interaction_result | AmdInteractionResult | Result for an AMD interaction |
|
cpa_interaction_result | CpaInteractionResult | Result for a CPA interaction |
Semantic Interpretation of an ASR result
Field | Type | Label | Description |
interpretation | google.protobuf.Struct | Structure containing Semantic Interpretation. |
|
interpretation_json | string | Json string containing Semantic interpretation. |
|
grammar_label | string | The label of the grammar used to generate this Semantic Interpretation. |
|
confidence | uint32 | Value 0 to 1000 of how confident the ASR is that result is correct match |
|
tag_format | string | Tag Format of in grammar used to generate this Semantic Interpretation. |
|
input_text | string | Raw input text for the interpretation |
Description of some artifact within the synthesis
Field | Type | Label | Description |
name | string | Name of the artifact being referenced |
|
offset_ms | uint32 | Offset in milliseconds to the named artifact |
Warning generated by a synthesis
Field | Type | Label | Description |
message | string | String containing warning message returned from synthesizer |
|
line | OptionalInt32 | Optional line indicating where the issue was detected |
Result returned from a transcription interaction.
Field | Type | Label | Description |
n_bests | TranscriptionResult | repeated | List of the N best possible matches provided via ASR. |
language | string | Language defined when creating the interaction. |
Structure to hold data provided from ASR as final results
Field | Type | Label | Description |
asr_result_meta_data | AsrResultMetaData | Raw ASR output which includes the transcript of the audio. |
|
normalized_result | NormalizedResult | If results are to be normalized, Normalized Result is added here. |
|
grammar_results | AsrGrammarResult | repeated | If enhanced transcription with grammars is used results are added here. |
srt_file | bytes | If SRT generation is enabled, the SRT file is added here. |
|
vtt_file | bytes | If VTT generation is enabled, the VTT file is added here. |
|
blended_score | OptionalFloat | Optional blended quality transcription score |
Contains a TTS interaction result.
Field | Type | Label | Description |
audio_format | AudioFormat | Format of returned audio. |
|
audio_length_ms | uint32 | Length of generated audio data. |
|
sentence_offsets_ms | uint32 | repeated | Offsets in milliseconds to where in audio buffer each synthesized sentence begins. |
word_offsets_ms | uint32 | repeated | Offsets in milliseconds to where in audio buffer each synthesized word begins. |
ssml_mark_offsets | SynthesisOffset | repeated | Offsets to where in audio buffer each synthesized SSML mark begins. |
voice_offsets | SynthesisOffset | repeated | Offsets to where in audio voice each synthesized begins. |
synthesis_warnings | SynthesisWarning | repeated | List of any Synthesis warnings. |
One word that is part of an ASR result.
Field | Type | Label | Description |
start_time_ms | int32 | Time in milliseconds since beginning of audio where word starts. |
|
duration_ms | int32 | Length of word in milliseconds. |
|
word | string | String output of word. |
|
confidence | uint32 | Value 0 to 1000 on how confident the result is. |
List of Interaction FinalResult Statuses
Name | Number | Description |
FINAL_RESULT_STATUS_UNSPECIFIED | 0 | No final status specified |
FINAL_RESULT_STATUS_NO_INPUT | 1 | No voice audio detected within the audio The final_result field in FinalResult will be empty |
FINAL_RESULT_STATUS_ERROR | 2 | An error occurred that stopped processing |
FINAL_RESULT_STATUS_CANCELLED | 3 | Interaction cancelled or closed before results can be returned |
FINAL_RESULT_STATUS_TRANSCRIPTION_MATCH | 11 | A transcription result was returned |
FINAL_RESULT_STATUS_TRANSCRIPTION_CONTINUOUS_MATCH | 12 | A transcription “intermediate” final result was returned |
FINAL_RESULT_STATUS_TRANSCRIPTION_GRAMMAR_MATCHES | 13 | A transcription result was returned, which contains one or more embedded grammar matches |
FINAL_RESULT_STATUS_TRANSCRIPTION_PARTIAL_MATCH | 14 | A enhanced transcription result was returned, but no SISR |
FINAL_RESULT_STATUS_GRAMMAR_MATCH | 21 | A complete grammar match was returned |
FINAL_RESULT_STATUS_GRAMMAR_NO_MATCH | 22 | No result could be obtained for the audio with the supplied grammars |
FINAL_RESULT_STATUS_GRAMMAR_PARTIAL_MATCH | 23 | Raw text is returned, but could not be parsed with the supplied grammars |
FINAL_RESULT_STATUS_AMD_TONE | 31 | An AMD interaction found one or more tones within the audio |
FINAL_RESULT_STATUS_AMD_NO_TONES | 32 | An AMD interaction found no tones within the audio |
FINAL_RESULT_STATUS_CPA_RESULT | 41 | A CPA interaction result was returned |
FINAL_RESULT_STATUS_CPA_SILENCE | 42 | No voice audio was detected for a CPA interaction |
FINAL_RESULT_STATUS_TTS_READY | 51 | TTS audio is available to pull |
FINAL_RESULT_STATUS_TEXT_NORMALIZE_RESULT | 61 | An inverse text normalization result was returned for a NormalizeText interaction. |
Field | Type | Label | Description |
deployment_id | string | Deployment identifier associated to the session |
|
session_id | string | Valid session identifier to attached to request |
|
operator_id | string | UUID related to the operator (entity or person making request) |
Currently no fields defined
Field | Type | Label | Description |
close_status | google.rpc.Status | Status of request |
Currently no fields defined
Field | Type | Label | Description |
close_status | google.rpc.Status | Status of request |
Field | Type | Label | Description |
deployment_id | string | Deployment identifier to associate the session with |
|
session_id | OptionalString | Optional unique reference for session (must be UUID) A UUID value will be auto generated if not supplied by client |
|
operator_id | string | UUID related to the operator (entity or person making request) |
Currently no fields defined
Field | Type | Label | Description |
audio_format | AudioFormat | Parameters for the inbound audio resource associated with the session |
Field | Type | Label | Description |
language | string | The language selector the specified grammar e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. |
|
grammar_label | string | Reference label for session grammar Note: label must consist of letters, digits, hyphens, underscores only |
|
grammar_url | string | A grammar URL to be loaded |
|
inline_grammar_text | string | A string containing the raw grammar text |
|
grammar_settings | GrammarSettings | Optional grammar settings applied to this request |
Field | Type | Label | Description |
status | google.rpc.Status | The status of the grammar load |
|
mode | GrammarMode | The mode of the loaded grammar |
|
label | string | The label for the loaded grammar |
Field | Type | Label | Description |
phrases | string | repeated | A list of strings containing word and phrase "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary. |
phase_list_label | string | A label that can be used to reference this list within a transcription request |
|
language | string | The language selector describing which ASR resource will process request e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. Note that phrase lists are inherently language-independent, so this field is only used to direct which language-dependent resource will process the phrase load request |
Field | Type | Label | Description |
status | google.rpc.Status | The status of the phrase list load. |
|
label | string | The label for the phrase list. |
Field | Type | Label | Description |
correlation_id | OptionalString | Optional unique reference per request message. A UUID value will be auto generated if not supplied by client |
|
session_request | SessionRequestMessage | For session-specific requests |
|
audio_request | AudioRequestMessage | For audio-specific requests |
|
interaction_request | InteractionRequestMessage | For interaction-specific requests |
|
dtmf_request | DtmfPushRequest | For DTMF events (part of ASR interaction) |
Field | Type | Label | Description |
session_create | SessionCreateRequest | Creates a new session and returns its ID and session related messages through response streamed callback messages |
|
session_audio_format | SessionInboundAudioFormatRequest | Defines the inbound audio format for the session. Must be assigned before any audio is sent and cannot later be changed. |
|
session_attach | SessionAttachRequest | Attach to an existing session |
|
session_close | SessionCloseRequest | Explicit request to close session |
|
session_set_settings | SessionSettings | Set settings to be configured for session. |
|
session_get_settings | SessionGetSettingsRequest | Get settings for session. |
|
session_load_grammar | SessionLoadGrammarRequest | Load session-specific grammar |
|
session_load_phrase_list | SessionLoadPhraseList | Load session-specific phrase list |
|
session_cancel | SessionCancelRequest | Explicit request to cancel all session related interactions and processing in progress |
Field | Type | Label | Description |
session_id | OptionalString | Session identifier (will be returned from initial call) |
|
correlation_id | OptionalString | Optional reference to corresponding request correlation_id |
|
vad_event | VadEvent | VAD event notification |
|
final_result | FinalResult | Final result notification |
|
partial_result | PartialResult | Partial result notification |
|
session_event | SessionEvent | Session event notification (typically errors) |
|
session_close | SessionCloseResponse | Response for explicit session close request |
|
audio_pull | AudioPullResponse | Response to audio pull request |
|
session_get_settings | SessionSettings | Response to get settings for session. |
|
interaction_create_amd | InteractionCreateAmdResponse | Response to create AMD interaction request |
|
interaction_create_asr | InteractionCreateAsrResponse | Response to create ASR interaction request |
|
interaction_create_cpa | InteractionCreateCpaResponse | Response to create CPA interaction request |
|
interaction_create_tts | InteractionCreateTtsResponse | Response to create TTS interaction request |
|
interaction_create_grammar_parse | InteractionCreateGrammarParseResponse | Response to create a grammar parse request |
|
interaction_create_normalize_text | InteractionCreateNormalizeTextResponse | Response to create a normalize text request |
|
interaction_get_settings | InteractionSettings | Response to interaction get settings request |
|
interaction_request_results | InteractionRequestResultsResponse | Response to interaction request results |
|
interaction_create_transcription | InteractionCreateTranscriptionResponse | Response to create Transcription interaction request |
|
session_phrase_list | SessionLoadPhraseListResponse | Response for session load phrase list request |
|
session_grammar | SessionLoadGrammarResponse | Response for session load grammar request |
|
interaction_cancel | InteractionCancelResponse | Response to interaction cancel |
|
interaction_close | InteractionCloseResponse | Response to explicit request to close interaction |
|
session_cancel | SessionCancelResponse | Response for explicit session cancel request |
Settings related to answering machine / tone detection
and other tones such as FAX or SIT tone
Field | Type | Label | Description |
amd_enable | OptionalBool | Enabled answering machine beep detection Default: true |
|
amd_input_text | OptionalString | Which string is returned in response to an AMD beep detection Default: AMD |
|
fax_enable | OptionalBool | Enable fax tone detection Default: true |
|
fax_input_text | OptionalString | Which string is returned in response to a fax tone detection Default: FAX |
|
sit_enable | OptionalBool | Enable SIT detection Default: true |
|
sit_reorder_local_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT REORDER LOCAL" |
|
sit_vacant_code_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT VACANT CODE" |
|
sit_no_circuit_local_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT NO CIRCUIT LOCAL" |
|
sit_intercept_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT INTERCEPT" |
|
sit_reorder_distant_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT REORDER DISTANT" |
|
sit_no_circuit_distant_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT NO CIRCUIT DISTANT" |
|
sit_other_input_text | OptionalString | Which string is returned in response to specified SIT detection Default: "SIT OTHER" |
|
busy_enable | OptionalBool | Enable busy tone detection Default: true |
|
busy_input_text | OptionalString | Which string is returned in response to a busy tone detection Default: BUSY |
|
tone_detect_timeout_ms | OptionalInt32 | Maximum number of milliseconds the tone detection algorithm should listen for input before timing out. |
Field | Type | Label | Description |
audio_channel | OptionalInt32 | For multi-channel audio, this is the channel number being referenced. Range is from 0 to N. Default channel 0 will be used if not specified |
|
audio_consume_mode | AudioConsumeSettings.AudioConsumeMode | Select which audio mode is used Default: AUDIO_CONSUME_MODE_STREAMING |
|
stream_start_location | AudioConsumeSettings.StreamStartLocation | Specify where audio consume starts when "streaming" mode is used Default: STREAM_START_LOCATION_STREAM_BEGIN |
|
start_offset_ms | OptionalInt32 | Optional offset in milliseconds to adjust the audio start point. Range: Value in milliseconds, positive or negative. Default: 0 |
|
audio_consume_max_ms | OptionalInt32 | Optional maximum audio to process. Value of 0 means process all audio sent Range: Positive value in milliseconds Default: 0 |
Settings related to Call Progress Analysis
Field | Type | Label | Description |
human_residence_time_ms | OptionalInt32 | Maximum amount of speech for human residence classification Default: 1800 |
|
human_business_time_ms | OptionalInt32 | Maximum amount of speech for human business classification. Human speech lasting longer than this will be classified as unknown speech Default: 3000 |
|
unknown_silence_timeout_ms | OptionalInt32 | Maximum amount of silence to allow before human speech is detected. If This timeout is reached, the classification will be returned as unknown silence. Default: 5000 |
|
max_time_from_connect_ms | OptionalInt32 | Maximum amount of time the CPA algorithm is allowed to perform human or machine classification. Only use this if you understand the implications (lower accuracy). Default: 0 (disabled) |
Settings that apply to all interaction types
Field | Type | Label | Description |
secure_context | OptionalBool | When true (enabled), certain ASR and TTS data will not be logged. This provides additional security for sensitive data such as account numbers and passwords that may be used within applications. Anywhere that potentially sensitive data would have been recorded will be replaced with _SUPPRESSED in the logs. Default: false |
|
custom_interaction_data | OptionalString | Optional data (i.e. could be string, JSON, delimited lists, etc.) set by user, for external purposes. Not used by LumenVox |
|
logging_tag | OptionalString | repeated | Optional tag for logging. Reserved for future use. |
Settings related to SRGS grammar usage
Field | Type | Label | Description |
default_tag_format | GrammarSettings.TagFormat | The default tag-format for loaded grammars if not otherwise specified. Default: TAG_FORMAT_SEMANTICS_1_2006 |
|
ssl_verify_peer | OptionalBool | Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true |
|
load_grammar_timeout_ms | OptionalInt32 | Maximum milliseconds to allow for grammar loading. If this is exceeded, a timeout error will be raised. Range 1000-2147483647 (~600 hours) Default: 200000 (~3.333 minutes) |
|
compatibility_mode | OptionalInt32 | Compatibility mode for certain media server operations. Only change from the default if you understand the consequences. Range: 0-1 Default: 0 |
Describes the interaction specific settings
Field | Type | Label | Description |
general_interaction_settings | GeneralInteractionSettings | Optional settings related to all interactions |
|
audio_consume_settings | AudioConsumeSettings | Optional settings defining how audio is consumed/used by the interaction |
|
vad_settings | VadSettings | Optional Voice Activity Detection settings for interaction |
|
grammar_settings | GrammarSettings | Optional grammar settings for interaction |
|
recognition_settings | RecognitionSettings | Optional recognition settings for interaction |
|
cpa_settings | CpaSettings | Optional Call Progress Analysis settings for interaction |
|
amd_settings | AmdSettings | Optional Tone Detection (AMD) settings for interaction |
|
normalization_settings | NormalizationSettings | Optional settings specifying which text normalization steps should be performed on output of interaction. |
|
phrase_list_settings | PhraseListSettings | Optional settings specifying boost options for phrases |
|
tts_settings | TtsSettings | Optional settings for Text-To-Speech (TTS) |
Field | Type | Label | Description |
logging_verbosity | LoggingSettings.LoggingVerbosity | Logging verbosity setting Default: LOGGING_VERBOSITY_INFO |
Settings related to text Normalization results
Field | Type | Label | Description |
enable_inverse_text | OptionalBool | Set to true to enable inverse text normalization (going from spoken form → written form (e.g. twenty two → 22) Default: false |
|
enable_punctuation_capitalization | OptionalBool | Set to true to enable punctuation and capitalization normalization Default: false |
|
enable_redaction | OptionalBool | Set to true to enable redaction of sensitive information Default: false |
|
request_timeout_ms | OptionalInt32 | Number of milliseconds text normalization should await results before timing out Possible values: 0 - 1000000 Default: 5000 |
|
enable_srt_generation | OptionalBool | Set to true to enable generation of SRT file (SubRip file format) Default: false |
|
enable_vtt_generation | OptionalBool | Set to true to enable generation of VTT file (WebVTT file format) Default: false |
Field | Type | Label | Description |
probability_boost | OptionalInt32 | Probability score boost raises or lowers the probability the words or phrases are recognized. A negative value lowers the probability the word is returned in results. Range: -10.0 to 5.0 (very probable) Default: 0 |
Settings related to recognition results
Field | Type | Label | Description |
max_alternatives | OptionalInt32 | Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of `NBest` messages within each `AsrInteractionResult`. Default: 1 |
|
trim_silence_value | OptionalInt32 | Controls how aggressively the ASR trims leading silence from input audio. Range: 0 (very aggressive) to 1000 (no silence trimmed) Default: 970 |
|
enable_partial_results | OptionalBool | When true, partial results callbacks will be enabled for the interaction Default: false |
|
confidence_threshold | OptionalInt32 | Confidence threshold. Range 0 to 1000; applies to grammar based asr interactions Default: 0 |
|
decode_timeout | OptionalInt32 | Number of milliseconds the ASR should await results before timing out Possible values: 0 - 100,000,000 Default: 10,000,000 (~2.7 hours) |
No additional fields needed
Optional settings to be used for the duration of a session for all
interactions created within the session.
These can be overridden at the interaction level
All settings are optional, not specifying a setting at any level means the
default or parent context's value will be used. As a rule, only settings
that need to be changed from default should be set explicitly
Field | Type | Label | Description |
archive_session | OptionalBool | Whether the session data should be archived when closed, for tuning and other diagnostic purposes Default: false |
|
custom_session_data | OptionalString | Optional data (i.e. could be string, JSON, delimited lists, etc.) set by user, for external purposes. Not used by LumenVox |
|
interaction_settings | InteractionSettings | Optional settings to be used for duration of session for all interactions created. These can be over-ridden at the interaction level |
|
call_id | OptionalString | Optional call identifier sting used for CDR tracking. This is often associated with the telephony call-id value or equivalent. |
|
channel_id | OptionalString | repeated | Optional channel identifier sting used for CDR tracking. This is often associated with a telephony/MRCP channel SDP value or equivalent. |
archive_session_delay_seconds | OptionalInt32 | Optional delay interval for archiving in seconds Session data will persist more in redis before being written to database |
|
logging_tag | OptionalString | repeated | Optional tag for logging. Reserved for future use. |
Field | Type | Label | Description |
voice | OptionalString | Optional voice (if using simple text, or if not specified within SSML) |
|
synth_emphasis_level | OptionalString | The strength of the emphasis used in the voice during synthesis. Possible Values: "strong", "moderate", "none" or "reduced". |
|
synth_prosody_pitch | OptionalString | The pitch of the audio being synthesized. Possible Values: A number followed by "Hz", a relative change, or one of the following values: "x-low", "low", "medium", "high", "x-high", or "default". See the SSML standard for details. |
|
synth_prosody_contour | OptionalString | The contour of the audio being synthesized. Possible Values: Please refer to the SSML standard on pitch contour for details. |
|
synth_prosody_rate | OptionalString | The speaking rate of the audio being synthesized. Possible Values: A relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default". See the SSML standard for details. |
|
synth_prosody_duration | OptionalString | The duration of time it will take for the synthesized text to play. Possible Values: A time, such as "250ms" or "3s". |
|
synth_prosody_volume | OptionalString | The volume of the audio being synthesized. Possible Values: A number, a relative change or one of: "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default". See the SSML specification for details. |
|
synth_voice_age | OptionalString | The age of the voice used for synthesis. Possible Values: A non-negative integer. |
|
synth_voice_gender | OptionalString | The default TTS gender to use if none is specified. Possible Values: Either neutral (which uses the default), male, or female. |
Field | Type | Label | Description |
voice_mappings | TtsSettings.VoiceMappingsEntry | repeated | Voice mappings allow alternative voice names to map to LumenVox voices. The key is the language of the voice mappings. |
Field | Type | Label | Description |
key | string |
|
|
value | VoiceMapping |
|
Settings related to Voice Activity Detection (VAD)
VAD is used to begin audio processing once a person starts speaking and
is used to detect when a person has stopped speaking
Field | Type | Label | Description |
use_vad | OptionalBool | When `false`, all audio as specified in AudioConsumeSettings is used for processing. In streaming audio mode, InteractionFinalizeProcessing() would need be called to finish processing When `true`, VAD is used to determine when the speaker starts and stops speaking. When using VAD in batch audio mode, the engine will look for speech begin within the designated audio to process and will stop processing audio when end of speech is found, which may mean that all audio loaded is not processed. |
|
barge_in_timeout_ms | OptionalInt32 | Maximum silence, in ms, allowed while waiting for user input (barge-in) before a timeout is reported. Range: -1 (infinite) to positive integer number of milliseconds Default: -1 (infinite) |
|
end_of_speech_timeout_ms | OptionalInt32 | After barge-in, STREAM_STATUS_END_SPEECH_TIMEOUT will occur if end-of-speech not detected in time specified by this property. This is different from the eos_delay_ms; This value represents the total amount of time a caller is permitted to speak after barge-in is detected. Range: a positive number of milliseconds or -1 (infinite) Default: -1 (infinite) |
|
noise_reduction_mode | VadSettings.NoiseReductionMode | Determines noise reduction mode. Default: NOISE_REDUCTION_DEFAULT |
|
bargein_threshold | OptionalInt32 | A higher value makes the VAD more sensitive towards speech, and less sensitive towards non-speech, which means that the VAD algorithm must be more sure that the audio is speech before triggering barge in. Raising the value will reject more false positives/noises. However, it may mean that some speech that is on the borderline may be rejected. This value should not be changed from the default without significant tuning and verification. Range: Integer value from 0 to 100 Default: 50 |
|
eos_delay_ms | OptionalInt32 | Milliseconds of silence after speech before processing begins. Range: A positive integer number of milliseconds Default: 800 |
|
snr_sensitivity | OptionalInt32 | Determines how much louder the speaker must be than the background noise in order to trigger barge-in. The smaller this value, the easier it will be to trigger barge-in. Range: Integer range from 0 to 100 Default: 50 |
|
stream_init_delay | OptionalInt32 | Accurate VAD depends on a good estimation of the acoustic environment. The VAD module uses the first couple frames of audio to estimate the acoustic environment, such as noise level. The length of this period is defined by this parameter. Range: A positive integer number of milliseconds. Default: 100 |
|
volume_sensitivity | OptionalInt32 | The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be. This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive) prompts that are not properly cancelled will be less likely to falsely cancel barge-in. Range: Integer range from 0 to 100 Default: 50 |
|
wind_back_ms | OptionalInt32 | The length of audio to be wound back at the beginning of voice activity. This is used primarily to counter instances where barge-in does not accurately capture the very start of speech. The resolution of this parameter is 1/8 of a second. Range: A positive integer number of milliseconds Default: 480 |
Field | Type | Label | Description |
voicePairs | VoiceMapping.VoicePairsEntry | repeated | A map of custom voice pairs. The key is the voice that will be requested by the API user. The value is the LumenVox voice that will be used for synthesis. Use the key "default" to set a default voice for the given language. |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Name | Number | Description |
AUDIO_CONSUME_MODE_UNSPECIFIED | 0 | No mode specified |
AUDIO_CONSUME_MODE_STREAMING | 1 | Specify streaming mode is used |
AUDIO_CONSUME_MODE_BATCH | 2 | Specify batch mode is used |
Only used when AUDIO_CONSUME_MODE_STREAMING is used
Name | Number | Description |
STREAM_START_LOCATION_UNSPECIFIED | 0 | No location specified |
STREAM_START_LOCATION_STREAM_BEGIN | 1 | Start processing from the beginning of the stream. Note: Only valid option for AUDIO_CONSUME_MODE_BATCH |
STREAM_START_LOCATION_BEGIN_PROCESSING_CALL | 2 | Start processing from the audio streamed after the API call InteractionBeginProcessing() was made. Note: Not valid for AUDIO_CONSUME_MODE_BATCH |
STREAM_START_LOCATION_INTERACTION_CREATED | 3 | Start processing from the audio streamed after the interaction was created. Note: Not valid for AUDIO_CONSUME_MODE_BATCH |
Name | Number | Description |
TAG_FORMAT_UNSPECIFIED | 0 | |
TAG_FORMAT_LUMENVOX_1 | 1 | lumenvox/1.0 tag format |
TAG_FORMAT_SEMANTICS_1 | 2 | semantics/1.0 tag format |
TAG_FORMAT_SEMANTICS_1_LITERALS | 3 | semantics/1.0-literals tag format |
TAG_FORMAT_SEMANTICS_1_2006 | 4 | semantics/1.0.2006 tag format |
TAG_FORMAT_SEMANTICS_1_2006_LITERALS | 5 | semantics/1.0.2006-literals tag format |
Name | Number | Description |
LOGGING_VERBOSITY_UNSPECIFIED | 0 | Logging verbosity is not specified |
LOGGING_VERBOSITY_DEBUG | 1 | Internal system events that are not usually observable |
LOGGING_VERBOSITY_INFO | 2 | Routine logging, such as ongoing status or performance |
LOGGING_VERBOSITY_WARNING | 3 | Warnings and above only - service degradation or danger |
LOGGING_VERBOSITY_ERROR | 4 | Functionality is unavailable, invariants are broken, or data is lost |
LOGGING_VERBOSITY_CRITICAL | 5 | Only log exceptions and critical errors (not recommended) |
Name | Number | Description |
NOISE_REDUCTION_MODE_UNSPECIFIED | 0 | No change to setting |
NOISE_REDUCTION_MODE_DISABLED | 1 | Noise reduction disabled |
NOISE_REDUCTION_MODE_DEFAULT | 2 | Default (recommended) noise reduction algorithm is enabled. |
NOISE_REDUCTION_MODE_ALTERNATE | 3 | Alternate noise reduction algorithm. Similar to default, but we have seen varied results based on differing noise types and levels. |
NOISE_REDUCTION_MODE_ADAPTIVE | 4 | Uses an adaptive noise reduction algorithm that is most suited to varying levels of background noise, such as changing car noise, etc. |
.proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
double | double | double | float | float64 | double | float | Float | |
float | float | float | float | float32 | float | float | Float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
bool | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |
Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox