Protocol Documentation

lumenvox/api/audio_formats.proto

Top

AudioFormat

FieldTypeLabelDescription
standard_audio_format AudioFormat.StandardAudioFormat

Standard audio format

sample_rate_hertz OptionalInt32

Sample rate in Hertz of the audio data This field is mandatory for RAW PCM audio format. It's optional for the other formats. For audio formats with headers, this value will be ignored, and instead the value from the file header will be used. Default: 8000 (8 KHz)

AudioFormat.StandardAudioFormat

Specification for the audio format

Not all standard formats are supported in all cases. Different operations

may natively handle a subset of the total audio formats.

NameNumberDescription
STANDARD_AUDIO_FORMAT_UNSPECIFIED 0

STANDARD_AUDIO_FORMAT_LINEAR16 1

Uncompressed 16-bit signed little-endian samples (Linear PCM).

STANDARD_AUDIO_FORMAT_ULAW 2

8-bit audio samples using G.711 PCMU/mu-law.

STANDARD_AUDIO_FORMAT_ALAW 3

8-bit audio samples using G.711 PCMA/a-law.

STANDARD_AUDIO_FORMAT_WAV 4

WAV formatted audio

STANDARD_AUDIO_FORMAT_FLAC 5

FLAC formatted audio

STANDARD_AUDIO_FORMAT_MP3 6

MP3 formatted audio

STANDARD_AUDIO_FORMAT_OPUS 7

OPUS formatted audio

STANDARD_AUDIO_FORMAT_M4A 8

M4A formatted audio

STANDARD_AUDIO_FORMAT_MP4 9

Audio packed into MP4 container

STANDARD_AUDIO_FORMAT_NO_AUDIO_RESOURCE 100

Explicitly indicate that no audio resource should be allocated

lumenvox/api/common.proto

Top

AudioPullRequest

FieldTypeLabelDescription
audio_id string

Id of the audio requested (Note that this could be session_id to request the inbound audio resource)

audio_channel OptionalInt32

For multi-channel audio, this is the channel number being referenced. Range is from 0 to N. Default channel 0 will be used if not specified

audio_start OptionalInt32

Number of milliseconds from the beginning of the audio to return. Default is from the beginning

audio_length OptionalInt32

Maximum number of milliseconds to return. A zero value returns all available audio (from requested start point). Default is all audio, from start point

AudioPullResponse

FieldTypeLabelDescription
audio_data bytes

Binary audio data that was requested

audio_channel OptionalInt32

For multi-channel audio, this is the channel number being referenced.

final_data_chunk bool

In case of large audio, data will be split and there will be multiple AudioPullResponse messages. final_data_chunk field is set to true for the last message

AudioPushRequest

Note that SessionInboundAudioFormatRequest should be called before

using this message, so that the audio format is defined

FieldTypeLabelDescription
audio_data bytes

Binary audio data to be added to the audio resource

AudioRequestMessage

FieldTypeLabelDescription
audio_push AudioPushRequest

Streamed binary audio data to be added to the session audio resource

audio_pull AudioPullRequest

Returns a block of audio data from an audio resource.

DtmfPushRequest

FieldTypeLabelDescription
interaction_id string

ASR interaction to associate this dtmf_key with

dtmf_key string

DTMF key press to be added to interaction stream for processing. Valid keys are 0-9, A-F, *, #

Event

Event can be either a VadEvent or a SessionEvent

FieldTypeLabelDescription
vad_event VadEvent

Event returned form Vad (AudioManager)

session_event SessionEvent

Session Events used to report errors to the API user

Grammar

FieldTypeLabelDescription
grammar_url string

A grammar URL to be loaded

inline_grammar_text string

A string containing the raw grammar text

global_grammar_label string

Deprecated. Reference to a previously defined "global" grammar Note: label must consist of letters, digits, hyphens, underscores only

session_grammar_label string

Reference to a previously defined "session" grammar Note: label must consist of letters, digits, hyphens, underscores only

builtin_voice_grammar Grammar.BuiltinGrammar

Reference to a "builtin" voice grammar

builtin_dtmf_grammar Grammar.BuiltinGrammar

Reference to a "builtin" DTMF grammar

label OptionalString

Optional label assigned to grammar, used for error reporting Note: label must consist of letters, digits, hyphens, underscores only

Fields with deprecated option

Name Option
global_grammar_label

true

LogEvent

a single event with timestamp to be logged to the database

the LogEvent will be returned via reporting api

FieldTypeLabelDescription
time_stamp google.protobuf.Timestamp

Log Event Timestamp (UTC)

event Event

can be either a VadEvent or a SessionEvent

PhraseList

FieldTypeLabelDescription
phrase_list_label string

The label of a previously defined global phrase list

SessionEvent

FieldTypeLabelDescription
interaction_id OptionalString

Optional interaction object being referenced

status_message google.rpc.Status

String containing event information

VadEvent

Message used to signal events over the course of Voice Activity Detection

processing.

The audio_offset will signify at what point within the session audio

resource the event occurred.

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

vad_event_type VadEvent.VadEventType

The type of event this message represents

audio_offset OptionalInt32

The offset in milliseconds from the beginning of the audio resource that this event occurred

Grammar.BuiltinGrammar

Note that all builtin grammars are language-specific

NameNumberDescription
BUILTIN_GRAMMAR_UNSPECIFIED 0

Undefined built-in grammar

BUILTIN_GRAMMAR_BOOLEAN 1

"yes" => true

BUILTIN_GRAMMAR_CURRENCY 2

"one dollar ninety seven" => USD1.97

BUILTIN_GRAMMAR_DATE 3

"march sixteenth nineteen seventy nine" => 19790316

BUILTIN_GRAMMAR_DIGITS 4

"one two three four" => 1234

BUILTIN_GRAMMAR_NUMBER 5

"three point one four one five nine two six" => 3.1415926

BUILTIN_GRAMMAR_PHONE 6

"eight five eight seven oh seven oh seven oh seven" => 8587070707

BUILTIN_GRAMMAR_TIME 7

"six o clock" => 0600

GrammarMode

List of all grammar modes.

NameNumberDescription
GRAMMAR_MODE_UNSPECIFIED 0

Mode not specified

GRAMMAR_MODE_VOICE 1

Voice mode

GRAMMAR_MODE_DTMF 2

DTMF mode

GRAMMAR_MODE_VOICE_AND_DTMF 3

Voice and DTMF mode Deprecated - should not be used

InteractionStatus

List of all Interaction statuses.

NameNumberDescription
INTERACTION_STATUS_UNSPECIFIED 0

This status is not expected or valid to happen. Indicating empty message.

INTERACTION_STATUS_CREATED 1

Interaction is in created only state, no additional processing is done yet.

INTERACTION_STATUS_RESULTS_READY 2

Interaction results are ready. Most results are sent automatically when ready.

INTERACTION_STATUS_CLOSED 3

Used to indicated successfully closed interaction state

INTERACTION_STATUS_CANCELED 4

Used to indicated successfully canceled interaction state

INTERACTION_STATUS_ASR_WAITING_ON_GRAMMARS 101

Audio processing not started yet. Waiting on grammars to be loaded.

INTERACTION_STATUS_ASR_WAITING_ON_BARGIN 102

Audio processing not started yet. Waiting on BARGE_IN event from VAD

INTERACTION_STATUS_ASR_STREAM_REQUEST 103

Initial status or post BARGE_IN status of interaction, stream processing not started yet

INTERACTION_STATUS_ASR_STOP_REQUESTED_WAITING 104

Batch mode, waiting for STOP request

INTERACTION_STATUS_ASR_STREAM_STARTED 105

ASR started reading stream

INTERACTION_STATUS_ASR_STREAM_STOP_REQUESTED 106

Set in case of Finalize request

INTERACTION_STATUS_ASR_WAITING_FOR_CPA_AMD_RESPONSE 107

Used for CPA and AMD interactions

INTERACTION_STATUS_ASR_TIMEOUT 109

No VAD event or interaction finalize, ASR processing timed out

INTERACTION_STATUS_ASR_WAITING_ON_BARGEOUT 110

Audio processing started. Waiting on BARGE_OUT event from VAD

INTERACTION_STATUS_TTS_PROCESSING 200

TTS processing

INTERACTION_STATUS_GRAMMAR_PARSE_WAITING_ON_GRAMMARS 400

Grammar(s) loading in progress, interaction not started yet

INTERACTION_STATUS_GRAMMAR_PARSE_REQUESTED_PROCESSING 401

Interaction processing in progress

INTERACTION_STATUS_NORMALIZE_TEXT_REQUESTED_PROCESSING 500

Normalize Text

INTERACTION_STATUS_ASR_TRANSCRIPTION_WAITING_ON_PHRASE_LISTS 600

Asr Transcription

InteractionSubType

List of all interaction sub-types for ASR Interactions

NameNumberDescription
INTERACTION_SUB_TYPE_UNSPECIFIED 0

This is not valid type. Indicating empty gRPC message.

INTERACTION_SUB_TYPE_GRAMMAR_BASED_CPA 1

Call process analysis interaction type with grammars

INTERACTION_SUB_TYPE_GRAMMAR_BASED_AMD 2

Answering machine detection interaction type with grammars

INTERACTION_SUB_TYPE_ENHANCED_TRANSCRIPTION 3

ASR transcription interaction with multiple grammars

INTERACTION_SUB_TYPE_CONTINUOUS_TRANSCRIPTION 4

Deprecated - ASR continuous transcription

INTERACTION_SUB_TYPE_TRANSCRIPTION_WITH_NORMALIZATION 5

Deprecated - Transcription result with normalized text Normalization can be enabled for different interaction types/subtypes in parallel, e.g. GRAMMAR_BASED_TRANSCRIPTION can have normalization setting as well. If needed for filtering, this flag will be added separately

INTERACTION_SUB_TYPE_GRAMMAR_BASED_TRANSCRIPTION 6

Transcription interaction type with grammars

InteractionType

List of all Interaction types.

NameNumberDescription
INTERACTION_TYPE_UNSPECIFIED 0

This is not valid type. Indicating empty gRPC message.

INTERACTION_TYPE_ASR 2

ASR processing interaction

INTERACTION_TYPE_TTS 3

TTS processing interaction

INTERACTION_TYPE_GRAMMAR_PARSE 4

Validate grammar content. Can be url, inline or file reference (label)

INTERACTION_TYPE_NORMALIZATION 5

Normalization interaction type

INTERACTION_TYPE_CPA 6

Call process analysis interaction type

INTERACTION_TYPE_AMD 7

Answering machine detection interaction type

INTERACTION_TYPE_ASR_TRANSCRIPTION 8

ASR transcription interaction type

VadEvent.VadEventType

NameNumberDescription
VAD_EVENT_TYPE_UNSPECIFIED 0

Undefined VAD event type

VAD_EVENT_TYPE_BEGIN_PROCESSING 1

VAD begins processing audio

VAD_EVENT_TYPE_BARGE_IN 2

Barge-in occurred, audio that will be processed by the ASR starts here. This notification might be useful to stop prompt playback for example

VAD_EVENT_TYPE_END_OF_SPEECH 3

End-of-speech occurred, no further audio will be processed by VAD for the specified interaction. If the setting VadSettings.auto_finalize_on_eos is true, the ASR will immediately finish processing audio at this point

VAD_EVENT_TYPE_BARGE_IN_TIMEOUT 4

VAD timed out waiting for audio barge-in (start-of-speech). The audio manager will no longer process audio for this interaction.

VAD_EVENT_TYPE_END_OF_SPEECH_TIMEOUT 5

VAD timed out waiting for audio barge-out (end-of-speech). The audio manager will no longer process audio for this interaction.

VAD_EVENT_TYPE_END_OF_AUDIO_BEFORE_BARGEIN 6

VAD has reached audio_consume_max_ms before barge-in has occurred.

VAD_EVENT_TYPE_END_OF_AUDIO_AFTER_BARGEIN 7

VAD has reached audio_consume_max_ms before barge-out (end-of-speech) has occurred.

lumenvox/api/global.proto

Top

GlobalEvent

FieldTypeLabelDescription
status_message google.rpc.Status

String containing event information

GlobalGetSettingsRequest

FieldTypeLabelDescription
settings_type GlobalGetSettingsRequest.GetSettingsType

Used to specify the type of settings to request

GlobalLoadGrammarRequest

FieldTypeLabelDescription
language string

The language selector the specified grammar e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.)

grammar_label string

Reference label for global grammar Note: label must consist of letters, digits, hyphens, underscores only

grammar_url string

A grammar URL to be loaded

inline_grammar_text string

A string containing the raw grammar text

grammar_settings GrammarSettings

Optional grammar settings applied to this request

GlobalLoadGrammarResponse

FieldTypeLabelDescription
status google.rpc.Status

The status of the grammar load

mode GrammarMode

The mode of the loaded grammar

label string

The label for the loaded grammar

GlobalLoadPhraseList

FieldTypeLabelDescription
phrases string repeated

A list of strings containing word and phrase "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary.

phase_list_label string

A label that can be used to reference this list within a transcription request

language string

The language selector describing which ASR resource will process request e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. Note that phrase lists are inherently language-independent, so this field is only used to direct which language-dependent resource will process the phrase load request

phrase_list_settings PhraseListSettings

Optional settings specifying boost options for phrases

GlobalLoadPhraseListResponse

FieldTypeLabelDescription
status google.rpc.Status

The status of the phrase list load.

label string

The label for the phrase list.

GlobalRequest

FieldTypeLabelDescription
correlation_id OptionalString

Optional unique reference per request message. A UUID value will be auto generated if not supplied by client

deployment_id string

Valid deployment identifier (UUID) to associate the request with

operator_id string

UUID related to the operator (entity or person making request)

global_load_grammar_request GlobalLoadGrammarRequest

Load a globally defined grammar

global_load_phrase_list GlobalLoadPhraseList

Load a globally defined phrase list

global_get_settings_request GlobalGetSettingsRequest

Get specified global default settings

session_settings SessionSettings

Default session settings

interaction_settings InteractionSettings

Deprecated. Default interaction settings

grammar_settings GrammarSettings

Deprecated. Default grammar settings

recognition_settings RecognitionSettings

Deprecated. Default recognition settings

normalization_settings NormalizationSettings

Deprecated. Default normalization settings

vad_settings VadSettings

Deprecated. Default VAD settings

cpa_settings CpaSettings

Deprecated. Default CPA settings

amd_settings AmdSettings

Deprecated. Default tone detection settings

audio_consume_settings AudioConsumeSettings

Deprecated. Default audio consume settings

logging_settings LoggingSettings

Default logging settings

phrase_list_settings PhraseListSettings

Deprecated. Optional settings specifying boost options for phrases

reset_settings ResetSettings

Will reset all of the settings to default

Fields with deprecated option

Name Option
interaction_settings

true

grammar_settings

true

recognition_settings

true

normalization_settings

true

vad_settings

true

cpa_settings

true

amd_settings

true

audio_consume_settings

true

phrase_list_settings

true

GlobalResponse

FieldTypeLabelDescription
correlation_id OptionalString

Reference to corresponding request correlation_id

global_event GlobalEvent

Global event notification (typically errors)

global_settings GlobalSettings

Global default settings (which were requested)

global_grammar GlobalLoadGrammarResponse

Deprecated. Response to a global load grammar request

global_phrase_list GlobalLoadPhraseListResponse

Response to a global load phrase list request

Fields with deprecated option

Name Option
global_grammar

true

GlobalSettings

Container for all session and interaction related (global) settings

FieldTypeLabelDescription
session_settings SessionSettings

Default session settings

interaction_settings InteractionSettings

Deprecated. Default interaction settings

grammar_settings GrammarSettings

Deprecated. Default grammar settings

recognition_settings RecognitionSettings

Deprecated. Default recognition settings

normalization_settings NormalizationSettings

Deprecated. Default normalization settings

vad_settings VadSettings

Deprecated. Default VAD settings

cpa_settings CpaSettings

Deprecated. Default CPA settings

amd_settings AmdSettings

Deprecated. Default tone detection settings

audio_consume_settings AudioConsumeSettings

Deprecated. Default audio consume settings

logging_settings LoggingSettings

Default logging settings

phrase_list_settings PhraseListSettings

Deprecated. Optional settings specifying boost options for phrases

tts_settings TtsSettings

Deprecated. Optional settings for Text-To-Speech (TTS)

Fields with deprecated option

Name Option
interaction_settings

true

grammar_settings

true

recognition_settings

true

normalization_settings

true

vad_settings

true

cpa_settings

true

amd_settings

true

audio_consume_settings

true

phrase_list_settings

true

tts_settings

true

GlobalGetSettingsRequest.GetSettingsType

NameNumberDescription
GET_SETTINGS_TYPE_UNSPECIFIED 0

GET_SETTINGS_TYPE_SESSION 1

SessionSettings type

GET_SETTINGS_TYPE_INTERACTION 2

InteractionSettings type

GET_SETTINGS_TYPE_GRAMMAR 3

GrammarSettings type

GET_SETTINGS_TYPE_RECOGNITION 4

RecognitionSettings type

GET_SETTINGS_TYPE_NORMALIZATION 5

NormalizationSettings type

GET_SETTINGS_TYPE_VAD 6

VadSettings type

GET_SETTINGS_TYPE_CPA 7

CpaSettings type

GET_SETTINGS_TYPE_AMD 8

AmdSettings type

GET_SETTINGS_TYPE_AUDIO_CONSUME 9

AudioConsumeSettings type

GET_SETTINGS_TYPE_LOGGING_SETTINGS 10

LoggingSettings type

GET_SETTINGS_TYPE_PHRASE_LIST 11

PhraseList type

lumenvox/api/interaction.proto

Top

InteractionBeginProcessingRequest

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

InteractionCancelRequest

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

InteractionCancelResponse

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

close_status google.rpc.Status

Status of request

InteractionCloseRequest

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

InteractionCloseResponse

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

close_status google.rpc.Status

Status of request

InteractionCreateAmdRequest

FieldTypeLabelDescription
amd_settings AmdSettings

Parameters for this interaction

audio_consume_settings AudioConsumeSettings

Optional settings specifying audio to process for interaction

vad_settings VadSettings

Optional settings related to voice activity detection

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

InteractionCreateAmdResponse

FieldTypeLabelDescription
interaction_id string

Interaction ID (uuid) that can be used during subsequent AMD processing

InteractionCreateAsrRequest

FieldTypeLabelDescription
language string

The language selector the specified grammars e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.

grammars Grammar repeated

List of grammars to use, one for each root grammar to activate

grammar_settings GrammarSettings

Optional grammar settings to apply to this interaction

recognition_settings RecognitionSettings

Optional recognition settings for this interaction

vad_settings VadSettings

Optional settings related to voice activity detection

audio_consume_settings AudioConsumeSettings

Optional settings specifying audio to process for interaction

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

InteractionCreateAsrResponse

FieldTypeLabelDescription
interaction_id string

Interaction ID (uuid) that can be used during subsequent ASR processing

InteractionCreateCpaRequest

FieldTypeLabelDescription
cpa_settings CpaSettings

Parameters for this interaction

audio_consume_settings AudioConsumeSettings

Optional settings specifying audio to process for interaction

vad_settings VadSettings

Optional settings related to voice activity detection

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

InteractionCreateCpaResponse

FieldTypeLabelDescription
interaction_id string

Interaction ID (uuid) that can be used during subsequent CPA processing

InteractionCreateGrammarParseRequest

FieldTypeLabelDescription
language string

The language selector the specified grammars e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.

grammars Grammar repeated

List of grammars to use, one for each root grammar to activate

grammar_settings GrammarSettings

Optional grammar settings to apply to this interaction

input_text string

Input text to be parsed against specified grammar[s]

parse_timeout_ms OptionalInt32

Maximum milliseconds to allow for a grammar parse. If this is exceeded, a timeout error will be raised. Range 0-10000000 (~166 minutes) Default: 10000 (10 seconds)

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

InteractionCreateGrammarParseResponse

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced by the request

InteractionCreateNormalizeTextRequest

FieldTypeLabelDescription
language string

Language to use for normalization (e.g. en-us)

transcript string

All words in single string.

normalization_settings NormalizationSettings

Optional settings specifying whether text normalization step should be performed on output of this interaction.

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

InteractionCreateNormalizeTextResponse

FieldTypeLabelDescription
interaction_id string

Interaction ID (UUID) that can be used during subsequent Normalize Text processing

InteractionCreateTranscriptionRequest

FieldTypeLabelDescription
language string

Transcription language selector this request. e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.

phrases TranscriptionPhraseList repeated

Optional phrase lists for interaction

continuous_utterance_transcription OptionalBool

If `true`, transcription will perform continuous recognition (continuing to wait for and process audio even if the user pauses speaking) until the client closes the input stream (gRPC API). This may return multiple FinalResult callback messages. If `false`, the recognizer will detect a single spoken utterance. When it detects that the user has paused or stopped speaking, it will return an FinalResult callback and cease recognition. It will return no more than one FinalResult. Default: false

recognition_settings RecognitionSettings

Optional recognition settings for this interaction

vad_settings VadSettings

Optional settings related to voice activity detection

audio_consume_settings AudioConsumeSettings

Optional settings specifying audio to process for interaction

normalization_settings NormalizationSettings

Optional settings specifying whether text normalization step should be performed on output of this interaction.

phrase_list_settings PhraseListSettings

Optional settings specifying boost options for phrases

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

embedded_grammars Grammar repeated

Optional list of grammars to use during transcription when a grammar matches during transcription, the semantic results of the grammar will also be returned

embedded_grammar_settings GrammarSettings

Optional grammar settings for embedded grammars

language_model_name OptionalString

Optional name of a language model (decoder) to use when processing transcription. Default is to not specify this, allowing engine to use default language decoder

acoustic_model_name OptionalString

Optional name of a acoustic model (encoder) to use when processing transcription. Default is to not specify this, allowing engine to use default language encoder

enable_postprocessing OptionalString

Optional custom postprocessing to enhance decoder functionality. Default is to not specify this, allowing engine to use default postprocessing

InteractionCreateTranscriptionResponse

FieldTypeLabelDescription
interaction_id string

Interaction ID (uuid) that can be used during subsequent ASR processing

InteractionCreateTtsRequest

FieldTypeLabelDescription
language string

Synthesis language for this request (e.g.: "en-US", "de-DE", etc.)

ssml_request InteractionCreateTtsRequest.SsmlUrlRequest

SSML type request and parameters

inline_request InteractionCreateTtsRequest.InlineTtsRequest

Inline TTS definition (text and optional parameters)

audio_format AudioFormat

Audio format to be generated by TTS Synthesis Note: this is not configurable at Session or Global level, since it is explicitly required for each interaction request.

synthesis_timeout_ms OptionalInt32

Optional timeout to limit the maximum time allowed for a synthesis Default: 5000 milliseconds

general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

InteractionCreateTtsRequest.InlineTtsRequest

Inline TTS definition (text and optional parameters)

FieldTypeLabelDescription
text string

Text to synthesize, can simple text, or SSML

tts_inline_synthesis_settings TtsInlineSynthesisSettings

Optional settings for voice synthesis.

ssl_verify_peer OptionalBool

Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true

InteractionCreateTtsRequest.SsmlUrlRequest

FieldTypeLabelDescription
ssml_url string

URL from which to fetch synthesis request SSML

ssl_verify_peer OptionalBool

Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true

InteractionCreateTtsResponse

FieldTypeLabelDescription
interaction_id string

Interaction ID (uuid) that can be used during subsequent TTS processing

InteractionFinalizeProcessingRequest

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

InteractionRequestMessage

FieldTypeLabelDescription
interaction_create_amd InteractionCreateAmdRequest

Create AMD interaction request

interaction_create_asr InteractionCreateAsrRequest

Create ASR interaction request

interaction_create_cpa InteractionCreateCpaRequest

Create CPA interaction request

interaction_create_transcription InteractionCreateTranscriptionRequest

Create transcription interaction request

interaction_create_tts InteractionCreateTtsRequest

Create TTS interaction request

interaction_create_grammar_parse InteractionCreateGrammarParseRequest

Create a grammar parse request

interaction_begin_processing InteractionBeginProcessingRequest

Interaction begin processing

interaction_finalize_processing InteractionFinalizeProcessingRequest

Interaction finalize processing

interaction_request_results InteractionRequestResultsRequest

Interaction request results

interaction_create_normalize_text InteractionCreateNormalizeTextRequest

Create a normalize text request

interaction_cancel InteractionCancelRequest

Interaction cancel

interaction_close InteractionCloseRequest

Explicit request to close interaction

InteractionRequestResultsRequest

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

InteractionRequestResultsResponse

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

interaction_results Result

Requested results

TranscriptionPhraseList

FieldTypeLabelDescription
phrases string repeated

Optional list of strings containing words and phrases "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary.

global_phrase_list PhraseList

Optional reference to previously defined global phrase list(s)

session_phrase_list PhraseList

Optional reference to previously defined session phrase list(s)

lumenvox/api/lumenvox.proto

Top

LumenVox

LumenVox Service

The LumenVox API can be used to access various speech resources,

such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS),

Transcription, Call-Progress-Analysis (CPA), etc.

Method NameRequest TypeResponse TypeDescription
Session SessionRequest stream SessionResponse stream

Session Creates a new session and establishes a bidirectional stream, able to process all messages on this single bidirectional connection

Global GlobalRequest stream GlobalResponse stream

Global Manages globally defined (deployment-level) objects

lumenvox/api/optional_values.proto

Top

OptionalBool

Wrapper message for optional `bool`.

The JSON representation for `OptionalBool` is JSON `true` and `false`.

FieldTypeLabelDescription
value bool

The bool value.

OptionalBytes

Wrapper message for optional `bytes`.

The JSON representation for `OptionalBytes` is JSON string.

FieldTypeLabelDescription
value bytes

The bytes value.

OptionalDouble

Wrapper message for optional `double`.

The JSON representation for `OptionalDouble` is JSON number.

FieldTypeLabelDescription
value double

The double value.

OptionalFloat

Wrapper message for optional `float`.

The JSON representation for `OptionalFloat` is JSON number.

FieldTypeLabelDescription
value float

The float value.

OptionalInt32

Wrapper message for optional `int32`.

The JSON representation for `OptionalInt32` is JSON number.

FieldTypeLabelDescription
value int32

The int32 value.

OptionalInt64

Wrapper message for optional `int64`.

The JSON representation for `OptionalInt64` is JSON string.

FieldTypeLabelDescription
value int64

The int64 value.

OptionalString

Wrapper message for optional `string`.

The JSON representation for `OptionalString` is JSON string.

FieldTypeLabelDescription
value string

The string value.

OptionalUInt32

Wrapper message for optional `uint32`.

The JSON representation for `OptionalUInt32` is JSON number.

FieldTypeLabelDescription
value uint32

The uint32 value.

OptionalUInt64

Wrapper message for optional `uint64`.

The JSON representation for `OptionalUInt64` is JSON string.

FieldTypeLabelDescription
value uint64

The uint64 value.

lumenvox/api/results.proto

Top

AmdInteractionResult

Result returned from an AMD interaction.

FieldTypeLabelDescription
amd_result AsrGrammarResult

AMD result in the form of an ASR-type message.

AsrGrammarResult

Structure to hold data provided from ASR as final results

FieldTypeLabelDescription
asr_result_meta_data AsrResultMetaData

Raw ASR output used to produce semantic interpretations

semantic_interpretations SemanticInterpretation repeated

List of all possible semantic interpretations for given transcript.

AsrInteractionResult

Result returned from an ASR interaction.

FieldTypeLabelDescription
n_bests AsrGrammarResult repeated

List of the N best possible matches provided via ASR.

input_mode string

The modality of the input, for example, speech, dtmf, etc.

language string

Language defined when creating the interaction.

AsrResultMetaData

Raw transcript of words decoded by ASR

FieldTypeLabelDescription
words Word repeated

All words in Phrase so far.

transcript string

All words in single string.

start_time_ms int32

Time in milliseconds since beginning of audio stream where recognition starts.

duration_ms int32

Length of transcript in milliseconds.

confidence uint32

Overall confidence of the entire transcript.

CpaInteractionResult

Result returned from a CPA interaction.

FieldTypeLabelDescription
cpa_result AsrGrammarResult

CPA result in the form of an ASR-type message.

FinalResult

Callback sent when a final interaction result is ready.

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

final_result Result

Final result for the specified interaction. Null if status error > 0

final_result_status FinalResultStatus

Final status of the interaction

status google.rpc.Status

Status code produced. Returns 0 on success. this is comming form the 'internal' result message and should be passed to the caller shall we include the error here ??? or better send it as a SessionEvent ??

GrammarParseInteractionResult

Result returned from grammar parse interaction.

FieldTypeLabelDescription
input_text string

Input string used during grammar parse

semantic_interpretations SemanticInterpretation repeated

List of all possible semantic interpretations for given text.

input_mode string

The modality of the input, for example, speech, dtmf, etc.

language string

Language defined when creating the interaction.

has_next_transition bool

Set to true if more input on input text is valid of interaction grammars.

InverseTextNormalizationToken

Token used in Inverse Text Normalization

FieldTypeLabelDescription
tag string

Type of token.

data google.protobuf.Struct

All data in token

NormalizationSegment

One segment (one or more words) that is part of a result phrase.

FieldTypeLabelDescription
original_segment string

Input word used to create segment.

original_word_indices uint32 repeated

Index to words in original input.

vocalization string

Output after Inverse Text normalization.

token InverseTextNormalizationToken

Token information used in Inverse Text normalization.

redaction RedactionData

Data add for redaction.

final string

Final output for segment.

NormalizeTextResult

Result returned from an Normalize Text interaction.

FieldTypeLabelDescription
transcript string

Input string used for the text normalization request

normalized_result NormalizedResult

Normalized result message

NormalizedResult

Result returned from an Normalize Text. Used in either Transcription

interaction or a Text Normalization interaction.

FieldTypeLabelDescription
segments NormalizationSegment repeated

All segments in result.

verbalized string

Output after Inverse Text normalization.

verbalized_redacted string

Output after Inverse Text normalization and redacted.

final string

Final output after Inverse Text normalization and punctuation and capitalization_normalization

final_redacted string

Final output after Inverse Text normalization, punctuation and capitalization_normalization, and redaction

PartialResult

Callback sent when a partial interaction result is available.

FieldTypeLabelDescription
interaction_id string

The interaction object being referenced

partial_result Result

Partial result for the specified interaction

RedactionData

More detail on Redacted tokens

FieldTypeLabelDescription
personal_identifiable_information bool

Redacted Personal Identifiable Information.

entity string

Type of redaction

score float

Redaction Score

Result

Contains results of various types that may be returned

FieldTypeLabelDescription
asr_interaction_result AsrInteractionResult

Results for an ASR interaction

transcription_interaction_result TranscriptionInteractionResult

Results for a transcription interaction

grammar_parse_interaction_result GrammarParseInteractionResult

Results for a grammar parse interaction

tts_interaction_result TtsInteractionResult

Results for a TTS interaction

normalize_text_result NormalizeTextResult

Result for a Normalize Text interaction

amd_interaction_result AmdInteractionResult

Result for an AMD interaction

cpa_interaction_result CpaInteractionResult

Result for a CPA interaction

SemanticInterpretation

Semantic Interpretation of an ASR result

FieldTypeLabelDescription
interpretation google.protobuf.Struct

Structure containing Semantic Interpretation.

interpretation_json string

Json string containing Semantic interpretation.

grammar_label string

The label of the grammar used to generate this Semantic Interpretation.

confidence uint32

Value 0 to 1000 of how confident the ASR is that result is correct match

tag_format string

Tag Format of in grammar used to generate this Semantic Interpretation.

input_text string

Raw input text for the interpretation

SynthesisOffset

Description of some artifact within the synthesis

FieldTypeLabelDescription
name string

Name of the artifact being referenced

offset_ms uint32

Offset in milliseconds to the named artifact

SynthesisWarning

Warning generated by a synthesis

FieldTypeLabelDescription
message string

String containing warning message returned from synthesizer

line OptionalInt32

Optional line indicating where the issue was detected

TranscriptionInteractionResult

Result returned from a transcription interaction.

FieldTypeLabelDescription
n_bests TranscriptionResult repeated

List of the N best possible matches provided via ASR.

language string

Language defined when creating the interaction.

TranscriptionResult

Structure to hold data provided from ASR as final results

FieldTypeLabelDescription
asr_result_meta_data AsrResultMetaData

Raw ASR output which includes the transcript of the audio.

normalized_result NormalizedResult

If results are to be normalized, Normalized Result is added here.

grammar_results AsrGrammarResult repeated

If enhanced transcription with grammars is used results are added here.

srt_file bytes

If SRT generation is enabled, the SRT file is added here.

vtt_file bytes

If VTT generation is enabled, the VTT file is added here.

blended_score OptionalFloat

Optional blended quality transcription score

TtsInteractionResult

Contains a TTS interaction result.

FieldTypeLabelDescription
audio_format AudioFormat

Format of returned audio.

audio_length_ms uint32

Length of generated audio data.

sentence_offsets_ms uint32 repeated

Offsets in milliseconds to where in audio buffer each synthesized sentence begins.

word_offsets_ms uint32 repeated

Offsets in milliseconds to where in audio buffer each synthesized word begins.

ssml_mark_offsets SynthesisOffset repeated

Offsets to where in audio buffer each synthesized SSML mark begins.

voice_offsets SynthesisOffset repeated

Offsets to where in audio voice each synthesized begins.

synthesis_warnings SynthesisWarning repeated

List of any Synthesis warnings.

Word

One word that is part of an ASR result.

FieldTypeLabelDescription
start_time_ms int32

Time in milliseconds since beginning of audio where word starts.

duration_ms int32

Length of word in milliseconds.

word string

String output of word.

confidence uint32

Value 0 to 1000 on how confident the result is.

FinalResultStatus

List of Interaction FinalResult Statuses

NameNumberDescription
FINAL_RESULT_STATUS_UNSPECIFIED 0

No final status specified

FINAL_RESULT_STATUS_NO_INPUT 1

No voice audio detected within the audio The final_result field in FinalResult will be empty

FINAL_RESULT_STATUS_ERROR 2

An error occurred that stopped processing

FINAL_RESULT_STATUS_CANCELLED 3

Interaction cancelled or closed before results can be returned

FINAL_RESULT_STATUS_TRANSCRIPTION_MATCH 11

A transcription result was returned

FINAL_RESULT_STATUS_TRANSCRIPTION_CONTINUOUS_MATCH 12

A transcription “intermediate” final result was returned

FINAL_RESULT_STATUS_TRANSCRIPTION_GRAMMAR_MATCHES 13

A transcription result was returned, which contains one or more embedded grammar matches

FINAL_RESULT_STATUS_TRANSCRIPTION_PARTIAL_MATCH 14

A enhanced transcription result was returned, but no SISR

FINAL_RESULT_STATUS_GRAMMAR_MATCH 21

A complete grammar match was returned

FINAL_RESULT_STATUS_GRAMMAR_NO_MATCH 22

No result could be obtained for the audio with the supplied grammars

FINAL_RESULT_STATUS_GRAMMAR_PARTIAL_MATCH 23

Raw text is returned, but could not be parsed with the supplied grammars

FINAL_RESULT_STATUS_AMD_TONE 31

An AMD interaction found one or more tones within the audio

FINAL_RESULT_STATUS_AMD_NO_TONES 32

An AMD interaction found no tones within the audio

FINAL_RESULT_STATUS_CPA_RESULT 41

A CPA interaction result was returned

FINAL_RESULT_STATUS_CPA_SILENCE 42

No voice audio was detected for a CPA interaction

FINAL_RESULT_STATUS_TTS_READY 51

TTS audio is available to pull

FINAL_RESULT_STATUS_TEXT_NORMALIZE_RESULT 61

An inverse text normalization result was returned for a NormalizeText interaction.

lumenvox/api/session.proto

Top

SessionAttachRequest

FieldTypeLabelDescription
deployment_id string

Deployment identifier associated to the session

session_id string

Valid session identifier to attached to request

operator_id string

UUID related to the operator (entity or person making request)

SessionCancelRequest

Currently no fields defined

SessionCancelResponse

FieldTypeLabelDescription
close_status google.rpc.Status

Status of request

SessionCloseRequest

Currently no fields defined

SessionCloseResponse

FieldTypeLabelDescription
close_status google.rpc.Status

Status of request

SessionCreateRequest

FieldTypeLabelDescription
deployment_id string

Deployment identifier to associate the session with

session_id OptionalString

Optional unique reference for session (must be UUID) A UUID value will be auto generated if not supplied by client

operator_id string

UUID related to the operator (entity or person making request)

SessionGetSettingsRequest

Currently no fields defined

SessionInboundAudioFormatRequest

FieldTypeLabelDescription
audio_format AudioFormat

Parameters for the inbound audio resource associated with the session

SessionLoadGrammarRequest

FieldTypeLabelDescription
language string

The language selector the specified grammar e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.

grammar_label string

Reference label for session grammar Note: label must consist of letters, digits, hyphens, underscores only

grammar_url string

A grammar URL to be loaded

inline_grammar_text string

A string containing the raw grammar text

grammar_settings GrammarSettings

Optional grammar settings applied to this request

SessionLoadGrammarResponse

FieldTypeLabelDescription
status google.rpc.Status

The status of the grammar load

mode GrammarMode

The mode of the loaded grammar

label string

The label for the loaded grammar

SessionLoadPhraseList

FieldTypeLabelDescription
phrases string repeated

A list of strings containing word and phrase "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary.

phase_list_label string

A label that can be used to reference this list within a transcription request

language string

The language selector describing which ASR resource will process request e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. Note that phrase lists are inherently language-independent, so this field is only used to direct which language-dependent resource will process the phrase load request

SessionLoadPhraseListResponse

FieldTypeLabelDescription
status google.rpc.Status

The status of the phrase list load.

label string

The label for the phrase list.

SessionRequest

FieldTypeLabelDescription
correlation_id OptionalString

Optional unique reference per request message. A UUID value will be auto generated if not supplied by client

session_request SessionRequestMessage

For session-specific requests

audio_request AudioRequestMessage

For audio-specific requests

interaction_request InteractionRequestMessage

For interaction-specific requests

dtmf_request DtmfPushRequest

For DTMF events (part of ASR interaction)

SessionRequestMessage

FieldTypeLabelDescription
session_create SessionCreateRequest

Creates a new session and returns its ID and session related messages through response streamed callback messages

session_audio_format SessionInboundAudioFormatRequest

Defines the inbound audio format for the session. Must be assigned before any audio is sent and cannot later be changed.

session_attach SessionAttachRequest

Attach to an existing session

session_close SessionCloseRequest

Explicit request to close session

session_set_settings SessionSettings

Set settings to be configured for session.

session_get_settings SessionGetSettingsRequest

Get settings for session.

session_load_grammar SessionLoadGrammarRequest

Load session-specific grammar

session_load_phrase_list SessionLoadPhraseList

Load session-specific phrase list

session_cancel SessionCancelRequest

Explicit request to cancel all session related interactions and processing in progress

SessionResponse

FieldTypeLabelDescription
session_id OptionalString

Session identifier (will be returned from initial call)

correlation_id OptionalString

Optional reference to corresponding request correlation_id

vad_event VadEvent

VAD event notification

final_result FinalResult

Final result notification

partial_result PartialResult

Partial result notification

session_event SessionEvent

Session event notification (typically errors)

session_close SessionCloseResponse

Response for explicit session close request

audio_pull AudioPullResponse

Response to audio pull request

session_get_settings SessionSettings

Response to get settings for session.

interaction_create_amd InteractionCreateAmdResponse

Response to create AMD interaction request

interaction_create_asr InteractionCreateAsrResponse

Response to create ASR interaction request

interaction_create_cpa InteractionCreateCpaResponse

Response to create CPA interaction request

interaction_create_tts InteractionCreateTtsResponse

Response to create TTS interaction request

interaction_create_grammar_parse InteractionCreateGrammarParseResponse

Response to create a grammar parse request

interaction_create_normalize_text InteractionCreateNormalizeTextResponse

Response to create a normalize text request

interaction_get_settings InteractionSettings

Response to interaction get settings request

interaction_request_results InteractionRequestResultsResponse

Response to interaction request results

interaction_create_transcription InteractionCreateTranscriptionResponse

Response to create Transcription interaction request

session_phrase_list SessionLoadPhraseListResponse

Response for session load phrase list request

session_grammar SessionLoadGrammarResponse

Response for session load grammar request

interaction_cancel InteractionCancelResponse

Response to interaction cancel

interaction_close InteractionCloseResponse

Response to explicit request to close interaction

session_cancel SessionCancelResponse

Response for explicit session cancel request

lumenvox/api/settings.proto

Top

AmdSettings

Settings related to answering machine / tone detection

and other tones such as FAX or SIT tone

FieldTypeLabelDescription
amd_enable OptionalBool

Enabled answering machine beep detection Default: true

amd_input_text OptionalString

Which string is returned in response to an AMD beep detection Default: AMD

fax_enable OptionalBool

Enable fax tone detection Default: true

fax_input_text OptionalString

Which string is returned in response to a fax tone detection Default: FAX

sit_enable OptionalBool

Enable SIT detection Default: true

sit_reorder_local_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT REORDER LOCAL"

sit_vacant_code_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT VACANT CODE"

sit_no_circuit_local_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT NO CIRCUIT LOCAL"

sit_intercept_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT INTERCEPT"

sit_reorder_distant_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT REORDER DISTANT"

sit_no_circuit_distant_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT NO CIRCUIT DISTANT"

sit_other_input_text OptionalString

Which string is returned in response to specified SIT detection Default: "SIT OTHER"

busy_enable OptionalBool

Enable busy tone detection Default: true

busy_input_text OptionalString

Which string is returned in response to a busy tone detection Default: BUSY

tone_detect_timeout_ms OptionalInt32

Maximum number of milliseconds the tone detection algorithm should listen for input before timing out.

AudioConsumeSettings

FieldTypeLabelDescription
audio_channel OptionalInt32

For multi-channel audio, this is the channel number being referenced. Range is from 0 to N. Default channel 0 will be used if not specified

audio_consume_mode AudioConsumeSettings.AudioConsumeMode

Select which audio mode is used Default: AUDIO_CONSUME_MODE_STREAMING

stream_start_location AudioConsumeSettings.StreamStartLocation

Specify where audio consume starts when "streaming" mode is used Default: STREAM_START_LOCATION_STREAM_BEGIN

start_offset_ms OptionalInt32

Optional offset in milliseconds to adjust the audio start point. Range: Value in milliseconds, positive or negative. Default: 0

audio_consume_max_ms OptionalInt32

Optional maximum audio to process. Value of 0 means process all audio sent Range: Positive value in milliseconds Default: 0

CpaSettings

Settings related to Call Progress Analysis

FieldTypeLabelDescription
human_residence_time_ms OptionalInt32

Maximum amount of speech for human residence classification Default: 1800

human_business_time_ms OptionalInt32

Maximum amount of speech for human business classification. Human speech lasting longer than this will be classified as unknown speech Default: 3000

unknown_silence_timeout_ms OptionalInt32

Maximum amount of silence to allow before human speech is detected. If This timeout is reached, the classification will be returned as unknown silence. Default: 5000

max_time_from_connect_ms OptionalInt32

Maximum amount of time the CPA algorithm is allowed to perform human or machine classification. Only use this if you understand the implications (lower accuracy). Default: 0 (disabled)

GeneralInteractionSettings

Settings that apply to all interaction types

FieldTypeLabelDescription
secure_context OptionalBool

When true (enabled), certain ASR and TTS data will not be logged. This provides additional security for sensitive data such as account numbers and passwords that may be used within applications. Anywhere that potentially sensitive data would have been recorded will be replaced with _SUPPRESSED in the logs. Default: false

custom_interaction_data OptionalString

Optional data (i.e. could be string, JSON, delimited lists, etc.) set by user, for external purposes. Not used by LumenVox

logging_tag OptionalString repeated

Optional tag for logging. Reserved for future use.

GrammarSettings

Settings related to SRGS grammar usage

FieldTypeLabelDescription
default_tag_format GrammarSettings.TagFormat

The default tag-format for loaded grammars if not otherwise specified. Default: TAG_FORMAT_SEMANTICS_1_2006

ssl_verify_peer OptionalBool

Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true

load_grammar_timeout_ms OptionalInt32

Maximum milliseconds to allow for grammar loading. If this is exceeded, a timeout error will be raised. Range 1000-2147483647 (~600 hours) Default: 200000 (~3.333 minutes)

compatibility_mode OptionalInt32

Compatibility mode for certain media server operations. Only change from the default if you understand the consequences. Range: 0-1 Default: 0

InteractionSettings

Describes the interaction specific settings

FieldTypeLabelDescription
general_interaction_settings GeneralInteractionSettings

Optional settings related to all interactions

audio_consume_settings AudioConsumeSettings

Optional settings defining how audio is consumed/used by the interaction

vad_settings VadSettings

Optional Voice Activity Detection settings for interaction

grammar_settings GrammarSettings

Optional grammar settings for interaction

recognition_settings RecognitionSettings

Optional recognition settings for interaction

cpa_settings CpaSettings

Optional Call Progress Analysis settings for interaction

amd_settings AmdSettings

Optional Tone Detection (AMD) settings for interaction

normalization_settings NormalizationSettings

Optional settings specifying which text normalization steps should be performed on output of interaction.

phrase_list_settings PhraseListSettings

Optional settings specifying boost options for phrases

tts_settings TtsSettings

Optional settings for Text-To-Speech (TTS)

LoggingSettings

FieldTypeLabelDescription
logging_verbosity LoggingSettings.LoggingVerbosity

Logging verbosity setting Default: LOGGING_VERBOSITY_INFO

NormalizationSettings

Settings related to text Normalization results

FieldTypeLabelDescription
enable_inverse_text OptionalBool

Set to true to enable inverse text normalization (going from spoken form → written form (e.g. twenty two → 22) Default: false

enable_punctuation_capitalization OptionalBool

Set to true to enable punctuation and capitalization normalization Default: false

enable_redaction OptionalBool

Set to true to enable redaction of sensitive information Default: false

request_timeout_ms OptionalInt32

Number of milliseconds text normalization should await results before timing out Possible values: 0 - 1000000 Default: 5000

enable_srt_generation OptionalBool

Set to true to enable generation of SRT file (SubRip file format) Default: false

enable_vtt_generation OptionalBool

Set to true to enable generation of VTT file (WebVTT file format) Default: false

PhraseListSettings

FieldTypeLabelDescription
probability_boost OptionalInt32

Probability score boost raises or lowers the probability the words or phrases are recognized. A negative value lowers the probability the word is returned in results. Range: -10.0 to 5.0 (very probable) Default: 0

RecognitionSettings

Settings related to recognition results

FieldTypeLabelDescription
max_alternatives OptionalInt32

Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of `NBest` messages within each `AsrInteractionResult`. Default: 1

trim_silence_value OptionalInt32

Controls how aggressively the ASR trims leading silence from input audio. Range: 0 (very aggressive) to 1000 (no silence trimmed) Default: 970

enable_partial_results OptionalBool

When true, partial results callbacks will be enabled for the interaction Default: false

confidence_threshold OptionalInt32

Confidence threshold. Range 0 to 1000; applies to grammar based asr interactions Default: 0

decode_timeout OptionalInt32

Number of milliseconds the ASR should await results before timing out Possible values: 0 - 100,000,000 Default: 10,000,000 (~2.7 hours)

ResetSettings

No additional fields needed

SessionSettings

Optional settings to be used for the duration of a session for all

interactions created within the session.

These can be overridden at the interaction level

All settings are optional, not specifying a setting at any level means the

default or parent context's value will be used. As a rule, only settings

that need to be changed from default should be set explicitly

FieldTypeLabelDescription
archive_session OptionalBool

Whether the session data should be archived when closed, for tuning and other diagnostic purposes Default: false

custom_session_data OptionalString

Optional data (i.e. could be string, JSON, delimited lists, etc.) set by user, for external purposes. Not used by LumenVox

interaction_settings InteractionSettings

Optional settings to be used for duration of session for all interactions created. These can be over-ridden at the interaction level

call_id OptionalString

Optional call identifier sting used for CDR tracking. This is often associated with the telephony call-id value or equivalent.

channel_id OptionalString repeated

Optional channel identifier sting used for CDR tracking. This is often associated with a telephony/MRCP channel SDP value or equivalent.

archive_session_delay_seconds OptionalInt32

Optional delay interval for archiving in seconds Session data will persist more in redis before being written to database

logging_tag OptionalString repeated

Optional tag for logging. Reserved for future use.

TtsInlineSynthesisSettings

FieldTypeLabelDescription
voice OptionalString

Optional voice (if using simple text, or if not specified within SSML)

synth_emphasis_level OptionalString

The strength of the emphasis used in the voice during synthesis. Possible Values: "strong", "moderate", "none" or "reduced".

synth_prosody_pitch OptionalString

The pitch of the audio being synthesized. Possible Values: A number followed by "Hz", a relative change, or one of the following values: "x-low", "low", "medium", "high", "x-high", or "default". See the SSML standard for details.

synth_prosody_contour OptionalString

The contour of the audio being synthesized. Possible Values: Please refer to the SSML standard on pitch contour for details.

synth_prosody_rate OptionalString

The speaking rate of the audio being synthesized. Possible Values: A relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default". See the SSML standard for details.

synth_prosody_duration OptionalString

The duration of time it will take for the synthesized text to play. Possible Values: A time, such as "250ms" or "3s".

synth_prosody_volume OptionalString

The volume of the audio being synthesized. Possible Values: A number, a relative change or one of: "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default". See the SSML specification for details.

synth_voice_age OptionalString

The age of the voice used for synthesis. Possible Values: A non-negative integer.

synth_voice_gender OptionalString

The default TTS gender to use if none is specified. Possible Values: Either neutral (which uses the default), male, or female.

TtsSettings

FieldTypeLabelDescription
voice_mappings TtsSettings.VoiceMappingsEntry repeated

Voice mappings allow alternative voice names to map to LumenVox voices. The key is the language of the voice mappings.

TtsSettings.VoiceMappingsEntry

FieldTypeLabelDescription
key string

value VoiceMapping

VadSettings

Settings related to Voice Activity Detection (VAD)

VAD is used to begin audio processing once a person starts speaking and

is used to detect when a person has stopped speaking

FieldTypeLabelDescription
use_vad OptionalBool

When `false`, all audio as specified in AudioConsumeSettings is used for processing. In streaming audio mode, InteractionFinalizeProcessing() would need be called to finish processing When `true`, VAD is used to determine when the speaker starts and stops speaking. When using VAD in batch audio mode, the engine will look for speech begin within the designated audio to process and will stop processing audio when end of speech is found, which may mean that all audio loaded is not processed.

barge_in_timeout_ms OptionalInt32

Maximum silence, in ms, allowed while waiting for user input (barge-in) before a timeout is reported. Range: -1 (infinite) to positive integer number of milliseconds Default: -1 (infinite)

end_of_speech_timeout_ms OptionalInt32

After barge-in, STREAM_STATUS_END_SPEECH_TIMEOUT will occur if end-of-speech not detected in time specified by this property. This is different from the eos_delay_ms; This value represents the total amount of time a caller is permitted to speak after barge-in is detected. Range: a positive number of milliseconds or -1 (infinite) Default: -1 (infinite)

noise_reduction_mode VadSettings.NoiseReductionMode

Determines noise reduction mode. Default: NOISE_REDUCTION_DEFAULT

bargein_threshold OptionalInt32

A higher value makes the VAD more sensitive towards speech, and less sensitive towards non-speech, which means that the VAD algorithm must be more sure that the audio is speech before triggering barge in. Raising the value will reject more false positives/noises. However, it may mean that some speech that is on the borderline may be rejected. This value should not be changed from the default without significant tuning and verification. Range: Integer value from 0 to 100 Default: 50

eos_delay_ms OptionalInt32

Milliseconds of silence after speech before processing begins. Range: A positive integer number of milliseconds Default: 800

snr_sensitivity OptionalInt32

Determines how much louder the speaker must be than the background noise in order to trigger barge-in. The smaller this value, the easier it will be to trigger barge-in. Range: Integer range from 0 to 100 Default: 50

stream_init_delay OptionalInt32

Accurate VAD depends on a good estimation of the acoustic environment. The VAD module uses the first couple frames of audio to estimate the acoustic environment, such as noise level. The length of this period is defined by this parameter. Range: A positive integer number of milliseconds. Default: 100

volume_sensitivity OptionalInt32

The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be. This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive) prompts that are not properly cancelled will be less likely to falsely cancel barge-in. Range: Integer range from 0 to 100 Default: 50

wind_back_ms OptionalInt32

The length of audio to be wound back at the beginning of voice activity. This is used primarily to counter instances where barge-in does not accurately capture the very start of speech. The resolution of this parameter is 1/8 of a second. Range: A positive integer number of milliseconds Default: 480

VoiceMapping

FieldTypeLabelDescription
voicePairs VoiceMapping.VoicePairsEntry repeated

A map of custom voice pairs. The key is the voice that will be requested by the API user. The value is the LumenVox voice that will be used for synthesis. Use the key "default" to set a default voice for the given language.

VoiceMapping.VoicePairsEntry

FieldTypeLabelDescription
key string

value string

AudioConsumeSettings.AudioConsumeMode

NameNumberDescription
AUDIO_CONSUME_MODE_UNSPECIFIED 0

No mode specified

AUDIO_CONSUME_MODE_STREAMING 1

Specify streaming mode is used

AUDIO_CONSUME_MODE_BATCH 2

Specify batch mode is used

AudioConsumeSettings.StreamStartLocation

Only used when AUDIO_CONSUME_MODE_STREAMING is used

NameNumberDescription
STREAM_START_LOCATION_UNSPECIFIED 0

No location specified

STREAM_START_LOCATION_STREAM_BEGIN 1

Start processing from the beginning of the stream. Note: Only valid option for AUDIO_CONSUME_MODE_BATCH

STREAM_START_LOCATION_BEGIN_PROCESSING_CALL 2

Start processing from the audio streamed after the API call InteractionBeginProcessing() was made. Note: Not valid for AUDIO_CONSUME_MODE_BATCH

STREAM_START_LOCATION_INTERACTION_CREATED 3

Start processing from the audio streamed after the interaction was created. Note: Not valid for AUDIO_CONSUME_MODE_BATCH

GrammarSettings.TagFormat

NameNumberDescription
TAG_FORMAT_UNSPECIFIED 0

TAG_FORMAT_LUMENVOX_1 1

lumenvox/1.0 tag format

TAG_FORMAT_SEMANTICS_1 2

semantics/1.0 tag format

TAG_FORMAT_SEMANTICS_1_LITERALS 3

semantics/1.0-literals tag format

TAG_FORMAT_SEMANTICS_1_2006 4

semantics/1.0.2006 tag format

TAG_FORMAT_SEMANTICS_1_2006_LITERALS 5

semantics/1.0.2006-literals tag format

LoggingSettings.LoggingVerbosity

NameNumberDescription
LOGGING_VERBOSITY_UNSPECIFIED 0

Logging verbosity is not specified

LOGGING_VERBOSITY_DEBUG 1

Internal system events that are not usually observable

LOGGING_VERBOSITY_INFO 2

Routine logging, such as ongoing status or performance

LOGGING_VERBOSITY_WARNING 3

Warnings and above only - service degradation or danger

LOGGING_VERBOSITY_ERROR 4

Functionality is unavailable, invariants are broken, or data is lost

LOGGING_VERBOSITY_CRITICAL 5

Only log exceptions and critical errors (not recommended)

VadSettings.NoiseReductionMode

NameNumberDescription
NOISE_REDUCTION_MODE_UNSPECIFIED 0

No change to setting

NOISE_REDUCTION_MODE_DISABLED 1

Noise reduction disabled

NOISE_REDUCTION_MODE_DEFAULT 2

Default (recommended) noise reduction algorithm is enabled.

NOISE_REDUCTION_MODE_ALTERNATE 3

Alternate noise reduction algorithm. Similar to default, but we have seen varied results based on differing noise types and levels.

NOISE_REDUCTION_MODE_ADAPTIVE 4

Uses an adaptive noise reduction algorithm that is most suited to varying levels of background noise, such as changing car noise, etc.

Scalar Value Types

.proto TypeNotesC++JavaPythonGoC#PHPRuby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)

Copyright (C) 2001-2024, Ai Software, LLC d/b/a LumenVox