Protocol Documentation

lumenvox/api/audio_formats.proto
- MAudioFormat
- EAudioFormat.StandardAudioFormat
lumenvox/api/common.proto
lumenvox/api/global.proto
lumenvox/api/interaction.proto
lumenvox/api/lumenvox.proto
- SLumenVox
lumenvox/api/optional_values.proto
lumenvox/api/results.proto
lumenvox/api/session.proto
lumenvox/api/settings.proto
Scalar Value Types

lumenvox/api/audio_formats.proto

Top

AudioFormat

Field	Type	Label	Description
standard_audio_format	AudioFormat.StandardAudioFormat		Standard audio format
sample_rate_hertz	OptionalInt32		Sample rate in Hertz of the audio data This field is mandatory for RAW PCM audio format. It's optional for the other formats. For audio formats with headers, this value will be ignored, and instead the value from the file header will be used. Default: 8000 (8 KHz)

AudioFormat.StandardAudioFormat

Specification for the audio format

Not all standard formats are supported in all cases. Different operations

may natively handle a subset of the total audio formats.

Name	Number	Description
STANDARD_AUDIO_FORMAT_UNSPECIFIED	0
STANDARD_AUDIO_FORMAT_LINEAR16	1	Uncompressed 16-bit signed little-endian samples (Linear PCM).
STANDARD_AUDIO_FORMAT_ULAW	2	8-bit audio samples using G.711 PCMU/mu-law.
STANDARD_AUDIO_FORMAT_ALAW	3	8-bit audio samples using G.711 PCMA/a-law.
STANDARD_AUDIO_FORMAT_WAV	4	WAV formatted audio
STANDARD_AUDIO_FORMAT_FLAC	5	FLAC formatted audio
STANDARD_AUDIO_FORMAT_MP3	6	MP3 formatted audio
STANDARD_AUDIO_FORMAT_OPUS	7	OPUS formatted audio
STANDARD_AUDIO_FORMAT_M4A	8	M4A formatted audio
STANDARD_AUDIO_FORMAT_MP4	9	Audio packed into MP4 container
STANDARD_AUDIO_FORMAT_NO_AUDIO_RESOURCE	100	Explicitly indicate that no audio resource should be allocated

lumenvox/api/common.proto

Top

AudioPullRequest

Field	Type	Label	Description
audio_id	string		Id of the audio requested (Note that this could be session_id to request the inbound audio resource)
audio_channel	OptionalInt32		For multi-channel audio, this is the channel number being referenced. Range is from 0 to N. Default channel 0 will be used if not specified
audio_start	OptionalInt32		Number of milliseconds from the beginning of the audio to return. Default is from the beginning
audio_length	OptionalInt32		Maximum number of milliseconds to return. A zero value returns all available audio (from requested start point). Default is all audio, from start point

AudioPullResponse

Field	Type	Label	Description
audio_data	bytes		Binary audio data that was requested
audio_channel	OptionalInt32		For multi-channel audio, this is the channel number being referenced.
final_data_chunk	bool		In case of large audio, data will be split and there will be multiple AudioPullResponse messages. final_data_chunk field is set to true for the last message

AudioPushRequest

Note that SessionInboundAudioFormatRequest should be called before

using this message, so that the audio format is defined

Field	Type	Label	Description
audio_data	bytes		Binary audio data to be added to the audio resource

AudioRequestMessage

Field	Type	Label	Description
audio_push	AudioPushRequest		Streamed binary audio data to be added to the session audio resource
audio_pull	AudioPullRequest		Returns a block of audio data from an audio resource.

DtmfPushRequest

Field	Type	Label	Description
interaction_id	string		ASR interaction to associate this dtmf_key with
dtmf_key	string		DTMF key press to be added to interaction stream for processing. Valid keys are 0-9, A-F, *, #

Event

Event can be either a VadEvent or a SessionEvent

Field	Type	Label	Description
vad_event	VadEvent		Event returned form Vad (AudioManager)
session_event	SessionEvent		Session Events used to report errors to the API user

Grammar

Field	Type	Label	Description
grammar_url	string		A grammar URL to be loaded
inline_grammar_text	string		A string containing the raw grammar text
global_grammar_label	string		Deprecated. Reference to a previously defined "global" grammar Note: label must consist of letters, digits, hyphens, underscores only
session_grammar_label	string		Reference to a previously defined "session" grammar Note: label must consist of letters, digits, hyphens, underscores only
builtin_voice_grammar	Grammar.BuiltinGrammar		Reference to a "builtin" voice grammar
builtin_dtmf_grammar	Grammar.BuiltinGrammar		Reference to a "builtin" DTMF grammar
label	OptionalString		Optional label assigned to grammar, used for error reporting Note: label must consist of letters, digits, hyphens, underscores only

Fields with deprecated option

Name	Option
global_grammar_label	true

LogEvent

a single event with timestamp to be logged to the database

the LogEvent will be returned via reporting api

Field	Type	Label	Description
time_stamp	google.protobuf.Timestamp		Log Event Timestamp (UTC)
event	Event		can be either a VadEvent or a SessionEvent

PhraseList

Field	Type	Label	Description
phrase_list_label	string		The label of a previously defined global phrase list

SessionEvent

Field	Type	Label	Description
interaction_id	OptionalString		Optional interaction object being referenced
status_message	google.rpc.Status		String containing event information

VadEvent

Message used to signal events over the course of Voice Activity Detection

processing.

The audio_offset will signify at what point within the session audio

resource the event occurred.

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced
vad_event_type	VadEvent.VadEventType		The type of event this message represents
audio_offset	OptionalInt32		The offset in milliseconds from the beginning of the audio resource that this event occurred

Grammar.BuiltinGrammar

Note that all builtin grammars are language-specific

Name	Number	Description
BUILTIN_GRAMMAR_UNSPECIFIED	0	Undefined built-in grammar
BUILTIN_GRAMMAR_BOOLEAN	1	"yes" => true
BUILTIN_GRAMMAR_CURRENCY	2	"one dollar ninety seven" => USD1.97
BUILTIN_GRAMMAR_DATE	3	"march sixteenth nineteen seventy nine" => 19790316
BUILTIN_GRAMMAR_DIGITS	4	"one two three four" => 1234
BUILTIN_GRAMMAR_NUMBER	5	"three point one four one five nine two six" => 3.1415926
BUILTIN_GRAMMAR_PHONE	6	"eight five eight seven oh seven oh seven oh seven" => 8587070707
BUILTIN_GRAMMAR_TIME	7	"six o clock" => 0600

GrammarMode

List of all grammar modes.

Name	Number	Description
GRAMMAR_MODE_UNSPECIFIED	0	Mode not specified
GRAMMAR_MODE_VOICE	1	Voice mode
GRAMMAR_MODE_DTMF	2	DTMF mode
GRAMMAR_MODE_VOICE_AND_DTMF	3	Voice and DTMF mode Deprecated - should not be used

InteractionStatus

List of all Interaction statuses.

Name	Number	Description
INTERACTION_STATUS_UNSPECIFIED	0	This status is not expected or valid to happen. Indicating empty message.
INTERACTION_STATUS_CREATED	1	Interaction is in created only state, no additional processing is done yet.
INTERACTION_STATUS_RESULTS_READY	2	Interaction results are ready. Most results are sent automatically when ready.
INTERACTION_STATUS_CLOSED	3	Used to indicated successfully closed interaction state
INTERACTION_STATUS_CANCELED	4	Used to indicated successfully canceled interaction state
INTERACTION_STATUS_ASR_WAITING_ON_GRAMMARS	101	Audio processing not started yet. Waiting on grammars to be loaded.
INTERACTION_STATUS_ASR_WAITING_ON_BARGIN	102	Audio processing not started yet. Waiting on BARGE_IN event from VAD
INTERACTION_STATUS_ASR_STREAM_REQUEST	103	Initial status or post BARGE_IN status of interaction, stream processing not started yet
INTERACTION_STATUS_ASR_STOP_REQUESTED_WAITING	104	Batch mode, waiting for STOP request
INTERACTION_STATUS_ASR_STREAM_STARTED	105	ASR started reading stream
INTERACTION_STATUS_ASR_STREAM_STOP_REQUESTED	106	Set in case of Finalize request
INTERACTION_STATUS_ASR_WAITING_FOR_CPA_AMD_RESPONSE	107	Used for CPA and AMD interactions
INTERACTION_STATUS_ASR_TIMEOUT	109	No VAD event or interaction finalize, ASR processing timed out
INTERACTION_STATUS_ASR_WAITING_ON_BARGEOUT	110	Audio processing started. Waiting on BARGE_OUT event from VAD
INTERACTION_STATUS_TTS_PROCESSING	200	TTS processing
INTERACTION_STATUS_GRAMMAR_PARSE_WAITING_ON_GRAMMARS	400	Grammar(s) loading in progress, interaction not started yet
INTERACTION_STATUS_GRAMMAR_PARSE_REQUESTED_PROCESSING	401	Interaction processing in progress
INTERACTION_STATUS_NORMALIZE_TEXT_REQUESTED_PROCESSING	500	Normalize Text
INTERACTION_STATUS_ASR_TRANSCRIPTION_WAITING_ON_PHRASE_LISTS	600	Asr Transcription

InteractionSubType

List of all interaction sub-types for ASR Interactions

Name	Number	Description
INTERACTION_SUB_TYPE_UNSPECIFIED	0	This is not valid type. Indicating empty gRPC message.
INTERACTION_SUB_TYPE_GRAMMAR_BASED_CPA	1	Call process analysis interaction type with grammars
INTERACTION_SUB_TYPE_GRAMMAR_BASED_AMD	2	Answering machine detection interaction type with grammars
INTERACTION_SUB_TYPE_ENHANCED_TRANSCRIPTION	3	ASR transcription interaction with multiple grammars
INTERACTION_SUB_TYPE_CONTINUOUS_TRANSCRIPTION	4	Deprecated - ASR continuous transcription
INTERACTION_SUB_TYPE_TRANSCRIPTION_WITH_NORMALIZATION	5	Deprecated - Transcription result with normalized text Normalization can be enabled for different interaction types/subtypes in parallel, e.g. GRAMMAR_BASED_TRANSCRIPTION can have normalization setting as well. If needed for filtering, this flag will be added separately
INTERACTION_SUB_TYPE_GRAMMAR_BASED_TRANSCRIPTION	6	Transcription interaction type with grammars

InteractionType

List of all Interaction types.

Name	Number	Description
INTERACTION_TYPE_UNSPECIFIED	0	This is not valid type. Indicating empty gRPC message.
INTERACTION_TYPE_ASR	2	ASR processing interaction
INTERACTION_TYPE_TTS	3	TTS processing interaction
INTERACTION_TYPE_GRAMMAR_PARSE	4	Validate grammar content. Can be url, inline or file reference (label)
INTERACTION_TYPE_NORMALIZATION	5	Normalization interaction type
INTERACTION_TYPE_CPA	6	Call process analysis interaction type
INTERACTION_TYPE_AMD	7	Answering machine detection interaction type
INTERACTION_TYPE_ASR_TRANSCRIPTION	8	ASR transcription interaction type

VadEvent.VadEventType

Name	Number	Description
VAD_EVENT_TYPE_UNSPECIFIED	0	Undefined VAD event type
VAD_EVENT_TYPE_BEGIN_PROCESSING	1	VAD begins processing audio
VAD_EVENT_TYPE_BARGE_IN	2	Barge-in occurred, audio that will be processed by the ASR starts here. This notification might be useful to stop prompt playback for example
VAD_EVENT_TYPE_END_OF_SPEECH	3	End-of-speech occurred, no further audio will be processed by VAD for the specified interaction. If the setting VadSettings.auto_finalize_on_eos is true, the ASR will immediately finish processing audio at this point
VAD_EVENT_TYPE_BARGE_IN_TIMEOUT	4	VAD timed out waiting for audio barge-in (start-of-speech). The audio manager will no longer process audio for this interaction.
VAD_EVENT_TYPE_END_OF_SPEECH_TIMEOUT	5	VAD timed out waiting for audio barge-out (end-of-speech). The audio manager will no longer process audio for this interaction.
VAD_EVENT_TYPE_END_OF_AUDIO_BEFORE_BARGEIN	6	VAD has reached audio_consume_max_ms before barge-in has occurred.
VAD_EVENT_TYPE_END_OF_AUDIO_AFTER_BARGEIN	7	VAD has reached audio_consume_max_ms before barge-out (end-of-speech) has occurred.

lumenvox/api/global.proto

Top

GlobalEvent

Field	Type	Label	Description
status_message	google.rpc.Status		String containing event information

GlobalGetSettingsRequest

Field	Type	Label	Description
settings_type	GlobalGetSettingsRequest.GetSettingsType		Used to specify the type of settings to request

GlobalLoadGrammarRequest

Field	Type	Label	Description
language	string		The language selector the specified grammar e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.)
grammar_label	string		Reference label for global grammar Note: label must consist of letters, digits, hyphens, underscores only
grammar_url	string		A grammar URL to be loaded
inline_grammar_text	string		A string containing the raw grammar text
grammar_settings	GrammarSettings		Optional grammar settings applied to this request

GlobalLoadGrammarResponse

Field	Type	Label	Description
status	google.rpc.Status		The status of the grammar load
mode	GrammarMode		The mode of the loaded grammar
label	string		The label for the loaded grammar

GlobalLoadPhraseList

Field	Type	Label	Description
phrases	string	repeated	A list of strings containing word and phrase "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary.
phase_list_label	string		A label that can be used to reference this list within a transcription request
language	string		The language selector describing which ASR resource will process request e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. Note that phrase lists are inherently language-independent, so this field is only used to direct which language-dependent resource will process the phrase load request
phrase_list_settings	PhraseListSettings		Optional settings specifying boost options for phrases

GlobalLoadPhraseListResponse

Field	Type	Label	Description
status	google.rpc.Status		The status of the phrase list load.
label	string		The label for the phrase list.

GlobalRequest

Field	Type	Label	Description
correlation_id	OptionalString		Optional unique reference per request message. A UUID value will be auto generated if not supplied by client
deployment_id	string		Valid deployment identifier (UUID) to associate the request with
operator_id	string		UUID related to the operator (entity or person making request)
global_load_grammar_request	GlobalLoadGrammarRequest		Load a globally defined grammar
global_load_phrase_list	GlobalLoadPhraseList		Load a globally defined phrase list
global_get_settings_request	GlobalGetSettingsRequest		Get specified global default settings
session_settings	SessionSettings		Default session settings
interaction_settings	InteractionSettings		Deprecated. Default interaction settings
grammar_settings	GrammarSettings		Deprecated. Default grammar settings
recognition_settings	RecognitionSettings		Deprecated. Default recognition settings
normalization_settings	NormalizationSettings		Deprecated. Default normalization settings
vad_settings	VadSettings		Deprecated. Default VAD settings
cpa_settings	CpaSettings		Deprecated. Default CPA settings
amd_settings	AmdSettings		Deprecated. Default tone detection settings
audio_consume_settings	AudioConsumeSettings		Deprecated. Default audio consume settings
logging_settings	LoggingSettings		Default logging settings
phrase_list_settings	PhraseListSettings		Deprecated. Optional settings specifying boost options for phrases
reset_settings	ResetSettings		Will reset all of the settings to default

Fields with deprecated option

Name	Option
interaction_settings	true
grammar_settings	true
recognition_settings	true
normalization_settings	true
vad_settings	true
cpa_settings	true
amd_settings	true
audio_consume_settings	true
phrase_list_settings	true

GlobalResponse

Field	Type	Label	Description
correlation_id	OptionalString		Reference to corresponding request correlation_id
global_event	GlobalEvent		Global event notification (typically errors)
global_settings	GlobalSettings		Global default settings (which were requested)
global_grammar	GlobalLoadGrammarResponse		Deprecated. Response to a global load grammar request
global_phrase_list	GlobalLoadPhraseListResponse		Response to a global load phrase list request

Fields with deprecated option

Name	Option
global_grammar	true

GlobalSettings

Container for all session and interaction related (global) settings

Field	Type	Label	Description
session_settings	SessionSettings		Default session settings
interaction_settings	InteractionSettings		Deprecated. Default interaction settings
grammar_settings	GrammarSettings		Deprecated. Default grammar settings
recognition_settings	RecognitionSettings		Deprecated. Default recognition settings
normalization_settings	NormalizationSettings		Deprecated. Default normalization settings
vad_settings	VadSettings		Deprecated. Default VAD settings
cpa_settings	CpaSettings		Deprecated. Default CPA settings
amd_settings	AmdSettings		Deprecated. Default tone detection settings
audio_consume_settings	AudioConsumeSettings		Deprecated. Default audio consume settings
logging_settings	LoggingSettings		Default logging settings
phrase_list_settings	PhraseListSettings		Deprecated. Optional settings specifying boost options for phrases
tts_settings	TtsSettings		Deprecated. Optional settings for Text-To-Speech (TTS)

Fields with deprecated option

Name	Option
interaction_settings	true
grammar_settings	true
recognition_settings	true
normalization_settings	true
vad_settings	true
cpa_settings	true
amd_settings	true
audio_consume_settings	true
phrase_list_settings	true
tts_settings	true

GlobalGetSettingsRequest.GetSettingsType

Name	Number	Description
GET_SETTINGS_TYPE_UNSPECIFIED	0
GET_SETTINGS_TYPE_SESSION	1	SessionSettings type
GET_SETTINGS_TYPE_INTERACTION	2	InteractionSettings type
GET_SETTINGS_TYPE_GRAMMAR	3	GrammarSettings type
GET_SETTINGS_TYPE_RECOGNITION	4	RecognitionSettings type
GET_SETTINGS_TYPE_NORMALIZATION	5	NormalizationSettings type
GET_SETTINGS_TYPE_VAD	6	VadSettings type
GET_SETTINGS_TYPE_CPA	7	CpaSettings type
GET_SETTINGS_TYPE_AMD	8	AmdSettings type
GET_SETTINGS_TYPE_AUDIO_CONSUME	9	AudioConsumeSettings type
GET_SETTINGS_TYPE_LOGGING_SETTINGS	10	LoggingSettings type
GET_SETTINGS_TYPE_PHRASE_LIST	11	PhraseList type

lumenvox/api/interaction.proto

Top

InteractionBeginProcessingRequest

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced

InteractionCancelRequest

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced

InteractionCancelResponse

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced
close_status	google.rpc.Status		Status of request

InteractionCloseRequest

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced

InteractionCloseResponse

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced
close_status	google.rpc.Status		Status of request

InteractionCreateAmdRequest

Field	Type	Label	Description
amd_settings	AmdSettings		Parameters for this interaction
audio_consume_settings	AudioConsumeSettings		Optional settings specifying audio to process for interaction
vad_settings	VadSettings		Optional settings related to voice activity detection
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions

InteractionCreateAmdResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent AMD processing

InteractionCreateAsrRequest

Field	Type	Label	Description
language	string		The language selector the specified grammars e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.
grammars	Grammar	repeated	List of grammars to use, one for each root grammar to activate
grammar_settings	GrammarSettings		Optional grammar settings to apply to this interaction
recognition_settings	RecognitionSettings		Optional recognition settings for this interaction
vad_settings	VadSettings		Optional settings related to voice activity detection
audio_consume_settings	AudioConsumeSettings		Optional settings specifying audio to process for interaction
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions

InteractionCreateAsrResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent ASR processing

InteractionCreateCpaRequest

Field	Type	Label	Description
cpa_settings	CpaSettings		Parameters for this interaction
audio_consume_settings	AudioConsumeSettings		Optional settings specifying audio to process for interaction
vad_settings	VadSettings		Optional settings related to voice activity detection
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions

InteractionCreateCpaResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent CPA processing

InteractionCreateGrammarParseRequest

Field	Type	Label	Description
language	string		The language selector the specified grammars e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.
grammars	Grammar	repeated	List of grammars to use, one for each root grammar to activate
grammar_settings	GrammarSettings		Optional grammar settings to apply to this interaction
input_text	string		Input text to be parsed against specified grammar[s]
parse_timeout_ms	OptionalInt32		Maximum milliseconds to allow for a grammar parse. If this is exceeded, a timeout error will be raised. Range 0-10000000 (~166 minutes) Default: 10000 (10 seconds)
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions

InteractionCreateGrammarParseResponse

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced by the request

InteractionCreateNormalizeTextRequest

Field	Type	Label	Description
language	string		Language to use for normalization (e.g. en-us)
transcript	string		All words in single string.
normalization_settings	NormalizationSettings		Optional settings specifying whether text normalization step should be performed on output of this interaction.
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions

InteractionCreateNormalizeTextResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (UUID) that can be used during subsequent Normalize Text processing

InteractionCreateTranscriptionRequest

Field	Type	Label	Description
language	string		Transcription language selector this request. e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.
phrases	TranscriptionPhraseList	repeated	Optional phrase lists for interaction
continuous_utterance_transcription	OptionalBool		If `true`, transcription will perform continuous recognition (continuing to wait for and process audio even if the user pauses speaking) until the client closes the input stream (gRPC API). This may return multiple FinalResult callback messages. If `false`, the recognizer will detect a single spoken utterance. When it detects that the user has paused or stopped speaking, it will return an FinalResult callback and cease recognition. It will return no more than one FinalResult. Default: false
recognition_settings	RecognitionSettings		Optional recognition settings for this interaction
vad_settings	VadSettings		Optional settings related to voice activity detection
audio_consume_settings	AudioConsumeSettings		Optional settings specifying audio to process for interaction
normalization_settings	NormalizationSettings		Optional settings specifying whether text normalization step should be performed on output of this interaction.
phrase_list_settings	PhraseListSettings		Optional settings specifying boost options for phrases
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions
embedded_grammars	Grammar	repeated	Optional list of grammars to use during transcription when a grammar matches during transcription, the semantic results of the grammar will also be returned
embedded_grammar_settings	GrammarSettings		Optional grammar settings for embedded grammars
language_model_name	OptionalString		Optional name of a language model (decoder) to use when processing transcription. Default is to not specify this, allowing engine to use default language decoder
acoustic_model_name	OptionalString		Optional name of a acoustic model (encoder) to use when processing transcription. Default is to not specify this, allowing engine to use default language encoder
enable_postprocessing	OptionalString		Optional custom postprocessing to enhance decoder functionality. Default is to not specify this, allowing engine to use default postprocessing

InteractionCreateTranscriptionResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent ASR processing

InteractionCreateTtsRequest

Field	Type	Label	Description
language	string		Synthesis language for this request (e.g.: "en-US", "de-DE", etc.)
ssml_request	InteractionCreateTtsRequest.SsmlUrlRequest		SSML type request and parameters
inline_request	InteractionCreateTtsRequest.InlineTtsRequest		Inline TTS definition (text and optional parameters)
audio_format	AudioFormat		Audio format to be generated by TTS Synthesis Note: this is not configurable at Session or Global level, since it is explicitly required for each interaction request.
synthesis_timeout_ms	OptionalInt32		Optional timeout to limit the maximum time allowed for a synthesis Default: 5000 milliseconds
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions

InteractionCreateTtsRequest.InlineTtsRequest

Inline TTS definition (text and optional parameters)

Field	Type	Label	Description
text	string		Text to synthesize, can simple text, or SSML
tts_inline_synthesis_settings	TtsInlineSynthesisSettings		Optional settings for voice synthesis.
ssl_verify_peer	OptionalBool		Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true

InteractionCreateTtsRequest.SsmlUrlRequest

Field	Type	Label	Description
ssml_url	string		URL from which to fetch synthesis request SSML
ssl_verify_peer	OptionalBool		Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true

InteractionCreateTtsResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent TTS processing

InteractionFinalizeProcessingRequest

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced

InteractionRequestMessage

Field	Type	Label	Description
interaction_create_amd	InteractionCreateAmdRequest		Create AMD interaction request
interaction_create_asr	InteractionCreateAsrRequest		Create ASR interaction request
interaction_create_cpa	InteractionCreateCpaRequest		Create CPA interaction request
interaction_create_transcription	InteractionCreateTranscriptionRequest		Create transcription interaction request
interaction_create_tts	InteractionCreateTtsRequest		Create TTS interaction request
interaction_create_grammar_parse	InteractionCreateGrammarParseRequest		Create a grammar parse request
interaction_begin_processing	InteractionBeginProcessingRequest		Interaction begin processing
interaction_finalize_processing	InteractionFinalizeProcessingRequest		Interaction finalize processing
interaction_request_results	InteractionRequestResultsRequest		Interaction request results
interaction_create_normalize_text	InteractionCreateNormalizeTextRequest		Create a normalize text request
interaction_cancel	InteractionCancelRequest		Interaction cancel
interaction_close	InteractionCloseRequest		Explicit request to close interaction

InteractionRequestResultsRequest

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced

InteractionRequestResultsResponse

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced
interaction_results	Result		Requested results

TranscriptionPhraseList

Field	Type	Label	Description
phrases	string	repeated	Optional list of strings containing words and phrases "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary.
global_phrase_list	PhraseList		Optional reference to previously defined global phrase list(s)
session_phrase_list	PhraseList		Optional reference to previously defined session phrase list(s)

lumenvox/api/lumenvox.proto

Top

LumenVox

LumenVox Service

The LumenVox API can be used to access various speech resources,

such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS),

Transcription, Call-Progress-Analysis (CPA), etc.

Method Name	Request Type	Response Type	Description
Session	SessionRequest stream	SessionResponse stream	Session Creates a new session and establishes a bidirectional stream, able to process all messages on this single bidirectional connection
Global	GlobalRequest stream	GlobalResponse stream	Global Manages globally defined (deployment-level) objects

lumenvox/api/optional_values.proto

Top

OptionalBool

Wrapper message for optional `bool`.

The JSON representation for `OptionalBool` is JSON `true` and `false`.

Field	Type	Label	Description
value	bool		The bool value.

OptionalBytes

Wrapper message for optional `bytes`.

The JSON representation for `OptionalBytes` is JSON string.

Field	Type	Label	Description
value	bytes		The bytes value.

OptionalDouble

Wrapper message for optional `double`.

The JSON representation for `OptionalDouble` is JSON number.

Field	Type	Label	Description
value	double		The double value.

OptionalFloat

Wrapper message for optional `float`.

The JSON representation for `OptionalFloat` is JSON number.

Field	Type	Label	Description
value	float		The float value.

OptionalInt32

Wrapper message for optional `int32`.

The JSON representation for `OptionalInt32` is JSON number.

Field	Type	Label	Description
value	int32		The int32 value.

OptionalInt64

Wrapper message for optional `int64`.

The JSON representation for `OptionalInt64` is JSON string.

Field	Type	Label	Description
value	int64		The int64 value.

OptionalString

Wrapper message for optional `string`.

The JSON representation for `OptionalString` is JSON string.

Field	Type	Label	Description
value	string		The string value.

OptionalUInt32

Wrapper message for optional `uint32`.

The JSON representation for `OptionalUInt32` is JSON number.

Field	Type	Label	Description
value	uint32		The uint32 value.

OptionalUInt64

Wrapper message for optional `uint64`.

The JSON representation for `OptionalUInt64` is JSON string.

Field	Type	Label	Description
value	uint64		The uint64 value.

lumenvox/api/results.proto

Top

AmdInteractionResult

Result returned from an AMD interaction.

Field	Type	Label	Description
amd_result	AsrGrammarResult		AMD result in the form of an ASR-type message.

AsrGrammarResult

Structure to hold data provided from ASR as final results

Field	Type	Label	Description
asr_result_meta_data	AsrResultMetaData		Raw ASR output used to produce semantic interpretations
semantic_interpretations	SemanticInterpretation	repeated	List of all possible semantic interpretations for given transcript.

AsrInteractionResult

Result returned from an ASR interaction.

Field	Type	Label	Description
n_bests	AsrGrammarResult	repeated	List of the N best possible matches provided via ASR.
input_mode	string		The modality of the input, for example, speech, dtmf, etc.
language	string		Language defined when creating the interaction.

AsrResultMetaData

Raw transcript of words decoded by ASR

Field	Type	Label	Description
words	Word	repeated	All words in Phrase so far.
transcript	string		All words in single string.
start_time_ms	int32		Time in milliseconds since beginning of audio stream where recognition starts.
duration_ms	int32		Length of transcript in milliseconds.
confidence	uint32		Overall confidence of the entire transcript.

CpaInteractionResult

Result returned from a CPA interaction.

Field	Type	Label	Description
cpa_result	AsrGrammarResult		CPA result in the form of an ASR-type message.

FinalResult

Callback sent when a final interaction result is ready.

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced
final_result	Result		Final result for the specified interaction. Null if status error > 0
final_result_status	FinalResultStatus		Final status of the interaction
status	google.rpc.Status		Status code produced. Returns 0 on success. this is comming form the 'internal' result message and should be passed to the caller shall we include the error here ??? or better send it as a SessionEvent ??

GrammarParseInteractionResult

Result returned from grammar parse interaction.

Field	Type	Label	Description
input_text	string		Input string used during grammar parse
semantic_interpretations	SemanticInterpretation	repeated	List of all possible semantic interpretations for given text.
input_mode	string		The modality of the input, for example, speech, dtmf, etc.
language	string		Language defined when creating the interaction.
has_next_transition	bool		Set to true if more input on input text is valid of interaction grammars.

InverseTextNormalizationToken

Token used in Inverse Text Normalization

Field	Type	Label	Description
tag	string		Type of token.
data	google.protobuf.Struct		All data in token

NormalizationSegment

One segment (one or more words) that is part of a result phrase.

Field	Type	Label	Description
original_segment	string		Input word used to create segment.
original_word_indices	uint32	repeated	Index to words in original input.
vocalization	string		Output after Inverse Text normalization.
token	InverseTextNormalizationToken		Token information used in Inverse Text normalization.
redaction	RedactionData		Data add for redaction.
final	string		Final output for segment.

NormalizeTextResult

Result returned from an Normalize Text interaction.

Field	Type	Label	Description
transcript	string		Input string used for the text normalization request
normalized_result	NormalizedResult		Normalized result message

NormalizedResult

Result returned from an Normalize Text. Used in either Transcription

interaction or a Text Normalization interaction.

Field	Type	Label	Description
segments	NormalizationSegment	repeated	All segments in result.
verbalized	string		Output after Inverse Text normalization.
verbalized_redacted	string		Output after Inverse Text normalization and redacted.
final	string		Final output after Inverse Text normalization and punctuation and capitalization_normalization
final_redacted	string		Final output after Inverse Text normalization, punctuation and capitalization_normalization, and redaction

PartialResult

Callback sent when a partial interaction result is available.

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced
partial_result	Result		Partial result for the specified interaction

RedactionData

More detail on Redacted tokens

Field	Type	Label	Description
personal_identifiable_information	bool		Redacted Personal Identifiable Information.
entity	string		Type of redaction
score	float		Redaction Score

Result

Contains results of various types that may be returned

Field	Type	Label	Description
asr_interaction_result	AsrInteractionResult		Results for an ASR interaction
transcription_interaction_result	TranscriptionInteractionResult		Results for a transcription interaction
grammar_parse_interaction_result	GrammarParseInteractionResult		Results for a grammar parse interaction
tts_interaction_result	TtsInteractionResult		Results for a TTS interaction
normalize_text_result	NormalizeTextResult		Result for a Normalize Text interaction
amd_interaction_result	AmdInteractionResult		Result for an AMD interaction
cpa_interaction_result	CpaInteractionResult		Result for a CPA interaction

SemanticInterpretation

Semantic Interpretation of an ASR result

Field	Type	Label	Description
interpretation	google.protobuf.Struct		Structure containing Semantic Interpretation.
interpretation_json	string		Json string containing Semantic interpretation.
grammar_label	string		The label of the grammar used to generate this Semantic Interpretation.
confidence	uint32		Value 0 to 1000 of how confident the ASR is that result is correct match
tag_format	string		Tag Format of in grammar used to generate this Semantic Interpretation.
input_text	string		Raw input text for the interpretation

SynthesisOffset

Description of some artifact within the synthesis

Field	Type	Label	Description
name	string		Name of the artifact being referenced
offset_ms	uint32		Offset in milliseconds to the named artifact

SynthesisWarning

Warning generated by a synthesis

Field	Type	Label	Description
message	string		String containing warning message returned from synthesizer
line	OptionalInt32		Optional line indicating where the issue was detected

TranscriptionInteractionResult

Result returned from a transcription interaction.

Field	Type	Label	Description
n_bests	TranscriptionResult	repeated	List of the N best possible matches provided via ASR.
language	string		Language defined when creating the interaction.

TranscriptionResult

Structure to hold data provided from ASR as final results

Field	Type	Label	Description
asr_result_meta_data	AsrResultMetaData		Raw ASR output which includes the transcript of the audio.
normalized_result	NormalizedResult		If results are to be normalized, Normalized Result is added here.
grammar_results	AsrGrammarResult	repeated	If enhanced transcription with grammars is used results are added here.
srt_file	bytes		If SRT generation is enabled, the SRT file is added here.
vtt_file	bytes		If VTT generation is enabled, the VTT file is added here.
blended_score	OptionalFloat		Optional blended quality transcription score

TtsInteractionResult

Contains a TTS interaction result.

Field	Type	Label	Description
audio_format	AudioFormat		Format of returned audio.
audio_length_ms	uint32		Length of generated audio data.
sentence_offsets_ms	uint32	repeated	Offsets in milliseconds to where in audio buffer each synthesized sentence begins.
word_offsets_ms	uint32	repeated	Offsets in milliseconds to where in audio buffer each synthesized word begins.
ssml_mark_offsets	SynthesisOffset	repeated	Offsets to where in audio buffer each synthesized SSML mark begins.
voice_offsets	SynthesisOffset	repeated	Offsets to where in audio voice each synthesized begins.
synthesis_warnings	SynthesisWarning	repeated	List of any Synthesis warnings.

Word

One word that is part of an ASR result.

Field	Type	Label	Description
start_time_ms	int32		Time in milliseconds since beginning of audio where word starts.
duration_ms	int32		Length of word in milliseconds.
word	string		String output of word.
confidence	uint32		Value 0 to 1000 on how confident the result is.

FinalResultStatus

List of Interaction FinalResult Statuses

Name	Number	Description
FINAL_RESULT_STATUS_UNSPECIFIED	0	No final status specified
FINAL_RESULT_STATUS_NO_INPUT	1	No voice audio detected within the audio The final_result field in FinalResult will be empty
FINAL_RESULT_STATUS_ERROR	2	An error occurred that stopped processing
FINAL_RESULT_STATUS_CANCELLED	3	Interaction cancelled or closed before results can be returned
FINAL_RESULT_STATUS_TRANSCRIPTION_MATCH	11	A transcription result was returned
FINAL_RESULT_STATUS_TRANSCRIPTION_CONTINUOUS_MATCH	12	A transcription “intermediate” final result was returned
FINAL_RESULT_STATUS_TRANSCRIPTION_GRAMMAR_MATCHES	13	A transcription result was returned, which contains one or more embedded grammar matches
FINAL_RESULT_STATUS_TRANSCRIPTION_PARTIAL_MATCH	14	A enhanced transcription result was returned, but no SISR
FINAL_RESULT_STATUS_GRAMMAR_MATCH	21	A complete grammar match was returned
FINAL_RESULT_STATUS_GRAMMAR_NO_MATCH	22	No result could be obtained for the audio with the supplied grammars
FINAL_RESULT_STATUS_GRAMMAR_PARTIAL_MATCH	23	Raw text is returned, but could not be parsed with the supplied grammars
FINAL_RESULT_STATUS_AMD_TONE	31	An AMD interaction found one or more tones within the audio
FINAL_RESULT_STATUS_AMD_NO_TONES	32	An AMD interaction found no tones within the audio
FINAL_RESULT_STATUS_CPA_RESULT	41	A CPA interaction result was returned
FINAL_RESULT_STATUS_CPA_SILENCE	42	No voice audio was detected for a CPA interaction
FINAL_RESULT_STATUS_TTS_READY	51	TTS audio is available to pull
FINAL_RESULT_STATUS_TEXT_NORMALIZE_RESULT	61	An inverse text normalization result was returned for a NormalizeText interaction.

lumenvox/api/session.proto

Top

SessionAttachRequest

Field	Type	Label	Description
deployment_id	string		Deployment identifier associated to the session
session_id	string		Valid session identifier to attached to request
operator_id	string		UUID related to the operator (entity or person making request)

SessionCancelRequest

Currently no fields defined

SessionCancelResponse

Field	Type	Label	Description
close_status	google.rpc.Status		Status of request

SessionCloseRequest

Currently no fields defined

SessionCloseResponse

Field	Type	Label	Description
close_status	google.rpc.Status		Status of request

SessionCreateRequest

Field	Type	Label	Description
deployment_id	string		Deployment identifier to associate the session with
session_id	OptionalString		Optional unique reference for session (must be UUID) A UUID value will be auto generated if not supplied by client
operator_id	string		UUID related to the operator (entity or person making request)

SessionGetSettingsRequest

Currently no fields defined

SessionInboundAudioFormatRequest

Field	Type	Label	Description
audio_format	AudioFormat		Parameters for the inbound audio resource associated with the session

SessionLoadGrammarRequest

Field	Type	Label	Description
language	string		The language selector the specified grammar e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.
grammar_label	string		Reference label for session grammar Note: label must consist of letters, digits, hyphens, underscores only
grammar_url	string		A grammar URL to be loaded
inline_grammar_text	string		A string containing the raw grammar text
grammar_settings	GrammarSettings		Optional grammar settings applied to this request

SessionLoadGrammarResponse

Field	Type	Label	Description
status	google.rpc.Status		The status of the grammar load
mode	GrammarMode		The mode of the loaded grammar
label	string		The label for the loaded grammar

SessionLoadPhraseList

Field	Type	Label	Description
phrases	string	repeated	A list of strings containing word and phrase "hints" so that the transcriber recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words or phrases to the transcriber's vocabulary.
phase_list_label	string		A label that can be used to reference this list within a transcription request
language	string		The language selector describing which ASR resource will process request e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc. Note that phrase lists are inherently language-independent, so this field is only used to direct which language-dependent resource will process the phrase load request

SessionLoadPhraseListResponse

Field	Type	Label	Description
status	google.rpc.Status		The status of the phrase list load.
label	string		The label for the phrase list.

SessionRequest

Field	Type	Label	Description
correlation_id	OptionalString		Optional unique reference per request message. A UUID value will be auto generated if not supplied by client
session_request	SessionRequestMessage		For session-specific requests
audio_request	AudioRequestMessage		For audio-specific requests
interaction_request	InteractionRequestMessage		For interaction-specific requests
dtmf_request	DtmfPushRequest		For DTMF events (part of ASR interaction)

SessionRequestMessage

Field	Type	Label	Description
session_create	SessionCreateRequest		Creates a new session and returns its ID and session related messages through response streamed callback messages
session_audio_format	SessionInboundAudioFormatRequest		Defines the inbound audio format for the session. Must be assigned before any audio is sent and cannot later be changed.
session_attach	SessionAttachRequest		Attach to an existing session
session_close	SessionCloseRequest		Explicit request to close session
session_set_settings	SessionSettings		Set settings to be configured for session.
session_get_settings	SessionGetSettingsRequest		Get settings for session.
session_load_grammar	SessionLoadGrammarRequest		Load session-specific grammar
session_load_phrase_list	SessionLoadPhraseList		Load session-specific phrase list
session_cancel	SessionCancelRequest		Explicit request to cancel all session related interactions and processing in progress

SessionResponse

Field	Type	Label	Description
session_id	OptionalString		Session identifier (will be returned from initial call)
correlation_id	OptionalString		Optional reference to corresponding request correlation_id
vad_event	VadEvent		VAD event notification
final_result	FinalResult		Final result notification
partial_result	PartialResult		Partial result notification
session_event	SessionEvent		Session event notification (typically errors)
session_close	SessionCloseResponse		Response for explicit session close request
audio_pull	AudioPullResponse		Response to audio pull request
session_get_settings	SessionSettings		Response to get settings for session.
interaction_create_amd	InteractionCreateAmdResponse		Response to create AMD interaction request
interaction_create_asr	InteractionCreateAsrResponse		Response to create ASR interaction request
interaction_create_cpa	InteractionCreateCpaResponse		Response to create CPA interaction request
interaction_create_tts	InteractionCreateTtsResponse		Response to create TTS interaction request
interaction_create_grammar_parse	InteractionCreateGrammarParseResponse		Response to create a grammar parse request
interaction_create_normalize_text	InteractionCreateNormalizeTextResponse		Response to create a normalize text request
interaction_get_settings	InteractionSettings		Response to interaction get settings request
interaction_request_results	InteractionRequestResultsResponse		Response to interaction request results
interaction_create_transcription	InteractionCreateTranscriptionResponse		Response to create Transcription interaction request
session_phrase_list	SessionLoadPhraseListResponse		Response for session load phrase list request
session_grammar	SessionLoadGrammarResponse		Response for session load grammar request
interaction_cancel	InteractionCancelResponse		Response to interaction cancel
interaction_close	InteractionCloseResponse		Response to explicit request to close interaction
session_cancel	SessionCancelResponse		Response for explicit session cancel request

lumenvox/api/settings.proto

Top

AmdSettings

Settings related to answering machine / tone detection

and other tones such as FAX or SIT tone

Field	Type	Label	Description
amd_enable	OptionalBool		Enabled answering machine beep detection Default: true
amd_input_text	OptionalString		Which string is returned in response to an AMD beep detection Default: AMD
fax_enable	OptionalBool		Enable fax tone detection Default: true
fax_input_text	OptionalString		Which string is returned in response to a fax tone detection Default: FAX
sit_enable	OptionalBool		Enable SIT detection Default: true
sit_reorder_local_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT REORDER LOCAL"
sit_vacant_code_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT VACANT CODE"
sit_no_circuit_local_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT NO CIRCUIT LOCAL"
sit_intercept_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT INTERCEPT"
sit_reorder_distant_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT REORDER DISTANT"
sit_no_circuit_distant_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT NO CIRCUIT DISTANT"
sit_other_input_text	OptionalString		Which string is returned in response to specified SIT detection Default: "SIT OTHER"
busy_enable	OptionalBool		Enable busy tone detection Default: true
busy_input_text	OptionalString		Which string is returned in response to a busy tone detection Default: BUSY
tone_detect_timeout_ms	OptionalInt32		Maximum number of milliseconds the tone detection algorithm should listen for input before timing out.

AudioConsumeSettings

Field	Type	Label	Description
audio_channel	OptionalInt32		For multi-channel audio, this is the channel number being referenced. Range is from 0 to N. Default channel 0 will be used if not specified
audio_consume_mode	AudioConsumeSettings.AudioConsumeMode		Select which audio mode is used Default: AUDIO_CONSUME_MODE_STREAMING
stream_start_location	AudioConsumeSettings.StreamStartLocation		Specify where audio consume starts when "streaming" mode is used Default: STREAM_START_LOCATION_STREAM_BEGIN
start_offset_ms	OptionalInt32		Optional offset in milliseconds to adjust the audio start point. Range: Value in milliseconds, positive or negative. Default: 0
audio_consume_max_ms	OptionalInt32		Optional maximum audio to process. Value of 0 means process all audio sent Range: Positive value in milliseconds Default: 0

CpaSettings

Settings related to Call Progress Analysis

Field	Type	Label	Description
human_residence_time_ms	OptionalInt32		Maximum amount of speech for human residence classification Default: 1800
human_business_time_ms	OptionalInt32		Maximum amount of speech for human business classification. Human speech lasting longer than this will be classified as unknown speech Default: 3000
unknown_silence_timeout_ms	OptionalInt32		Maximum amount of silence to allow before human speech is detected. If This timeout is reached, the classification will be returned as unknown silence. Default: 5000
max_time_from_connect_ms	OptionalInt32		Maximum amount of time the CPA algorithm is allowed to perform human or machine classification. Only use this if you understand the implications (lower accuracy). Default: 0 (disabled)

GeneralInteractionSettings

Settings that apply to all interaction types

Field	Type	Label	Description
secure_context	OptionalBool		When true (enabled), certain ASR and TTS data will not be logged. This provides additional security for sensitive data such as account numbers and passwords that may be used within applications. Anywhere that potentially sensitive data would have been recorded will be replaced with _SUPPRESSED in the logs. Default: false
custom_interaction_data	OptionalString		Optional data (i.e. could be string, JSON, delimited lists, etc.) set by user, for external purposes. Not used by LumenVox
logging_tag	OptionalString	repeated	Optional tag for logging. Reserved for future use.

GrammarSettings

Settings related to SRGS grammar usage

Field	Type	Label	Description
default_tag_format	GrammarSettings.TagFormat		The default tag-format for loaded grammars if not otherwise specified. Default: TAG_FORMAT_SEMANTICS_1_2006
ssl_verify_peer	OptionalBool		Enables or disables the verification of a peer's certificate using a local certificate authority file upon HTTPS requests. Set to false (disabled) to skip verification for trusted sites. Default: true
load_grammar_timeout_ms	OptionalInt32		Maximum milliseconds to allow for grammar loading. If this is exceeded, a timeout error will be raised. Range 1000-2147483647 (~600 hours) Default: 200000 (~3.333 minutes)
compatibility_mode	OptionalInt32		Compatibility mode for certain media server operations. Only change from the default if you understand the consequences. Range: 0-1 Default: 0

InteractionSettings

Describes the interaction specific settings

Field	Type	Label	Description
general_interaction_settings	GeneralInteractionSettings		Optional settings related to all interactions
audio_consume_settings	AudioConsumeSettings		Optional settings defining how audio is consumed/used by the interaction
vad_settings	VadSettings		Optional Voice Activity Detection settings for interaction
grammar_settings	GrammarSettings		Optional grammar settings for interaction
recognition_settings	RecognitionSettings		Optional recognition settings for interaction
cpa_settings	CpaSettings		Optional Call Progress Analysis settings for interaction
amd_settings	AmdSettings		Optional Tone Detection (AMD) settings for interaction
normalization_settings	NormalizationSettings		Optional settings specifying which text normalization steps should be performed on output of interaction.
phrase_list_settings	PhraseListSettings		Optional settings specifying boost options for phrases
tts_settings	TtsSettings		Optional settings for Text-To-Speech (TTS)

LoggingSettings

Field	Type	Label	Description
logging_verbosity	LoggingSettings.LoggingVerbosity		Logging verbosity setting Default: LOGGING_VERBOSITY_INFO

NormalizationSettings

Settings related to text Normalization results

Field	Type	Label	Description
enable_inverse_text	OptionalBool		Set to true to enable inverse text normalization (going from spoken form → written form (e.g. twenty two → 22) Default: false
enable_punctuation_capitalization	OptionalBool		Set to true to enable punctuation and capitalization normalization Default: false
enable_redaction	OptionalBool		Set to true to enable redaction of sensitive information Default: false
request_timeout_ms	OptionalInt32		Number of milliseconds text normalization should await results before timing out Possible values: 0 - 1000000 Default: 5000
enable_srt_generation	OptionalBool		Set to true to enable generation of SRT file (SubRip file format) Default: false
enable_vtt_generation	OptionalBool		Set to true to enable generation of VTT file (WebVTT file format) Default: false

PhraseListSettings

Field	Type	Label	Description
probability_boost	OptionalInt32		Probability score boost raises or lowers the probability the words or phrases are recognized. A negative value lowers the probability the word is returned in results. Range: -10.0 to 5.0 (very probable) Default: 0

RecognitionSettings

Settings related to recognition results

Field	Type	Label	Description
max_alternatives	OptionalInt32		Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of `NBest` messages within each `AsrInteractionResult`. Default: 1
trim_silence_value	OptionalInt32		Controls how aggressively the ASR trims leading silence from input audio. Range: 0 (very aggressive) to 1000 (no silence trimmed) Default: 970
enable_partial_results	OptionalBool		When true, partial results callbacks will be enabled for the interaction Default: false
confidence_threshold	OptionalInt32		Confidence threshold. Range 0 to 1000; applies to grammar based asr interactions Default: 0
decode_timeout	OptionalInt32		Number of milliseconds the ASR should await results before timing out Possible values: 0 - 100,000,000 Default: 10,000,000 (~2.7 hours)

ResetSettings

No additional fields needed

SessionSettings

Optional settings to be used for the duration of a session for all

interactions created within the session.

These can be overridden at the interaction level

All settings are optional, not specifying a setting at any level means the

default or parent context's value will be used. As a rule, only settings

that need to be changed from default should be set explicitly

Field	Type	Label	Description
archive_session	OptionalBool		Whether the session data should be archived when closed, for tuning and other diagnostic purposes Default: false
custom_session_data	OptionalString		Optional data (i.e. could be string, JSON, delimited lists, etc.) set by user, for external purposes. Not used by LumenVox
interaction_settings	InteractionSettings		Optional settings to be used for duration of session for all interactions created. These can be over-ridden at the interaction level
call_id	OptionalString		Optional call identifier sting used for CDR tracking. This is often associated with the telephony call-id value or equivalent.
channel_id	OptionalString	repeated	Optional channel identifier sting used for CDR tracking. This is often associated with a telephony/MRCP channel SDP value or equivalent.
archive_session_delay_seconds	OptionalInt32		Optional delay interval for archiving in seconds Session data will persist more in redis before being written to database
logging_tag	OptionalString	repeated	Optional tag for logging. Reserved for future use.

TtsInlineSynthesisSettings

Field	Type	Label	Description
voice	OptionalString		Optional voice (if using simple text, or if not specified within SSML)
synth_emphasis_level	OptionalString		The strength of the emphasis used in the voice during synthesis. Possible Values: "strong", "moderate", "none" or "reduced".
synth_prosody_pitch	OptionalString		The pitch of the audio being synthesized. Possible Values: A number followed by "Hz", a relative change, or one of the following values: "x-low", "low", "medium", "high", "x-high", or "default". See the SSML standard for details.
synth_prosody_contour	OptionalString		The contour of the audio being synthesized. Possible Values: Please refer to the SSML standard on pitch contour for details.
synth_prosody_rate	OptionalString		The speaking rate of the audio being synthesized. Possible Values: A relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default". See the SSML standard for details.
synth_prosody_duration	OptionalString		The duration of time it will take for the synthesized text to play. Possible Values: A time, such as "250ms" or "3s".
synth_prosody_volume	OptionalString		The volume of the audio being synthesized. Possible Values: A number, a relative change or one of: "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default". See the SSML specification for details.
synth_voice_age	OptionalString		The age of the voice used for synthesis. Possible Values: A non-negative integer.
synth_voice_gender	OptionalString		The default TTS gender to use if none is specified. Possible Values: Either neutral (which uses the default), male, or female.

TtsSettings

Field	Type	Label	Description
voice_mappings	TtsSettings.VoiceMappingsEntry	repeated	Voice mappings allow alternative voice names to map to LumenVox voices. The key is the language of the voice mappings.

TtsSettings.VoiceMappingsEntry

Field	Type	Label	Description
key	string
value	VoiceMapping

VadSettings

Settings related to Voice Activity Detection (VAD)

VAD is used to begin audio processing once a person starts speaking and

is used to detect when a person has stopped speaking

Field	Type	Label	Description
use_vad	OptionalBool		When `false`, all audio as specified in AudioConsumeSettings is used for processing. In streaming audio mode, InteractionFinalizeProcessing() would need be called to finish processing When `true`, VAD is used to determine when the speaker starts and stops speaking. When using VAD in batch audio mode, the engine will look for speech begin within the designated audio to process and will stop processing audio when end of speech is found, which may mean that all audio loaded is not processed.
barge_in_timeout_ms	OptionalInt32		Maximum silence, in ms, allowed while waiting for user input (barge-in) before a timeout is reported. Range: -1 (infinite) to positive integer number of milliseconds Default: -1 (infinite)
end_of_speech_timeout_ms	OptionalInt32		After barge-in, STREAM_STATUS_END_SPEECH_TIMEOUT will occur if end-of-speech not detected in time specified by this property. This is different from the eos_delay_ms; This value represents the total amount of time a caller is permitted to speak after barge-in is detected. Range: a positive number of milliseconds or -1 (infinite) Default: -1 (infinite)
noise_reduction_mode	VadSettings.NoiseReductionMode		Determines noise reduction mode. Default: NOISE_REDUCTION_DEFAULT
bargein_threshold	OptionalInt32		A higher value makes the VAD more sensitive towards speech, and less sensitive towards non-speech, which means that the VAD algorithm must be more sure that the audio is speech before triggering barge in. Raising the value will reject more false positives/noises. However, it may mean that some speech that is on the borderline may be rejected. This value should not be changed from the default without significant tuning and verification. Range: Integer value from 0 to 100 Default: 50
eos_delay_ms	OptionalInt32		Milliseconds of silence after speech before processing begins. Range: A positive integer number of milliseconds Default: 800
snr_sensitivity	OptionalInt32		Determines how much louder the speaker must be than the background noise in order to trigger barge-in. The smaller this value, the easier it will be to trigger barge-in. Range: Integer range from 0 to 100 Default: 50
stream_init_delay	OptionalInt32		Accurate VAD depends on a good estimation of the acoustic environment. The VAD module uses the first couple frames of audio to estimate the acoustic environment, such as noise level. The length of this period is defined by this parameter. Range: A positive integer number of milliseconds. Default: 100
volume_sensitivity	OptionalInt32		The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be. This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive) prompts that are not properly cancelled will be less likely to falsely cancel barge-in. Range: Integer range from 0 to 100 Default: 50
wind_back_ms	OptionalInt32		The length of audio to be wound back at the beginning of voice activity. This is used primarily to counter instances where barge-in does not accurately capture the very start of speech. The resolution of this parameter is 1/8 of a second. Range: A positive integer number of milliseconds Default: 480

VoiceMapping

Field	Type	Label	Description
voicePairs	VoiceMapping.VoicePairsEntry	repeated	A map of custom voice pairs. The key is the voice that will be requested by the API user. The value is the LumenVox voice that will be used for synthesis. Use the key "default" to set a default voice for the given language.

VoiceMapping.VoicePairsEntry

Field	Type	Label	Description
key	string
value	string

AudioConsumeSettings.AudioConsumeMode

Name	Number	Description
AUDIO_CONSUME_MODE_UNSPECIFIED	0	No mode specified
AUDIO_CONSUME_MODE_STREAMING	1	Specify streaming mode is used
AUDIO_CONSUME_MODE_BATCH	2	Specify batch mode is used

AudioConsumeSettings.StreamStartLocation

Only used when AUDIO_CONSUME_MODE_STREAMING is used

Name	Number	Description
STREAM_START_LOCATION_UNSPECIFIED	0	No location specified
STREAM_START_LOCATION_STREAM_BEGIN	1	Start processing from the beginning of the stream. Note: Only valid option for AUDIO_CONSUME_MODE_BATCH
STREAM_START_LOCATION_BEGIN_PROCESSING_CALL	2	Start processing from the audio streamed after the API call InteractionBeginProcessing() was made. Note: Not valid for AUDIO_CONSUME_MODE_BATCH
STREAM_START_LOCATION_INTERACTION_CREATED	3	Start processing from the audio streamed after the interaction was created. Note: Not valid for AUDIO_CONSUME_MODE_BATCH

GrammarSettings.TagFormat

Name	Number	Description
TAG_FORMAT_UNSPECIFIED	0
TAG_FORMAT_LUMENVOX_1	1	lumenvox/1.0 tag format
TAG_FORMAT_SEMANTICS_1	2	semantics/1.0 tag format
TAG_FORMAT_SEMANTICS_1_LITERALS	3	semantics/1.0-literals tag format
TAG_FORMAT_SEMANTICS_1_2006	4	semantics/1.0.2006 tag format
TAG_FORMAT_SEMANTICS_1_2006_LITERALS	5	semantics/1.0.2006-literals tag format

LoggingSettings.LoggingVerbosity

Name	Number	Description
LOGGING_VERBOSITY_UNSPECIFIED	0	Logging verbosity is not specified
LOGGING_VERBOSITY_DEBUG	1	Internal system events that are not usually observable
LOGGING_VERBOSITY_INFO	2	Routine logging, such as ongoing status or performance
LOGGING_VERBOSITY_WARNING	3	Warnings and above only - service degradation or danger
LOGGING_VERBOSITY_ERROR	4	Functionality is unavailable, invariants are broken, or data is lost
LOGGING_VERBOSITY_CRITICAL	5	Only log exceptions and critical errors (not recommended)

VadSettings.NoiseReductionMode

Name	Number	Description
NOISE_REDUCTION_MODE_UNSPECIFIED	0	No change to setting
NOISE_REDUCTION_MODE_DISABLED	1	Noise reduction disabled
NOISE_REDUCTION_MODE_DEFAULT	2	Default (recommended) noise reduction algorithm is enabled.
NOISE_REDUCTION_MODE_ALTERNATE	3	Alternate noise reduction algorithm. Similar to default, but we have seen varied results based on differing noise types and levels.
NOISE_REDUCTION_MODE_ADAPTIVE	4	Uses an adaptive noise reduction algorithm that is most suited to varying levels of background noise, such as changing car noise, etc.

Scalar Value Types

.proto Type	Notes	C++	Java	Python	Go	C#	PHP	Ruby
double		double	double	float	float64	double	float	Float
float		float	float	float	float32	float	float	Float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long	int64	long	integer/string	Bignum
uint32	Uses variable-length encoding.	uint32	int	int/long	uint32	uint	integer	Bignum or Fixnum (as required)
uint64	Uses variable-length encoding.	uint64	long	int/long	uint64	ulong	integer/string	Bignum or Fixnum (as required)
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long	int64	long	integer/string	Bignum
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int	uint32	uint	integer	Bignum or Fixnum (as required)
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long	uint64	ulong	integer/string	Bignum
sfixed32	Always four bytes.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sfixed64	Always eight bytes.	int64	long	int/long	int64	long	integer/string	Bignum
bool		bool	boolean	boolean	bool	bool	boolean	TrueClass/FalseClass
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode	string	string	string	String (UTF-8)
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str	[]byte	ByteString	string	String (ASCII-8BIT)

Table of Contents

Protocol Documentation

lumenvox/api/audio_formats.proto

AudioFormat

AudioFormat.StandardAudioFormat

lumenvox/api/common.proto

AudioPullRequest

AudioPullResponse

AudioPushRequest

AudioRequestMessage

DtmfPushRequest

Event

Grammar

Fields with deprecated option

LogEvent

PhraseList

SessionEvent

VadEvent

Grammar.BuiltinGrammar

GrammarMode

InteractionStatus

InteractionSubType

InteractionType

VadEvent.VadEventType

lumenvox/api/global.proto

GlobalEvent

GlobalGetSettingsRequest

GlobalLoadGrammarRequest

GlobalLoadGrammarResponse

GlobalLoadPhraseList

GlobalLoadPhraseListResponse

GlobalRequest

Fields with deprecated option

GlobalResponse

Fields with deprecated option

GlobalSettings

Fields with deprecated option

GlobalGetSettingsRequest.GetSettingsType

lumenvox/api/interaction.proto

InteractionBeginProcessingRequest

InteractionCancelRequest

InteractionCancelResponse

InteractionCloseRequest

InteractionCloseResponse

InteractionCreateAmdRequest

InteractionCreateAmdResponse

InteractionCreateAsrRequest

InteractionCreateAsrResponse

InteractionCreateCpaRequest

InteractionCreateCpaResponse

InteractionCreateGrammarParseRequest

InteractionCreateGrammarParseResponse

InteractionCreateNormalizeTextRequest

InteractionCreateNormalizeTextResponse

InteractionCreateTranscriptionRequest

InteractionCreateTranscriptionResponse

InteractionCreateTtsRequest

InteractionCreateTtsRequest.InlineTtsRequest

InteractionCreateTtsRequest.SsmlUrlRequest

InteractionCreateTtsResponse

InteractionFinalizeProcessingRequest

InteractionRequestMessage

InteractionRequestResultsRequest

InteractionRequestResultsResponse

TranscriptionPhraseList

lumenvox/api/lumenvox.proto

LumenVox

lumenvox/api/optional_values.proto

OptionalBool

OptionalBytes

OptionalDouble

OptionalFloat

OptionalInt32

OptionalInt64

OptionalString

OptionalUInt32

OptionalUInt64

lumenvox/api/results.proto

AmdInteractionResult

AsrGrammarResult