Protocol Documentation

speech.proto
Scalar Value Types

speech.proto

Top

AudioFormat

Field	Type	Label	Description
standard_audio_format	AudioFormat.StandardAudioFormat		Standard audio format

AudioPullRequest

Field	Type	Label	Description
session_id	string		References an existing session_id (uuid)
audio_id	string		id of the audio requested (Note that this could be session_id to request the inbound audio resource)
audio_start	int64		Number of milliseconds from the beginning of the audio to return (default is from the beginning)
audio_length	int64		Maximum number of milliseconds to return. A zero value returns all available audio (from requested start point). (default is all audio, from start point)

AudioPullResponse

Field	Type	Label	Description
audio_data	bytes		Binary audio data that was requested

AudioPushRequest

Field	Type	Label	Description
session_id	string		References an existing session_id (uuid) to where the audio will be sent
audio_data	bytes		Binary audio data to be added to the audio resource

AudioPushResponse

Currently no fields returned

AudioStreamRequest

Field	Type	Label	Description
session_id	string		References an existing session_id (uuid) Set it in first AudioStreamRequest request message
audio_data	bytes		Streamed binary audio data to be added to the audio resource

AudioStreamResponse

Currently no fields returned

FinalResultsReady

Callback sent when final interaction results are ready.

Subsequent call(s) to InteractionRequestResults() can be used to obtain

results object.

This callback signals that all processing related to this interaction is

finished.

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

InteractionBeginProcessingRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

InteractionBeginProcessingResponse

Currently no fields returned

InteractionCancelRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

InteractionCancelResponse

Currently no fields returned

InteractionCloseRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

InteractionCloseResponse

Currently no fields returned

InteractionCreateASRRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_ids	string	repeated	List of grammar load interaction IDs, one for each root grammar to activate

InteractionCreateASRResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent ASR processing

InteractionCreateGrammarLoadRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
language	string		The language selector the specified grammar (e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.)
grammar_url	string		A grammar URL to be loaded
inline_grammar_text	string		A string containing the raw grammar text

InteractionCreateGrammarLoadResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used to reference the grammar during subsequent API calls

InteractionCreateGrammarParseRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
grammar_ids	string	repeated	List of grammar load interaction IDs, one for each root grammar to activate
input_text	string		Input text to be parsed against the grammar[s]

InteractionCreateGrammarParseResponse

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced by the request

InteractionCreateTTSRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
language	string		Synthesis language for this request (e.g.: "en-US", "de-DE", etc.)
ssml_url	string		URL from which to fetch synthesis request ssml
inline_request	InteractionCreateTTSRequest.InlineTTSRequest		Inline TTS definition (text and optional parameters)
audio_format	AudioFormat		Audio format to be generated by TTS Synthesis

InteractionCreateTTSRequest.InlineTTSRequest

Inline TTS definition (text and optional parameters)

Field	Type	Label	Description
text	string		Text to synthesize, can simple text, or ssml
voice	string		Optional TTS voice (if using simple text, or if not specified within SSML)

InteractionCreateTTSResponse

Field	Type	Label	Description
interaction_id	string		Interaction ID (uuid) that can be used during subsequent TTS processing

InteractionCreateTextNormalizationRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
language	string		The language selector the specified grammar (e.g.: "en-US", "de-DE" or dialect independent "en", "de", etc.)
input_text	string		Input text to be normalized.
enable_inverse_text_normalization	bool		Set to true to enable inverse_text_normalization (going from spoken form → written form (e.g. twenty two → 22)
enable_punctuation_capitalization_normalization	bool		Set to true to enable punctuation and capitalization_normalization
enable_redaction_normalization	bool		Set to true to enable redaction of sensitive information

InteractionCreateTextNormalizationResponse

Field	Type	Label	Description
interaction_id	string		The interaction object being referenced by the request

InteractionFinalizeProcessingRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

InteractionFinalizeProcessingResponse

Currently no fields returned

InteractionGetSettingsRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object to get the current settings from

InteractionGetSettingsResponse

Field	Type	Label	Description
json_settings	string		A JSON encoded string containing the requested settings

InteractionRequestResultsRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

InteractionRequestResultsResponse

Field	Type	Label	Description
result_ready	bool		The result status
results_json	string		The JSON object containing the result being requested or empty if result_ready is false

InteractionSetSettingsRequest

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object to set the settings for
json_settings	string		JSON formatted settings to be configured.

InteractionSetSettingsResponse

Currently no fields returned

IntermediateResultsReady

Callback sent when intermediate interaction results are available.

Call(s) to InteractionRequestResults() can be used to obtain results object.

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced

SessionCloseRequest

Field	Type	Label	Description
session_id	string		Reference to session to close

SessionCloseResponse

Currently no fields returned

SessionCreateRequest

Field	Type	Label	Description
audio_format	AudioFormat		Audio parameters for the audio resource object associated with the new session being created. These audio parameters various attributes such as encoding format, sample rate, etc. Use STANDARD_AUDIO_FORMAT_NO_AUDIO_RESOURCE if no audio resource needs to be created

SessionCreateResponse

Field	Type	Label	Description
session_id	string		Session ID of newly created session (will be returned from initial call)
vad_event	VadEvent		VAD event notification
final_result	FinalResultsReady		Final results ready notification
partial_result	IntermediateResultsReady		Intermediate results ready notification

SessionGetSettingsRequest

Field	Type	Label	Description
session_id	string		Which session object to get the settings from

SessionGetSettingsResponse

Field	Type	Label	Description
json_settings	string		A JSON encoded string containing the requested settings

SessionSetSettingsRequest

Field	Type	Label	Description
session_id	string		Which session to set the settings for
json_settings	string		JSON formatted settings to be configured.

SessionSetSettingsResponse

Currently no fields returned

VadEvent

Message used to signal events over the course of Voice Activity Detection

processing.

The audio_offset will signify at what point within the session audio

resource the event occurred.

Field	Type	Label	Description
session_id	string		The session object being referenced
interaction_id	string		The interaction object being referenced
vad_event_type	VadEvent.VadEventType		The type of event this message represents
audio_offset	int32		The offset in milliseconds from the beginning of the audio resource that this event occurred

AudioFormat.StandardAudioFormat

Specification for the audio format

Not all standard formats are supported in all cases. Different

interactions can natively handle a subset of the total audio formats.

Name	Number	Description
STANDARD_AUDIO_FORMAT_UNSPECIFIED	0	Undefined audio
STANDARD_AUDIO_FORMAT_ULAW_8KHZ	1	ULAW 8000 HZ, 1 byte per sample
STANDARD_AUDIO_FORMAT_ALAW_8KHZ	2	ALAW 8000 HZ, 1 byte per sample
STANDARD_AUDIO_FORMAT_PCM_8KHZ	3	PCM 8000 HZ, 2 bytes per sample
STANDARD_AUDIO_FORMAT_PCM_16KHZ	10	PCM 16000 HZ, 2 bytes per sample
STANDARD_AUDIO_FORMAT_PCM_22KHZ	20	PCM 22050 HZ, 2 bytes per sample
STANDARD_AUDIO_FORMAT_NO_AUDIO_RESOURCE	100	Used to indicate that no audio resource should be allocated

VadEvent.VadEventType

Name	Number	Description
VAD_EVENT_TYPE_UNSPECIFIED	0	Undefined VAD event type
VAD_EVENT_TYPE_BEGIN_PROCESSING	1	VAD begins processing audio
VAD_EVENT_TYPE_BARGE_IN	2	Barge-in occurred, audio that will be process by the ASR starts here. This notification might be useful to stop prompt playback for example
VAD_EVENT_TYPE_END_OF_SPEECH	3	End-of-speech occurred, no further audio will be processed by VAD for the specified interaction. If the setting InteractionASR_VoiceActivityDetection.AUTO_FINALIZE_ON_EOS is true, the ASR will immediately finish processing audio at this point
VAD_EVENT_TYPE_BARGE_IN_TIMEOUT	4	VAD timed out waiting for audio barge-in (start-of-speech). The audio manager will no longer process audio for this interaction.
VAD_EVENT_TYPE_END_OF_SPEECH_TIMEOUT	5	VAD timed out waiting for audio barge-out (end-of-speech). The audio manager will no longer process audio for this interaction.

SpeechAPIService

The LumenVox Speech API can be used to access various speech resources,

such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS),

Transcription, Call-Progress-Analysis (CPA).

Method Name	Request Type	Response Type	Description
SessionCreate	SessionCreateRequest	SessionCreateResponse stream	SessionCreate Creates a new session and returns its ID and session related messages through response streamed callback messages The returned session_id (uuid) can be used when making other requests for the session. Also, optionally creates a new audio resource using the specified audio_format parameters. Only one audio resource can be created per session. Audio will be added to this audio resource via gRPC (AudioPush or AudioStream) Typically audio would either be streamed in with AudioStream or sent in blocks using AudioPush. Both methods push audio into the same internal audio resource for processing
SessionClose	SessionCloseRequest	SessionCloseResponse	SessionClose Closes the specified session. Once closed, a session can no longer be referenced for requests and should be assumed to be no longer valid.
AudioStream	AudioStreamRequest stream	AudioStreamResponse	AudioStream Sends a stream of binary audio data into the specified audio resource. Note that this may be called before an interaction exists, which allows audio to be added before creating interactions that will process the audio.
AudioPush	AudioPushRequest	AudioPushResponse	AudioPush Sends a block of binary audio data into the specified audio resource. Note that this may be called before an interaction exists, which allows audio to be added before creating interactions that will process the audio. Please consider the gRPC maximum message size limits
AudioPull	AudioPullRequest	AudioPullResponse	AudioPull Returns a block of audio data from an audio resource. A begin point in milliseconds and maximum length can be specified to return a segment of the audio data. By default, all audio data within the resource is returned. Please note that due to GRPC maximum message length limitations, API clients may want to retrieve audio in more manageable chunk sizes. Using the default of always returning the entire audio buffer may not be advisable in all situations. Please consider the gRPC maximum message size limits
SessionSetSettings	SessionSetSettingsRequest	SessionSetSettingsResponse	SessionSetSettings Applies configuration changes to specified session settings.
SessionGetSettings	SessionGetSettingsRequest	SessionGetSettingsResponse	SessionGetSettings Returns a JSON encoded string containing the requested session settings.
InteractionCreateASR	InteractionCreateASRRequest	InteractionCreateASRResponse	InteractionCreateASR Creates a new ASR interaction for the specified session. This type of object is required to access ASR functionality. Use the returned interaction_id in subsequent ASR requests.
InteractionCreateTTS	InteractionCreateTTSRequest	InteractionCreateTTSResponse	InteractionCreateTTS Creates a new TTS interaction for the specified session. This type of object is required to access TTS functionality. Use the returned interaction_id in subsequent TTS requests.
InteractionCreateGrammarLoad	InteractionCreateGrammarLoadRequest	InteractionCreateGrammarLoadResponse	InteractionCreateGrammarLoad Requests a grammar be loaded within the specified session/interaction. The returned interaction_id may be referenced in subsequent ASR requests.
InteractionCreateGrammarParse	InteractionCreateGrammarParseRequest	InteractionCreateGrammarParseResponse	InteractionCreateGrammarParse Create a new grammar parse interaction for the specified session. A grammar parse interaction allows sending text directly, to be parsed by the active grammars. Essentially this is the same as an ASR interaction, but the speech to text functionality is skipped. The raw text is passed in directly instead of having the ASR engine supply the text from the audio. The text is parsed with the active grammars in the same way as an ASR interaction. The returned interaction_id may be used to determine status of this request as well as to access results, when processing is completed.
InteractionSetSettings	InteractionSetSettingsRequest	InteractionSetSettingsResponse	InteractionSetSettings Adds or modifies the specified settings to the specified interaction. Settings not mentioned in this call will remain unaffected.
InteractionGetSettings	InteractionGetSettingsRequest	InteractionGetSettingsResponse	InteractionGetSettings Return a JSON encoded string containing the current settings for the specified interaction.
InteractionBeginProcessing	InteractionBeginProcessingRequest	InteractionBeginProcessingResponse	InteractionBeginProcessing Begins processing the specified interaction. Typically, any interaction settings that are needed should be set before calling InteractionBeginProcessing. Calling this function triggers backend services to begin processing the audio or text being inputted
InteractionFinalizeProcessing	InteractionFinalizeProcessingRequest	InteractionFinalizeProcessingResponse	InteractionFinalizeProcessing Used to force VAD complete when VAD is used, or after VAD speech begin. Takes all available resource audio and triggers an ASR decode. This is optional most of the time, when the default auto-decode setting is used. This can also be used when performing DTMF or Text type interactions Results for the interaction may be available during subsequent calls to InteractionRequestResults
InteractionRequestResults	InteractionRequestResultsRequest	InteractionRequestResultsResponse	InteractionRequestResults Returns an interaction's results as a JSON encoded string. Note that an empty JSON object may be returned if no results are currently available
InteractionCancel	InteractionCancelRequest	InteractionCancelResponse	InteractionCancel Cancels the specified interaction. Any active processing related to the interaction is stopped.
InteractionClose	InteractionCloseRequest	InteractionCloseResponse	InteractionClose Closes the specified interaction.

Scalar Value Types

.proto Type	Notes	C++	Java	Python	Go	C#	PHP	Ruby
double		double	double	float	float64	double	float	Float
float		float	float	float	float32	float	float	Float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64	long	int/long	int64	long	integer/string	Bignum
uint32	Uses variable-length encoding.	uint32	int	int/long	uint32	uint	integer	Bignum or Fixnum (as required)
uint64	Uses variable-length encoding.	uint64	long	int/long	uint64	ulong	integer/string	Bignum or Fixnum (as required)
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64	long	int/long	int64	long	integer/string	Bignum
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28.	uint32	int	int	uint32	uint	integer	Bignum or Fixnum (as required)
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56.	uint64	long	int/long	uint64	ulong	integer/string	Bignum
sfixed32	Always four bytes.	int32	int	int	int32	int	integer	Bignum or Fixnum (as required)
sfixed64	Always eight bytes.	int64	long	int/long	int64	long	integer/string	Bignum
bool		bool	boolean	boolean	bool	bool	boolean	TrueClass/FalseClass
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string	String	str/unicode	string	string	string	String (UTF-8)
bytes	May contain any arbitrary sequence of bytes.	string	ByteString	str	[]byte	ByteString	string	String (ASCII-8BIT)

Table of Contents

Protocol Documentation

speech.proto

AudioFormat

AudioPullRequest

AudioPullResponse

AudioPushRequest

AudioPushResponse

AudioStreamRequest

AudioStreamResponse

FinalResultsReady

InteractionBeginProcessingRequest

InteractionBeginProcessingResponse

InteractionCancelRequest

InteractionCancelResponse

InteractionCloseRequest

InteractionCloseResponse

InteractionCreateASRRequest

InteractionCreateASRResponse

InteractionCreateGrammarLoadRequest

InteractionCreateGrammarLoadResponse

InteractionCreateGrammarParseRequest

InteractionCreateGrammarParseResponse

InteractionCreateTTSRequest

InteractionCreateTTSRequest.InlineTTSRequest

InteractionCreateTTSResponse

InteractionCreateTextNormalizationRequest

InteractionCreateTextNormalizationResponse

InteractionFinalizeProcessingRequest

InteractionFinalizeProcessingResponse

InteractionGetSettingsRequest

InteractionGetSettingsResponse

InteractionRequestResultsRequest

InteractionRequestResultsResponse

InteractionSetSettingsRequest

InteractionSetSettingsResponse

IntermediateResultsReady

SessionCloseRequest

SessionCloseResponse

SessionCreateRequest

SessionCreateResponse

SessionGetSettingsRequest

SessionGetSettingsResponse

SessionSetSettingsRequest

SessionSetSettingsResponse

VadEvent

AudioFormat.StandardAudioFormat

VadEvent.VadEventType

SpeechAPIService

Scalar Value Types