View Source ExAzureSpeech.TextToSpeech.Websocket (ex_azure_speech v0.2.2)
Module for handling the websocket connection to the Azure Text to Speech service.
The Text-to-Speech webhook internals are implemented like this:
- Opens a WebSocket connection to the Azure Text to Speech service.
- The client sends a
ExAzureSpeech.Common.Messages.SpeechConfigMessage
informing the basic configuration for the recognition. - The client sends a
ExAzureSpeech.TextToSpeech.Messages.SynthesisContextMessage
informing the synthesis context. - The client sends a
ExAzureSpeech.TextToSpeech.Messages.SynthesisMessage
to start the synthesis. - The client receives audio metadata from the service. Which can be processed by the asynchronous callbacks
- The client receives audio data from the service in a binary format
- The client receives a
ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.session_end
message when the synthesis ends.
Summary
Types
Callbacks for handling audio metadata.
Expected websocket frame responses from the Azure Text-to-Speech Service.
Functions
Opens a connection to the Azure Text to Speech service.
Synthesises the given text using the Azure Text to Speech service.
Types
@type callbacks() :: [ viseme_callback: (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.viseme() -> any()), word_boundary_callback: (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.word_boundary() -> any()), sentence_boundary_callback: (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.sentence_boundary() -> any()), session_end_callback: (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.session_end() -> any()) ]
Callbacks for handling audio metadata.
viseme_callback: Executes everytime an Viseme metadata is received.
word_boundary_callback: Executes everytime an Word Boundary metadata is received.
sentence_boundary_callback: Executes everytime an Sentence Boundary metadata is received.
session_end_callback: Executes everytime an Session End metadata is received.
@type expected_responses() ::
:turn_start | :response | :audio_metadata | :audio | :turn_end
Expected websocket frame responses from the Azure Text-to-Speech Service.
turn_start: The start of a new synthesis turn.
response: Returns info from a stream, nothing useful.--
audio_metadata: Returns metadata about the audio. Like boundaries, visemes, etc.
audio: Returns the audio data in binary format.
turn_end: The end of a synthesis turn.
Functions
@spec open_connection( ExAzureSpeech.TextToSpeech.SocketConfig.t(), ExAzureSpeech.TextToSpeech.SpeechSynthesisConfig.t(), callbacks() ) :: {:ok, pid()} | {:error, ExAzureSpeech.Auth.Errors.Unauthorized.t() | ExAzureSpeech.Auth.Errors.Failure.t()}
Opens a connection to the Azure Text to Speech service.
@spec synthesize( pid(), ExAzureSpeech.TextToSpeech.Messages.SynthesisMessage.t(), (pid() -> any()) ) :: {:ok, Enumerable.t()} | {:error, ExAzureSpeech.Common.Errors.FailedToDispatchCommand.t() | ExAzureSpeech.TextToSpeech.Errors.SpeechSynthError.t() | ExAzureSpeech.Common.Errors.WebsocketConnectionFailed.t()}
Synthesises the given text using the Azure Text to Speech service.