View Source ExAzureSpeech.TextToSpeech.Websocket (ex_azure_speech v0.2.2)

Module for handling the websocket connection to the Azure Text to Speech service.

The Text-to-Speech webhook internals are implemented like this:

Opens a WebSocket connection to the Azure Text to Speech service.
The client sends a ExAzureSpeech.Common.Messages.SpeechConfigMessage informing the basic configuration for the recognition.
The client sends a ExAzureSpeech.TextToSpeech.Messages.SynthesisContextMessage informing the synthesis context.
The client sends a ExAzureSpeech.TextToSpeech.Messages.SynthesisMessage to start the synthesis.
The client receives audio metadata from the service. Which can be processed by the asynchronous callbacks
The client receives audio data from the service in a binary format
The client receives a ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.session_end message when the synthesis ends.

Summary

Types

callbacks()

Callbacks for handling audio metadata.

expected_responses()

Expected websocket frame responses from the Azure Text-to-Speech Service.

Functions

open_connection(opts, context, callbacks)

Opens a connection to the Azure Text to Speech service.

synthesize(pid, command, close_connection_callback)

Synthesises the given text using the Azure Text to Speech service.

Types

callbacks()

@type callbacks() :: [
  viseme_callback:
    (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.viseme() -> any()),
  word_boundary_callback:
    (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.word_boundary() ->
       any()),
  sentence_boundary_callback:
    (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.sentence_boundary() ->
       any()),
  session_end_callback:
    (ExAzureSpeech.TextToSpeech.Responses.AudioMetadata.session_end() ->
       any())
]

Callbacks for handling audio metadata.

viseme_callback: Executes everytime an Viseme metadata is received.
word_boundary_callback: Executes everytime an Word Boundary metadata is received.
sentence_boundary_callback: Executes everytime an Sentence Boundary metadata is received.
session_end_callback: Executes everytime an Session End metadata is received.

expected_responses()

@type expected_responses() ::
  :turn_start | :response | :audio_metadata | :audio | :turn_end

Expected websocket frame responses from the Azure Text-to-Speech Service.

turn_start: The start of a new synthesis turn.
response: Returns info from a stream, nothing useful.-- audio_metadata: Returns metadata about the audio. Like boundaries, visemes, etc.
audio: Returns the audio data in binary format.
turn_end: The end of a synthesis turn.

Functions

open_connection(opts, context, callbacks)

@spec open_connection(
  ExAzureSpeech.TextToSpeech.SocketConfig.t(),
  ExAzureSpeech.TextToSpeech.SpeechSynthesisConfig.t(),
  callbacks()
) ::
  {:ok, pid()}
  | {:error,
     ExAzureSpeech.Auth.Errors.Unauthorized.t()
     | ExAzureSpeech.Auth.Errors.Failure.t()}

Opens a connection to the Azure Text to Speech service.

synthesize(pid, command, close_connection_callback)

@spec synthesize(
  pid(),
  ExAzureSpeech.TextToSpeech.Messages.SynthesisMessage.t(),
  (pid() -> any())
) ::
  {:ok, Enumerable.t()}
  | {:error,
     ExAzureSpeech.Common.Errors.FailedToDispatchCommand.t()
     | ExAzureSpeech.TextToSpeech.Errors.SpeechSynthError.t()
     | ExAzureSpeech.Common.Errors.WebsocketConnectionFailed.t()}

Synthesises the given text using the Azure Text to Speech service.