View Source Membrane.Element.GCloud.SpeechToText (Membrane Element: GCloud SpeechToText v0.10.0)
An element providing speech recognition via Google Cloud Speech To Text service using Streaming API.
The element has to handle a connection time limit (currently 5 minutes). It does that
by spawning multiple streaming clients - the streaming is stopped after streaming_time_limit
(see t/0
) and a new client that starts streaming is spawned. The old one is kept alive for results_await_time
and will receive recognition results for the streamed audio.
This means that first results from the new client might arrive before the last result from an old client.
Bear in mind that streaming_time_limit
+ results_await_time
must
be smaller than recognition time limit for Google Streaming API
(currently 5 minutes)
Element options
Passed via struct Membrane.Element.GCloud.SpeechToText.t/0
language_code
String.t()
Default value:
"en-US"
The language of the supplied audio. See Language Support for a list of supported languages codes.interim_results
boolean()
Default value:
false
If set to true, the interim results may be returned by recognition API. See Google API docs for more info.word_time_offsets
boolean()
Default value:
false
Iftrue
, the top result includes a list of words and the start and end time offsets (timestamps) for those words.speech_contexts
[%SpeechContext{}]
Default value:
[]
A list of speech recognition contexts. See the docs for more info.model
:default | :video | :phone_call | :command_and_search
Default value:
:default
Model used for speech recognition. Bear in mind that:video
model is a premium model that costs more than the standard rate.streaming_time_limit
Time.t()
Default value:
200000000000
Determines how much audio can be sent to recognition API in one client session. After this time, a new client session is created while the old one is kept alive for some time to receive recognition results.Bear in mind that
streaming_time_limit
+results_await_time
must be smaller than recognition time limit for Google Streaming API (currently 5 minutes)results_await_time
Time.t()
Default value:
90000000000
The amount of time a client that stopped streaming is kept alive awaiting results from recognition API.reconnection_overlap_time
Time.t()
Default value:
2000000000
Duration of audio re-sent in a new client session after reconnection
Pads
:input
Accepted formats:
FLAC
Direction: | :input |
Availability: | :always |
Flow control: | :manual |
Demand unit: | :buffers |
Summary
Types
Struct containing options for Membrane.Element.GCloud.SpeechToText
Types
@type t() :: %Membrane.Element.GCloud.SpeechToText{ interim_results: boolean(), language_code: String.t(), model: :default | :video | :phone_call | :command_and_search, reconnection_overlap_time: Membrane.Time.t(), results_await_time: Membrane.Time.t(), speech_contexts: [%Google.Cloud.Speech.V1.SpeechContext{phrases: term()}], streaming_time_limit: Membrane.Time.t(), word_time_offsets: boolean() }
Struct containing options for Membrane.Element.GCloud.SpeechToText
Functions
@spec options() :: keyword()
Returns description of options available for this module