View Source Bumblebee.Audio (Bumblebee v0.2.0)
High-level tasks related to audio processing.
Link to this section Summary
Types
A term representing audio.
Functions
Builds serving for speech-to-text generation.
Link to this section Types
A term representing audio.
Can be either of:
a 1-dimensional
Nx.Tensor
with audio samples{:file, path}
with path to an audio file (note that this requiresffmpeg
installed)
@type speech_to_text_output() :: %{results: [speech_to_text_result()]}
@type speech_to_text_result() :: %{text: String.t()}
Link to this section Functions
@spec speech_to_text( Bumblebee.model_info(), Bumblebee.Featurizer.t(), Bumblebee.Tokenizer.t(), keyword() ) :: Nx.Serving.t()
Builds serving for speech-to-text generation.
The serving accepts speech_to_text_input/0
and returns
speech_to_text_output/0
. A list of inputs is also supported.
Note that either :max_new_tokens
or :max_length
must be specified.
The generation should generally finish based on the audio input,
however you still need to specify the upper limit.
options
Options
:max_new_tokens
- the maximum number of tokens to be generated, ignoring the number of tokens in the prompt:compile
- compiles all computations for predefined input shapes during serving initialization. Should be a keyword list with the following keys::batch_size
- the maximum batch size of the input. Inputs are optionally padded to always match this batch size
It is advised to set this option in production and also configure a defn compiler using
:defn_options
to maximally reduce inference time.:defn_options
- the options for JIT compilation. Defaults to[]
Also accepts all the other options of Bumblebee.Text.Generation.build_generate/3
.
examples
Examples
{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
serving =
Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
max_new_tokens: 100,
defn_options: [compiler: EXLA]
)
Nx.Serving.run(serving, {:file, "/path/to/audio.wav"})
#=> %{results: [%{text: "There is a cat outside the window."}]}