View Source Bumblebee.Audio.WhisperFeaturizer (Bumblebee v0.5.3)
Whisper featurizer for audio data.
Configuration
:feature_size
- the dimension of the extracted features. This corresponds to the number of Mel bins. Defaults to80
:sampling_rate
- the sampling rate at which the audio files should be digitally expressed in Hertz. Defaults to16000
:num_seconds
- the maximum duration of the audio sequence. This implies that the the maximum length of the input sequence is:num_seconds
*:sampling_rate
. Defaults to30
:hop_length
- the hop between consecutive overlapping windows for the STFT used to obtain Mel Frequency coefficients. Defaults to160
:fft_length
- the size of the fourier transform. Defaults to400
:padding_value
- the value used to pad the audio. Should correspond to silence. Defaults to0.0