View Source Bumblebee.Audio.WhisperFeaturizer (Bumblebee v0.5.3)

Whisper featurizer for audio data.

Configuration

:feature_size - the dimension of the extracted features. This corresponds to the number of Mel bins. Defaults to 80
:sampling_rate - the sampling rate at which the audio files should be digitally expressed in Hertz. Defaults to 16000
:num_seconds - the maximum duration of the audio sequence. This implies that the the maximum length of the input sequence is :num_seconds * :sampling_rate . Defaults to 30
:hop_length - the hop between consecutive overlapping windows for the STFT used to obtain Mel Frequency coefficients. Defaults to 160
:fft_length - the size of the fourier transform. Defaults to 400
:padding_value - the value used to pad the audio. Should correspond to silence. Defaults to 0.0