View Source Bumblebee.Audio.WhisperFeaturizer (Bumblebee v0.6.0)

Whisper featurizer for audio data.

Configuration

  • :feature_size - the dimension of the extracted features. This corresponds to the number of Mel bins. Defaults to 80

  • :sampling_rate - the sampling rate at which the audio files should be digitally expressed in Hertz. Defaults to 16000

  • :num_seconds - the maximum duration of the audio sequence. This implies that the maximum length of the input sequence is :num_seconds * :sampling_rate . Defaults to 30

  • :hop_length - the hop between consecutive overlapping windows for the STFT used to obtain Mel Frequency coefficients. Defaults to 160

  • :fft_length - the size of the fourier transform. Defaults to 400

  • :padding_value - the value used to pad the audio. Should correspond to silence. Defaults to 0.0