Membrane.RTP.VAD (Membrane RTP plugin v0.10.0) View Source

Simple vad based on audio level sent in RTP header.

To make this module work appropriate RTP header extension has to be set in SDP offer/answer.

If avg of audio level in packets in time_window exceeds vad_threshold it emits notification speech_notification_t/0.

When avg falls below vad_threshold and doesn't exceed it in the next vad_silence_timer it emits notification silence_notification_t/0.

Buffers that are processed by this element may or may not have been processed by a depayloader and passed through a jitter buffer. If they have not, then the only timestamp available for time comparison is the RTP timestamp. The delta between RTP timestamps is dependent on the clock rate used by the encoding. For OPUS the clock rate is 48kHz and packets are sent every 20ms, so the RTP timestamp delta between sequential packets should be 48000 / 1000 * 20, or 960.

When calculating the epoch of the timestamp, we need to account for 32bit integer wrapping.

  • :current - the difference between timestamps is low: the timestamp has not wrapped around.
  • :next - the timestamp has wrapped around to 0. To simplify queue processing we reset the state.
  • :prev - the timestamp has recently wrapped around. We might receive an out-of-order packet from before the rollover, which we ignore.

    Element options

Passed via struct Membrane.RTP.VAD.t/0

  • vad_id

    1..14

    Required
    ID of VAD header extension.

  • clock_rate

    Membrane.RTP.clock_rate_t()

    Default value: 48000
    Clock rate (in Hz) for the encoding.

  • time_window

    pos_integer()

    Default value: 2000
    Time window (in ms) in which avg audio level is measured.

  • min_packet_num

    pos_integer()

    Default value: 50
    Minimal number of packets to count avg audio level from. Speech won't be detected until there are enough packets.

  • vad_threshold

    -127..0

    Default value: -50
    Audio level in dBov representing vad threshold. Values above are considered to represent voice activity. Value -127 represents digital silence.

  • vad_silence_time

    pos_integer()

    Default value: 300
    Time to wait before emitting notification silence_notification_t/0 after audio track is no longer considered to represent speech. If at this time audio track is considered to represent speech again the notification will not be sent.

Pads

:input

Availability:always
Caps:any
Demand unit:buffers
Direction:input
Mode:pull
Name:input

:output

Availability:always
Caps:any
Direction:output
Mode:pull
Name:output

Link to this section Summary

Types

Notification sent after detecting silence activity.

Notification sent after detecting speech activity.

t()

Struct containing options for Membrane.RTP.VAD

Functions

Returns pads descriptions for Membrane.RTP.VAD

Returns description of options available for this module

Link to this section Types

Link to this type

silence_notification_t()

View Source

Specs

silence_notification_t() :: {:vad, :silence}

Notification sent after detecting silence activity.

Link to this type

speech_notification_t()

View Source

Specs

speech_notification_t() :: {:vad, :speech}

Notification sent after detecting speech activity.

Specs

t() :: %Membrane.RTP.VAD{
  clock_rate: Membrane.RTP.clock_rate_t(),
  min_packet_num: pos_integer(),
  time_window: pos_integer(),
  vad_id: 1..14,
  vad_silence_time: pos_integer(),
  vad_threshold: -127..0
}

Struct containing options for Membrane.RTP.VAD

Link to this section Functions

Specs

membrane_pads() :: [{Membrane.Pad.name_t(), Membrane.Pad.description_t()}]

Returns pads descriptions for Membrane.RTP.VAD

Specs

options() :: keyword()

Returns description of options available for this module