View Source Nostrum.Voice (Nostrum v0.8.0)

Interface for playing and listening to audio through Discord's voice channels.

Using Discord Voice Channels

To play sound in Discord with Nostrum, you'll need ffmpeg to be installed. If you don't have the executable ffmpeg in the path, the absolute path may be configured through config keys :nostrum, :ffmpeg. If you don't want to use ffmpeg, read on to the next section.

A bot may be connected to at most one voice channel per guild. For this reason, most of the functions in this module take a guild id, and the resulting action will be performed in the given guild's voice channel that the bot is connected to.

The primary Discord gateway responsible for all text based communication relies on one websocket connection per shard, where small bots typically only have one shard. The Discord voice gateways work by establishing a websocket connection per guild/channel. After some handshaking on this connection, audio data can be sent over UDP/RTP. Behind the scenes the voice websocket connections are implemented nearly the same way the main shard websocket connections are, and require no developer intervention.

In addition to playing audio, listening to incoming audio is supported through the functions listen/3 and start_listen_async/1.

voice-without-ffmpeg

Voice Without FFmpeg

If you wish to BYOE (Bring Your Own Encoder), there are a few options.

  • Use :raw as type for play/4
    • Provide the complete list of opus frames as the input
  • Use :raw_s as type for play/4
    • Provide a stateful enumerable of opus frames as input (think GenServer wrapped in Stream.unfold/2)
  • Use lower level functions to send opus frames at your leisure

Link to this section Summary

Types

Opus packet

The play input

The type of play input

Tuple with RTP header elements and opus packet

RTP sequence

RTP SSRC

RTP timestamp

Functions

Returns a specification to start this module under a supervisor.

Low-level. Manually connect to voice websockets gateway.

Create a complete Ogg logical bitstream from a list of Opus packets.

Extract the opus packet from the RTP packet received from Discord.

Gets the id of the voice channel that the bot is connected to.

Gets the current URL being played.

Gets a map of RTP SSRC to user id.

Leaves the voice channel of the given guild id.

Listen for incoming voice RTP packets.

Pad discontinuous chunks of opus audio with silence.

Pauses the current sound being played in a voice channel.

Plays sound in the voice channel the bot is in.

Checks if the bot is playing sound in a voice channel.

Checks if the connection is up and ready to play audio.

Resumes playing the current paused sound in a voice channel.

Low-level. Send pre-encoded audio packets directly.

Low-level. Set speaking flag in voice channel.

Start asynchronously receiving events for incoming RTP packets for an active voice session.

Stops the current sound being played in a voice channel.

Stop asynchronously receiving events for incoming RTP packets for an active voice session.

Link to this section Types

Link to this type

opus_packet()

View Source (since 0.6.0)
@type opus_packet() :: binary()

Opus packet

Link to this type

play_input()

View Source (since 0.6.0)
@type play_input() :: String.t() | binary() | Enum.t()

The play input

The input given to play/4, either a compatible URL or binary audio data. See play/4 for more information.

Link to this type

play_type()

View Source (since 0.6.0)
@type play_type() :: :url | :pipe | :ytdl | :stream | :raw | :raw_s

The type of play input

The type given to play/4 determines how the input parameter is interpreted. See play/4 for more information.

Link to this type

rtp_opus()

View Source (since 0.6.0)
@type rtp_opus() :: {{rtp_sequence(), rtp_timestamp(), rtp_ssrc()}, opus_packet()}

Tuple with RTP header elements and opus packet

Link to this type

rtp_sequence()

View Source (since 0.6.0)
@type rtp_sequence() :: non_neg_integer()

RTP sequence

Link to this type

rtp_ssrc()

View Source (since 0.6.0)
@type rtp_ssrc() :: non_neg_integer()

RTP SSRC

Link to this type

rtp_timestamp()

View Source (since 0.6.0)
@type rtp_timestamp() :: non_neg_integer()

RTP timestamp

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

connect_to_gateway(guild_id)

View Source (since 0.5.0)
@spec connect_to_gateway(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}

Low-level. Manually connect to voice websockets gateway.

This function should only be called if config option :voice_auto_connect is set to false. By default Nostrum will automatically create a voice gateway when joining a channel.

Link to this function

create_ogg_bitstream(opus_packets)

View Source (since 0.5.1)
@spec create_ogg_bitstream([opus_packet()]) :: [binary()]

Create a complete Ogg logical bitstream from a list of Opus packets.

This function takes a list of opus packets and returns a list of Ogg encapsulated Opus pages for a single Ogg logical bitstream.

It is highly recommended to learn about the Ogg container format to understand how to use the data.

To get started, assuming you have a list of evenly temporally spaced and consecutive opus packets from a single source that you want written to a file, you can run the following:

bitstream =
  opus_packets
  |> create_ogg_bitstream()
  |> :binary.list_to_bin()

File.write!("my_recording.ogg", bitstream)

When creating a logical bitstream, ensure that the packets are all from a single SSRC. When listening in a channel with multiple speakers, you should be storing the received packets in unique buckets for each SSRC so that the multiple audio sources don't become jumbled. A single logical bitstream should represent audio data from a single speaker. An Ogg physical bitstream (e.g. a file) may be composed of multiple interleaved Ogg logical bitstreams as each logical bitstream and its constituent pages contain a unique and randomly generated bitstream serial number, but this is a story for another time.

Assuming you have a list of rtp_opus/0 packets that are not separated by ssrc, you may do the following:

jumbled_packets
|> Stream.filter(fn {{_seq, _time, ssrc}, _opus} -> ssrc == particular_ssrc end)
|> Enum.map(fn {{_seq, _time, _ssrc}, opus} -> opus end)
|> create_ogg_bitstream()
Link to this function

extract_opus_packet(packet)

View Source (since 0.6.0)
@spec extract_opus_packet(binary()) :: opus_packet()

Extract the opus packet from the RTP packet received from Discord.

Incoming voice RTP packets contain a fixed length RTP header and an optional RTP header extension, which must be stripped to retrieve the underlying opus packet.

Link to this function

get_channel_id(guild_id)

View Source
@spec get_channel_id(Nostrum.Struct.Guild.id()) :: Nostrum.Struct.Channel.id()

Gets the id of the voice channel that the bot is connected to.

parameters

Parameters

  • guild_id - ID of guild that the resultant channel belongs to.

Returns the channel_id for the channel the bot is connected to, otherwise nil.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.get_channel(123456789)
420691337

iex> Nostrum.Voice.leave_channel(123456789)

iex> Nostrum.Voice.get_channel(123456789)
nil
Link to this function

get_current_url(guild_id)

View Source (since 0.6.0)
@spec get_current_url(Nostrum.Struct.Guild.id()) :: String.t() | nil

Gets the current URL being played.

If play/4 was invoked with type :url, :ytdl, or :stream, this function will return the URL given as input last time it was called.

If play/4 was invoked with type :pipe, :raw, or :raw_s, this will return nil as the input is raw audio data, not be a readable URL string.

Link to this function

get_ssrc_map(guild_id)

View Source (since 0.6.0)

Gets a map of RTP SSRC to user id.

Within a voice channel, an SSRC (synchronization source) will uniquely map to a user id of a user who is speaking.

If listening to incoming voice packets asynchronously, this function will not be needed as the Nostrum.Struct.VoiceWSState.ssrc_map/0 will be available with every event. If listening with listen/3, this function may be used. It is recommended to cache the result of this function and only call it again when you encounter an SSRC that is not present in the cached result. This is to reduce excess load on the voice websocket and voice state processes.

Link to this function

join_channel(guild_id, channel_id, self_mute \\ false, self_deaf \\ false, persist \\ true)

View Source
@spec join_channel(
  Nostrum.Struct.Guild.id(),
  Nostrum.Struct.Channel.id(),
  boolean(),
  boolean(),
  boolean()
) :: no_return() | :ok

Joins or moves the bot to a voice channel.

This function calls Nostrum.Api.update_voice_state/4.

The fifth argument persist defaults to true. When true, if calling join_channel/5 while already in a different channel in the same guild, the audio source will be persisted in the new channel. If the audio is actively playing at the time of changing channels, it will resume playing automatically upon joining. If there is an active audio source that has been paused before changing channels, the audio will be able to be resumed manually if resume/1 is called.

If persist is set to false, the audio source will be destroyed before changing channels. The same effect is achieved by calling stop/1 or leave_channel/1 before join_channel/5

@spec leave_channel(Nostrum.Struct.Guild.id()) :: no_return() | :ok

Leaves the voice channel of the given guild id.

This function is equivalent to calling Nostrum.Api.update_voice_state(guild_id, nil).

Link to this function

listen(guild_id, num_packets, raw_rtp \\ false)

View Source (since 0.6.0)
@spec listen(Nostrum.Struct.Guild.id(), pos_integer(), raw_rtp :: false) ::
  [rtp_opus()] | {:error, String.t()}
@spec listen(Nostrum.Struct.Guild.id(), pos_integer(), raw_rtp :: true) ::
  [binary()] | {:error, String.t()}

Listen for incoming voice RTP packets.

parameters

Parameters

  • guild_id - ID of guild that the bot is listening to.
  • num_packets - Number of packets to wait for.
  • raw_rtp - Whether to return raw RTP packets. Defaults to false.

Returns a list of tuples of type rtp_opus/0.

The inner tuple contains fields from the RTP header and can be matched against to retrieve information about the packet such as the SSRC, which identifies the source. Note that RTP timestamps are completely unrelated to Unix timestamps.

If raw_rtp is set to true, a list of raw RTP packets is returned instead. To extract an opus packet from an RTP packet, see extract_opus_packet/1.

This function will block until the specified number of packets is received.

Link to this function

pad_opus(packets)

View Source (since 0.6.0)
@spec pad_opus([rtp_opus(), ...]) :: [opus_packet()]

Pad discontinuous chunks of opus audio with silence.

This function takes a list of rtp_opus/0, which is a tuple containing RTP bits and opus audio data. It returns a list of opus audio packets. The reason the input has to be in the rtp_opus/0 tuple format returned by listen/3 and async listen events is that the RTP packet header contains info on the relative timestamps of incoming packets; the opus packets themselves don't contain information relating to timing.

The Discord client will continue to internally increment the t:rtp_timestamp() when the user is not speaking such that the duration of pauses can be determined from the RTP packets. Bots will typically not behave this way, so if you call this function on audio produced by a bot it is very likely that no silence will be inserted.

The use case of this function is as follows: Consider a user speaks for two seconds, pauses for ten seconds, then speaks for another two seconds. During the pause, no RTP packets will be received, so if you create a bitstream from it, the resulting audio will be both two-second speaking segments consecutively without the long pause in the middle. If you wish to preserve the timing of the speaking and include the pause, calling this function will interleave the appropriate amount of opus silence packets to maintain temporal fidelity.

Note that the Discord client currently sends about 10 silence packets (200 ms) each time it detects end of speech, so creating a bitstream without first padding your audio with this function will maintain short silences between speech segments.

This function should only be called on a collection of RTP packets from a single SSRC

@spec pause(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}

Pauses the current sound being played in a voice channel.

The bot must be connected to a voice channel in the guild specified.

parameters

Parameters

  • guild_id - ID of guild whose voice channel the sound will be paused in.

Returns {:error, reason} if unable to pause or no sound is playing, else :ok.

This function is similar to stop/1, except that the sound may be resumed after being paused.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "~/files/twelve_hour_loop_of_waterfall_sounds.mp3")

iex> Nostrum.Voice.pause(123456789)
Link to this function

play(guild_id, input, type \\ :url, options \\ [])

View Source
@spec play(Nostrum.Struct.Guild.id(), play_input(), play_type(), keyword()) ::
  :ok | {:error, String.t()}

Plays sound in the voice channel the bot is in.

The bot must be connected to a voice channel in the guild specified.

parameters

Parameters

  • guild_id - ID of guild whose voice channel the sound will be played in.
  • input - Audio to be played, play_input/0. Input type determined by type parameter.
  • type - Type of input, play_type/0 (defaults to :url).
    • :url Input will be any url that ffmpeg can read.
    • :pipe Input will be data that is piped to stdin of ffmpeg.
    • :ytdl Input will be url for youtube-dl, which gets automatically piped to ffmpeg.
    • :stream Input will be livestream url for streamlink, which gets automatically piped to ffmpeg.
    • :raw Input will be an enumerable of raw opus packets. This bypasses ffmpeg and all options.
    • :raw_s Same as :raw but input must be stateful, i.e. calling Enum.take/2 on input is not idempotent.
  • options - See options section below.

Returns {:error, reason} if unable to play or a sound is playing, else :ok.

options

Options

  • :start_pos (string) - The start position of the audio to be played. Defaults to beginning.
  • :duration (string) - The duration to of the audio to be played . Defaults to entire duration.
  • :realtime (boolean) - Make ffmpeg process the input in realtime instead of as fast as possible. Defaults to true.
  • :volume (number) - The output volume of the audio. Default volume is 1.0.
  • :filter (string) - Filter(s) to be applied to the audio. No filters applied by default.

The values of :start_pos and :duration can be any time duration that ffmpeg can read. The :filter can be used multiple times in a single call (see examples). The values of :filter can be any audio filters that ffmpeg can read. Filters will be applied in order and can be as complex as you want. The world is your oyster!

Note that using the :volume option is shortcut for the "volume" filter, and will be added to the end of the filter chain, acting as a master volume. Volume values between 0.0 and 1.0 act as standard operating range where 0 is off and 1 is max. Values greater than 1.0 will add saturation and distortion to the audio. Negative values act the same as their position but reverse the polarity of the waveform.

Having all the ffmpeg audio filters available is extremely powerful so it may be worth learning some of them for your use cases. If you use any filters to increase the playback speed of your audio, it's recommended to set the :realtime option to false because realtime processing is relative to the original playback speed.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "~/music/FavoriteSong.mp3", :url)

iex> Nostrum.Voice.play(123456789, "~/music/NotFavoriteButStillGoodSong.mp3", :url, volume: 0.5)

iex> Nostrum.Voice.play(123456789, "~/music/ThisWillBeHeavilyDistorted.mp3", :url, volume: 1000)
iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> raw_data = File.read!("~/music/sound_effect.wav")

iex> Nostrum.Voice.play(123456789, raw_data, :pipe)
iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "https://www.youtube.com/watch?v=b4RJ-QGOtw4", :ytdl,
...>   realtime: true, start_pos: "0:17", duration: "30")

iex> Nostrum.Voice.play(123456789, "https://www.youtube.com/watch?v=0ngcL_5ekXo", :ytdl,
...>   filter: "lowpass=f=1200", filter: "highpass=f=300", filter: "asetrate=44100*0.5")
iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "https://www.twitch.tv/pestily", :stream)

iex> Nostrum.Voice.play(123456789, "https://youtu.be/LN4r-K8ZP5Q", :stream)
@spec playing?(Nostrum.Struct.Guild.id()) :: boolean()

Checks if the bot is playing sound in a voice channel.

parameters

Parameters

  • guild_id - ID of guild to check if audio being played.

Returns true if the bot is currently being played in a voice channel, otherwise false.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "https://a-real-site.biz/RickRoll.m4a")

iex> Nostrum.Voice.playing?(123456789)
true

iex> Nostrum.Voice.pause(123456789)

iex> Nostrum.Voice.playing?(123456789)
false
@spec ready?(Nostrum.Struct.Guild.id()) :: boolean()

Checks if the connection is up and ready to play audio.

parameters

Parameters

  • guild_id - ID of guild to check if voice connection is up.

Returns true if the bot is connected to a voice channel, otherwise false.

This function does not check if audio is already playing. For that, use playing?/1.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.ready?(123456789)
true

iex> Nostrum.Voice.leave_channel(123456789)

iex> Nostrum.Voice.ready?(123456789)
false
@spec resume(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}

Resumes playing the current paused sound in a voice channel.

The bot must be connected to a voice channel in the guild specified.

parameters

Parameters

  • guild_id - ID of guild whose voice channel the sound will be resumed in.

Returns {:error, reason} if unable to resume or no sound has been paused, otherwise returns :ok.

This function is used to resume a sound that had previously been paused.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "~/stuff/Toto - Africa (Bass Boosted)")

iex> Nostrum.Voice.pause(123456789)

iex> Nostrum.Voice.resume(123456789)
Link to this function

send_frames(guild_id, frames)

View Source (since 0.5.0)
@spec send_frames(Nostrum.Struct.Guild.id(), [opus_packet()]) ::
  :ok | {:error, String.t()}

Low-level. Send pre-encoded audio packets directly.

Speaking should be set to true via Nostrum.Voice.set_is_speaking/2 before sending frames.

Opus frames will be encrypted and prefixed with the appropriate RTP header and sent immediately. The length of frames depends on how often you wish to send a sequence of frames. A single frame contains 20ms of audio. Sending more than 50 frames (1 second of audio) in a single function call may result in inconsistent playback rates.

Nostrum.Voice.playing?/1 will not return accurate values when using send_frames/2 instead of Nostrum.Voice.play/4

Link to this function

set_is_speaking(guild_id, speaking)

View Source (since 0.5.0)
@spec set_is_speaking(Nostrum.Struct.Guild.id(), boolean()) :: :ok

Low-level. Set speaking flag in voice channel.

This function does not need to be called unless you are sending audio frames directly using Nostrum.Voice.send_frames/2.

Link to this function

start_listen_async(guild_id)

View Source (since 0.6.0)
@spec start_listen_async(Nostrum.Struct.Guild.id()) :: :ok | {:error, term()}

Start asynchronously receiving events for incoming RTP packets for an active voice session.

This is an alternative to the blocking listen/3. Events will be generated asynchronously when a user is speaking. See Nostrum.Consumer.voice_incoming_packet/0 for more info.

@spec stop(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}

Stops the current sound being played in a voice channel.

The bot must be connected to a voice channel in the guild specified.

parameters

Parameters

  • guild_id - ID of guild whose voice channel the sound will be stopped in.

Returns {:error, reason} if unable to stop or no sound is playing, else :ok.

If a sound has finished playing, this function does not need to be called to start playing another sound.

examples

Examples

iex> Nostrum.Voice.join_channel(123456789, 420691337)

iex> Nostrum.Voice.play(123456789, "http://brandthill.com/files/weird_dubstep_noises.mp3")

iex> Nostrum.Voice.stop(123456789)
Link to this function

stop_listen_async(guild_id)

View Source (since 0.6.0)
@spec stop_listen_async(Nostrum.Struct.Guild.id()) :: :ok | {:error, term()}

Stop asynchronously receiving events for incoming RTP packets for an active voice session.