Audio utilities for Live API.
Provides helper functions for working with audio data in the Live API. The Live API uses specific audio formats for input and output.
Audio Formats
- Input: 16-bit PCM, 16kHz, mono
- Output: 16-bit PCM, 24kHz, mono
Usage
# Create an audio blob for sending
blob = Audio.create_input_blob(pcm_data)
Session.send_realtime_input(session, audio: blob)
# Decode audio from server response
pcm_data = Audio.decode_output(base64_data)Sample Rates
The different sample rates for input and output mean that you may need to resample audio when recording from or playing to standard audio devices.
- Input: 16kHz (16,000 samples per second)
- Output: 24kHz (24,000 samples per second)
Summary
Functions
Calculates the byte size needed for a given duration of audio.
Splits audio data into chunks of specified duration.
Creates an audio blob for sending to the Live API.
Decodes audio data from a server response.
Safely decodes audio data, returning an error tuple on failure.
Calculates the duration of audio data in milliseconds.
Returns the expected input MIME type for audio.
Returns the input sample rate (16kHz).
Returns the output MIME type for audio.
Returns the output sample rate (24kHz).
Types
Functions
@spec bytes_for_duration(non_neg_integer(), pos_integer()) :: non_neg_integer()
Calculates the byte size needed for a given duration of audio.
Parameters
duration_ms- Duration in millisecondssample_rate- Sample rate (default: input_sample_rate)
Returns
Number of bytes needed for the given duration.
Example
# Get bytes needed for 100ms of input audio
bytes = Audio.bytes_for_duration(100)
#=> 3200 # (16000 samples/sec * 0.1 sec * 2 bytes/sample)
@spec chunk_audio(binary(), pos_integer(), pos_integer()) :: [binary()]
Splits audio data into chunks of specified duration.
Useful for streaming audio to the Live API in appropriately-sized chunks.
Parameters
pcm_data- Raw PCM audio datachunk_duration_ms- Duration of each chunk in millisecondssample_rate- Sample rate (default: input_sample_rate)
Returns
List of binary chunks, each containing audio for the specified duration. The last chunk may be shorter if the audio doesn't divide evenly.
Example
# Split audio into 100ms chunks for streaming
chunks = Audio.chunk_audio(pcm_data, 100)
Enum.each(chunks, fn chunk ->
blob = Audio.create_input_blob(chunk)
Session.send_realtime_input(session, audio: blob)
end)
@spec create_input_blob( binary(), keyword() ) :: audio_blob()
Creates an audio blob for sending to the Live API.
Takes raw PCM audio data (16-bit, 16kHz, mono) and returns
a properly formatted blob for use with Session.send_realtime_input/2.
Parameters
pcm_data- Raw PCM audio data as binary (16-bit, 16kHz, mono)opts- Optional options::encode- Whether to base64 encode the data (default: false)
Returns
A map with :data and :mime_type keys suitable for the Live API.
Examples
# With raw binary data
blob = Audio.create_input_blob(pcm_data)
Session.send_realtime_input(session, audio: blob)
# With pre-encoding (if you want to send encoded data)
blob = Audio.create_input_blob(pcm_data, encode: true)
Decodes audio data from a server response.
The Live API returns audio data as base64-encoded strings. This function decodes them back to raw PCM data.
Parameters
base64_data- Base64-encoded audio data from server response
Returns
Raw PCM audio data as binary (16-bit, 24kHz, mono).
Example
# From a server response part
audio_data = response.server_content.model_turn.parts
|> Enum.find(& &1.inline_data)
|> Map.get(:inline_data)
|> Map.get(:data)
|> Audio.decode_output()
Safely decodes audio data, returning an error tuple on failure.
Parameters
base64_data- Base64-encoded audio data
Returns
{:ok, binary}- Successfully decoded audio data{:error, reason}- Decoding failed
@spec duration_ms(binary(), pos_integer()) :: non_neg_integer()
Calculates the duration of audio data in milliseconds.
Parameters
pcm_data- Raw PCM audio data (16-bit samples)sample_rate- Sample rate of the audio (default: input_sample_rate)
Returns
Duration in milliseconds as an integer.
Example
# Calculate duration of input audio
duration_ms = Audio.duration_ms(pcm_data)
# Calculate duration of output audio
duration_ms = Audio.duration_ms(output_data, Audio.output_sample_rate())
@spec input_mime_type() :: String.t()
Returns the expected input MIME type for audio.
Example
Audio.input_mime_type()
#=> "audio/pcm;rate=16000"
@spec input_sample_rate() :: pos_integer()
Returns the input sample rate (16kHz).
The Live API expects input audio at 16kHz sample rate.
Example
Audio.input_sample_rate()
#=> 16000
@spec output_mime_type() :: String.t()
Returns the output MIME type for audio.
Example
Audio.output_mime_type()
#=> "audio/pcm;rate=24000"
@spec output_sample_rate() :: pos_integer()
Returns the output sample rate (24kHz).
The Live API returns audio at 24kHz sample rate.
Example
Audio.output_sample_rate()
#=> 24000