high-level audio helpers built on top of /chat/completions.
why this module exists alongside Api.Speech and Api.Transcription:
the dedicated /audio/speech and /audio/transcriptions endpoints
on openrouter accept a fixed allowlist of model slugs that aren't
exposed via /models, so consumers can't discover them via the
catalog. transcription via multipart is also broken at the gateway
level (you'll see No number after minus sign in JSON at position 1
from v8 choking on --<boundary>).
the catalog does list audio-input and audio-output chat models —
gpt-audio, gpt-audio-mini, gemini-2.5-flash, voxtral, etc. —
and the documented /chat/completions input_audio (stt) and
audio modality (tts) paths work against any of them. that's what
this module wraps.
transcribe/2— audio → text, picks fromOpenrouterSdk.Catalog.Models.audio_input_models/0speak/2— text → audio, picks fromOpenrouterSdk.Catalog.Models.tts_models/0
Summary
Functions
generate speech audio from text via /chat/completions with an
audio output modality.
transcribe a binary audio clip via chat completions + input_audio.
Functions
generate speech audio from text via /chat/completions with an
audio output modality.
{:ok, mp3_binary} =
OpenrouterSdk.Audio.speak(%{
text: "hello there",
model: "openai/gpt-audio-mini"
})
File.write!("hello.mp3", mp3_binary)options on the payload
:text(required) — the text to read aloud:model(required) — an audio-output chat model id (e.g. fromOpenrouterSdk.Catalog.Models.tts_models/0):voice— defaults to"alloy". accepts whatever the chosen model's provider supports:format— defaults to"mp3". accepts"mp3","wav","opus","flac"etc. depending on provider
returns the raw audio bytes — the helper base64-decodes the response audio for you.
transcribe a binary audio clip via chat completions + input_audio.
{:ok, "hello world"} =
OpenrouterSdk.Audio.transcribe(%{
audio: File.read!("clip.webm"),
mime: "audio/webm",
model: "google/gemini-2.5-flash"
})options on the payload
:audio(required) — the raw audio bytes:mime(required) — content type.audio/webm,audio/mp4,audio/wav,audio/mpeg,audio/ogg,audio/flacall work:model(required) — an audio-input chat model id:prompt(optional) — overrides the default verbatim instruction. use this if you want translation, formatting, etc.
the second argument is forwarded to OpenrouterSdk.Api.Chat.completions/2
so you can pass :config_overrides, custom middleware, etc.