OpenrouterSdk.Audio (OpenRouter SDK v0.1.4)

high-level audio helpers built on top of /chat/completions.

why this module exists alongside Api.Speech and Api.Transcription: the dedicated /audio/speech and /audio/transcriptions endpoints on openrouter accept a fixed allowlist of model slugs that aren't exposed via /models, so consumers can't discover them via the catalog. transcription via multipart is also broken at the gateway level (you'll see No number after minus sign in JSON at position 1 from v8 choking on --<boundary>).

the catalog does list audio-input and audio-output chat models — gpt-audio, gpt-audio-mini, gemini-2.5-flash, voxtral, etc. — and the documented /chat/completions input_audio (stt) and audio modality (tts) paths work against any of them. that's what this module wraps.

transcribe/2 — audio → text, picks from OpenrouterSdk.Catalog.Models.audio_input_models/0
speak/2 — text → audio, picks from OpenrouterSdk.Catalog.Models.tts_models/0

Summary

Functions

speak(payload, opts \\ [])

generate speech audio from text via /chat/completions with an audio output modality.

transcribe(payload, opts \\ [])

transcribe a binary audio clip via chat completions + input_audio.

Functions

speak(payload, opts \\ [])

@spec speak(
  map(),
  keyword()
) :: {:ok, binary()} | {:error, term()}

generate speech audio from text via /chat/completions with an audio output modality.

{:ok, mp3_binary} =
  OpenrouterSdk.Audio.speak(%{
    text: "hello there",
    model: "openai/gpt-audio-mini"
  })

File.write!("hello.mp3", mp3_binary)

options on the payload

:text (required) — the text to read aloud
:model (required) — an audio-output chat model id (e.g. from OpenrouterSdk.Catalog.Models.tts_models/0)
:voice — defaults to "alloy". accepts whatever the chosen model's provider supports
:format — defaults to "mp3". accepts "mp3", "wav", "opus", "flac" etc. depending on provider

returns the raw audio bytes — the helper base64-decodes the response audio for you.

transcribe(payload, opts \\ [])

@spec transcribe(
  map(),
  keyword()
) :: {:ok, String.t()} | {:error, term()}

transcribe a binary audio clip via chat completions + input_audio.

{:ok, "hello world"} =
  OpenrouterSdk.Audio.transcribe(%{
    audio: File.read!("clip.webm"),
    mime: "audio/webm",
    model: "google/gemini-2.5-flash"
  })

options on the payload

:audio (required) — the raw audio bytes
:mime (required) — content type. audio/webm, audio/mp4, audio/wav, audio/mpeg, audio/ogg, audio/flac all work
:model (required) — an audio-input chat model id
:prompt (optional) — overrides the default verbatim instruction. use this if you want translation, formatting, etc.

the second argument is forwarded to OpenrouterSdk.Api.Chat.completions/2 so you can pass :config_overrides, custom middleware, etc.