# `OpenrouterSdk.Audio`
[🔗](https://github.com/sf-voice/openrouter-elixir-sdk/blob/v0.1.4/lib/openrouter_sdk/audio.ex#L1)

high-level audio helpers built on top of `/chat/completions`.

why this module exists alongside `Api.Speech` and `Api.Transcription`:
the dedicated `/audio/speech` and `/audio/transcriptions` endpoints
on openrouter accept a fixed allowlist of model slugs that aren't
exposed via `/models`, so consumers can't discover them via the
catalog. transcription via multipart is also broken at the gateway
level (you'll see `No number after minus sign in JSON at position 1`
from v8 choking on `--<boundary>`).

the catalog *does* list audio-input and audio-output chat models —
`gpt-audio`, `gpt-audio-mini`, `gemini-2.5-flash`, `voxtral`, etc. —
and the documented `/chat/completions` `input_audio` (stt) and
`audio` modality (tts) paths work against any of them. that's what
this module wraps.

  * `transcribe/2` — audio → text, picks from
    `OpenrouterSdk.Catalog.Models.audio_input_models/0`
  * `speak/2` — text → audio, picks from
    `OpenrouterSdk.Catalog.Models.tts_models/0`

# `speak`

```elixir
@spec speak(
  map(),
  keyword()
) :: {:ok, binary()} | {:error, term()}
```

generate speech audio from text via `/chat/completions` with an
audio output modality.

    {:ok, mp3_binary} =
      OpenrouterSdk.Audio.speak(%{
        text: "hello there",
        model: "openai/gpt-audio-mini"
      })

    File.write!("hello.mp3", mp3_binary)

## options on the payload
  * `:text` (required) — the text to read aloud
  * `:model` (required) — an audio-output chat model id (e.g.
    from `OpenrouterSdk.Catalog.Models.tts_models/0`)
  * `:voice` — defaults to `"alloy"`. accepts whatever the chosen
    model's provider supports
  * `:format` — defaults to `"mp3"`. accepts `"mp3"`, `"wav"`,
    `"opus"`, `"flac"` etc. depending on provider

returns the raw audio bytes — the helper base64-decodes the
response audio for you.

# `transcribe`

```elixir
@spec transcribe(
  map(),
  keyword()
) :: {:ok, String.t()} | {:error, term()}
```

transcribe a binary audio clip via chat completions + input_audio.

    {:ok, "hello world"} =
      OpenrouterSdk.Audio.transcribe(%{
        audio: File.read!("clip.webm"),
        mime: "audio/webm",
        model: "google/gemini-2.5-flash"
      })

## options on the payload
  * `:audio` (required) — the raw audio bytes
  * `:mime` (required) — content type. `audio/webm`, `audio/mp4`,
    `audio/wav`, `audio/mpeg`, `audio/ogg`, `audio/flac` all work
  * `:model` (required) — an audio-input chat model id
  * `:prompt` (optional) — overrides the default verbatim
    instruction. use this if you want translation, formatting, etc.

the second argument is forwarded to `OpenrouterSdk.Api.Chat.completions/2`
so you can pass `:config_overrides`, custom middleware, etc.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
