View Source ExOpenAI.Realtime (ex_openai.ex v1.8.0)

Modules for interacting with the realtime group of OpenAI APIs

API Reference: https://platform.openai.com/docs/api-reference/realtime

Summary

Functions

Create an ephemeral API token for use in client-side applications with the Realtime API. Can be configured with the same session parameters as the session.update client event.

Functions

create-realtime-session(opts \\ [])

@spec create - realtime -
  session(
    base_url: String.t(),
    openai_organization_key: String.t(),
    openai_api_key: String.t(),
    voice: :verse | :shimmer | :sage | :echo | :coral | :ballad | :ash | :alloy,
    turn_detection: %{
      create_response: boolean(),
      interrupt_response: boolean(),
      prefix_padding_ms: integer(),
      silence_duration_ms: integer(),
      threshold: float(),
      type: String.t()
    },
    tools: [
      %{
        description: String.t(),
        name: String.t(),
        parameters: map(),
        type: :function
      }
    ],
    tool_choice: String.t(),
    temperature: float(),
    output_audio_format: :g711_alaw | :g711_ulaw | :pcm16,
    model:
      :"gpt-4o-mini-realtime-preview-2024-12-17"
      | :"gpt-4o-mini-realtime-preview"
      | :"gpt-4o-realtime-preview-2024-12-17"
      | :"gpt-4o-realtime-preview-2024-10-01"
      | :"gpt-4o-realtime-preview",
    modalities: [:audio | :text],
    max_response_output_tokens: :inf | integer(),
    instructions: String.t(),
    input_audio_transcription: %{
      language: String.t(),
      model: String.t(),
      prompt: String.t()
    },
    input_audio_format: :g711_alaw | :g711_ulaw | :pcm16,
    stream_to: (... -> any()) | pid()
  ) ::
  {:ok, ExOpenAI.Components.RealtimeSessionCreateResponse.t()} | {:error, any()}

Create an ephemeral API token for use in client-side applications with the Realtime API. Can be configured with the same session parameters as the session.update client event.

It responds with a session object, plus a client_secret key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API.

Endpoint: https://api.openai.com/v1/realtime/sessions

Method: POST

Docs: https://platform.openai.com/docs/api-reference/realtime

Required Arguments:

Optional Arguments:

stream_to: "PID or function of where to stream content to"
input_audio_format: "The format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.\nFor pcm16, input audio must be 16-bit PCM at a 24kHz sample rate, \nsingle channel (mono), and little-endian byte order.\n"
input_audio_transcription: "Configuration for input audio transcription, defaults to off and can be set to null to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through OpenAI Whisper transcription and should be treated as rough guidance rather than the representation understood by the model. The client can optionally set the language and prompt for transcription, these fields will be passed to the Whisper API.\n"
instructions: "The default system instructions (i.e. system message) prepended to model \ncalls. This field allows the client to guide the model on desired \nresponses. The model can be instructed on response content and format, \n(e.g. \"be extremely succinct\", \"act friendly\", \"here are examples of good \nresponses\") and on audio behavior (e.g. \"talk quickly\", \"inject emotion \ninto your voice\", \"laugh frequently\"). The instructions are not guaranteed \nto be followed by the model, but they provide guidance to the model on the \ndesired behavior.\n\nNote that the server sets default instructions which will be used if this \nfield is not set and are visible in the session.created event at the \nstart of the session.\n"
max_response_output_tokens: "Maximum number of output tokens for a single assistant response,\ninclusive of tool calls. Provide an integer between 1 and 4096 to\nlimit output tokens, or inf for the maximum available tokens for a\ngiven model. Defaults to inf.\n"
modalities: "The set of modalities the model can respond with. To disable audio,\nset this to [\"text\"].\n"
model: "The Realtime model used for this session.\n"
output_audio_format: "The format of output audio. Options are pcm16, g711_ulaw, or g711_alaw.\nFor pcm16, output audio is sampled at a rate of 24kHz.\n"
temperature: "Sampling temperature for the model, limited to [0.6, 1.2]. Defaults to 0.8.\n"
tool_choice: "How the model chooses tools. Options are auto, none, required, or \nspecify a function.\n"
tools: "Tools (functions) available to the model."
turn_detection: "Configuration for turn detection. Can be set to null to turn off. Server \nVAD means that the model will detect the start and end of speech based on \naudio volume and respond at the end of user speech.\n"
voice: "The voice the model uses to respond. Voice cannot be changed during the \nsession once the model has responded with audio at least once. Current \nvoice options are alloy, ash, ballad, coral, echo sage, \nshimmer and verse.\n"
openai_api_key: "OpenAI API key to pass directly. If this is specified, it will override the api_key config value."
openai_organization_key: "OpenAI API key to pass directly. If this is specified, it will override the organization_key config value."
base_url: "Which API endpoint to use as base, defaults to https://api.openai.com/v1"