View Source ExOpenAI.Realtime (ex_openai.ex v1.8.0)
Modules for interacting with the realtime
group of OpenAI APIs
API Reference: https://platform.openai.com/docs/api-reference/realtime
Summary
Functions
Create an ephemeral API token for use in client-side applications with the
Realtime API. Can be configured with the same session parameters as the
session.update
client event.
Functions
@spec create - realtime - session( base_url: String.t(), openai_organization_key: String.t(), openai_api_key: String.t(), voice: :verse | :shimmer | :sage | :echo | :coral | :ballad | :ash | :alloy, turn_detection: %{ create_response: boolean(), interrupt_response: boolean(), prefix_padding_ms: integer(), silence_duration_ms: integer(), threshold: float(), type: String.t() }, tools: [ %{ description: String.t(), name: String.t(), parameters: map(), type: :function } ], tool_choice: String.t(), temperature: float(), output_audio_format: :g711_alaw | :g711_ulaw | :pcm16, model: :"gpt-4o-mini-realtime-preview-2024-12-17" | :"gpt-4o-mini-realtime-preview" | :"gpt-4o-realtime-preview-2024-12-17" | :"gpt-4o-realtime-preview-2024-10-01" | :"gpt-4o-realtime-preview", modalities: [:audio | :text], max_response_output_tokens: :inf | integer(), instructions: String.t(), input_audio_transcription: %{ language: String.t(), model: String.t(), prompt: String.t() }, input_audio_format: :g711_alaw | :g711_ulaw | :pcm16, stream_to: (... -> any()) | pid() ) :: {:ok, ExOpenAI.Components.RealtimeSessionCreateResponse.t()} | {:error, any()}
Create an ephemeral API token for use in client-side applications with the
Realtime API. Can be configured with the same session parameters as the
session.update
client event.
It responds with a session object, plus a client_secret
key which contains
a usable ephemeral API token that can be used to authenticate browser clients
for the Realtime API.
Endpoint: https://api.openai.com/v1/realtime/sessions
Method: POST
Docs: https://platform.openai.com/docs/api-reference/realtime
Required Arguments:
Optional Arguments:
stream_to
: "PID or function of where to stream content to"input_audio_format
: "The format of input audio. Options arepcm16
,g711_ulaw
, org711_alaw
.\nForpcm16
, input audio must be 16-bit PCM at a 24kHz sample rate, \nsingle channel (mono), and little-endian byte order.\n"input_audio_transcription
: "Configuration for input audio transcription, defaults to off and can be set tonull
to turn off once on. Input audio transcription is not native to the model, since the model consumes audio directly. Transcription runs asynchronously through OpenAI Whisper transcription and should be treated as rough guidance rather than the representation understood by the model. The client can optionally set the language and prompt for transcription, these fields will be passed to the Whisper API.\n"instructions
: "The default system instructions (i.e. system message) prepended to model \ncalls. This field allows the client to guide the model on desired \nresponses. The model can be instructed on response content and format, \n(e.g. \"be extremely succinct\", \"act friendly\", \"here are examples of good \nresponses\") and on audio behavior (e.g. \"talk quickly\", \"inject emotion \ninto your voice\", \"laugh frequently\"). The instructions are not guaranteed \nto be followed by the model, but they provide guidance to the model on the \ndesired behavior.\n\nNote that the server sets default instructions which will be used if this \nfield is not set and are visible in thesession.created
event at the \nstart of the session.\n"max_response_output_tokens
: "Maximum number of output tokens for a single assistant response,\ninclusive of tool calls. Provide an integer between 1 and 4096 to\nlimit output tokens, orinf
for the maximum available tokens for a\ngiven model. Defaults toinf
.\n"modalities
: "The set of modalities the model can respond with. To disable audio,\nset this to [\"text\"].\n"model
: "The Realtime model used for this session.\n"output_audio_format
: "The format of output audio. Options arepcm16
,g711_ulaw
, org711_alaw
.\nForpcm16
, output audio is sampled at a rate of 24kHz.\n"temperature
: "Sampling temperature for the model, limited to [0.6, 1.2]. Defaults to 0.8.\n"tool_choice
: "How the model chooses tools. Options areauto
,none
,required
, or \nspecify a function.\n"tools
: "Tools (functions) available to the model."turn_detection
: "Configuration for turn detection. Can be set tonull
to turn off. Server \nVAD means that the model will detect the start and end of speech based on \naudio volume and respond at the end of user speech.\n"voice
: "The voice the model uses to respond. Voice cannot be changed during the \nsession once the model has responded with audio at least once. Current \nvoice options arealloy
,ash
,ballad
,coral
,echo
sage
, \nshimmer
andverse
.\n"openai_api_key
: "OpenAI API key to pass directly. If this is specified, it will override theapi_key
config value."openai_organization_key
: "OpenAI API key to pass directly. If this is specified, it will override theorganization_key
config value."base_url
: "Which API endpoint to use as base, defaults to https://api.openai.com/v1"