View Source Ollama (Ollama v0.8.0)
Ollama is a powerful tool for running large language models locally or on your own infrastructure. This library provides an interface for working with Ollama in Elixir.
- 🦙 Full implementation of the Ollama API
- 🧰 Tool use (function calling)
- 🧱 Structured outputs
- 🛜 Streaming requests
- Stream to an Enumerable
- Or stream messages to any Elixir process
Installation
The package can be installed by adding ollama
to your list of dependencies
in mix.exs
.
def deps do
[
{:ollama, "0.8.0"}
]
end
Quickstart
Assuming you have Ollama running on localhost, and that you have installed a
model, use completion/2
or chat/2
interact with the model.
1. Generate a completion
iex> client = Ollama.init()
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> ])
{:ok, %{"response" => "The sky is blue because it is the color of the sky.", ...}}
2. Generate the next message in a chat
iex> client = Ollama.init()
iex> messages = [
...> %{role: "system", content: "You are a helpful assistant."},
...> %{role: "user", content: "Why is the sky blue?"},
...> %{role: "assistant", content: "Due to rayleigh scattering."},
...> %{role: "user", content: "How is that different than mie scattering?"},
...> ]
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> ])
{:ok, %{"message" => %{
"role" => "assistant",
"content" => "Mie scattering affects all wavelengths similarly, while Rayleigh favors shorter ones."
}, ...}}
3. Generate structured data
The :format
option can be used with both completion/2
and chat/2
.
Ollama.completion(client, [
model: "llama3.1",
prompt: "Tell me about Canada",
format: %{
type: "object",
properties: %{
name: %{type: "string"},
capital: %{type: "string"},
languages: %{type: "array", items: %{type: "string"}},
},
required: ["name", "capital", "languages"]
}
])
# {:ok, %{"response" => "{ \"name\": \"Canada\" ,\"capital\": \"Ottawa\" ,\"languages\": [\"English\", \"French\"] }", ...}}
Streaming
Streaming is supported on certain endpoints by setting the :stream
option to
true
or a pid/0
.
When :stream
is set to true
, a lazy Enumerable.t/0
is returned, which
can be used with any Stream
functions.
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> stream: true,
...> ])
{:ok, stream}
iex> is_function(stream, 2)
true
iex> stream
...> |> Stream.each(& Process.send(pid, &1, [])
...> |> Stream.run()
:ok
This approach above builds the Enumerable.t/0
by calling receive
, which
may cause issues in GenServer
callbacks. As an alternative, you can set the
:stream
option to a pid/0
. This returns a Task.t/0
that sends
messages to the specified process.
The following example demonstrates a streaming request in a LiveView event, sending each streaming message back to the same LiveView process:
defmodule MyApp.ChatLive do
use Phoenix.LiveView
# When the client invokes the "prompt" event, create a streaming request and
# asynchronously send messages back to self.
def handle_event("prompt", %{"message" => prompt}, socket) do
{:ok, task} = Ollama.completion(Ollama.init(), [
model: "llama2",
prompt: prompt,
stream: self(),
])
{:noreply, assign(socket, current_request: task)}
end
# The streaming request sends messages back to the LiveView process.
def handle_info({_request_pid, {:data, _data}} = message, socket) do
pid = socket.assigns.current_request.pid
case message do
{^pid, {:data, %{"done" => false} = data}} ->
# handle each streaming chunk
{^pid, {:data, %{"done" => true} = data}} ->
# handle the final streaming chunk
{_pid, _data} ->
# this message was not expected!
end
end
# Tidy up when the request is finished
def handle_info({ref, {:ok, %Req.Response{status: 200}}}, socket) do
Process.demonitor(ref, [:flush])
{:noreply, assign(socket, current_request: nil)}
end
end
Regardless of the streaming approach used, each streaming message is a plain
map/0
. For the message schema, refer to the
Ollama API docs.
Function calling
Ollama 0.3 and later versions support tool use and function calling on
compatible models. Note that Ollama currently doesn't support tool use with
streaming requests, so avoid setting :stream
to true
.
Using tools typically involves at least two round-trip requests to the model. Begin by defining one or more tools using a schema similar to ChatGPT's. Provide clear and concise descriptions for the tool and each argument.
iex> stock_price_tool = %{
...> type: "function",
...> function: %{
...> name: "get_stock_price",
...> description: "Fetches the live stock price for the given ticker.",
...> parameters: %{
...> type: "object",
...> properties: %{
...> ticker: %{
...> type: "string",
...> description: "The ticker symbol of a specific stock."
...> }
...> },
...> required: ["ticker"]
...> }
...> }
...> }
The first round-trip involves sending a prompt in a chat with the tool definitions. The model should respond with a message containing a list of tool calls.
iex> Ollama.chat(client, [
...> model: "mistral-nemo",
...> messages: [
...> %{role: "user", content: "What is the current stock price for Apple?"}
...> ],
...> tools: [stock_price_tool],
...> ])
{:ok, %{"message" => %{
"role" => "assistant",
"content" => "",
"tool_calls" => [
%{"function" => %{
"name" => "get_stock_price",
"arguments" => %{"ticker" => "AAPL"}
}}
]
}, ...}}
Your implementation must intercept these tool calls and execute a
corresponding function in your codebase with the specified arguments. The next
round-trip involves passing the function's result back to the model as a
message with a :role
of "tool"
.
iex> Ollama.chat(client, [
...> model: "mistral-nemo",
...> messages: [
...> %{role: "user", content: "What is the current stock price for Apple?"},
...> %{role: "assistant", content: "", tool_calls: [%{"function" => %{"name" => "get_stock_price", "arguments" => %{"ticker" => "AAPL"}}}]},
...> %{role: "tool", content: "$217.96"},
...> ],
...> tools: [stock_price_tool],
...> ])
{:ok, %{"message" => %{
"role" => "assistant",
"content" => "The current stock price for Apple (AAPL) is approximately $217.96.",
}, ...}}
After receiving the function tool's value, the model will respond to the user's original prompt, incorporating the function result into its response.
Summary
Functions
Generates the next message in a chat using the specified model. Optionally streamable.
Checks a blob exists in ollama by its digest or binary data.
Generates a completion for the given prompt using the specified model. Optionally streamable.
Creates a model with another name from an existing model.
Creates a blob from its binary data.
Creates a model using the given name and model file. Optionally streamable.
Deletes a model and its data.
Generate embeddings from a model for the given prompt.
Generate embeddings from a model for the given prompt.
Creates a new Ollama API client. Accepts either a base URL for the Ollama API,
a keyword list of options passed to Req.new/1
, or an existing Req.Request.t/0
struct.
Lists all models that Ollama has available.
Lists currently running models, their memory footprint, and process details.
Load a model into memory without generating a completion. Optionally specify
a keep alive value (defaults to 5 minutes, set -1
to permanently keep alive).
Downloads a model from the ollama library. Optionally streamable.
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first. Optionally streamable.
Shows all information for a specific model.
Stops a running model and unloads it from memory.
Types
@type client() :: %Ollama{req: Req.Request.t()}
Client struct
@type message() :: {:role, term()} | {:content, binary()} | {:images, [binary()]} | {:tool_calls, [%{optional(atom() | binary()) => term()}]}
Chat message
A chat message is a map/0
with the following fields:
:role
- Required. The role of the message, eithersystem
,user
,assistant
ortool
.:content
(String.t/0
) - Required. The content of the message.:images
(list ofString.t/0
) - (optional) List of Base64 encoded images (for multimodal models only).:tool_calls
- (optional) List of tools the model wants to use.
@type response() :: {:ok, map() | boolean() | Enumerable.t() | Task.t()} | {:error, term()}
Client response
Tool definition
A tool definition is a map/0
with the following fields:
:type
- Required. Type of tool. (Currently only"function"
supported).:function
(map/0
) - Required.:name
(String.t/0
) - Required. The name of the function to be called.:description
(String.t/0
) - A description of what the function does.:parameters
- Required. The parameters the functions accepts.
Functions
Generates the next message in a chat using the specified model. Optionally streamable.
Options
:model
(String.t/0
) - Required. The ollama model name.:messages
(list ofmap/0
) - Required. List of messages - used to keep a chat memory.:tools
(list ofmap/0
) - Tools for the model to use if supported (requiresstream
to befalse
):format
- Set the expected format of the response (json
or JSON schema map).:stream
- See section on streaming. The default value isfalse
.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Message structure
Each message is a map with the following fields:
:role
- Required. The role of the message, eithersystem
,user
,assistant
ortool
.:content
(String.t/0
) - Required. The content of the message.:images
(list ofString.t/0
) - (optional) List of Base64 encoded images (for multimodal models only).:tool_calls
- (optional) List of tools the model wants to use.
Tool definitions
:type
- Required. Type of tool. (Currently only"function"
supported).:function
(map/0
) - Required.:name
(String.t/0
) - Required. The name of the function to be called.:description
(String.t/0
) - A description of what the function does.:parameters
- Required. The parameters the functions accepts.
Examples
iex> messages = [
...> %{role: "system", content: "You are a helpful assistant."},
...> %{role: "user", content: "Why is the sky blue?"},
...> %{role: "assistant", content: "Due to rayleigh scattering."},
...> %{role: "user", content: "How is that different than mie scattering?"},
...> ]
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> ])
{:ok, %{"message" => %{
"role" => "assistant",
"content" => "Mie scattering affects all wavelengths similarly, while Rayleigh favors shorter ones."
}, ...}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.chat(client, [
...> model: "llama2",
...> messages: messages,
...> stream: true,
...> ])
{:ok, Ollama.Streaming{}}
@spec check_blob(client(), Ollama.Blob.digest() | binary()) :: response()
Checks a blob exists in ollama by its digest or binary data.
Examples
iex> Ollama.check_blob(client, "sha256:fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e")
{:ok, true}
iex> Ollama.check_blob(client, "this should not exist")
{:ok, false}
Generates a completion for the given prompt using the specified model. Optionally streamable.
Options
:model
(String.t/0
) - Required. The ollama model name.:prompt
(String.t/0
) - Required. Prompt to generate a response for.:images
(list ofString.t/0
) - A list of Base64 encoded images to be included with the prompt (for multimodal models only).:system
(String.t/0
) - System prompt, overriding the model default.:template
(String.t/0
) - Prompt template, overriding the model default.:context
- The context parameter returned from a previouscompletion/2
call (enabling short conversational memory).:format
- Set the expected format of the response (json
or JSON schema map).:raw
(boolean/0
) - Settrue
if specifying a fully templated prompt. (:template
is ingored):stream
- See section on streaming. The default value isfalse
.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Examples
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> ])
{:ok, %{"response": "The sky is blue because it is the color of the sky.", ...}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.completion(client, [
...> model: "llama2",
...> prompt: "Why is the sky blue?",
...> stream: true,
...> ])
{:ok, %Ollama.Streaming{}}
Creates a model with another name from an existing model.
Options
:source
(String.t/0
) - Required. Name of the model to copy from.:destination
(String.t/0
) - Required. Name of the model to copy to.
Example
iex> Ollama.copy_model(client, [
...> source: "llama2",
...> destination: "llama2-backup"
...> ])
{:ok, true}
Creates a blob from its binary data.
Example
iex> Ollama.create_blob(client, data)
{:ok, true}
Creates a model using the given name and model file. Optionally streamable.
Any dependent blobs reference in the modelfile, such as FROM
and ADAPTER
instructions, must exist first. See check_blob/2
and create_blob/2
.
Options
:name
(String.t/0
) - Required. Name of the model to create.:modelfile
(String.t/0
) - Required. Contents of the Modelfile.:quantize
(String.t/0
) - Quantize f16 and f32 models when importing them.:stream
- See section on streaming. The default value isfalse
.
Example
iex> modelfile = "FROM llama2\nSYSTEM \"You are mario from Super Mario Bros.\""
iex> Ollama.create_model(client, [
...> name: "mario",
...> modelfile: modelfile,
...> stream: true,
...> ])
{:ok, Ollama.Streaming{}}
Deletes a model and its data.
Options
:source
(String.t/0
) - Required. Name of the model to copy from.:destination
(String.t/0
) - Required. Name of the model to copy to.
Example
iex> Ollama.delete_model(client, name: "llama2")
{:ok, true}
Generate embeddings from a model for the given prompt.
Options
:model
(String.t/0
) - Required. The name of the model used to generate the embeddings.:input
- Required. Text or list of text to generate embeddings for.:truncate
(boolean/0
) - Truncates the end of each input to fit within context length.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Example
iex> Ollama.embed(client, [
...> model: "nomic-embed-text",
...> input: ["Why is the sky blue?", "Why is the grass green?"],
...> ])
{:ok, %{"embedding" => [
[ 0.009724553, 0.04449892, -0.14063916, 0.0013168337, 0.032128844,
0.10730086, -0.008447222, 0.010106917, 5.2289694e-4, -0.03554127, ...],
[ 0.028196355, 0.043162502, -0.18592504, 0.035034444, 0.055619627,
0.12082449, -0.0090096295, 0.047170386, -0.032078084, 0.0047163847, ...]
]}}
Generate embeddings from a model for the given prompt.
Options
:model
(String.t/0
) - Required. The name of the model used to generate the embeddings.:prompt
(String.t/0
) - Required. The prompt used to generate the embedding.:keep_alive
- How long to keep the model loaded.:options
- Additional advanced model parameters.
Example
iex> Ollama.embeddings(client, [
...> model: "llama2",
...> prompt: "Here is an article about llamas..."
...> ])
{:ok, %{"embedding" => [
0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]}}
@spec init(Req.url() | keyword() | Req.Request.t()) :: client()
Creates a new Ollama API client. Accepts either a base URL for the Ollama API,
a keyword list of options passed to Req.new/1
, or an existing Req.Request.t/0
struct.
If no arguments are given, the client is initiated with the default options:
@default_req_opts [
base_url: "http://localhost:11434/api",
receive_timeout: 60_000,
]
Examples
iex> client = Ollama.init("https://ollama.service.ai:11434/api")
%Ollama{}
Lists all models that Ollama has available.
Example
iex> Ollama.list_models(client)
{:ok, %{"models" => [
%{"name" => "codellama:13b", ...},
%{"name" => "llama2:latest", ...},
]}}
Lists currently running models, their memory footprint, and process details.
Example
iex> Ollama.list_running(client)
{:ok, %{"models" => [
%{"name" => "nomic-embed-text:latest", ...},
]}}
Load a model into memory without generating a completion. Optionally specify
a keep alive value (defaults to 5 minutes, set -1
to permanently keep alive).
Options
:model
(String.t/0
) - Required. Name of the model to load.:keep_alive
- How long to keep the model loaded.
Example
iex> Ollama.preload(client, model: "llama3.1", timeout: 3_600_000)
true
Downloads a model from the ollama library. Optionally streamable.
Options
:name
(String.t/0
) - Required. Name of the model to pull.:stream
- See section on streaming. The default value isfalse
.
Example
iex> Ollama.pull_model(client, name: "llama2")
{:ok, %{"status" => "success"}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.pull_model(client, name: "llama2", stream: true)
{:ok, %Ollama.Streaming{}}
Upload a model to a model library. Requires registering for ollama.ai and adding a public key first. Optionally streamable.
Options
:name
(String.t/0
) - Required. Name of the model to pull.:stream
- See section on streaming. The default value isfalse
.
Example
iex> Ollama.push_model(client, name: "mattw/pygmalion:latest")
{:ok, %{"status" => "success"}}
# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.push_model(client, name: "mattw/pygmalion:latest", stream: true)
{:ok, %Ollama.Streaming{}}
Shows all information for a specific model.
Options
:name
(String.t/0
) - Required. Name of the model to show.
Example
iex> Ollama.show_model(client, name: "llama2")
{:ok, %{
"details" => %{
"families" => ["llama", "clip"],
"family" => "llama",
"format" => "gguf",
"parameter_size" => "7B",
"quantization_level" => "Q4_0"
},
"modelfile" => "...",
"parameters" => "...",
"template" => "..."
}}
Stops a running model and unloads it from memory.
Options
:model
(String.t/0
) - Required. Name of the model to unload.
Example
iex> Ollama.preload(client, model: "llama3.1")
true