View Source Ollama (Ollama v0.8.0)

Ollama-ex

License

Ollama is a powerful tool for running large language models locally or on your own infrastructure. This library provides an interface for working with Ollama in Elixir.

🦙 Full implementation of the Ollama API
🧰 Tool use (function calling)
🧱 Structured outputs
🛜 Streaming requests
- Stream to an Enumerable
- Or stream messages to any Elixir process

Installation

The package can be installed by adding ollama to your list of dependencies in mix.exs.

def deps do
  [
    {:ollama, "0.8.0"}
  ]
end

Quickstart

Assuming you have Ollama running on localhost, and that you have installed a model, use completion/2 or chat/2 interact with the model.

1. Generate a completion

iex> client = Ollama.init()

iex> Ollama.completion(client, [
...>   model: "llama2",
...>   prompt: "Why is the sky blue?",
...> ])
{:ok, %{"response" => "The sky is blue because it is the color of the sky.", ...}}

2. Generate the next message in a chat

iex> client = Ollama.init()
iex> messages = [
...>   %{role: "system", content: "You are a helpful assistant."},
...>   %{role: "user", content: "Why is the sky blue?"},
...>   %{role: "assistant", content: "Due to rayleigh scattering."},
...>   %{role: "user", content: "How is that different than mie scattering?"},
...> ]

iex> Ollama.chat(client, [
...>   model: "llama2",
...>   messages: messages,
...> ])
{:ok, %{"message" => %{
  "role" => "assistant",
  "content" => "Mie scattering affects all wavelengths similarly, while Rayleigh favors shorter ones."
}, ...}}

3. Generate structured data

The :format option can be used with both completion/2 and chat/2.

Ollama.completion(client, [
  model: "llama3.1",
  prompt: "Tell me about Canada",
  format: %{
    type: "object",
    properties: %{
      name: %{type: "string"},
      capital: %{type: "string"},
      languages: %{type: "array", items: %{type: "string"}},
    },
    required: ["name", "capital", "languages"]
  }
])
# {:ok, %{"response" => "{ \"name\": \"Canada\" ,\"capital\": \"Ottawa\" ,\"languages\": [\"English\", \"French\"] }", ...}}

Streaming

Streaming is supported on certain endpoints by setting the :stream option to true or a pid/0.

When :stream is set to true, a lazy Enumerable.t/0 is returned, which can be used with any Stream functions.

iex> Ollama.completion(client, [
...>   model: "llama2",
...>   prompt: "Why is the sky blue?",
...>   stream: true,
...> ])
{:ok, stream}

iex> is_function(stream, 2)
true

iex> stream
...> |> Stream.each(& Process.send(pid, &1, [])
...> |> Stream.run()
:ok

This approach above builds the Enumerable.t/0 by calling receive, which may cause issues in GenServer callbacks. As an alternative, you can set the :stream option to a pid/0. This returns a Task.t/0 that sends messages to the specified process.

The following example demonstrates a streaming request in a LiveView event, sending each streaming message back to the same LiveView process:

defmodule MyApp.ChatLive do
  use Phoenix.LiveView

  # When the client invokes the "prompt" event, create a streaming request and
  # asynchronously send messages back to self.
  def handle_event("prompt", %{"message" => prompt}, socket) do
    {:ok, task} = Ollama.completion(Ollama.init(), [
      model: "llama2",
      prompt: prompt,
      stream: self(),
    ])

    {:noreply, assign(socket, current_request: task)}
  end

  # The streaming request sends messages back to the LiveView process.
  def handle_info({_request_pid, {:data, _data}} = message, socket) do
    pid = socket.assigns.current_request.pid
    case message do
      {^pid, {:data, %{"done" => false} = data}} ->
        # handle each streaming chunk

      {^pid, {:data, %{"done" => true} = data}} ->
        # handle the final streaming chunk

      {_pid, _data} ->
        # this message was not expected!
    end
  end

  # Tidy up when the request is finished
  def handle_info({ref, {:ok, %Req.Response{status: 200}}}, socket) do
    Process.demonitor(ref, [:flush])
    {:noreply, assign(socket, current_request: nil)}
  end
end

Regardless of the streaming approach used, each streaming message is a plain map/0. For the message schema, refer to the Ollama API docs.

Function calling

Ollama 0.3 and later versions support tool use and function calling on compatible models. Note that Ollama currently doesn't support tool use with streaming requests, so avoid setting :stream to true.

Using tools typically involves at least two round-trip requests to the model. Begin by defining one or more tools using a schema similar to ChatGPT's. Provide clear and concise descriptions for the tool and each argument.

iex> stock_price_tool = %{
...>   type: "function",
...>   function: %{
...>     name: "get_stock_price",
...>     description: "Fetches the live stock price for the given ticker.",
...>     parameters: %{
...>       type: "object",
...>       properties: %{
...>         ticker: %{
...>           type: "string",
...>           description: "The ticker symbol of a specific stock."
...>         }
...>       },
...>       required: ["ticker"]
...>     }
...>   }
...> }

The first round-trip involves sending a prompt in a chat with the tool definitions. The model should respond with a message containing a list of tool calls.

iex> Ollama.chat(client, [
...>   model: "mistral-nemo",
...>   messages: [
...>     %{role: "user", content: "What is the current stock price for Apple?"}
...>   ],
...>   tools: [stock_price_tool],
...> ])
{:ok, %{"message" => %{
  "role" => "assistant",
  "content" => "",
  "tool_calls" => [
    %{"function" => %{
      "name" => "get_stock_price",
      "arguments" => %{"ticker" => "AAPL"}
    }}
  ]
}, ...}}

Your implementation must intercept these tool calls and execute a corresponding function in your codebase with the specified arguments. The next round-trip involves passing the function's result back to the model as a message with a :role of "tool".

iex> Ollama.chat(client, [
...>   model: "mistral-nemo",
...>   messages: [
...>     %{role: "user", content: "What is the current stock price for Apple?"},
...>     %{role: "assistant", content: "", tool_calls: [%{"function" => %{"name" => "get_stock_price", "arguments" => %{"ticker" => "AAPL"}}}]},
...>     %{role: "tool", content: "$217.96"},
...>   ],
...>   tools: [stock_price_tool],
...> ])
{:ok, %{"message" => %{
  "role" => "assistant",
  "content" => "The current stock price for Apple (AAPL) is approximately $217.96.",
}, ...}}

After receiving the function tool's value, the model will respond to the user's original prompt, incorporating the function result into its response.

Summary

Types

client()

Client struct

message()

Chat message

response()

Client response

tool()

Tool definition

Functions

chat(client, params)

Generates the next message in a chat using the specified model. Optionally streamable.

check_blob(client, digest)

Checks a blob exists in ollama by its digest or binary data.

completion(client, params)

Generates a completion for the given prompt using the specified model. Optionally streamable.

copy_model(client, params)

Creates a model with another name from an existing model.

create_blob(client, blob)

Creates a blob from its binary data.

create_model(client, params)

Creates a model using the given name and model file. Optionally streamable.

delete_model(client, params)

Deletes a model and its data.

embed(client, params)

Generate embeddings from a model for the given prompt.

embeddings(client, params) deprecated

Generate embeddings from a model for the given prompt.

init(opts \\ [])

Creates a new Ollama API client. Accepts either a base URL for the Ollama API, a keyword list of options passed to Req.new/1, or an existing Req.Request.t/0 struct.

list_models(client)

Lists all models that Ollama has available.

list_running(client)

Lists currently running models, their memory footprint, and process details.

preload(client, params)

Load a model into memory without generating a completion. Optionally specify a keep alive value (defaults to 5 minutes, set -1 to permanently keep alive).

pull_model(client, params)

Downloads a model from the ollama library. Optionally streamable.

push_model(client, params)

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first. Optionally streamable.

show_model(client, params)

Shows all information for a specific model.

unload(client, params)

Stops a running model and unloads it from memory.

Types

client()

@type client() :: %Ollama{req: Req.Request.t()}

Client struct

message()

@type message() ::
  {:role, term()}
  | {:content, binary()}
  | {:images, [binary()]}
  | {:tool_calls, [%{optional(atom() | binary()) => term()}]}

Chat message

A chat message is a map/0 with the following fields:

:role - Required. The role of the message, either system, user, assistant or tool.
:content (String.t/0) - Required. The content of the message.
:images (list of String.t/0) - (optional) List of Base64 encoded images (for multimodal models only).
:tool_calls - (optional) List of tools the model wants to use.

response()

@type response() ::
  {:ok, map() | boolean() | Enumerable.t() | Task.t()} | {:error, term()}

Client response

tool()

@type tool() :: {:type, term()} | {:function, map()}

Tool definition

A tool definition is a map/0 with the following fields:

:type - Required. Type of tool. (Currently only "function" supported).
:function (map/0) - Required.
- :name (String.t/0) - Required. The name of the function to be called.
- :description (String.t/0) - A description of what the function does.
- :parameters - Required. The parameters the functions accepts.

Functions

chat(client, params)

@spec chat(
  client(),
  keyword()
) :: response()

Generates the next message in a chat using the specified model. Optionally streamable.

Options

:model (String.t/0) - Required. The ollama model name.
:messages (list of map/0) - Required. List of messages - used to keep a chat memory.
:tools (list of map/0) - Tools for the model to use if supported (requires stream to be false)
:format - Set the expected format of the response (json or JSON schema map).
:stream - See section on streaming. The default value is false.
:keep_alive - How long to keep the model loaded.
:options - Additional advanced model parameters.

Message structure

Each message is a map with the following fields:

:role - Required. The role of the message, either system, user, assistant or tool.
:content (String.t/0) - Required. The content of the message.
:images (list of String.t/0) - (optional) List of Base64 encoded images (for multimodal models only).
:tool_calls - (optional) List of tools the model wants to use.

Tool definitions

:type - Required. Type of tool. (Currently only "function" supported).
:function (map/0) - Required.
- :name (String.t/0) - Required. The name of the function to be called.
- :description (String.t/0) - A description of what the function does.
- :parameters - Required. The parameters the functions accepts.

Examples

iex> messages = [
...>   %{role: "system", content: "You are a helpful assistant."},
...>   %{role: "user", content: "Why is the sky blue?"},
...>   %{role: "assistant", content: "Due to rayleigh scattering."},
...>   %{role: "user", content: "How is that different than mie scattering?"},
...> ]

iex> Ollama.chat(client, [
...>   model: "llama2",
...>   messages: messages,
...> ])
{:ok, %{"message" => %{
  "role" => "assistant",
  "content" => "Mie scattering affects all wavelengths similarly, while Rayleigh favors shorter ones."
}, ...}}

# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.chat(client, [
...>   model: "llama2",
...>   messages: messages,
...>   stream: true,
...> ])
{:ok, Ollama.Streaming{}}

check_blob(client, digest)

@spec check_blob(client(), Ollama.Blob.digest() | binary()) :: response()

Checks a blob exists in ollama by its digest or binary data.

Examples

iex> Ollama.check_blob(client, "sha256:fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e")
{:ok, true}

iex> Ollama.check_blob(client, "this should not exist")
{:ok, false}

completion(client, params)

@spec completion(
  client(),
  keyword()
) :: response()

Generates a completion for the given prompt using the specified model. Optionally streamable.

Options

:model (String.t/0) - Required. The ollama model name.
:prompt (String.t/0) - Required. Prompt to generate a response for.
:images (list of String.t/0) - A list of Base64 encoded images to be included with the prompt (for multimodal models only).
:system (String.t/0) - System prompt, overriding the model default.
:template (String.t/0) - Prompt template, overriding the model default.
:context - The context parameter returned from a previous completion/2 call (enabling short conversational memory).
:format - Set the expected format of the response (json or JSON schema map).
:raw (boolean/0) - Set true if specifying a fully templated prompt. (:template is ingored)
:stream - See section on streaming. The default value is false.
:keep_alive - How long to keep the model loaded.
:options - Additional advanced model parameters.

Examples

iex> Ollama.completion(client, [
...>   model: "llama2",
...>   prompt: "Why is the sky blue?",
...> ])
{:ok, %{"response": "The sky is blue because it is the color of the sky.", ...}}

# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.completion(client, [
...>   model: "llama2",
...>   prompt: "Why is the sky blue?",
...>   stream: true,
...> ])
{:ok, %Ollama.Streaming{}}

copy_model(client, params)

@spec copy_model(
  client(),
  keyword()
) :: response()

Creates a model with another name from an existing model.

Options

:source (String.t/0) - Required. Name of the model to copy from.
:destination (String.t/0) - Required. Name of the model to copy to.

Example

iex> Ollama.copy_model(client, [
...>   source: "llama2",
...>   destination: "llama2-backup"
...> ])
{:ok, true}

create_blob(client, blob)

@spec create_blob(client(), binary()) :: response()

Creates a blob from its binary data.

Example

iex> Ollama.create_blob(client, data)
{:ok, true}

create_model(client, params)

@spec create_model(
  client(),
  keyword()
) :: response()

Creates a model using the given name and model file. Optionally streamable.

Any dependent blobs reference in the modelfile, such as FROM and ADAPTER instructions, must exist first. See check_blob/2 and create_blob/2.

Options

:name (String.t/0) - Required. Name of the model to create.
:modelfile (String.t/0) - Required. Contents of the Modelfile.
:quantize (String.t/0) - Quantize f16 and f32 models when importing them.
:stream - See section on streaming. The default value is false.

Example

iex> modelfile = "FROM llama2\nSYSTEM \"You are mario from Super Mario Bros.\""
iex> Ollama.create_model(client, [
...>   name: "mario",
...>   modelfile: modelfile,
...>   stream: true,
...> ])
{:ok, Ollama.Streaming{}}

delete_model(client, params)

@spec delete_model(
  client(),
  keyword()
) :: response()

Deletes a model and its data.

Options

:source (String.t/0) - Required. Name of the model to copy from.
:destination (String.t/0) - Required. Name of the model to copy to.

Example

iex> Ollama.delete_model(client, name: "llama2")
{:ok, true}

embed(client, params)

@spec embed(
  client(),
  keyword()
) :: response()

Generate embeddings from a model for the given prompt.

Options

:model (String.t/0) - Required. The name of the model used to generate the embeddings.
:input - Required. Text or list of text to generate embeddings for.
:truncate (boolean/0) - Truncates the end of each input to fit within context length.
:keep_alive - How long to keep the model loaded.
:options - Additional advanced model parameters.

Example

iex> Ollama.embed(client, [
...>   model: "nomic-embed-text",
...>   input: ["Why is the sky blue?", "Why is the grass green?"],
...> ])
{:ok, %{"embedding" => [
  [ 0.009724553, 0.04449892, -0.14063916, 0.0013168337, 0.032128844,
    0.10730086, -0.008447222, 0.010106917, 5.2289694e-4, -0.03554127, ...],
  [ 0.028196355, 0.043162502, -0.18592504, 0.035034444, 0.055619627,
    0.12082449, -0.0090096295, 0.047170386, -0.032078084, 0.0047163847, ...]
]}}

embeddings(client, params)

This function is deprecated. Superseded by embed/2.

@spec embeddings(
  client(),
  keyword()
) :: response()

Generate embeddings from a model for the given prompt.

Options

:model (String.t/0) - Required. The name of the model used to generate the embeddings.
:prompt (String.t/0) - Required. The prompt used to generate the embedding.
:keep_alive - How long to keep the model loaded.
:options - Additional advanced model parameters.

Example

iex> Ollama.embeddings(client, [
...>   model: "llama2",
...>   prompt: "Here is an article about llamas..."
...> ])
{:ok, %{"embedding" => [
  0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,
  0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281
]}}

init(opts \\ [])

@spec init(Req.url() | keyword() | Req.Request.t()) :: client()

Creates a new Ollama API client. Accepts either a base URL for the Ollama API, a keyword list of options passed to Req.new/1, or an existing Req.Request.t/0 struct.

If no arguments are given, the client is initiated with the default options:

@default_req_opts [
  base_url: "http://localhost:11434/api",
  receive_timeout: 60_000,
]

Examples

iex> client = Ollama.init("https://ollama.service.ai:11434/api")
%Ollama{}

list_models(client)

@spec list_models(client()) :: response()

Lists all models that Ollama has available.

Example

iex> Ollama.list_models(client)
{:ok, %{"models" => [
  %{"name" => "codellama:13b", ...},
  %{"name" => "llama2:latest", ...},
]}}

list_running(client)

@spec list_running(client()) :: response()

Lists currently running models, their memory footprint, and process details.

Example

iex> Ollama.list_running(client)
{:ok, %{"models" => [
  %{"name" => "nomic-embed-text:latest", ...},
]}}

preload(client, params)

@spec preload(
  client(),
  keyword()
) :: response()

Load a model into memory without generating a completion. Optionally specify a keep alive value (defaults to 5 minutes, set -1 to permanently keep alive).

Options

:model (String.t/0) - Required. Name of the model to load.
:keep_alive - How long to keep the model loaded.

Example

iex> Ollama.preload(client, model: "llama3.1", timeout: 3_600_000)
true

pull_model(client, params)

@spec pull_model(
  client(),
  keyword()
) :: response()

Downloads a model from the ollama library. Optionally streamable.

Options

:name (String.t/0) - Required. Name of the model to pull.
:stream - See section on streaming. The default value is false.

Example

iex> Ollama.pull_model(client, name: "llama2")
{:ok, %{"status" => "success"}}

# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.pull_model(client, name: "llama2", stream: true)
{:ok, %Ollama.Streaming{}}

push_model(client, params)

@spec push_model(
  client(),
  keyword()
) :: response()

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first. Optionally streamable.

Options

:name (String.t/0) - Required. Name of the model to pull.
:stream - See section on streaming. The default value is false.

Example

iex> Ollama.push_model(client, name: "mattw/pygmalion:latest")
{:ok, %{"status" => "success"}}

# Passing true to the :stream option initiates an async streaming request.
iex> Ollama.push_model(client, name: "mattw/pygmalion:latest", stream: true)
{:ok, %Ollama.Streaming{}}

show_model(client, params)

@spec show_model(
  client(),
  keyword()
) :: response()

Shows all information for a specific model.

Options

:name (String.t/0) - Required. Name of the model to show.

Example

iex> Ollama.show_model(client, name: "llama2")
{:ok, %{
  "details" => %{
    "families" => ["llama", "clip"],
    "family" => "llama",
    "format" => "gguf",
    "parameter_size" => "7B",
    "quantization_level" => "Q4_0"
  },
  "modelfile" => "...",
  "parameters" => "...",
  "template" => "..."
}}

unload(client, params)

@spec unload(
  client(),
  keyword()
) :: response()

Stops a running model and unloads it from memory.

Options

:model (String.t/0) - Required. Name of the model to unload.

Example

iex> Ollama.preload(client, model: "llama3.1")
true