Ollixir.HuggingFace (Ollixir v0.1.1)

Copy Markdown View Source

HuggingFace Hub integration for Ollixir.

Optional Dependency

This module requires the hf_hub package. Add it to your dependencies:

{:hf_hub, "~> 0.1.3"}

The module will not be available if hf_hub is not installed.

This module provides seamless integration with HuggingFace Hub, enabling you to:

  • Discover GGUF model files in HuggingFace repositories
  • Auto-select optimal quantization based on preferences
  • Build Ollama-compatible model references
  • Pull and run HuggingFace models directly through Ollama

Overview

Ollama natively supports running GGUF models from HuggingFace Hub using the hf.co/{username}/{repository}:{quantization} model reference format. This module adds discovery and convenience features on top of that capability.

Quick Start

# Initialize Ollixir client
client = Ollixir.init()

# Discover available GGUF files
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")

# Auto-select best quantization
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")

# Pull and chat
{:ok, _} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M")
{:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
  [%{role: "user", content: "Hello!"}],
  quantization: "Q4_K_M"
)

Direct Usage (No Discovery)

If you already know the repository and quantization you want, you can skip this module entirely and use Ollixir directly:

Ollixir.chat(client,
  model: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M",
  messages: [%{role: "user", content: "Hello!"}]
)

Quantization Selection

The module uses a preference order optimized for quality/size balance:

  1. Q4_K_M, Q4_K_S (best balance for most users)
  2. Q5_K_M, Q5_K_S (higher quality, larger size)
  3. Q6_K, Q8_0 (even higher quality)
  4. IQ4_XS, IQ3_M (smaller, for constrained environments)
  5. F16, BF16 (full precision, largest)

You can also specify your own preference or filter by maximum size.

Summary

Types

Information about a GGUF file in a HuggingFace repository.

Options for HuggingFace operations.

Functions

Auto-selects the best model from a HuggingFace repository.

Finds the best available quantization from a list of GGUF files.

Chats with a HuggingFace model through Ollama.

Generates embeddings from a HuggingFace model through Ollama.

Extracts the quantization type from a GGUF filename.

Generates a completion from a HuggingFace model through Ollama.

Checks if a model reference is a HuggingFace model.

Lists all GGUF files in a HuggingFace repository.

Gets model information from HuggingFace Hub.

Builds an Ollama model reference from a HuggingFace repository ID.

Parses an Ollama HuggingFace model reference into its components.

Pulls a HuggingFace model through Ollama.

Returns the default quantization preference order.

Types

gguf_info()

@type gguf_info() :: %{
  filename: String.t(),
  size_bytes: non_neg_integer(),
  size_gb: float(),
  quantization: String.t(),
  ollama_tag: String.t()
}

Information about a GGUF file in a HuggingFace repository.

hf_opts()

@type hf_opts() :: [quantization: String.t(), revision: String.t(), token: String.t()]

Options for HuggingFace operations.

Functions

auto_select(repo_id, opts \\ [])

@spec auto_select(
  String.t(),
  keyword()
) :: {:ok, String.t(), gguf_info()} | {:error, term()}

Auto-selects the best model from a HuggingFace repository.

Discovers available GGUF files and selects the optimal quantization based on the preference order.

Parameters

  • repo_id - HuggingFace repository ID
  • opts - Options:
    • :quantization - Force a specific quantization instead of auto-selecting
    • :max_size_gb - Maximum file size in GB
    • :revision - Git revision (default: "main")
    • :token - HuggingFace API token

Returns

A tuple of {:ok, model_ref, gguf_info} where:

  • model_ref - The full Ollama model reference (e.g., "hf.co/repo:Q4_K_M")
  • gguf_info - The selected GGUF file info map

Examples

{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")
# => {:ok, "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M", %{quantization: "Q4_K_M", ...}}

# With size constraint
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
  max_size_gb: 0.7
)

# Force specific quantization
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
  quantization: "Q8_0"
)

best_quantization(gguf_files, opts \\ [])

@spec best_quantization(
  [gguf_info()],
  keyword()
) :: String.t() | nil

Finds the best available quantization from a list of GGUF files.

Uses the default preference order to select the highest-priority quantization that is available in the given list.

Parameters

  • gguf_files - List of GGUF info maps from list_gguf_files/2
  • opts - Options:
    • :preference - Custom preference list (default: quant_preference/0)
    • :max_size_gb - Maximum file size in GB (filters out larger files)

Examples

{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")

Ollixir.HuggingFace.best_quantization(ggufs)
# => "Q4_K_M"

Ollixir.HuggingFace.best_quantization(ggufs, max_size_gb: 1.0)
# => "Q4_K_M" (if under 1GB) or next smallest

chat(client, repo_id, messages, opts \\ [])

@spec chat(Ollixir.client(), String.t(), [map()], keyword()) ::
  {:ok, term()} | {:error, term()}

Chats with a HuggingFace model through Ollama.

This is a convenience wrapper around Ollixir.chat/2 that builds the correct model reference format.

Parameters

  • client - Ollixir client from Ollixir.init/1
  • repo_id - HuggingFace repository ID
  • messages - List of message maps with :role and :content
  • opts - Options:
    • :quantization - Quantization tag (recommended)
    • :stream - Stream responses (default: false)
    • Other options passed to Ollixir.chat/2

Examples

client = Ollixir.init()

{:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
  [%{role: "user", content: "Hello!"}],
  quantization: "Q4_K_M"
)

IO.puts(response["message"]["content"])

# With streaming
{:ok, stream} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
  [%{role: "user", content: "Tell me a story"}],
  quantization: "Q4_K_M",
  stream: true
)
Enum.each(stream, fn chunk ->
  if content = get_in(chunk, ["message", "content"]), do: IO.write(content)
end)

embed(client, repo_id, input, opts \\ [])

@spec embed(Ollixir.client(), String.t(), String.t() | [String.t()], keyword()) ::
  {:ok, term()} | {:error, term()}

Generates embeddings from a HuggingFace model through Ollama.

This is a convenience wrapper around Ollixir.embed/2 that builds the correct model reference format.

Parameters

  • client - Ollixir client from Ollixir.init/1
  • repo_id - HuggingFace repository ID (must be an embedding model)
  • input - Text or list of texts to embed
  • opts - Options:
    • :quantization - Quantization tag (recommended)
    • Other options passed to Ollixir.embed/2

Examples

client = Ollixir.init()

{:ok, response} = Ollixir.HuggingFace.embed(client, "nomic-ai/nomic-embed-text-v1.5-GGUF",
  "Hello world",
  quantization: "Q4_K_M"
)

embeddings = response["embeddings"]

extract_quantization(filename)

@spec extract_quantization(String.t()) :: String.t()

Extracts the quantization type from a GGUF filename.

Parses common quantization patterns from filenames like:

  • Llama-3.2-1B-Instruct-Q4_K_M.gguf -> "Q4_K_M"
  • model-IQ3_M.gguf -> "IQ3_M"
  • model-Q6_K.gguf -> "Q6_K"
  • model-f16.gguf -> "F16"

Examples

iex> Ollixir.HuggingFace.extract_quantization("Llama-3.2-1B-Instruct-Q4_K_M.gguf")
"Q4_K_M"

iex> Ollixir.HuggingFace.extract_quantization("model-IQ3_M.gguf")
"IQ3_M"

iex> Ollixir.HuggingFace.extract_quantization("model-Q6_K.gguf")
"Q6_K"

iex> Ollixir.HuggingFace.extract_quantization("unknown-format.gguf")
"unknown"

generate(client, repo_id, prompt, opts \\ [])

@spec generate(Ollixir.client(), String.t(), String.t(), keyword()) ::
  {:ok, term()} | {:error, term()}

Generates a completion from a HuggingFace model through Ollama.

This is a convenience wrapper around Ollixir.generate/2 (or Ollixir.completion/2) that builds the correct model reference format.

Parameters

  • client - Ollixir client from Ollixir.init/1
  • repo_id - HuggingFace repository ID
  • prompt - The prompt string
  • opts - Options:
    • :quantization - Quantization tag (recommended)
    • :stream - Stream responses (default: false)
    • Other options passed to Ollixir.generate/2

Examples

client = Ollixir.init()

{:ok, response} = Ollixir.HuggingFace.generate(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
  "Once upon a time",
  quantization: "Q4_K_M"
)

IO.puts(response["response"])

hf_model?(model_ref)

@spec hf_model?(String.t()) :: boolean()

Checks if a model reference is a HuggingFace model.

Examples

iex> Ollixir.HuggingFace.hf_model?("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
true

iex> Ollixir.HuggingFace.hf_model?("llama3.2")
false

list_gguf_files(repo_id, opts \\ [])

@spec list_gguf_files(
  String.t(),
  keyword()
) :: {:ok, [gguf_info()]} | {:error, term()}

Lists all GGUF files in a HuggingFace repository.

Uses the HuggingFace Hub API to discover available GGUF model files, extracting quantization type and file size for each.

Parameters

  • repo_id - HuggingFace repository ID
  • opts - Options passed to HfHub.Api.list_repo_tree/2:
    • :revision - Git revision (default: "main")
    • :token - HuggingFace API token for private repos

Returns

A list of maps containing:

  • :filename - Full filename (e.g., "Llama-3.2-1B-Instruct-Q4_K_M.gguf")
  • :size_bytes - File size in bytes
  • :size_gb - File size in gigabytes (rounded to 2 decimal places)
  • :quantization - Extracted quantization type (e.g., "Q4_K_M")
  • :ollama_tag - The tag to use with Ollama (uppercase quantization)

Examples

{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
# => [
#   %{filename: "Llama-3.2-1B-Instruct-Q4_K_M.gguf", size_gb: 0.75, quantization: "Q4_K_M", ...},
#   %{filename: "Llama-3.2-1B-Instruct-Q8_0.gguf", size_gb: 1.23, quantization: "Q8_0", ...},
#   ...
# ]

model_info(repo_id, opts \\ [])

@spec model_info(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, term()}

Gets model information from HuggingFace Hub.

Returns metadata about the model including downloads, tags, and file list.

Parameters

Examples

{:ok, info} = Ollixir.HuggingFace.model_info("bartowski/Llama-3.2-1B-Instruct-GGUF")
IO.puts("Downloads: #{info.downloads}")
IO.puts("Tags: #{Enum.join(info.tags, ", ")}")

model_ref(repo_id, opts \\ [])

@spec model_ref(
  String.t(),
  keyword()
) :: String.t()

Builds an Ollama model reference from a HuggingFace repository ID.

Ollama natively supports HuggingFace models using the format: hf.co/{username}/{repository}:{quantization}

Parameters

  • repo_id - HuggingFace repository ID (e.g., "bartowski/Llama-3.2-1B-Instruct-GGUF")
  • opts - Options:
    • :quantization - Quantization tag (e.g., "Q4_K_M", "IQ3_M")

Examples

iex> Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF")
"hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF"

iex> Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q8_0")
"hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q8_0"

parse_model_ref(model_ref)

@spec parse_model_ref(String.t()) ::
  {:ok, %{repo_id: String.t(), quantization: String.t() | nil}}
  | {:error, :not_hf_model}

Parses an Ollama HuggingFace model reference into its components.

Examples

iex> Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
{:ok, %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M"}}

iex> Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF")
{:ok, %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: nil}}

iex> Ollixir.HuggingFace.parse_model_ref("llama3.2")
{:error, :not_hf_model}

pull(client, repo_id, opts \\ [])

@spec pull(Ollixir.client(), String.t(), keyword()) ::
  {:ok, term()} | {:error, term()}

Pulls a HuggingFace model through Ollama.

This is a convenience wrapper around Ollixir.pull_model/2 that builds the correct model reference format.

Parameters

  • client - Ollixir client from Ollixir.init/1
  • repo_id - HuggingFace repository ID
  • opts - Options:
    • :quantization - Quantization tag (recommended)
    • :stream - Stream progress updates (default: false)
    • Other options passed to Ollixir.pull_model/2

Examples

client = Ollixir.init()

# Pull specific quantization
{:ok, response} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
  quantization: "Q4_K_M"
)

# Pull with streaming progress
{:ok, stream} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
  quantization: "Q4_K_M",
  stream: true
)
Enum.each(stream, &IO.inspect/1)

quant_preference()

@spec quant_preference() :: [String.t()]

Returns the default quantization preference order.

This is the order used by best_quantization/1 and auto_select/2 when choosing the optimal quantization for a model.

Examples

Ollixir.HuggingFace.quant_preference()
# => ["Q4_K_M", "Q4_K_S", "Q4_K", "Q4_K_L", "Q5_K_M", ...]