HuggingFace Hub integration for Ollixir.
Optional Dependency
This module requires the hf_hub package. Add it to your dependencies:
{:hf_hub, "~> 0.1.3"}The module will not be available if hf_hub is not installed.
This module provides seamless integration with HuggingFace Hub, enabling you to:
- Discover GGUF model files in HuggingFace repositories
- Auto-select optimal quantization based on preferences
- Build Ollama-compatible model references
- Pull and run HuggingFace models directly through Ollama
Overview
Ollama natively supports running GGUF models from HuggingFace Hub using the
hf.co/{username}/{repository}:{quantization} model reference format. This module
adds discovery and convenience features on top of that capability.
Quick Start
# Initialize Ollixir client
client = Ollixir.init()
# Discover available GGUF files
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
# Auto-select best quantization
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")
# Pull and chat
{:ok, _} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M")
{:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
[%{role: "user", content: "Hello!"}],
quantization: "Q4_K_M"
)Direct Usage (No Discovery)
If you already know the repository and quantization you want, you can skip this module entirely and use Ollixir directly:
Ollixir.chat(client,
model: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M",
messages: [%{role: "user", content: "Hello!"}]
)Quantization Selection
The module uses a preference order optimized for quality/size balance:
- Q4_K_M, Q4_K_S (best balance for most users)
- Q5_K_M, Q5_K_S (higher quality, larger size)
- Q6_K, Q8_0 (even higher quality)
- IQ4_XS, IQ3_M (smaller, for constrained environments)
- F16, BF16 (full precision, largest)
You can also specify your own preference or filter by maximum size.
Summary
Types
Information about a GGUF file in a HuggingFace repository.
Options for HuggingFace operations.
Functions
Auto-selects the best model from a HuggingFace repository.
Finds the best available quantization from a list of GGUF files.
Chats with a HuggingFace model through Ollama.
Generates embeddings from a HuggingFace model through Ollama.
Extracts the quantization type from a GGUF filename.
Generates a completion from a HuggingFace model through Ollama.
Checks if a model reference is a HuggingFace model.
Lists all GGUF files in a HuggingFace repository.
Gets model information from HuggingFace Hub.
Builds an Ollama model reference from a HuggingFace repository ID.
Parses an Ollama HuggingFace model reference into its components.
Pulls a HuggingFace model through Ollama.
Returns the default quantization preference order.
Types
@type gguf_info() :: %{ filename: String.t(), size_bytes: non_neg_integer(), size_gb: float(), quantization: String.t(), ollama_tag: String.t() }
Information about a GGUF file in a HuggingFace repository.
Options for HuggingFace operations.
Functions
Auto-selects the best model from a HuggingFace repository.
Discovers available GGUF files and selects the optimal quantization based on the preference order.
Parameters
repo_id- HuggingFace repository IDopts- Options::quantization- Force a specific quantization instead of auto-selecting:max_size_gb- Maximum file size in GB:revision- Git revision (default: "main"):token- HuggingFace API token
Returns
A tuple of {:ok, model_ref, gguf_info} where:
model_ref- The full Ollama model reference (e.g., "hf.co/repo:Q4_K_M")gguf_info- The selected GGUF file info map
Examples
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")
# => {:ok, "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M", %{quantization: "Q4_K_M", ...}}
# With size constraint
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
max_size_gb: 0.7
)
# Force specific quantization
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
quantization: "Q8_0"
)
Finds the best available quantization from a list of GGUF files.
Uses the default preference order to select the highest-priority quantization that is available in the given list.
Parameters
gguf_files- List of GGUF info maps fromlist_gguf_files/2opts- Options::preference- Custom preference list (default:quant_preference/0):max_size_gb- Maximum file size in GB (filters out larger files)
Examples
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
Ollixir.HuggingFace.best_quantization(ggufs)
# => "Q4_K_M"
Ollixir.HuggingFace.best_quantization(ggufs, max_size_gb: 1.0)
# => "Q4_K_M" (if under 1GB) or next smallest
Chats with a HuggingFace model through Ollama.
This is a convenience wrapper around Ollixir.chat/2 that builds
the correct model reference format.
Parameters
client- Ollixir client fromOllixir.init/1repo_id- HuggingFace repository IDmessages- List of message maps with:roleand:contentopts- Options::quantization- Quantization tag (recommended):stream- Stream responses (default: false)- Other options passed to
Ollixir.chat/2
Examples
client = Ollixir.init()
{:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
[%{role: "user", content: "Hello!"}],
quantization: "Q4_K_M"
)
IO.puts(response["message"]["content"])
# With streaming
{:ok, stream} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
[%{role: "user", content: "Tell me a story"}],
quantization: "Q4_K_M",
stream: true
)
Enum.each(stream, fn chunk ->
if content = get_in(chunk, ["message", "content"]), do: IO.write(content)
end)
@spec embed(Ollixir.client(), String.t(), String.t() | [String.t()], keyword()) :: {:ok, term()} | {:error, term()}
Generates embeddings from a HuggingFace model through Ollama.
This is a convenience wrapper around Ollixir.embed/2 that builds
the correct model reference format.
Parameters
client- Ollixir client fromOllixir.init/1repo_id- HuggingFace repository ID (must be an embedding model)input- Text or list of texts to embedopts- Options::quantization- Quantization tag (recommended)- Other options passed to
Ollixir.embed/2
Examples
client = Ollixir.init()
{:ok, response} = Ollixir.HuggingFace.embed(client, "nomic-ai/nomic-embed-text-v1.5-GGUF",
"Hello world",
quantization: "Q4_K_M"
)
embeddings = response["embeddings"]
Extracts the quantization type from a GGUF filename.
Parses common quantization patterns from filenames like:
Llama-3.2-1B-Instruct-Q4_K_M.gguf-> "Q4_K_M"model-IQ3_M.gguf-> "IQ3_M"model-Q6_K.gguf-> "Q6_K"model-f16.gguf-> "F16"
Examples
iex> Ollixir.HuggingFace.extract_quantization("Llama-3.2-1B-Instruct-Q4_K_M.gguf")
"Q4_K_M"
iex> Ollixir.HuggingFace.extract_quantization("model-IQ3_M.gguf")
"IQ3_M"
iex> Ollixir.HuggingFace.extract_quantization("model-Q6_K.gguf")
"Q6_K"
iex> Ollixir.HuggingFace.extract_quantization("unknown-format.gguf")
"unknown"
@spec generate(Ollixir.client(), String.t(), String.t(), keyword()) :: {:ok, term()} | {:error, term()}
Generates a completion from a HuggingFace model through Ollama.
This is a convenience wrapper around Ollixir.generate/2 (or Ollixir.completion/2)
that builds the correct model reference format.
Parameters
client- Ollixir client fromOllixir.init/1repo_id- HuggingFace repository IDprompt- The prompt stringopts- Options::quantization- Quantization tag (recommended):stream- Stream responses (default: false)- Other options passed to
Ollixir.generate/2
Examples
client = Ollixir.init()
{:ok, response} = Ollixir.HuggingFace.generate(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
"Once upon a time",
quantization: "Q4_K_M"
)
IO.puts(response["response"])
Checks if a model reference is a HuggingFace model.
Examples
iex> Ollixir.HuggingFace.hf_model?("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
true
iex> Ollixir.HuggingFace.hf_model?("llama3.2")
false
Lists all GGUF files in a HuggingFace repository.
Uses the HuggingFace Hub API to discover available GGUF model files, extracting quantization type and file size for each.
Parameters
repo_id- HuggingFace repository IDopts- Options passed toHfHub.Api.list_repo_tree/2::revision- Git revision (default: "main"):token- HuggingFace API token for private repos
Returns
A list of maps containing:
:filename- Full filename (e.g., "Llama-3.2-1B-Instruct-Q4_K_M.gguf"):size_bytes- File size in bytes:size_gb- File size in gigabytes (rounded to 2 decimal places):quantization- Extracted quantization type (e.g., "Q4_K_M"):ollama_tag- The tag to use with Ollama (uppercase quantization)
Examples
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
# => [
# %{filename: "Llama-3.2-1B-Instruct-Q4_K_M.gguf", size_gb: 0.75, quantization: "Q4_K_M", ...},
# %{filename: "Llama-3.2-1B-Instruct-Q8_0.gguf", size_gb: 1.23, quantization: "Q8_0", ...},
# ...
# ]
Gets model information from HuggingFace Hub.
Returns metadata about the model including downloads, tags, and file list.
Parameters
repo_id- HuggingFace repository IDopts- Options passed toHfHub.Api.model_info/2
Examples
{:ok, info} = Ollixir.HuggingFace.model_info("bartowski/Llama-3.2-1B-Instruct-GGUF")
IO.puts("Downloads: #{info.downloads}")
IO.puts("Tags: #{Enum.join(info.tags, ", ")}")
Builds an Ollama model reference from a HuggingFace repository ID.
Ollama natively supports HuggingFace models using the format:
hf.co/{username}/{repository}:{quantization}
Parameters
repo_id- HuggingFace repository ID (e.g., "bartowski/Llama-3.2-1B-Instruct-GGUF")opts- Options::quantization- Quantization tag (e.g., "Q4_K_M", "IQ3_M")
Examples
iex> Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF")
"hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF"
iex> Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q8_0")
"hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q8_0"
@spec parse_model_ref(String.t()) :: {:ok, %{repo_id: String.t(), quantization: String.t() | nil}} | {:error, :not_hf_model}
Parses an Ollama HuggingFace model reference into its components.
Examples
iex> Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
{:ok, %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M"}}
iex> Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF")
{:ok, %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: nil}}
iex> Ollixir.HuggingFace.parse_model_ref("llama3.2")
{:error, :not_hf_model}
@spec pull(Ollixir.client(), String.t(), keyword()) :: {:ok, term()} | {:error, term()}
Pulls a HuggingFace model through Ollama.
This is a convenience wrapper around Ollixir.pull_model/2 that builds
the correct model reference format.
Parameters
client- Ollixir client fromOllixir.init/1repo_id- HuggingFace repository IDopts- Options::quantization- Quantization tag (recommended):stream- Stream progress updates (default: false)- Other options passed to
Ollixir.pull_model/2
Examples
client = Ollixir.init()
# Pull specific quantization
{:ok, response} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
quantization: "Q4_K_M"
)
# Pull with streaming progress
{:ok, stream} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
quantization: "Q4_K_M",
stream: true
)
Enum.each(stream, &IO.inspect/1)
@spec quant_preference() :: [String.t()]
Returns the default quantization preference order.
This is the order used by best_quantization/1 and auto_select/2 when
choosing the optimal quantization for a model.
Examples
Ollixir.HuggingFace.quant_preference()
# => ["Q4_K_M", "Q4_K_S", "Q4_K", "Q4_K_L", "Q5_K_M", ...]