Run any of the 45,000+ GGUF models on HuggingFace Hub directly through Ollama.
The Ollixir.HuggingFace module provides discovery, selection, and convenience
functions for working with HuggingFace models.
Installation
The HuggingFace integration requires the hf_hub package. This is an optional
dependency - you only need to add it if you want to use the Ollixir.HuggingFace
module for model discovery and auto-selection.
Add hf_hub to your dependencies in mix.exs:
defp deps do
[
{:ollixir, "~> 0.1.1"},
{:hf_hub, "~> 0.1.3"} # Required for Ollixir.HuggingFace module
]
endThen run:
mix deps.get
Note
If you don't need the discovery features and already know the repository and
quantization you want, you can skip hf_hub entirely and use Ollama directly
with the hf.co/{repo}:{quantization} model format. See Direct Ollama Usage.
Overview
Ollama natively supports HuggingFace models using the hf.co/{repo}:{quantization}
format. This module adds:
| Feature | Function | Description |
|---|---|---|
| Discovery | list_gguf_files/2 | Find available GGUF files and quantizations |
| Selection | auto_select/2 | Auto-pick best quantization for your hardware |
| Convenience | chat/4, pull/3 | Wrappers that build correct model references |
| Metadata | model_info/2 | Get downloads, tags, and other model info |
Quick Start
client = Ollixir.init()
# Option 1: Direct (if you know the repo and quantization)
{:ok, response} = Ollixir.chat(client,
model: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M",
messages: [%{role: "user", content: "Hello!"}]
)
# Option 2: With discovery and auto-selection
alias Ollixir.HuggingFace
{:ok, model_ref, info} = HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")
# => {:ok, "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M", %{size_gb: 0.75, ...}}
{:ok, response} = HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
[%{role: "user", content: "Hello!"}],
quantization: "Q4_K_M"
)Discovering Available Models
Listing GGUF Files
Find all GGUF files in a HuggingFace repository:
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
# Returns a list sorted by size:
# [
# %{filename: "Llama-3.2-1B-Instruct-IQ3_M.gguf", size_gb: 0.61, quantization: "IQ3_M", ...},
# %{filename: "Llama-3.2-1B-Instruct-Q4_K_M.gguf", size_gb: 0.75, quantization: "Q4_K_M", ...},
# %{filename: "Llama-3.2-1B-Instruct-Q8_0.gguf", size_gb: 1.23, quantization: "Q8_0", ...},
# ...
# ]Each entry contains:
filename- Full filename in the repositorysize_bytes- File size in bytessize_gb- File size in gigabytesquantization- Extracted quantization type (e.g., "Q4_K_M")ollama_tag- Tag to use with Ollama (uppercase quantization)
Getting Model Metadata
{:ok, info} = Ollixir.HuggingFace.model_info("bartowski/Llama-3.2-1B-Instruct-GGUF")
IO.puts("Downloads: #{info.downloads}")
IO.puts("Tags: #{Enum.join(info.tags, ", ")}")Quantization Selection
Understanding Quantization Types
| Type | Size | Quality | Use Case |
|---|---|---|---|
| Q4_K_M | ~4 bits | Good | Best balance for most users |
| Q4_K_S | ~4 bits | Good | Slightly smaller than Q4_K_M |
| Q5_K_M | ~5 bits | Better | Higher quality, moderate size |
| Q6_K | ~6 bits | High | Near-original quality |
| Q8_0 | 8 bits | Very High | Minimal quality loss |
| IQ3_M | ~3 bits | Lower | For constrained environments |
| IQ4_XS | ~4 bits | Good | Smallest with decent quality |
| F16 | 16 bits | Original | Full precision, largest |
Auto-Selection
Let the library pick the best available quantization:
# Default: picks Q4_K_M if available, then next best
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")
# With size constraint (e.g., only 1GB available)
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
max_size_gb: 1.0
)
# Force specific quantization
{:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
quantization: "Q8_0"
)Manual Selection
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
# Find best quantization from available options
best = Ollixir.HuggingFace.best_quantization(ggufs)
# => "Q4_K_M"
# With size constraint
best = Ollixir.HuggingFace.best_quantization(ggufs, max_size_gb: 0.7)
# => "IQ3_M" (if Q4_K_M is too large)
# Custom preference order
best = Ollixir.HuggingFace.best_quantization(ggufs, preference: ["Q8_0", "Q6_K", "Q5_K_M"])Building Model References
Programmatic Reference Building
# Without quantization (Ollama picks default, usually Q4_K_M)
ref = Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF")
# => "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF"
# With specific quantization
ref = Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q8_0")
# => "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q8_0"Parsing Model References
{:ok, parsed} = Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
# => %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M"}
# Check if a model is from HuggingFace
Ollixir.HuggingFace.hf_model?("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
# => true
Ollixir.HuggingFace.hf_model?("llama3.2")
# => falseUsing HuggingFace Models
Pulling Models
client = Ollixir.init()
# Pull with specific quantization
{:ok, _} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
quantization: "Q4_K_M"
)
# Pull with streaming progress
{:ok, stream} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
quantization: "Q4_K_M",
stream: true
)
stream
|> Stream.each(fn chunk ->
case chunk do
%{"completed" => completed, "total" => total} when total > 0 ->
IO.write("\rProgress: #{Float.round(completed / total * 100, 1)}%")
%{"status" => status} ->
IO.puts(status)
_ -> :ok
end
end)
|> Stream.run()Chat
{:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
[%{role: "user", content: "What is the capital of France?"}],
quantization: "Q4_K_M"
)
IO.puts(response["message"]["content"])
# With streaming
{:ok, stream} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
[%{role: "user", content: "Tell me a story"}],
quantization: "Q4_K_M",
stream: true
)
Enum.each(stream, fn chunk ->
if content = get_in(chunk, ["message", "content"]), do: IO.write(content)
end)Generate (Completion)
{:ok, response} = Ollixir.HuggingFace.generate(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
"The quick brown fox",
quantization: "Q4_K_M"
)
IO.puts(response["response"])Embeddings
# Use an embedding model from HuggingFace
{:ok, response} = Ollixir.HuggingFace.embed(client, "nomic-ai/nomic-embed-text-v1.5-GGUF",
"Hello world",
quantization: "Q4_K_M"
)
embeddings = response["embeddings"]Private Repositories
To access private HuggingFace repositories, set your HuggingFace token:
export HF_TOKEN="hf_..."
Or configure in your application:
# config/config.exs
config :hf_hub,
token: System.get_env("HF_TOKEN")Then use normally:
{:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("your-org/private-model")Popular GGUF Repositories
Community members maintain high-quality GGUF quantizations:
| Maintainer | Repository Pattern | Known For |
|---|---|---|
| bartowski | bartowski/{model}-GGUF | Comprehensive quant coverage |
| TheBloke | TheBloke/{model}-GGUF | Wide model variety |
| MaziyarPanahi | MaziyarPanahi/{model}-GGUF | Latest models |
Browse all GGUF models: https://huggingface.co/models?library=gguf
Complete Example
defmodule MyApp.HuggingFaceChat do
alias Ollixir.HuggingFace
def run do
client = Ollixir.init()
repo = "bartowski/Llama-3.2-1B-Instruct-GGUF"
# 1. Discover available quantizations
IO.puts("Discovering available models...")
{:ok, ggufs} = HuggingFace.list_gguf_files(repo)
IO.puts("\nAvailable quantizations:")
for gguf <- ggufs do
IO.puts(" #{gguf.quantization}: #{gguf.size_gb} GB")
end
# 2. Auto-select best option
{:ok, model_ref, selected} = HuggingFace.auto_select(repo, max_size_gb: 1.0)
IO.puts("\nSelected: #{selected.quantization} (#{selected.size_gb} GB)")
# 3. Pull the model
IO.puts("\nPulling model...")
{:ok, _} = HuggingFace.pull(client, repo, quantization: selected.quantization)
# 4. Chat
IO.puts("\nChatting...")
{:ok, response} = HuggingFace.chat(client, repo,
[%{role: "user", content: "What is 2 + 2?"}],
quantization: selected.quantization
)
IO.puts("Response: #{response["message"]["content"]}")
end
endDirect Ollixir Usage
If you already know the repository and quantization, you can skip this module entirely and use Ollixir directly:
client = Ollixir.init()
# Pull
Ollixir.pull_model(client, name: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
# Chat
Ollixir.chat(client,
model: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M",
messages: [%{role: "user", content: "Hello!"}]
)
# Generate
Ollixir.generate(client,
model: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M",
prompt: "Hello"
)The Ollixir.HuggingFace module is most useful when you need to:
- Discover what's available in a repository
- Auto-select based on hardware constraints
- Get model metadata
- Access private repositories with authentication