# `Ollixir.HuggingFace`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L4)

  HuggingFace Hub integration for Ollixir.

  > #### Optional Dependency {: .info}
  >
  > This module requires the `hf_hub` package. Add it to your dependencies:
  >
  >     {:hf_hub, "~> 0.1.3"}
  >
  > The module will not be available if `hf_hub` is not installed.

  This module provides seamless integration with HuggingFace Hub, enabling you to:

- Discover GGUF model files in HuggingFace repositories
- Auto-select optimal quantization based on preferences
- Build Ollama-compatible model references
- Pull and run HuggingFace models directly through Ollama

## Overview

Ollama natively supports running GGUF models from HuggingFace Hub using the
`hf.co/{username}/{repository}:{quantization}` model reference format. This module
adds discovery and convenience features on top of that capability.

## Quick Start

    # Initialize Ollixir client
    client = Ollixir.init()

    # Discover available GGUF files
    {:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")

    # Auto-select best quantization
    {:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")

    # Pull and chat
    {:ok, _} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M")
    {:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
      [%{role: "user", content: "Hello!"}],
      quantization: "Q4_K_M"
    )

## Direct Usage (No Discovery)

If you already know the repository and quantization you want, you can skip
this module entirely and use Ollixir directly:

    Ollixir.chat(client,
      model: "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M",
      messages: [%{role: "user", content: "Hello!"}]
    )

## Quantization Selection

The module uses a preference order optimized for quality/size balance:

1. Q4_K_M, Q4_K_S (best balance for most users)
2. Q5_K_M, Q5_K_S (higher quality, larger size)
3. Q6_K, Q8_0 (even higher quality)
4. IQ4_XS, IQ3_M (smaller, for constrained environments)
5. F16, BF16 (full precision, largest)

You can also specify your own preference or filter by maximum size.

# `gguf_info`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L73)

```elixir
@type gguf_info() :: %{
  filename: String.t(),
  size_bytes: non_neg_integer(),
  size_gb: float(),
  quantization: String.t(),
  ollama_tag: String.t()
}
```

Information about a GGUF file in a HuggingFace repository.

# `hf_opts`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L84)

```elixir
@type hf_opts() :: [quantization: String.t(), revision: String.t(), token: String.t()]
```

Options for HuggingFace operations.

# `auto_select`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L406)

```elixir
@spec auto_select(
  String.t(),
  keyword()
) :: {:ok, String.t(), gguf_info()} | {:error, term()}
```

Auto-selects the best model from a HuggingFace repository.

Discovers available GGUF files and selects the optimal quantization
based on the preference order.

## Parameters

  - `repo_id` - HuggingFace repository ID
  - `opts` - Options:
    - `:quantization` - Force a specific quantization instead of auto-selecting
    - `:max_size_gb` - Maximum file size in GB
    - `:revision` - Git revision (default: "main")
    - `:token` - HuggingFace API token

## Returns

A tuple of `{:ok, model_ref, gguf_info}` where:
  - `model_ref` - The full Ollama model reference (e.g., "hf.co/repo:Q4_K_M")
  - `gguf_info` - The selected GGUF file info map

## Examples

    {:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF")
    # => {:ok, "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M", %{quantization: "Q4_K_M", ...}}

    # With size constraint
    {:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
      max_size_gb: 0.7
    )

    # Force specific quantization
    {:ok, model_ref, info} = Ollixir.HuggingFace.auto_select("bartowski/Llama-3.2-1B-Instruct-GGUF",
      quantization: "Q8_0"
    )

# `best_quantization`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L348)

```elixir
@spec best_quantization(
  [gguf_info()],
  keyword()
) :: String.t() | nil
```

Finds the best available quantization from a list of GGUF files.

Uses the default preference order to select the highest-priority
quantization that is available in the given list.

## Parameters

  - `gguf_files` - List of GGUF info maps from `list_gguf_files/2`
  - `opts` - Options:
    - `:preference` - Custom preference list (default: `quant_preference/0`)
    - `:max_size_gb` - Maximum file size in GB (filters out larger files)

## Examples

    {:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")

    Ollixir.HuggingFace.best_quantization(ggufs)
    # => "Q4_K_M"

    Ollixir.HuggingFace.best_quantization(ggufs, max_size_gb: 1.0)
    # => "Q4_K_M" (if under 1GB) or next smallest

# `chat`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L523)

```elixir
@spec chat(Ollixir.client(), String.t(), [map()], keyword()) ::
  {:ok, term()} | {:error, term()}
```

Chats with a HuggingFace model through Ollama.

This is a convenience wrapper around `Ollixir.chat/2` that builds
the correct model reference format.

## Parameters

  - `client` - Ollixir client from `Ollixir.init/1`
  - `repo_id` - HuggingFace repository ID
  - `messages` - List of message maps with `:role` and `:content`
  - `opts` - Options:
    - `:quantization` - Quantization tag (recommended)
    - `:stream` - Stream responses (default: false)
    - Other options passed to `Ollixir.chat/2`

## Examples

    client = Ollixir.init()

    {:ok, response} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
      [%{role: "user", content: "Hello!"}],
      quantization: "Q4_K_M"
    )

    IO.puts(response["message"]["content"])

    # With streaming
    {:ok, stream} = Ollixir.HuggingFace.chat(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
      [%{role: "user", content: "Tell me a story"}],
      quantization: "Q4_K_M",
      stream: true
    )
    Enum.each(stream, fn chunk ->
      if content = get_in(chunk, ["message", "content"]), do: IO.write(content)
    end)

# `embed`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L596)

```elixir
@spec embed(Ollixir.client(), String.t(), String.t() | [String.t()], keyword()) ::
  {:ok, term()} | {:error, term()}
```

Generates embeddings from a HuggingFace model through Ollama.

This is a convenience wrapper around `Ollixir.embed/2` that builds
the correct model reference format.

## Parameters

  - `client` - Ollixir client from `Ollixir.init/1`
  - `repo_id` - HuggingFace repository ID (must be an embedding model)
  - `input` - Text or list of texts to embed
  - `opts` - Options:
    - `:quantization` - Quantization tag (recommended)
    - Other options passed to `Ollixir.embed/2`

## Examples

    client = Ollixir.init()

    {:ok, response} = Ollixir.HuggingFace.embed(client, "nomic-ai/nomic-embed-text-v1.5-GGUF",
      "Hello world",
      quantization: "Q4_K_M"
    )

    embeddings = response["embeddings"]

# `extract_quantization`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L278)

```elixir
@spec extract_quantization(String.t()) :: String.t()
```

Extracts the quantization type from a GGUF filename.

Parses common quantization patterns from filenames like:
- `Llama-3.2-1B-Instruct-Q4_K_M.gguf` -> "Q4_K_M"
- `model-IQ3_M.gguf` -> "IQ3_M"
- `model-Q6_K.gguf` -> "Q6_K"
- `model-f16.gguf` -> "F16"

## Examples

    iex> Ollixir.HuggingFace.extract_quantization("Llama-3.2-1B-Instruct-Q4_K_M.gguf")
    "Q4_K_M"

    iex> Ollixir.HuggingFace.extract_quantization("model-IQ3_M.gguf")
    "IQ3_M"

    iex> Ollixir.HuggingFace.extract_quantization("model-Q6_K.gguf")
    "Q6_K"

    iex> Ollixir.HuggingFace.extract_quantization("unknown-format.gguf")
    "unknown"

# `generate`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L560)

```elixir
@spec generate(Ollixir.client(), String.t(), String.t(), keyword()) ::
  {:ok, term()} | {:error, term()}
```

Generates a completion from a HuggingFace model through Ollama.

This is a convenience wrapper around `Ollixir.generate/2` (or `Ollixir.completion/2`)
that builds the correct model reference format.

## Parameters

  - `client` - Ollixir client from `Ollixir.init/1`
  - `repo_id` - HuggingFace repository ID
  - `prompt` - The prompt string
  - `opts` - Options:
    - `:quantization` - Quantization tag (recommended)
    - `:stream` - Stream responses (default: false)
    - Other options passed to `Ollixir.generate/2`

## Examples

    client = Ollixir.init()

    {:ok, response} = Ollixir.HuggingFace.generate(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
      "Once upon a time",
      quantization: "Q4_K_M"
    )

    IO.puts(response["response"])

# `hf_model?`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L180)

```elixir
@spec hf_model?(String.t()) :: boolean()
```

Checks if a model reference is a HuggingFace model.

## Examples

    iex> Ollixir.HuggingFace.hf_model?("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
    true

    iex> Ollixir.HuggingFace.hf_model?("llama3.2")
    false

# `list_gguf_files`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L222)

```elixir
@spec list_gguf_files(
  String.t(),
  keyword()
) :: {:ok, [gguf_info()]} | {:error, term()}
```

Lists all GGUF files in a HuggingFace repository.

Uses the HuggingFace Hub API to discover available GGUF model files,
extracting quantization type and file size for each.

## Parameters

  - `repo_id` - HuggingFace repository ID
  - `opts` - Options passed to `HfHub.Api.list_repo_tree/2`:
    - `:revision` - Git revision (default: "main")
    - `:token` - HuggingFace API token for private repos

## Returns

A list of maps containing:
  - `:filename` - Full filename (e.g., "Llama-3.2-1B-Instruct-Q4_K_M.gguf")
  - `:size_bytes` - File size in bytes
  - `:size_gb` - File size in gigabytes (rounded to 2 decimal places)
  - `:quantization` - Extracted quantization type (e.g., "Q4_K_M")
  - `:ollama_tag` - The tag to use with Ollama (uppercase quantization)

## Examples

    {:ok, ggufs} = Ollixir.HuggingFace.list_gguf_files("bartowski/Llama-3.2-1B-Instruct-GGUF")
    # => [
    #   %{filename: "Llama-3.2-1B-Instruct-Q4_K_M.gguf", size_gb: 0.75, quantization: "Q4_K_M", ...},
    #   %{filename: "Llama-3.2-1B-Instruct-Q8_0.gguf", size_gb: 1.23, quantization: "Q8_0", ...},
    #   ...
    # ]

# `model_info`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L625)

```elixir
@spec model_info(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, term()}
```

Gets model information from HuggingFace Hub.

Returns metadata about the model including downloads, tags, and file list.

## Parameters

  - `repo_id` - HuggingFace repository ID
  - `opts` - Options passed to `HfHub.Api.model_info/2`

## Examples

    {:ok, info} = Ollixir.HuggingFace.model_info("bartowski/Llama-3.2-1B-Instruct-GGUF")
    IO.puts("Downloads: #{info.downloads}")
    IO.puts("Tags: #{Enum.join(info.tags, ", ")}")

# `model_ref`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L130)

```elixir
@spec model_ref(
  String.t(),
  keyword()
) :: String.t()
```

Builds an Ollama model reference from a HuggingFace repository ID.

Ollama natively supports HuggingFace models using the format:
`hf.co/{username}/{repository}:{quantization}`

## Parameters

  - `repo_id` - HuggingFace repository ID (e.g., "bartowski/Llama-3.2-1B-Instruct-GGUF")
  - `opts` - Options:
    - `:quantization` - Quantization tag (e.g., "Q4_K_M", "IQ3_M")

## Examples

    iex> Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF")
    "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF"

    iex> Ollixir.HuggingFace.model_ref("bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q8_0")
    "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q8_0"

# `parse_model_ref`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L159)

```elixir
@spec parse_model_ref(String.t()) ::
  {:ok, %{repo_id: String.t(), quantization: String.t() | nil}}
  | {:error, :not_hf_model}
```

Parses an Ollama HuggingFace model reference into its components.

## Examples

    iex> Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M")
    {:ok, %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: "Q4_K_M"}}

    iex> Ollixir.HuggingFace.parse_model_ref("hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF")
    {:ok, %{repo_id: "bartowski/Llama-3.2-1B-Instruct-GGUF", quantization: nil}}

    iex> Ollixir.HuggingFace.parse_model_ref("llama3.2")
    {:error, :not_hf_model}

# `pull`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L476)

```elixir
@spec pull(Ollixir.client(), String.t(), keyword()) ::
  {:ok, term()} | {:error, term()}
```

Pulls a HuggingFace model through Ollama.

This is a convenience wrapper around `Ollixir.pull_model/2` that builds
the correct model reference format.

## Parameters

  - `client` - Ollixir client from `Ollixir.init/1`
  - `repo_id` - HuggingFace repository ID
  - `opts` - Options:
    - `:quantization` - Quantization tag (recommended)
    - `:stream` - Stream progress updates (default: false)
    - Other options passed to `Ollixir.pull_model/2`

## Examples

    client = Ollixir.init()

    # Pull specific quantization
    {:ok, response} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
      quantization: "Q4_K_M"
    )

    # Pull with streaming progress
    {:ok, stream} = Ollixir.HuggingFace.pull(client, "bartowski/Llama-3.2-1B-Instruct-GGUF",
      quantization: "Q4_K_M",
      stream: true
    )
    Enum.each(stream, &IO.inspect/1)

# `quant_preference`
[🔗](https://github.com/nshkrdotcom/ollixir/blob/main/lib/ollixir/huggingface.ex#L321)

```elixir
@spec quant_preference() :: [String.t()]
```

Returns the default quantization preference order.

This is the order used by `best_quantization/1` and `auto_select/2` when
choosing the optimal quantization for a model.

## Examples

    Ollixir.HuggingFace.quant_preference()
    # => ["Q4_K_M", "Q4_K_S", "Q4_K", "Q4_K_L", "Q5_K_M", ...]

---

*Consult [api-reference.md](api-reference.md) for complete listing*