HuggingfaceClient.Inference.Inference (huggingface_client v0.1.0)

Elixir client for the Hugging Face Inference API.

Mirrors the full feature set of the @huggingface/inference npm package:

24+ inference providers (Groq, Together, Replicate, Fal.ai, Nebius, …)
30+ ML tasks (chat completion, image generation, ASR, embeddings, …)
Streaming via Server-Sent Events
Automatic provider routing via the HF Hub API

Quick start

# Create a client
client = HuggingfaceClient.client("hf_your_token")

# Chat completion
{:ok, resp} = HuggingfaceClient.chat_completion(client, %{
  model: "meta-llama/Llama-3.1-8B-Instruct",
  messages: [%{role: "user", content: "Hello!"}]
})
IO.puts(resp["choices"] |> hd() |> get_in(["message", "content"]))

# Streaming chat completion
{:ok, stream} = HuggingfaceClient.chat_completion_stream(client, %{
  model: "meta-llama/Llama-3.1-8B-Instruct",
  messages: [%{role: "user", content: "Tell me a story"}]
})
Enum.each(stream, fn chunk ->
  IO.write(get_in(chunk, ["choices", Access.at(0), "delta", "content"]) || "")
end)

# Use a different provider
{:ok, resp} = HuggingfaceClient.chat_completion(client, %{
  model: "meta-llama/Llama-3.1-8B-Instruct",
  provider: "groq",
  messages: [%{role: "user", content: "Hi from Groq!"}]
})

# Text-to-image
{:ok, image_bytes} = HuggingfaceClient.text_to_image(client, %{
  model: "stabilityai/stable-diffusion-2",
  inputs: "a scenic mountain lake at sunset"
})
File.write!("output.png", image_bytes)

# Embeddings
{:ok, embeddings} = HuggingfaceClient.feature_extraction(client, %{
  model: "sentence-transformers/all-MiniLM-L6-v2",
  inputs: ["Hello world", "Bonjour le monde"]
})

Using a dedicated endpoint

endpoint_client = HuggingfaceClient.endpoint_client(
  "hf_token",
  "https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/my-model"
)
{:ok, resp} = HuggingfaceClient.text_generation(endpoint_client, %{
  inputs: "The answer is"
})

Configuration

config :huggingface_client,
  hub_url: "https://huggingface.co",      # default
  router_url: "https://router.huggingface.co",  # default
  finch_opts: [
    pools: %{
      "https://api.groq.com" => [size: 25]
    }
  ]

Summary

Functions

apply_chat_template(template, messages, extra_vars \\ %{})

Applies a Jinja2 chat template to a list of messages.

audio_classification(client, args)

Audio classification. Returns label + score pairs.

audio_to_audio(client, args)

Audio-to-audio transformation (source separation, enhancement).

automatic_speech_recognition(client, args)

Automatic speech recognition / transcription.

available_providers(model_id, opts \\ [])

Returns the list of providers that currently support a model.

chat_completion(client, args)

Chat completion (OpenAI-compatible /v1/chat/completions).

chat_completion_stream(client, args)

Streaming chat completion. Returns {:ok, stream} where each element is a delta chunk.

client(access_token, opts \\ [])

Creates a new inference client with the given access token.

collect_content(stream)

Collects all content tokens from a streaming chat completion into a single string.

content_stream(stream)

Returns a lazy stream of plain content token strings from a chat completion stream.

depth_estimation(client, args)

Estimates monocular depth from an image. Returns a depth map useful for 3D reconstruction, AR, robotics.

document_question_answering(client, args)

Document question answering from scanned documents.

endpoint_client(access_token, endpoint_url, opts \\ [])

Creates a client tied to a specific inference endpoint.

feature_extraction(client, args)

Dense embedding / feature extraction.

fill_mask(client, args)

Fill-mask (masked language modelling).

image_classification(client, args)

Image classification. Returns [%{"label" => ..., "score" => ...}].

image_segmentation(client, args)

Image segmentation. Returns list of %{label, mask, score} maps.

image_text_to_image(client, args)

Image + text → image (multimodal). Takes an image input and text prompt, returns a generated image.

image_text_to_text(client, args)

Multimodal vision-language: image + text prompt → text response (GPT-4V style). Used for visual QA, chart understanding, document analysis, multi-turn vision conversations. Different from image_to_text/2 which captions without a text prompt.

image_text_to_video(client, args)

Image + text → video (multimodal). Takes an image input and text prompt, returns a generated video.

image_to_image(client, args)

Image-to-image transformation (style transfer, inpainting, super-resolution).

image_to_text(client, args)

Image captioning / image-to-text.

image_to_video(client, args)

Animate a still image into a short video clip.

mask_generation(client, args)

Generates segmentation masks (SAM / segment-anything style). Returns masks for all objects detected in an image.

model_info(model_id, opts \\ [])

Fetches model metadata and available inference providers from the Hub.

object_detection(client, args)

Object detection with bounding boxes.

question_answering(client, args)

Extractive question answering.

render_template(template, variables \\ %{})

Renders a Jinja2 template string with the given variables.

request(client, args)

Raw inference request — sends inputs directly to a provider with no task-level validation or response transformation.

sentence_similarity(client, args)

Sentence similarity scoring.

summarization(client, args)

Abstractive summarisation.

table_question_answering(client, args)

Table question answering (TAPAS / TaBERT).

tabular_classification(client, args)

Tabular data classification. Returns predicted class indices.

tabular_regression(client, args)

Tabular data regression. Returns predicted float values.

text_classification(client, args)

Text classification (sentiment analysis, topic, etc.).

text_generation(client, args)

Text generation (completion, non-chat).

text_generation_stream(client, args)

Streaming text generation. Returns {:ok, stream} of delta chunks.

text_to_audio(client, args)

Text-to-audio generation (music, effects). Returns audio bytes.

text_to_image(client, args)

Text-to-image generation.

text_to_speech(client, args)

Text-to-speech synthesis. Returns audio bytes.

text_to_video(client, args)

Text-to-video generation. Returns video bytes.

token_classification(client, args)

Token / entity classification (NER).

translation(client, args)

Neural machine translation.

video_classification(client, args)

Classifies a video clip into predefined categories.

visual_question_answering(client, args)

Visual question answering — answer questions about an image.

zero_shot_classification(client, args)

Zero-shot text classification.

zero_shot_image_classification(client, args)

Zero-shot image classification with candidate labels.

Functions

apply_chat_template(template, messages, extra_vars \\ %{})

@spec apply_chat_template(String.t(), [map()], map()) ::
  {:ok, String.t()} | {:error, Exception.t()}

Applies a Jinja2 chat template to a list of messages.

Delegates to HuggingfaceClient.Jinja.apply_chat_template/3.

Most HuggingFace models include a chat template in their tokenizer_config.json under the "chat_template" key.

Example

template = ~s({% for m in messages %}<{{ m["role"] }}>{{ m["content"] }}</{{ m["role"] }}>{% endfor %})

{:ok, text} = HuggingfaceClient.apply_chat_template(template, [
  %{"role" => "user", "content" => "Hello!"}
])

audio_classification(client, args)

@spec audio_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Audio classification. Returns label + score pairs.

audio_to_audio(client, args)

@spec audio_to_audio(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Audio-to-audio transformation (source separation, enhancement).

automatic_speech_recognition(client, args)

@spec automatic_speech_recognition(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Automatic speech recognition / transcription.

available_providers(model_id, opts \\ [])

@spec available_providers(
  String.t(),
  keyword()
) :: {:ok, [String.t()]} | {:error, Exception.t()}

Returns the list of providers that currently support a model.

Delegates to HuggingfaceClient.Inference.ModelInfo.available_providers/2.

Example

{:ok, providers} = HuggingfaceClient.available_providers("meta-llama/Llama-3.1-8B-Instruct")
# => ["groq", "together", "nebius", ...]

chat_completion(client, args)

@spec chat_completion(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Chat completion (OpenAI-compatible /v1/chat/completions).

Supports all providers that expose the chat completions API. The provider: "auto" (default) routes to the first available provider for the model, sorted by the user's preference in https://hf.co/settings/inference-providers.

Arguments

:model — HuggingFace model ID (e.g. "meta-llama/Llama-3.1-8B-Instruct")
:messages — list of %{role: string, content: string} maps
:provider — override the provider (e.g. "groq", "together")
:max_tokens — maximum output tokens
:temperature — sampling temperature
:tools — list of function tool definitions
any other OpenAI chat-completion parameters

Returns

{:ok, %{"choices" => [...], "model" => ..., ...}} or {:error, exception}.

chat_completion_stream(client, args)

@spec chat_completion_stream(HuggingfaceClient.Client.t(), map()) ::
  {:ok, Enumerable.t()} | {:error, Exception.t()}

Streaming chat completion. Returns {:ok, stream} where each element is a delta chunk.

Example

{:ok, stream} = HuggingfaceClient.chat_completion_stream(client, %{
  model: "meta-llama/Llama-3.1-8B-Instruct",
  messages: [%{role: "user", content: "Write a haiku"}]
})
Enum.each(stream, fn chunk ->
  IO.write(get_in(chunk, ["choices", Access.at(0), "delta", "content"]) || "")
end)

client(access_token, opts \\ [])

@spec client(
  String.t() | nil,
  keyword()
) :: HuggingfaceClient.Client.t()

Creates a new inference client with the given access token.

Options

:provider - Default inference provider. The default value is nil.
:bill_to - HF organisation to bill requests to. The default value is nil.
:endpoint_url - Custom endpoint URL. Overrides provider-based routing. The default value is nil.
:retry_on_503 (boolean/0) - Automatically retry once on HTTP 503 responses. The default value is true.
:req_opts (keyword/0) - Extra options forwarded to Req. The default value is [].

Examples

client = HuggingfaceClient.client("hf_your_token")
client = HuggingfaceClient.client("hf_token", provider: "groq", bill_to: "my-org")

collect_content(stream)

@spec collect_content(Enumerable.t()) :: String.t()

Collects all content tokens from a streaming chat completion into a single string.

Delegates to HuggingfaceClient.Inference.StreamHelpers.collect_content/1.

Example

{:ok, stream} = HuggingfaceClient.chat_completion_stream(client, args)
text = HuggingfaceClient.collect_content(stream)

content_stream(stream)

@spec content_stream(Enumerable.t()) :: Enumerable.t()

Returns a lazy stream of plain content token strings from a chat completion stream.

Delegates to HuggingfaceClient.Inference.StreamHelpers.content_stream/1.

depth_estimation(client, args)

@spec depth_estimation(HuggingfaceClient.Client.t(), map()) ::
  {:ok, term()} | {:error, Exception.t()}

Estimates monocular depth from an image. Returns a depth map useful for 3D reconstruction, AR, robotics.

Example

{:ok, result} = HuggingfaceClient.depth_estimation(client, %{
  image: "https://example.com/scene.jpg"
})

document_question_answering(client, args)

@spec document_question_answering(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Document question answering from scanned documents.

endpoint_client(access_token, endpoint_url, opts \\ [])

@spec endpoint_client(String.t() | nil, String.t(), keyword()) ::
  HuggingfaceClient.Client.t()

Creates a client tied to a specific inference endpoint.

Examples

client = HuggingfaceClient.endpoint_client(
  "hf_token",
  "https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2"
)

feature_extraction(client, args)

@spec feature_extraction(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Dense embedding / feature extraction.

Returns {:ok, [[float], ...]} — a list of embedding vectors.

fill_mask(client, args)

@spec fill_mask(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Fill-mask (masked language modelling).

image_classification(client, args)

@spec image_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Image classification. Returns [%{"label" => ..., "score" => ...}].

image_segmentation(client, args)

@spec image_segmentation(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Image segmentation. Returns list of %{label, mask, score} maps.

image_text_to_image(client, args)

@spec image_text_to_image(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Image + text → image (multimodal). Takes an image input and text prompt, returns a generated image.

Recommended model: black-forest-labs/FLUX.1-dev

image_text_to_text(client, args)

@spec image_text_to_text(HuggingfaceClient.Client.t(), map()) ::
  {:ok, term()} | {:error, Exception.t()}

Example

client = HuggingfaceClient.client(token, model: "llava-hf/llava-1.5-7b-hf")
{:ok, resp} = HuggingfaceClient.image_text_to_text(client, %{
  image: "https://example.com/chart.png",
  prompt: "What does this chart show?"
})

image_text_to_video(client, args)

@spec image_text_to_video(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Image + text → video (multimodal). Takes an image input and text prompt, returns a generated video.

Recommended model: Lightricks/LTX-Video

image_to_image(client, args)

@spec image_to_image(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Image-to-image transformation (style transfer, inpainting, super-resolution).

image_to_text(client, args)

@spec image_to_text(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Image captioning / image-to-text.

image_to_video(client, args)

@spec image_to_video(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Animate a still image into a short video clip.

mask_generation(client, args)

@spec mask_generation(HuggingfaceClient.Client.t(), map()) ::
  {:ok, term()} | {:error, Exception.t()}

Generates segmentation masks (SAM / segment-anything style). Returns masks for all objects detected in an image.

Example

{:ok, masks} = HuggingfaceClient.mask_generation(client, %{
  image: "https://example.com/photo.jpg"
})

model_info(model_id, opts \\ [])

@spec model_info(
  String.t(),
  keyword()
) :: {:ok, HuggingfaceClient.Inference.ModelInfo.t()} | {:error, Exception.t()}

Fetches model metadata and available inference providers from the Hub.

Delegates to HuggingfaceClient.Inference.ModelInfo.fetch/2.

Example

{:ok, info} = HuggingfaceClient.model_info("meta-llama/Llama-3.1-8B-Instruct",
  access_token: "hf_..."
)
IO.inspect(info.providers)

object_detection(client, args)

@spec object_detection(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Object detection with bounding boxes.

question_answering(client, args)

@spec question_answering(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Extractive question answering.

render_template(template, variables \\ %{})

@spec render_template(String.t(), map()) ::
  {:ok, String.t()} | {:error, Exception.t()}

Renders a Jinja2 template string with the given variables.

Delegates to HuggingfaceClient.Jinja.render/2.

Example

{:ok, text} = HuggingfaceClient.render_template(
  "Hello, {{ name }}!",
  %{"name" => "World"}
)

request(client, args)

@spec request(HuggingfaceClient.Client.t(), map()) ::
  {:ok, term()} | {:error, Exception.t()}

Raw inference request — sends inputs directly to a provider with no task-level validation or response transformation.

Useful for:

Custom fine-tuned models with non-standard I/O
Providers not yet covered by a dedicated task
Debugging raw provider responses

Example

{:ok, result} = HuggingfaceClient.request(client, %{
  model: "my-user/my-custom-model",
  inputs: "some raw text",
  parameters: %{custom_param: 42}
})

sentence_similarity(client, args)

@spec sentence_similarity(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Sentence similarity scoring.

summarization(client, args)

@spec summarization(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Abstractive summarisation.

table_question_answering(client, args)

@spec table_question_answering(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Table question answering (TAPAS / TaBERT).

tabular_classification(client, args)

@spec tabular_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Tabular data classification. Returns predicted class indices.

tabular_regression(client, args)

@spec tabular_regression(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Tabular data regression. Returns predicted float values.

text_classification(client, args)

@spec text_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Text classification (sentiment analysis, topic, etc.).

Returns {:ok, [{"label", "score"}, ...]}.

text_generation(client, args)

@spec text_generation(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Text generation (completion, non-chat).

Returns {:ok, %{"generated_text" => string}}.

text_generation_stream(client, args)

@spec text_generation_stream(HuggingfaceClient.Client.t(), map()) ::
  {:ok, Enumerable.t()} | {:error, Exception.t()}

Streaming text generation. Returns {:ok, stream} of delta chunks.

text_to_audio(client, args)

@spec text_to_audio(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Text-to-audio generation (music, effects). Returns audio bytes.

text_to_image(client, args)

@spec text_to_image(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Text-to-image generation.

Returns {:ok, binary} (raw image bytes) by default. Pass output_type: :url for a URL string (where provider supports it).

text_to_speech(client, args)

@spec text_to_speech(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Text-to-speech synthesis. Returns audio bytes.

text_to_video(client, args)

@spec text_to_video(HuggingfaceClient.Client.t(), map()) ::
  {:ok, binary()} | {:error, Exception.t()}

Text-to-video generation. Returns video bytes.

token_classification(client, args)

@spec token_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Token / entity classification (NER).

translation(client, args)

@spec translation(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Neural machine translation.

video_classification(client, args)

@spec video_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, term()} | {:error, Exception.t()}

Classifies a video clip into predefined categories.

Example

{:ok, results} = HuggingfaceClient.video_classification(client, %{
  video: File.read!("action.mp4"),
  top_k: 5
})

visual_question_answering(client, args)

@spec visual_question_answering(HuggingfaceClient.Client.t(), map()) ::
  {:ok, map()} | {:error, Exception.t()}

Visual question answering — answer questions about an image.

zero_shot_classification(client, args)

@spec zero_shot_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Zero-shot text classification.

zero_shot_image_classification(client, args)

@spec zero_shot_image_classification(HuggingfaceClient.Client.t(), map()) ::
  {:ok, list()} | {:error, Exception.t()}

Zero-shot image classification with candidate labels.