HuggingfaceClient.Libraries (huggingface_client v0.1.0)

Copy Markdown View Source

HuggingFace Library Integration Helpers.

Configuration builders and integration utilities for the major HuggingFace ecosystem libraries, mirroring the "4. Libraries" section of the docs:

  • 4.1 Core Libraries: Transformers, Datasets, Tokenizers, Accelerate, Evaluate
  • 4.2 Generative AI: Diffusers
  • 4.3 Optimization: Optimum, PEFT
  • 4.4 Other Tools: Safetensors, TRL, Bitsandbytes

These helpers generate configuration maps for use with:

  • HuggingfaceClient.run_job/1 — to run library-specific training on HF infra
  • HuggingfaceClient.autotrain_create/1 — for AutoTrain fine-tuning
  • Local training scripts

See: https://huggingface.co/docs

Example

# Bitsandbytes 4-bit quantization config
bnb_config = HuggingfaceClient.Libraries.bnb_config(
  load_in_4bit: true,
  bnb_4bit_quant_type: "nf4",
  bnb_4bit_compute_dtype: "bfloat16"
)

# Merge into LoRA config
training_config = Map.merge(
  HuggingfaceClient.lora_config(base_model: "meta-llama/Llama-3.1-8B"),
  %{"quantization_config" => bnb_config}
)

Summary

Functions

Builds a bitsandbytes quantization config.

Returns configuration for a Diffusers pipeline.

Returns the API schema for a Gradio Space.

Returns the URL for a Gradio Space's API endpoint.

Returns an Optimum export/optimization configuration.

Returns a QLoRA configuration combining 4-bit bitsandbytes + LoRA.

Returns configuration for a reranking model.

Returns metadata about a Safetensors file from a Hub repository.

Returns configuration for a Sentence Transformers embedding model.

Returns configuration for tokenizer settings.

Returns configuration for loading a Transformers model.

Returns a TRL (Transformer Reinforcement Learning) configuration.

Functions

bnb_config(opts \\ [])

@spec bnb_config(keyword()) :: map()

Builds a bitsandbytes quantization config.

Enables 4-bit or 8-bit quantization for memory-efficient inference and training.

Options (4-bit)

  • :load_in_4bit — enable 4-bit loading (default: true)
  • :bnb_4bit_quant_type"nf4" (NormalFloat4, better quality) or "fp4" (default: "nf4")
  • :bnb_4bit_compute_dtype — compute dtype: "bfloat16", "float16", "float32" (default: "bfloat16")
  • :bnb_4bit_use_double_quant — double quantization for extra savings (default: true)

Options (8-bit)

  • :load_in_8bit — enable 8-bit loading
  • :llm_int8_threshold — threshold for mixed-precision (default: 6.0)
  • :llm_int8_skip_modules — list of module names to skip quantization

Example

# QLoRA-style 4-bit config
config = HuggingfaceClient.Libraries.bnb_config(
  load_in_4bit: true,
  bnb_4bit_quant_type: "nf4",
  bnb_4bit_compute_dtype: "bfloat16",
  bnb_4bit_use_double_quant: true
)
# Use in LoRA training:
training_config = Map.merge(
  HuggingfaceClient.lora_config(base_model: "meta-llama/Llama-3.1-8B"),
  %{"bnb_config" => config}
)

# 8-bit for inference only
config_8bit = HuggingfaceClient.Libraries.bnb_config(
  load_in_8bit: true,
  llm_int8_threshold: 6.0
)

diffusers_config(opts \\ [])

@spec diffusers_config(keyword()) :: map()

Returns configuration for a Diffusers pipeline.

Used to generate images, videos, or audio with diffusion models.

Options

  • :model_id — HF model ID (required)
  • :task"text-to-image", "image-to-image", "inpainting", "text-to-video" (default: "text-to-image")
  • :scheduler — diffusion scheduler: "DDPM", "DDIM", "DPM++", "Euler", "EulerA" (default: "EulerA")
  • :dtype"float16", "bfloat16", "float32" (default: "float16")
  • :device"cuda", "mps", "cpu" (default: "cuda")
  • :enable_xformers — memory-efficient attention (default: true)
  • :safety_checker — enable safety checker (default: false)

Example

config = HuggingfaceClient.Libraries.diffusers_config(
  model_id: "black-forest-labs/FLUX.1-dev",
  task: "text-to-image",
  dtype: "bfloat16",
  device: "cuda"
)

gradio_api_schema(space_id, opts \\ [])

@spec gradio_api_schema(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}

Returns the API schema for a Gradio Space.

Example

{:ok, schema} = HuggingfaceClient.Libraries.gradio_api_schema("gradio/hello_world")
IO.inspect(schema["endpoints"])

gradio_api_url(space_id, opts \\ [])

@spec gradio_api_url(
  String.t(),
  keyword()
) :: String.t()

Returns the URL for a Gradio Space's API endpoint.

Example

api_url = HuggingfaceClient.Libraries.gradio_api_url("stabilityai/stable-diffusion")
# "https://stabilityai-stable-diffusion.hf.space/run/predict"

optimum_config(opts \\ [])

@spec optimum_config(keyword()) :: map()

Returns an Optimum export/optimization configuration.

Optimum is HuggingFace's toolkit for optimizing models for specific hardware.

Options

  • :model_id — HF model ID (required)
  • :backend"onnx", "openvino", "tflite", "coreml", "neuronx" (default: "onnx")
  • :task — task type for export (e.g. "text-classification")
  • :fp16 — export in FP16 (default: false)
  • :optimize_for"performance", "size", "latency" (default: "performance")

Example

config = HuggingfaceClient.Libraries.optimum_config(
  model_id: "bert-base-uncased",
  backend: "onnx",
  task: "text-classification",
  fp16: true,
  optimize_for: "latency"
)

qlora_config(opts \\ [])

@spec qlora_config(keyword()) :: map()

Returns a QLoRA configuration combining 4-bit bitsandbytes + LoRA.

QLoRA is the most memory-efficient fine-tuning approach for large models. Reduces GPU memory by ~75% compared to full fine-tuning.

Options

  • :base_model — model to fine-tune (required)
  • :rank — LoRA rank (default: 16)
  • :alpha — LoRA alpha (default: 32)
  • :quant_type"nf4" or "fp4" (default: "nf4")
  • :compute_dtype"bfloat16" or "float16" (default: "bfloat16")

Example

# Fine-tune a 70B model on a single A100
config = HuggingfaceClient.Libraries.qlora_config(
  base_model: "meta-llama/Llama-3.1-70B-Instruct",
  rank: 64, alpha: 128
)
{:ok, job} = HuggingfaceClient.run_job(
  image: "huggingface/trl-latest-gpu:latest",
  command: ["python", "sft.py"] ++ HuggingfaceClient.training_to_args(config),
  flavor: "a100-large",
  access_token: token
)

reranker_config(opts \\ [])

@spec reranker_config(keyword()) :: map()

Returns configuration for a reranking model.

Reranking improves RAG pipelines by scoring document relevance. Popular models: "cross-encoder/ms-marco-MiniLM-L-6-v2", "BAAI/bge-reranker-large".

Options

  • :model_id — cross-encoder model ID (required)
  • :max_length — max input length (default: 512)
  • :batch_size — scoring batch size (default: 32)

Example

config = HuggingfaceClient.Libraries.reranker_config(
  model_id: "BAAI/bge-reranker-large",
  max_length: 1024
)

# Use with TEI rerank endpoint
{:ok, results} = HuggingfaceClient.tei_rerank(tei,
  query: "What is deep learning?",
  texts: ["Deep learning is...", "Python is..."]
)

safetensors_metadata(repo_id, filename \\ "model.safetensors", opts \\ [])

@spec safetensors_metadata(String.t(), String.t(), keyword()) ::
  {:ok, map()} | {:error, Exception.t()}

Returns metadata about a Safetensors file from a Hub repository.

Safetensors is a safe, fast format for storing model weights.

Example

{:ok, meta} = HuggingfaceClient.Libraries.safetensors_metadata(
  "gpt2", "model.safetensors"
)
IO.puts("Tensors: #{map_size(meta["tensors"])}")

sentence_transformers_config(opts \\ [])

@spec sentence_transformers_config(keyword()) :: map()

Returns configuration for a Sentence Transformers embedding model.

Sentence Transformers provides state-of-the-art embeddings for semantic search, RAG, and document similarity.

Options

  • :model_id — embedding model ID (required) Examples: "sentence-transformers/all-MiniLM-L6-v2",
          `"BAAI/bge-large-en-v1.5"`, `"intfloat/e5-large-v2"`
  • :normalize — normalize embeddings (default: true)
  • :prompt — prompt prefix for asymmetric models
  • :batch_size — encoding batch size (default: 32)
  • :max_seq_length — max token length (default: 512)

Example

config = HuggingfaceClient.Libraries.sentence_transformers_config(
  model_id: "BAAI/bge-large-en-v1.5",
  normalize: true,
  prompt: "Represent this sentence for searching relevant passages: "
)

# Use with TEI client for production serving
tei = HuggingfaceClient.tei("http://localhost:8080")
{:ok, embedding} = HuggingfaceClient.tei_embed(tei,
  "What is machine learning?",
  prompt_name: "query"
)

tokenizer_config(opts \\ [])

@spec tokenizer_config(keyword()) :: map()

Returns configuration for tokenizer settings.

Options

  • :model_id — HF model ID (required)
  • :max_length — max token length (default: 512)
  • :padding"max_length", "longest", "do_not_pad" (default: "longest")
  • :truncation — truncation strategy (default: true)
  • :add_special_tokens — add BOS/EOS tokens (default: true)
  • :return_tensors"pt", "tf", "np" (default: "pt")

Example

tokenizer_cfg = HuggingfaceClient.Libraries.tokenizer_config(
  model_id: "bert-base-uncased",
  max_length: 512,
  padding: "max_length",
  truncation: true
)

transformers_config(opts \\ [])

@spec transformers_config(keyword()) :: map()

Returns configuration for loading a Transformers model.

Generates a from_pretrained-compatible config dict.

Options

  • :model_id — HF model ID (required)
  • :revision — branch/commit (default: "main")
  • :dtype"float16", "bfloat16", "float32", "auto" (default: "auto")
  • :device_map"auto", "cuda", "cpu", or explicit device map
  • :trust_remote_code — allow custom model code (default: false)
  • :load_in_4bit / :load_in_8bit — quantize with bitsandbytes
  • :attn_implementation"flash_attention_2", "sdpa", "eager"
  • :use_cache — enable KV cache (default: true)

Example

config = HuggingfaceClient.Libraries.transformers_config(
  model_id: "meta-llama/Llama-3.1-8B-Instruct",
  dtype: "bfloat16",
  device_map: "auto",
  attn_implementation: "flash_attention_2"
)

trl_config(opts \\ [])

@spec trl_config(keyword()) :: map()

Returns a TRL (Transformer Reinforcement Learning) configuration.

TRL supports SFT, DPO, ORPO, GRPO, PPO, and reward modeling.

Options

  • :trainer"sft", "dpo", "orpo", "grpo", "ppo", "reward" (required)
  • :base_model — model ID to train (required)
  • :dataset — dataset ID
  • :max_seq_length — max sequence length (default: 2048)
  • :packing — pack sequences for efficiency (default: true for SFT)
  • :use_peft — use PEFT/LoRA (default: false)
  • :lora_r — LoRA rank (if use_peft is true)
  • :learning_rate — learning rate

Example

# SFT training config
config = HuggingfaceClient.Libraries.trl_config(
  trainer: "sft",
  base_model: "Qwen/Qwen2.5-7B",
  dataset: "my-org/chat-dataset",
  max_seq_length: 2048,
  use_peft: true,
  lora_r: 16
)

# DPO alignment
config = HuggingfaceClient.Libraries.trl_config(
  trainer: "dpo",
  base_model: "my-org/sft-model",
  dataset: "my-org/preference-data",
  beta: 0.1
)