# `HuggingfaceClient.Libraries`
[🔗](https://github.com/huggingface/huggingface_client/blob/v0.1.0/lib/huggingface_client/hub/libraries.ex#L1)

HuggingFace Library Integration Helpers.

Configuration builders and integration utilities for the major HuggingFace ecosystem
libraries, mirroring the "4. Libraries" section of the docs:

- **4.1 Core Libraries**: Transformers, Datasets, Tokenizers, Accelerate, Evaluate
- **4.2 Generative AI**: Diffusers
- **4.3 Optimization**: Optimum, PEFT
- **4.4 Other Tools**: Safetensors, TRL, Bitsandbytes

These helpers generate configuration maps for use with:
- `HuggingfaceClient.run_job/1` — to run library-specific training on HF infra
- `HuggingfaceClient.autotrain_create/1` — for AutoTrain fine-tuning
- Local training scripts

See: https://huggingface.co/docs

## Example

    # Bitsandbytes 4-bit quantization config
    bnb_config = HuggingfaceClient.Libraries.bnb_config(
      load_in_4bit: true,
      bnb_4bit_quant_type: "nf4",
      bnb_4bit_compute_dtype: "bfloat16"
    )

    # Merge into LoRA config
    training_config = Map.merge(
      HuggingfaceClient.lora_config(base_model: "meta-llama/Llama-3.1-8B"),
      %{"quantization_config" => bnb_config}
    )

# `bnb_config`

```elixir
@spec bnb_config(keyword()) :: map()
```

Builds a bitsandbytes quantization config.

Enables 4-bit or 8-bit quantization for memory-efficient inference and training.

## Options (4-bit)
- `:load_in_4bit` — enable 4-bit loading (default: `true`)
- `:bnb_4bit_quant_type` — `"nf4"` (NormalFloat4, better quality) or `"fp4"` (default: `"nf4"`)
- `:bnb_4bit_compute_dtype` — compute dtype: `"bfloat16"`, `"float16"`, `"float32"` (default: `"bfloat16"`)
- `:bnb_4bit_use_double_quant` — double quantization for extra savings (default: `true`)

## Options (8-bit)
- `:load_in_8bit` — enable 8-bit loading
- `:llm_int8_threshold` — threshold for mixed-precision (default: 6.0)
- `:llm_int8_skip_modules` — list of module names to skip quantization

## Example

    # QLoRA-style 4-bit config
    config = HuggingfaceClient.Libraries.bnb_config(
      load_in_4bit: true,
      bnb_4bit_quant_type: "nf4",
      bnb_4bit_compute_dtype: "bfloat16",
      bnb_4bit_use_double_quant: true
    )
    # Use in LoRA training:
    training_config = Map.merge(
      HuggingfaceClient.lora_config(base_model: "meta-llama/Llama-3.1-8B"),
      %{"bnb_config" => config}
    )

    # 8-bit for inference only
    config_8bit = HuggingfaceClient.Libraries.bnb_config(
      load_in_8bit: true,
      llm_int8_threshold: 6.0
    )

# `diffusers_config`

```elixir
@spec diffusers_config(keyword()) :: map()
```

Returns configuration for a Diffusers pipeline.

Used to generate images, videos, or audio with diffusion models.

## Options
- `:model_id` — HF model ID (required)
- `:task` — `"text-to-image"`, `"image-to-image"`, `"inpainting"`, `"text-to-video"` (default: `"text-to-image"`)
- `:scheduler` — diffusion scheduler: `"DDPM"`, `"DDIM"`, `"DPM++"`, `"Euler"`, `"EulerA"` (default: `"EulerA"`)
- `:dtype` — `"float16"`, `"bfloat16"`, `"float32"` (default: `"float16"`)
- `:device` — `"cuda"`, `"mps"`, `"cpu"` (default: `"cuda"`)
- `:enable_xformers` — memory-efficient attention (default: `true`)
- `:safety_checker` — enable safety checker (default: `false`)

## Example
    config = HuggingfaceClient.Libraries.diffusers_config(
      model_id: "black-forest-labs/FLUX.1-dev",
      task: "text-to-image",
      dtype: "bfloat16",
      device: "cuda"
    )

# `gradio_api_schema`

```elixir
@spec gradio_api_schema(
  String.t(),
  keyword()
) :: {:ok, map()} | {:error, Exception.t()}
```

Returns the API schema for a Gradio Space.

## Example
    {:ok, schema} = HuggingfaceClient.Libraries.gradio_api_schema("gradio/hello_world")
    IO.inspect(schema["endpoints"])

# `gradio_api_url`

```elixir
@spec gradio_api_url(
  String.t(),
  keyword()
) :: String.t()
```

Returns the URL for a Gradio Space's API endpoint.

## Example
    api_url = HuggingfaceClient.Libraries.gradio_api_url("stabilityai/stable-diffusion")
    # "https://stabilityai-stable-diffusion.hf.space/run/predict"

# `optimum_config`

```elixir
@spec optimum_config(keyword()) :: map()
```

Returns an Optimum export/optimization configuration.

Optimum is HuggingFace's toolkit for optimizing models for specific hardware.

## Options
- `:model_id` — HF model ID (required)
- `:backend` — `"onnx"`, `"openvino"`, `"tflite"`, `"coreml"`, `"neuronx"` (default: `"onnx"`)
- `:task` — task type for export (e.g. `"text-classification"`)
- `:fp16` — export in FP16 (default: `false`)
- `:optimize_for` — `"performance"`, `"size"`, `"latency"` (default: `"performance"`)

## Example
    config = HuggingfaceClient.Libraries.optimum_config(
      model_id: "bert-base-uncased",
      backend: "onnx",
      task: "text-classification",
      fp16: true,
      optimize_for: "latency"
    )

# `qlora_config`

```elixir
@spec qlora_config(keyword()) :: map()
```

Returns a QLoRA configuration combining 4-bit bitsandbytes + LoRA.

QLoRA is the most memory-efficient fine-tuning approach for large models.
Reduces GPU memory by ~75% compared to full fine-tuning.

## Options
- `:base_model` — model to fine-tune (required)
- `:rank` — LoRA rank (default: 16)
- `:alpha` — LoRA alpha (default: 32)
- `:quant_type` — `"nf4"` or `"fp4"` (default: `"nf4"`)
- `:compute_dtype` — `"bfloat16"` or `"float16"` (default: `"bfloat16"`)

## Example
    # Fine-tune a 70B model on a single A100
    config = HuggingfaceClient.Libraries.qlora_config(
      base_model: "meta-llama/Llama-3.1-70B-Instruct",
      rank: 64, alpha: 128
    )
    {:ok, job} = HuggingfaceClient.run_job(
      image: "huggingface/trl-latest-gpu:latest",
      command: ["python", "sft.py"] ++ HuggingfaceClient.training_to_args(config),
      flavor: "a100-large",
      access_token: token
    )

# `reranker_config`

```elixir
@spec reranker_config(keyword()) :: map()
```

Returns configuration for a reranking model.

Reranking improves RAG pipelines by scoring document relevance.
Popular models: `"cross-encoder/ms-marco-MiniLM-L-6-v2"`, `"BAAI/bge-reranker-large"`.

## Options
- `:model_id` — cross-encoder model ID (required)
- `:max_length` — max input length (default: 512)
- `:batch_size` — scoring batch size (default: 32)

## Example
    config = HuggingfaceClient.Libraries.reranker_config(
      model_id: "BAAI/bge-reranker-large",
      max_length: 1024
    )

    # Use with TEI rerank endpoint
    {:ok, results} = HuggingfaceClient.tei_rerank(tei,
      query: "What is deep learning?",
      texts: ["Deep learning is...", "Python is..."]
    )

# `safetensors_metadata`

```elixir
@spec safetensors_metadata(String.t(), String.t(), keyword()) ::
  {:ok, map()} | {:error, Exception.t()}
```

Returns metadata about a Safetensors file from a Hub repository.

Safetensors is a safe, fast format for storing model weights.

## Example
    {:ok, meta} = HuggingfaceClient.Libraries.safetensors_metadata(
      "gpt2", "model.safetensors"
    )
    IO.puts("Tensors: #{map_size(meta["tensors"])}")

# `sentence_transformers_config`

```elixir
@spec sentence_transformers_config(keyword()) :: map()
```

Returns configuration for a Sentence Transformers embedding model.

Sentence Transformers provides state-of-the-art embeddings for semantic search,
RAG, and document similarity.

## Options
- `:model_id` — embedding model ID (required)
  Examples: `"sentence-transformers/all-MiniLM-L6-v2"`,
            `"BAAI/bge-large-en-v1.5"`, `"intfloat/e5-large-v2"`
- `:normalize` — normalize embeddings (default: `true`)
- `:prompt` — prompt prefix for asymmetric models
- `:batch_size` — encoding batch size (default: 32)
- `:max_seq_length` — max token length (default: 512)

## Example
    config = HuggingfaceClient.Libraries.sentence_transformers_config(
      model_id: "BAAI/bge-large-en-v1.5",
      normalize: true,
      prompt: "Represent this sentence for searching relevant passages: "
    )

    # Use with TEI client for production serving
    tei = HuggingfaceClient.tei("http://localhost:8080")
    {:ok, embedding} = HuggingfaceClient.tei_embed(tei,
      "What is machine learning?",
      prompt_name: "query"
    )

# `tokenizer_config`

```elixir
@spec tokenizer_config(keyword()) :: map()
```

Returns configuration for tokenizer settings.

## Options
- `:model_id` — HF model ID (required)
- `:max_length` — max token length (default: 512)
- `:padding` — `"max_length"`, `"longest"`, `"do_not_pad"` (default: `"longest"`)
- `:truncation` — truncation strategy (default: `true`)
- `:add_special_tokens` — add BOS/EOS tokens (default: `true`)
- `:return_tensors` — `"pt"`, `"tf"`, `"np"` (default: `"pt"`)

## Example
    tokenizer_cfg = HuggingfaceClient.Libraries.tokenizer_config(
      model_id: "bert-base-uncased",
      max_length: 512,
      padding: "max_length",
      truncation: true
    )

# `transformers_config`

```elixir
@spec transformers_config(keyword()) :: map()
```

Returns configuration for loading a Transformers model.

Generates a `from_pretrained`-compatible config dict.

## Options
- `:model_id` — HF model ID (required)
- `:revision` — branch/commit (default: `"main"`)
- `:dtype` — `"float16"`, `"bfloat16"`, `"float32"`, `"auto"` (default: `"auto"`)
- `:device_map` — `"auto"`, `"cuda"`, `"cpu"`, or explicit device map
- `:trust_remote_code` — allow custom model code (default: `false`)
- `:load_in_4bit` / `:load_in_8bit` — quantize with bitsandbytes
- `:attn_implementation` — `"flash_attention_2"`, `"sdpa"`, `"eager"`
- `:use_cache` — enable KV cache (default: `true`)

## Example
    config = HuggingfaceClient.Libraries.transformers_config(
      model_id: "meta-llama/Llama-3.1-8B-Instruct",
      dtype: "bfloat16",
      device_map: "auto",
      attn_implementation: "flash_attention_2"
    )

# `trl_config`

```elixir
@spec trl_config(keyword()) :: map()
```

Returns a TRL (Transformer Reinforcement Learning) configuration.

TRL supports SFT, DPO, ORPO, GRPO, PPO, and reward modeling.

## Options
- `:trainer` — `"sft"`, `"dpo"`, `"orpo"`, `"grpo"`, `"ppo"`, `"reward"` (required)
- `:base_model` — model ID to train (required)
- `:dataset` — dataset ID
- `:max_seq_length` — max sequence length (default: 2048)
- `:packing` — pack sequences for efficiency (default: `true` for SFT)
- `:use_peft` — use PEFT/LoRA (default: `false`)
- `:lora_r` — LoRA rank (if use_peft is true)
- `:learning_rate` — learning rate

## Example

    # SFT training config
    config = HuggingfaceClient.Libraries.trl_config(
      trainer: "sft",
      base_model: "Qwen/Qwen2.5-7B",
      dataset: "my-org/chat-dataset",
      max_seq_length: 2048,
      use_peft: true,
      lora_r: 16
    )

    # DPO alignment
    config = HuggingfaceClient.Libraries.trl_config(
      trainer: "dpo",
      base_model: "my-org/sft-model",
      dataset: "my-org/preference-data",
      beta: 0.1
    )

---

*Consult [api-reference.md](api-reference.md) for complete listing*
