HfHub
Elixir client for HuggingFace Hub — dataset/model metadata, file downloads, caching, and authentication. An Elixir port of Python's huggingface_hub.
hf_hub_ex provides a robust, production-ready interface to the HuggingFace Hub API, enabling Elixir applications to seamlessly access models, datasets, and spaces. This library is designed to be the foundational layer for porting Python HuggingFace libraries (datasets, evaluate, transformers) to the BEAM ecosystem.
Features
- Hub API Client — Fetch metadata for models, datasets, and spaces
- Bumblebee Compatible — Drop-in integration with Elixir ML pipelines via tuple-based repository API
- Repo Tree Listing — Recursive tree listing with pagination
- File Downloads — Stream files from HuggingFace repositories with resume support
- Archive Extraction — Optional extraction for zip/tar.gz/tgz/tar.xz/gz files
- Smart Caching — Local file caching with LRU eviction and ETag-based validation
- Filesystem Utilities — Manage local HuggingFace cache directory structure
- Authentication — Token-based authentication for private repositories
- Structured Errors — 30+ exception types matching Python's
huggingface_hub - BEAM-native — Leverages OTP, GenServers, and supervision trees for reliability
- Type-safe — Comprehensive typespecs and pattern matching
Installation
Add hf_hub to your dependencies in mix.exs:
def deps do
[
{:hf_hub, "~> 0.2.0"}
]
endThen run:
mix deps.get
Quick Start
Authentication
Set your HuggingFace token as an environment variable or in config:
export HF_TOKEN="hf_..."
Or in config/config.exs:
config :hf_hub,
token: System.get_env("HF_TOKEN"),
cache_dir: Path.expand("~/.cache/huggingface")Fetching Model Metadata
# Get model information
{:ok, model_info} = HfHub.Api.model_info("bert-base-uncased")
IO.inspect(model_info.id) # "bert-base-uncased"
IO.inspect(model_info.downloads) # 123456789
IO.inspect(model_info.tags) # ["pytorch", "bert", "fill-mask"]Downloading Files
# Download a model file
{:ok, path} = HfHub.Download.hf_hub_download(
repo_id: "bert-base-uncased",
filename: "config.json",
repo_type: :model
)
# Read the downloaded file
{:ok, config} = File.read(path)
# Download and extract an archive (returns extracted path)
{:ok, extracted_path} = HfHub.Download.hf_hub_download(
repo_id: "albertvillanova/tmp-tests-zip",
filename: "ds.zip",
repo_type: :dataset,
extract: true
)
# Download with progress tracking
{:ok, path} = HfHub.Download.hf_hub_download(
repo_id: "some/model",
filename: "model.bin",
progress_callback: fn downloaded, total ->
if total, do: IO.puts("#{round(downloaded / total * 100)}%")
end
)
# Download with SHA256 verification
{:ok, path} = HfHub.Download.hf_hub_download(
repo_id: "some/model",
filename: "model.bin",
verify_checksum: true,
expected_sha256: "abc123..." # Optional: fails if hash doesn't match
)Offline Mode
# Check if offline mode is enabled (via HF_HUB_OFFLINE=1 or config)
if HfHub.offline_mode?() do
IO.puts("Running in offline mode - only cached files available")
end
# Try to load a file from cache without network requests
case HfHub.try_to_load_from_cache("bert-base-uncased", "config.json") do
{:ok, path} ->
# File is cached, use it directly
File.read!(path)
{:error, :not_cached} ->
# File not cached, decide whether to download
{:ok, path} = HfHub.Download.hf_hub_download(
repo_id: "bert-base-uncased",
filename: "config.json"
)
File.read!(path)
endAccessing Datasets
# Get dataset information
{:ok, dataset_info} = HfHub.Api.dataset_info("squad")
# Download dataset files
{:ok, path} = HfHub.Download.hf_hub_download(
repo_id: "squad",
filename: "train-v1.1.json",
repo_type: :dataset
)
# Discover configs and splits
{:ok, configs} = HfHub.Api.dataset_configs("dpdl-benchmark/caltech101")
{:ok, splits} = HfHub.Api.dataset_splits("dpdl-benchmark/caltech101", config: "default")
# Resolve file paths for a config + split
{:ok, files} =
HfHub.DatasetFiles.resolve("dpdl-benchmark/caltech101", "default", "train")Bumblebee-Compatible API
Use the tuple-based repository API for seamless integration with Elixir ML pipelines:
# Repository reference types
repo = {:hf, "bert-base-uncased"}
repo_with_opts = {:hf, "bert-base-uncased", revision: "v1.0", auth_token: "hf_xxx"}
local_repo = {:local, "/path/to/model"}
# List files with ETags for cache validation
{:ok, files} = HfHub.get_repo_files({:hf, "bert-base-uncased"})
# => %{"config.json" => "\"abc123\"", "pytorch_model.bin" => "\"def456\"", ...}
# ETag-based cached download
{:ok, path} = HfHub.cached_download(
"https://huggingface.co/bert-base-uncased/resolve/main/config.json"
)
# Build file URLs
url = HfHub.file_url("bert-base-uncased", "config.json", "main")Repository Management
# Create a new repository
{:ok, url} = HfHub.Repo.create("my-org/my-model", private: true)
# Create a Space with Gradio
{:ok, url} = HfHub.Repo.create("my-space", repo_type: :space, space_sdk: "gradio")
# Delete a repository
:ok = HfHub.Repo.delete("my-org/old-model")
# Update settings
:ok = HfHub.Repo.update_settings("my-model", private: true, gated: :auto)
# Move/rename
{:ok, url} = HfHub.Repo.move("old-name", "new-org/new-name")
# Check existence
true = HfHub.Repo.exists?("bert-base-uncased")File Upload
# Upload a small file (< 10MB uses base64, >= 10MB uses LFS automatically)
{:ok, info} = HfHub.Commit.upload_file(
"/path/to/model.bin",
"model.bin",
"my-org/my-model",
token: token,
commit_message: "Add model weights"
)
# Upload from binary content
{:ok, info} = HfHub.Commit.upload_file(
Jason.encode!(%{hidden_size: 768}),
"config.json",
"my-model",
token: token
)
# Delete a file
{:ok, info} = HfHub.Commit.delete_file("old_model.bin", "my-model", token: token)
# Multiple operations in one commit
alias HfHub.Commit.Operation
{:ok, info} = HfHub.Commit.create("my-model", [
Operation.add("config.json", config_content),
Operation.add("model.bin", "/path/to/model.bin"),
Operation.delete("old_config.json")
], token: token, commit_message: "Update model")Folder Upload
# Upload entire folder
{:ok, info} = HfHub.Commit.upload_folder(
"/path/to/model_dir",
"my-org/my-model",
token: token,
commit_message: "Upload model"
)
# With pattern filtering
{:ok, info} = HfHub.Commit.upload_folder(
"/path/to/model_dir",
"my-model",
token: token,
ignore_patterns: ["*.pyc", "__pycache__/**"],
allow_patterns: ["*.safetensors", "*.json"]
)
# Large folder with automatic batching
{:ok, infos} = HfHub.Commit.upload_large_folder(
"/path/to/huge_model",
"my-model",
token: token,
multi_commits: true
)Git Operations
# Create a branch
{:ok, info} = HfHub.Git.create_branch("my-org/my-model", "feature-branch", token: token)
# Create branch from specific revision
{:ok, info} = HfHub.Git.create_branch("my-model", "hotfix", revision: "v1.0", token: token)
# Delete a branch
:ok = HfHub.Git.delete_branch("my-model", "old-branch", token: token)
# Create a tag
{:ok, info} = HfHub.Git.create_tag("my-model", "v1.0", token: token)
# Create annotated tag with message
{:ok, info} = HfHub.Git.create_tag("my-model", "v2.0",
revision: "abc123",
message: "Release v2.0",
token: token
)
# List all refs (branches, tags)
{:ok, refs} = HfHub.Git.list_refs("bert-base-uncased")
refs.branches # [%BranchInfo{name: "main", ...}]
refs.tags # [%TagInfo{name: "v1.0", ...}]
# List commits
{:ok, commits} = HfHub.Git.list_commits("bert-base-uncased", revision: "main")
# Super squash (destructive - squashes all commits)
:ok = HfHub.Git.super_squash("my-model", message: "Squashed history", token: token)User & Organization Profiles
# Get user profile
{:ok, user} = HfHub.Users.get("username")
IO.inspect(user.num_followers)
# List followers/following
{:ok, followers} = HfHub.Users.list_followers("username")
{:ok, following} = HfHub.Users.list_following("username")
# Like/unlike repos
:ok = HfHub.Users.like("bert-base-uncased")
:ok = HfHub.Users.unlike("bert-base-uncased")
# Organization info
{:ok, org} = HfHub.Organizations.get("huggingface")
{:ok, members} = HfHub.Organizations.list_members("huggingface")Model & Dataset Cards
# Load and parse cards
{:ok, card} = HfHub.Cards.load_model_card("bert-base-uncased")
card.data.license # "apache-2.0"
card.data.tags # ["pytorch", "bert", "fill-mask"]
{:ok, card} = HfHub.Cards.load_dataset_card("squad")
card.data.task_categories # ["question-answering"]
# Parse from content
{:ok, card} = HfHub.Cards.parse_model_card(readme_content)
# Create and render cards
card = HfHub.Cards.create_model_card(%{
language: "en",
license: "mit",
tags: ["text-classification"]
})
markdown = HfHub.Cards.render(card)Cache Management
# Check if a file is cached
cached? = HfHub.Cache.cached?(
repo_id: "bert-base-uncased",
filename: "pytorch_model.bin"
)
# Clear cache for a specific repo
:ok = HfHub.Cache.clear_cache(repo_id: "bert-base-uncased")
# Get cache statistics
{:ok, stats} = HfHub.Cache.cache_stats()
IO.inspect(stats.total_size) # Total bytes in cache
IO.inspect(stats.file_count) # Number of cached filesExamples
The examples/ directory contains runnable scripts demonstrating common use cases:
# Run all examples at once
./examples/run_all.sh
# Or run individual examples:
mix run examples/list_datasets.exs # List top datasets
mix run examples/list_models.exs # List popular models
mix run examples/dataset_info.exs # Get dataset metadata
mix run examples/list_repo_tree.exs # List repo tree entries
mix run examples/dataset_configs_splits.exs # Dataset configs + splits
mix run examples/dataset_files_resolver.exs # Resolve dataset files by config + split
mix run examples/download_file.exs # Download a single file
mix run examples/download_with_extract.exs # Download + extract archives
mix run examples/cache_demo.exs # Cache management demo
mix run examples/stream_download.exs # Stream large files
mix run examples/snapshot_download.exs # Download entire repo
mix run examples/auth_demo.exs # Authentication flow
See the examples README for detailed documentation.
API Overview
HfHub.Api
Interact with the HuggingFace Hub API:
model_info/2— Fetch model metadatadataset_info/2— Fetch dataset metadataspace_info/2— Fetch space metadatalist_models/1— List models with filterslist_datasets/1— List datasets with filterslist_repo_tree/2— List repo tree entries (files + folders)list_files/2— List files in a repositorydataset_configs/2— Get dataset configuration/subset namesdataset_splits/2— Get dataset split names for a config
HfHub.Download
Download files from HuggingFace repositories:
hf_hub_download/1— Download a single file (with caching, optional extraction)snapshot_download/1— Download entire repository snapshotdownload_stream/1— Stream download for large filesresume_download/1— Resume interrupted downloads
HfHub.DatasetFiles
Resolve dataset files by config and split:
resolve/4— Resolve file paths by config + splitresolve_from_tree/3— Resolve file paths from a repo tree
HfHub.Cache
Manage local file cache:
cached?/1— Check if file exists in cachecache_path/1— Get local path for cached fileclear_cache/1— Remove cached filescache_stats/0— Get cache usage statisticsevict_lru/1— Evict least recently used filesvalidate_integrity/0— Validate checksums of cached files
HfHub.FS
Filesystem utilities for HuggingFace cache:
ensure_cache_dir/0— Create cache directory structurerepo_path/2— Get local path for repositoryfile_path/4— Get local path for file in repositorylock_file/2— Acquire file lock for concurrent downloadsunlock_file/1— Release a file lockcache_dir/0— Get configured cache directory
HfHub.Config
Configuration utilities:
endpoint/0— Get HuggingFace Hub endpoint URLcache_dir/0— Get cache directory pathhttp_opts/0— Get HTTP client optionscache_opts/0— Get cache options
HfHub.Auth
Authentication and authorization:
get_token/0— Retrieve HuggingFace tokenset_token/1— Set authentication tokenlogin/1— Interactive login flowlogout/0— Remove stored credentialswhoami/0— Get current user informationvalidate_token/1— Validate token formatauth_headers/1— Build HTTP authorization headers
HfHub.Hub
Bumblebee-compatible ETag-based caching:
cached_download/2— Download with ETag-based cache validationfile_url/3— Build file URL for repositoryfile_listing_url/3— Build tree listing URL
HfHub.Repository
Repository reference types and helpers:
normalize!/1— Normalize repository tuplesfile_url/2— Build file URL from repository referencefile_listing_url/1— Build listing URL from repository referencecache_scope/1— Convert repo ID to cache scope string
HfHub.RepoFiles
Repository file listing with ETags:
get_repo_files/1— Get map of files to ETags for cache validation
HfHub.Constants
Constants matching Python's huggingface_hub.constants:
- File names:
config_name/0,pytorch_weights_name/0,safetensors_single_file/0 - Timeouts:
default_etag_timeout/0,default_download_timeout/0 - Repository types:
repo_types/0,repo_type_url_prefix/1
HfHub.Errors
Structured exceptions for error handling:
- Repository:
RepositoryNotFound,RevisionNotFound,EntryNotFound,GatedRepo - HTTP:
HTTPError,BadRequest,OfflineMode - Cache:
CacheNotFound,CorruptedCache,LocalEntryNotFound - Inference:
InferenceTimeout,InferenceEndpointError - Storage:
XetError,DDUFError,SafetensorsParsing
HfHub.LFS
LFS (Large File Storage) utilities:
UploadInfo.from_path/1— Create upload info from fileUploadInfo.from_binary/1— Create upload info from binarysha256_hex/1— Get hex-encoded SHA256 hashoid/1— Get LFS object identifierlfs_headers/0— Get standard LFS headers
HfHub.Commit
Commit operations for file uploads:
create/3— Create commit with multiple operationsupload_file/4— Upload single file (regular or LFS)upload_folder/3— Upload entire directory with pattern filteringupload_large_folder/3— Upload large directories with automatic batchingdelete_file/3— Delete file from repositorydelete_folder/3— Delete folder from repositorymatches_pattern?/2— Check if path matches gitignore-style patternneeds_lfs?/1— Check if file needs LFS uploadlfs_threshold/0— Get LFS size threshold (10MB)
HfHub.Git
Git operations for branch, tag, and commit management:
create_branch/3— Create a new branch from a revisiondelete_branch/3— Delete a branchcreate_tag/3— Create a tag (lightweight or annotated)delete_tag/3— Delete a taglist_refs/2— List all refs (branches, tags, converts, pull requests)list_commits/2— List commit history for a revisionsuper_squash/2— Squash all commits (destructive)
HfHub.Users
User profile and activity API:
get/2— Get user profile by usernamelist_followers/2— List users who follow a userlist_following/2— List users a user is followinglist_liked_repos/2— List repositories liked by a userlike/2,unlike/2— Like/unlike repositorieslist_likers/2— List users who liked a repository
HfHub.Organizations
Organization profile API:
get/2— Get organization profile by namelist_members/2— List organization members
HfHub.Cards
Model and Dataset card parsing and creation:
load_model_card/2— Load and parse model card from repositoryload_dataset_card/2— Load and parse dataset card from repositoryparse_model_card/1— Parse model card from markdown contentparse_dataset_card/1— Parse dataset card from markdown contentcreate_model_card/1— Create model card from datacreate_dataset_card/1— Create dataset card from datarender/1— Render card to markdown with YAML frontmatter
Configuration
Configure hf_hub in your config/config.exs:
config :hf_hub,
# Authentication token (defaults to HF_TOKEN env var)
token: System.get_env("HF_TOKEN"),
# Cache directory (defaults to ~/.cache/huggingface)
cache_dir: Path.expand("~/.cache/huggingface"),
# Hub endpoint (defaults to https://huggingface.co)
endpoint: "https://huggingface.co",
# HTTP client options
http_opts: [
receive_timeout: 30_000,
pool_timeout: 5_000
],
# Cache options
cache_opts: [
max_size: 10 * 1024 * 1024 * 1024, # 10 GB
eviction_policy: :lru
]Comparison to Python's huggingface_hub
hf_hub_ex aims for feature parity with the Python library while embracing Elixir idioms:
| Feature | Python huggingface_hub | Elixir hf_hub_ex |
|---|---|---|
| API Client | ✅ | ✅ |
| File Downloads | ✅ | ✅ |
| Caching | ✅ | ✅ (OTP-based) |
| Authentication | ✅ | ✅ |
| Repository Management | ✅ | ✅ |
| Upload Files | ✅ | ✅ |
| Inference API | ✅ | 🚧 (Planned) |
Key Differences
- Concurrency — Leverages OTP for parallel downloads and supervision
- Caching — GenServer-based cache with configurable eviction policies
- Error Handling — Pattern matching with
{:ok, result}/{:error, reason}tuples - Type Safety — Comprehensive typespecs and Dialyzer integration
Roadmap
- [x] Core API client (models, datasets, spaces)
- [x] File download with caching
- [x] Authentication support
- [x] Repository management (create, delete, update)
- [x] File uploads (single file, LFS support)
- [x] Folder uploads (with pattern filtering and batching)
- [ ] Inference API client
- [ ] WebSocket support for real-time inference
- [ ] Integration with
crucible_datasetsfor dataset loading
See docs/ROADMAP.md for detailed feature parity status with Python huggingface_hub.
Contributing
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-new-feature) - Write tests for your changes
- Ensure all tests pass (
mix test) - Run code quality checks (
mix format && mix credo && mix dialyzer) - Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/my-new-feature) - Create a Pull Request
Testing
# Run all tests
mix test
# Run with coverage
mix test --cover
# Run specific test file
mix test test/hf_hub/api_test.exs
License
MIT License - See LICENSE for details.
Acknowledgments
- Inspired by huggingface_hub (Python)
- Part of the North-Shore-AI research ecosystem
- Built with Req for HTTP client functionality
Links
- Hex Package: https://hex.pm/packages/hf_hub
- Documentation: https://hexdocs.pm/hf_hub
- GitHub: https://github.com/North-Shore-AI/hf_hub_ex
- Issues: https://github.com/North-Shore-AI/hf_hub_ex/issues
- HuggingFace Hub: https://huggingface.co
Built with ❤️ by the North-Shore-AI team