ElixirDatasets.Repository (ElixirDatasets v0.1.0)
View SourceFunctions for managing dataset repositories (local and Hugging Face).
Summary
Types
A location to fetch dataset files from. Can be either a Hugging Face repository or a local resource
Functions
Downloads a file from a repository.
Gets the list of files in a repository.
Normalizes repository specification to a consistent format.
Converts a repository ID to a cache scope string.
Types
A location to fetch dataset files from. Can be either a Hugging Face repository or a local resource:
{:hf, repository_id}- the Hugging Face repository ID{:hf, repository_id, options}- the Hugging Face repository ID with additional options{:local, path}- a local directory or file path containing the datasets
Functions
@spec download(t_repository(), String.t(), String.t() | nil) :: {:ok, String.t()} | {:error, String.t()}
Downloads a file from a repository.
For local repositories, verifies the file exists. For Hugging Face repositories, downloads the file using the Hub API.
Returns
{:ok, path} where path is the local file path,
or {:error, reason} if the download fails.
@spec get_files(t_repository()) :: {:ok, map()} | {:error, String.t()}
Gets the list of files in a repository.
For local repositories, lists files in the directory. For Hugging Face repositories, fetches the file listing from the API.
Returns
{:ok, repo_files} where repo_files is a map of %{filename => etag},
or {:error, reason} if the operation fails.
@spec normalize!(t_repository()) :: t_repository()
Normalizes repository specification to a consistent format.
Examples
iex> ElixirDatasets.Repository.normalize!({:hf, "repo/name"})
{:hf, "repo/name", []}
iex> ElixirDatasets.Repository.normalize!({:local, "/path/to/data"})
{:local, "/path/to/data"}
Converts a repository ID to a cache scope string.
Replaces slashes with double dashes and removes non-word characters.
Examples
iex> ElixirDatasets.Repository.repository_id_to_cache_scope("user/repo-name")
"user--repo-name"