ElixirDatasets.Info (ElixirDatasets v0.1.0)
View SourceFunctions for fetching and parsing dataset metadata from Hugging Face Hub.
Summary
Functions
Gets the configuration names available for a dataset.
Fetches dataset information from the Hugging Face API.
Fetches dataset information from the Hugging Face API and returns a list of DatasetInfo structs.
Gets the split names (e.g., 'train', 'test', 'validation') for a dataset.
Parses raw dataset info map into a list of DatasetInfo structs.
Functions
@spec get_dataset_config_names( String.t(), keyword() ) :: {:ok, [String.t()]} | {:error, String.t()}
Gets the configuration names available for a dataset.
Parameters
repository_id- the Hugging Face dataset repository ID (e.g., "glue")opts- optional keyword list with the following options::auth_token- the token to use as HTTP bearer authorization
Returns
Returns {:ok, config_names} where config_names is a list of configuration names,
or {:error, reason} if the request fails.
Examples
iex> {:ok, configs} = ElixirDatasets.Info.get_dataset_config_names("glue")
iex> Enum.member?(configs, "cola")
true
Fetches dataset information from the Hugging Face API.
Parameters
repository_id- the Hugging Face dataset repository ID (e.g., "aaaaa32r/elixirDatasets")opts- optional keyword list with the following options::auth_token- the token to use as HTTP bearer authorization
Returns
Returns {:ok, dataset_info} where dataset_info is a map containing the dataset metadata,
or {:error, reason} if the request fails.
@spec get_dataset_infos( String.t(), keyword() ) :: {:ok, [ElixirDatasets.DatasetInfo.t()]} | {:error, String.t()}
Fetches dataset information from the Hugging Face API and returns a list of DatasetInfo structs.
This function retrieves all available dataset configurations for a given repository.
Parameters
repository_id- the Hugging Face dataset repository ID (e.g., "aaaaa32r/elixirDatasets")opts- optional keyword list with the following options::auth_token- the token to use as HTTP bearer authorization
Returns
Returns {:ok, dataset_infos} where dataset_infos is a list of DatasetInfo structs,
or {:error, reason} if the request fails.
Examples
iex> {:ok, infos} = ElixirDatasets.Info.get_dataset_infos("aaaaa32r/elixirDatasets")
iex> Enum.map(infos, & &1.config_name)
["csv", "default"]
@spec get_dataset_split_names( String.t(), keyword() ) :: {:ok, [String.t()]} | {:error, String.t()}
Gets the split names (e.g., 'train', 'test', 'validation') for a dataset.
Parameters
repository_id- the Hugging Face dataset repository ID (e.g., "cornell-movie-review-data/rotten_tomatoes")opts- optional keyword list with the following options::auth_token- the token to use as HTTP bearer authorization
Returns
Returns {:ok, split_names} where split_names is a list of strings representing
the available splits, or {:error, reason} if the request fails.
Examples
iex> {:ok, splits} = ElixirDatasets.Info.get_dataset_split_names("cornell-movie-review-data/rotten_tomatoes")
iex> splits
["train", "validation", "test"]
@spec parse_dataset_infos(map()) :: [ElixirDatasets.DatasetInfo.t()]
Parses raw dataset info map into a list of DatasetInfo structs.
Extracts the dataset_info array from the HuggingFace API response's cardData field and converts each entry into a DatasetInfo struct.
Parameters
data- the raw response map from the HuggingFace API
Returns
A list of DatasetInfo structs.