ElixirDatasets.DatasetInfo (ElixirDatasets v0.1.0)

View Source

Represents dataset information from the HuggingFace API.

This struct encapsulates the metadata about a dataset configuration, including features and split information.

Summary

Types

Feature information in a dataset. Contains the feature name and its data type.

Split information for a dataset configuration. Contains the split name and the number of examples.

t()

DatasetInfo struct represents metadata about a dataset configuration.

Functions

Reads DatasetInfo from a directory JSON file.

Creates a DatasetInfo struct from a map (typically from JSON response).

Converts a DatasetInfo struct to a map.

Writes DatasetInfo to a directory as a JSON file.

Types

feature()

@type feature() :: map()

Feature information in a dataset. Contains the feature name and its data type.

split()

@type split() :: map()

Split information for a dataset configuration. Contains the split name and the number of examples.

t()

@type t() :: %ElixirDatasets.DatasetInfo{
  citation: String.t() | nil,
  config_name: String.t(),
  description: String.t() | nil,
  features: [map()] | nil,
  homepage: String.t() | nil,
  license: String.t() | nil,
  splits: [map()] | nil
}

DatasetInfo struct represents metadata about a dataset configuration.

Fields:

  • config_name - The configuration name for the dataset (e.g., "csv", "default")
  • features - List of feature definitions in the dataset
  • splits - List of data splits (train, test, validation, etc.) with example counts
  • description - Optional description of the dataset configuration
  • homepage - Optional homepage URL for the dataset
  • license - Optional license information
  • citation - Optional citation information

Functions

from_directory(directory, filename \\ "dataset_info.json")

Reads DatasetInfo from a directory JSON file.

Reads the 'dataset_info.json' file from the specified directory and returns a DatasetInfo struct.

Parameters

  • directory - The directory path containing the 'dataset_info.json' file

Returns

  • {:ok, dataset_info} - Success with the parsed DatasetInfo struct
  • {:error, reason} - If the file doesn't exist or parsing fails

Examples

iex> dataset_info = %ElixirDatasets.DatasetInfo{
...>   config_name: "csv",
...>   features: [%{"name" => "id", "dtype" => "int64"}],
...>   splits: [%{"name" => "train", "num_examples" => 10}]
...> }
iex> ElixirDatasets.DatasetInfo.write_to_directory(dataset_info, "/tmp/my_dataset")
{:ok, "/tmp/my_dataset/dataset_info.json"}
iex> ElixirDatasets.DatasetInfo.from_directory("/tmp/my_dataset")
{:ok, %ElixirDatasets.DatasetInfo{
  config_name: "csv",
  features: [%{"name" => "id", "dtype" => "int64"}],
  splits: [%{"name" => "train", "num_examples" => 10}],
  description: nil,
  homepage: nil,
  license: nil,
  citation: nil
}}

from_map(data)

@spec from_map(map()) :: t()

Creates a DatasetInfo struct from a map (typically from JSON response).

Examples

iex> map = %{
...>   "config_name" => "csv",
...>   "features" => [%{"name" => "id", "dtype" => "int64"}],
...>   "splits" => [%{"name" => "train", "num_examples" => 10}]
...> }
iex> ElixirDatasets.DatasetInfo.from_map(map)
%ElixirDatasets.DatasetInfo{
  config_name: "csv",
  features: [%{"name" => "id", "dtype" => "int64"}],
  splits: [%{"name" => "train", "num_examples" => 10}],
  description: nil,
  homepage: nil,
  license: nil,
  citation: nil
}

to_map(dataset_info)

@spec to_map(t()) :: map()

Converts a DatasetInfo struct to a map.

write_to_directory(dataset_info, directory)

@spec write_to_directory([t()], String.t()) :: {:ok, String.t()} | {:error, any()}

Writes DatasetInfo to a directory as a JSON file.

Creates a directory if it doesn't exist and saves the dataset information as 'dataset_info.json' in that directory.

Parameters

  • dataset_info - The DatasetInfo struct to write
  • directory - The directory path where the file will be saved

Returns

  • {:ok, filepath} - Success with the path to the saved file
  • {:error, reason} - If directory creation or file writing fails

Examples

iex> dataset_info = %ElixirDatasets.DatasetInfo{
...>   config_name: "csv",
...>   features: [%{"name" => "id", "dtype" => "int64"}],
...>   splits: [%{"name" => "train", "num_examples" => 10}]
...> }
iex> ElixirDatasets.DatasetInfo.write_to_directory(dataset_info, "/tmp/my_dataset")
{:ok, "/tmp/my_dataset/dataset_info.json"}