ElixirDatasets.DatasetInfo (ElixirDatasets v0.1.0)
View SourceRepresents dataset information from the HuggingFace API.
This struct encapsulates the metadata about a dataset configuration, including features and split information.
Summary
Types
Feature information in a dataset. Contains the feature name and its data type.
Split information for a dataset configuration. Contains the split name and the number of examples.
DatasetInfo struct represents metadata about a dataset configuration.
Functions
Reads DatasetInfo from a directory JSON file.
Creates a DatasetInfo struct from a map (typically from JSON response).
Converts a DatasetInfo struct to a map.
Writes DatasetInfo to a directory as a JSON file.
Types
@type feature() :: map()
Feature information in a dataset. Contains the feature name and its data type.
@type split() :: map()
Split information for a dataset configuration. Contains the split name and the number of examples.
@type t() :: %ElixirDatasets.DatasetInfo{ citation: String.t() | nil, config_name: String.t(), description: String.t() | nil, features: [map()] | nil, homepage: String.t() | nil, license: String.t() | nil, splits: [map()] | nil }
DatasetInfo struct represents metadata about a dataset configuration.
Fields:
config_name- The configuration name for the dataset (e.g., "csv", "default")features- List of feature definitions in the datasetsplits- List of data splits (train, test, validation, etc.) with example countsdescription- Optional description of the dataset configurationhomepage- Optional homepage URL for the datasetlicense- Optional license informationcitation- Optional citation information
Functions
Reads DatasetInfo from a directory JSON file.
Reads the 'dataset_info.json' file from the specified directory and returns a DatasetInfo struct.
Parameters
directory- The directory path containing the 'dataset_info.json' file
Returns
{:ok, dataset_info}- Success with the parsed DatasetInfo struct{:error, reason}- If the file doesn't exist or parsing fails
Examples
iex> dataset_info = %ElixirDatasets.DatasetInfo{
...> config_name: "csv",
...> features: [%{"name" => "id", "dtype" => "int64"}],
...> splits: [%{"name" => "train", "num_examples" => 10}]
...> }
iex> ElixirDatasets.DatasetInfo.write_to_directory(dataset_info, "/tmp/my_dataset")
{:ok, "/tmp/my_dataset/dataset_info.json"}
iex> ElixirDatasets.DatasetInfo.from_directory("/tmp/my_dataset")
{:ok, %ElixirDatasets.DatasetInfo{
config_name: "csv",
features: [%{"name" => "id", "dtype" => "int64"}],
splits: [%{"name" => "train", "num_examples" => 10}],
description: nil,
homepage: nil,
license: nil,
citation: nil
}}
Creates a DatasetInfo struct from a map (typically from JSON response).
Examples
iex> map = %{
...> "config_name" => "csv",
...> "features" => [%{"name" => "id", "dtype" => "int64"}],
...> "splits" => [%{"name" => "train", "num_examples" => 10}]
...> }
iex> ElixirDatasets.DatasetInfo.from_map(map)
%ElixirDatasets.DatasetInfo{
config_name: "csv",
features: [%{"name" => "id", "dtype" => "int64"}],
splits: [%{"name" => "train", "num_examples" => 10}],
description: nil,
homepage: nil,
license: nil,
citation: nil
}
Converts a DatasetInfo struct to a map.
Writes DatasetInfo to a directory as a JSON file.
Creates a directory if it doesn't exist and saves the dataset information as 'dataset_info.json' in that directory.
Parameters
dataset_info- The DatasetInfo struct to writedirectory- The directory path where the file will be saved
Returns
{:ok, filepath}- Success with the path to the saved file{:error, reason}- If directory creation or file writing fails
Examples
iex> dataset_info = %ElixirDatasets.DatasetInfo{
...> config_name: "csv",
...> features: [%{"name" => "id", "dtype" => "int64"}],
...> splits: [%{"name" => "train", "num_examples" => 10}]
...> }
iex> ElixirDatasets.DatasetInfo.write_to_directory(dataset_info, "/tmp/my_dataset")
{:ok, "/tmp/my_dataset/dataset_info.json"}