CrucibleIR.DatasetRef (CrucibleIR v0.2.1)

View Source

Reference to a dataset to be used in an experiment.

A DatasetRef points to a dataset from a specific provider (like crucible_datasets), with a specific split (like :train or :test), and optional configuration.

Fields

  • :provider - The dataset provider (default: :crucible_datasets)
  • :name - The dataset name (required)
  • :split - The dataset split to use (default: :train)
  • :options - Additional dataset-specific options
  • :version - Dataset version
  • :format - Data format (parquet, csv, jsonl, arrow)
  • :schema - Expected schema

Examples

iex> ref = %CrucibleIR.DatasetRef{name: :mmlu}
iex> ref.provider
:crucible_datasets

iex> ref = %CrucibleIR.DatasetRef{name: :mmlu, split: :test}
iex> ref.split
:test

iex> ref = %CrucibleIR.DatasetRef{name: :custom, provider: :huggingface, options: %{limit: 100}}
iex> ref.options
%{limit: 100}

Summary

Types

format()

@type format() :: :parquet | :csv | :jsonl | :arrow | atom()

provider()

@type provider() :: :crucible_datasets | :huggingface | atom()

split()

@type split() :: :train | :test | :validation | atom()

t()

@type t() :: %CrucibleIR.DatasetRef{
  format: format() | nil,
  name: atom(),
  options: map() | nil,
  provider: provider(),
  schema: map() | nil,
  split: split(),
  version: String.t() | nil
}