ElixirDatasets.Filter (ElixirDatasets v0.1.0)

View Source

Functions for filtering dataset files by configuration and split.

Summary

Functions

Filters repository files by configuration name and split.

Filters files by configuration name.

Filters files by split name.

Functions

by_config_and_split(repo_files, name, split)

@spec by_config_and_split(map(), String.t() | nil, String.t() | nil) :: {:ok, map()}

Filters repository files by configuration name and split.

Parameters

  • repo_files - map of files from repository (%{filename => etag})
  • name - optional configuration name to filter by
  • split - optional split name to filter by (e.g., "train", "test")

Returns

{:ok, filtered_files} where filtered_files is a map of matching files.

Examples

iex> files = %{"train.csv" => nil, "test.csv" => nil}
iex> ElixirDatasets.Filter.by_config_and_split(files, nil, "train")
{:ok, %{"train.csv" => nil}}

by_config_name(repo_files, config_name)

@spec by_config_name(map() | list(), String.t() | nil) :: map() | list()

Filters files by configuration name.

If config_name is nil, returns all files unchanged. Otherwise, returns only files whose path contains the config name.

Parameters

  • repo_files - map or list of files
  • config_name - optional configuration name to filter by

Returns

Filtered files in the same format as input (map or list).

by_split(repo_files, split)

@spec by_split(map() | list(), String.t() | nil) :: map() | list()

Filters files by split name.

If split is nil, returns all files unchanged. Otherwise, returns only files whose basename (without extension) contains the split name.

Parameters

  • repo_files - map or list of files
  • split - optional split name to filter by (e.g., "train", "test", "validation")

Returns

Filtered files in the same format as input (map or list).

Examples

iex> files = %{"train.csv" => nil, "test.csv" => nil, "validation.csv" => nil}
iex> ElixirDatasets.Filter.by_split(files, "train")
%{"train.csv" => nil}