ElixirDatasets.Streaming (ElixirDatasets v0.1.0)

View Source

Functions for streaming datasets progressively without loading everything into memory.

Summary

Functions

Builds a streaming dataset that yields rows progressively.

Builds URLs for streaming from repository files.

Functions

build(repository, filtered_files, opts)

@spec build(tuple(), map(), keyword()) :: Enumerable.t()

Builds a streaming dataset that yields rows progressively.

Parameters

  • repository - normalized repository tuple
  • filtered_files - map of files to stream from
  • opts - options including:
    • :batch_size - number of rows to fetch per batch (default: 1000)
    • :auth_token - authentication token for Hugging Face

Returns

A Stream that yields rows as maps.

build_urls(arg, filtered_files, load_opts)

@spec build_urls(tuple(), map(), keyword()) :: list()

Builds URLs for streaming from repository files.

For Hugging Face repositories, creates HTTP URLs. For local repositories, uses file paths.