ElixirDatasets.Utils.Uploader (ElixirDatasets v0.1.0)

View Source

Utility functions for uploading datasets to huggingface.co .

Summary

Functions

Deletes a file from a specified Hugging Face dataset repository.

Uploads a dataset to a specified Hugging Face repository.

Uploads a large file to Hugging Face using Git LFS.

Functions

delete_file_from_dataset(repository, filename, options \\ [])

@spec delete_file_from_dataset(String.t(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, String.t() | Exception.t()}

Deletes a file from a specified Hugging Face dataset repository.

Parameters

  • repository: The Hugging Face repository path (e.g., "username/dataset-name")
  • filename: The path of the file to delete in the repository
  • options: A keyword list with optional parameters:
    • :commit_message: Custom commit message (default: "Delete file from ElixirDatasets")
    • :description: Optional description for the commit

Examples

iex> delete_file_from_dataset("username/dataset", "old_file.csv", [])
{:ok, response_body}

iex> delete_file_from_dataset("username/dataset", "data.csv",
...>   commit_message: "Removing outdated data",
...>   description: "Data no longer needed"
...> )
{:ok, response_body}

upload_dataset(df, repository, options)

@spec upload_dataset(Explorer.DataFrame.t(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, Exception.t()}

Uploads a dataset to a specified Hugging Face repository.

upload_file_via_lfs(file_path, repository, options \\ [])

@spec upload_file_via_lfs(String.t(), String.t(), keyword()) ::
  {:ok, String.t()} | {:error, String.t()}

Uploads a large file to Hugging Face using Git LFS.

This function handles the complete LFS upload workflow:

  1. Calculates SHA256 hash and file size
  2. Initiates LFS batch request
  3. Uploads file to S3
  4. Verifies upload
  5. Creates commit with LFS reference

Parameters

  • file_path: The local path to the file to upload
  • repository: The Hugging Face repository path (e.g., "username/dataset-name")
  • options: A keyword list with optional parameters:
    • :commit_message: Custom commit message (default: "Upload file via LFS from ElixirDatasets")
    • :description: Optional description for the commit
    • :repo_filename: The path in the repository (default: basename of file_path)

Examples

iex> upload_file_via_lfs("/path/to/large_file.csv", "username/dataset", [])
{:ok, response_body}

iex> upload_file_via_lfs("/path/to/data.parquet", "username/dataset",
...>   commit_message: "Upload large dataset",
...>   repo_filename: "datasets/v1/data.parquet"
...> )
{:ok, response_body}