ElixirDatasets.Utils.Uploader (ElixirDatasets v0.1.0)
View SourceUtility functions for uploading datasets to huggingface.co .
Summary
Functions
Deletes a file from a specified Hugging Face dataset repository.
Uploads a dataset to a specified Hugging Face repository.
Uploads a large file to Hugging Face using Git LFS.
Functions
@spec delete_file_from_dataset(String.t(), String.t(), keyword()) :: {:ok, String.t()} | {:error, String.t() | Exception.t()}
Deletes a file from a specified Hugging Face dataset repository.
Parameters
repository: The Hugging Face repository path (e.g., "username/dataset-name")filename: The path of the file to delete in the repositoryoptions: A keyword list with optional parameters::commit_message: Custom commit message (default: "Delete file from ElixirDatasets"):description: Optional description for the commit
Examples
iex> delete_file_from_dataset("username/dataset", "old_file.csv", [])
{:ok, response_body}
iex> delete_file_from_dataset("username/dataset", "data.csv",
...> commit_message: "Removing outdated data",
...> description: "Data no longer needed"
...> )
{:ok, response_body}
@spec upload_dataset(Explorer.DataFrame.t(), String.t(), keyword()) :: {:ok, String.t()} | {:error, Exception.t()}
Uploads a dataset to a specified Hugging Face repository.
@spec upload_file_via_lfs(String.t(), String.t(), keyword()) :: {:ok, String.t()} | {:error, String.t()}
Uploads a large file to Hugging Face using Git LFS.
This function handles the complete LFS upload workflow:
- Calculates SHA256 hash and file size
- Initiates LFS batch request
- Uploads file to S3
- Verifies upload
- Creates commit with LFS reference
Parameters
file_path: The local path to the file to uploadrepository: The Hugging Face repository path (e.g., "username/dataset-name")options: A keyword list with optional parameters::commit_message: Custom commit message (default: "Upload file via LFS from ElixirDatasets"):description: Optional description for the commit:repo_filename: The path in the repository (default: basename of file_path)
Examples
iex> upload_file_via_lfs("/path/to/large_file.csv", "username/dataset", [])
{:ok, response_body}
iex> upload_file_via_lfs("/path/to/data.parquet", "username/dataset",
...> commit_message: "Upload large dataset",
...> repo_filename: "datasets/v1/data.parquet"
...> )
{:ok, response_body}