Anvil.Export (Anvil v0.1.1)
View SourceExport labeled data in various formats with deterministic lineage tracking.
This module provides two interfaces:
- New ADR-005 interface:
to_format/3with streaming, deterministic ordering, and manifests - Legacy interface:
export/2for backward compatibility
ADR-005 Interface
The new interface requires explicit schema version specification and produces:
- Deterministically ordered exports
- Export manifests with SHA256 hashes
- Streaming for large datasets
Examples
# New interface (recommended)
{:ok, result} = Anvil.Export.to_format(:csv, queue_id, %{
schema_version_id: schema_version_id,
output_path: "/tmp/export.csv"
})
# Legacy interface (for backward compatibility)
Anvil.Export.export(queue, format: :csv, path: "/tmp/export.csv")
Summary
Functions
Legacy export function for backward compatibility.
Exports labels to the specified format following ADR-005 specification.
Verifies export reproducibility by re-exporting and comparing hashes.
Types
@type export_result() :: %{ manifest: Anvil.Export.Manifest.t(), output_path: String.t() }
@type format() :: :csv | :jsonl
Functions
Legacy export function for backward compatibility.
This function is deprecated in favor of to_format/3 which provides
better reproducibility guarantees through deterministic ordering and
export manifests.
Options
:format- Export format (:csvor:jsonl):path- Output file path:filter- Filter function to select labels:include_metadata- Include labeling metadata (default: true)
Examples
iex> Anvil.Export.export(queue, format: :csv, path: "/tmp/labels.csv")
:ok
@spec to_format(format(), binary(), map()) :: {:ok, export_result()} | {:error, term()}
Exports labels to the specified format following ADR-005 specification.
This is the recommended interface for exports, providing:
- Deterministic ordering for reproducibility
- Export manifests with cryptographic hashes
- Streaming for memory safety
- Version pinning
Parameters
format- Export format (:csvor:jsonl)queue_id- UUID of the queue to exportopts- Export options (map)
Options
:schema_version_id- (required) UUID of the schema version:output_path- (required) File path for export:sample_version- (optional) Forge version tag:limit- (optional) Maximum number of rows:offset- (optional) Number of rows to skip:filter- (optional) Filter criteria
Returns
{:ok, %{manifest: manifest, output_path: path}}on success{:error, reason}on failure
Examples
iex> Anvil.Export.to_format(:csv, queue_id, %{
...> schema_version_id: schema_v2_id,
...> output_path: "/tmp/labels.csv"
...> })
{:ok, %{manifest: %Manifest{...}, output_path: "/tmp/labels.csv"}}
iex> Anvil.Export.to_format(:jsonl, queue_id, %{
...> schema_version_id: schema_v2_id,
...> output_path: "/tmp/labels.jsonl",
...> limit: 1000,
...> offset: 0
...> })
{:ok, %{manifest: %Manifest{...}, output_path: "/tmp/labels.jsonl"}}
@spec verify_reproducibility(Anvil.Export.Manifest.t()) :: {:ok, :reproducible} | {:error, term()}
Verifies export reproducibility by re-exporting and comparing hashes.
Examples
iex> manifest = Anvil.Export.Manifest.load("/tmp/export.csv.manifest.json")
iex> Anvil.Export.verify_reproducibility(manifest)
{:ok, :reproducible}