Anvil.Export (Anvil v0.1.1)

View Source

Export labeled data in various formats with deterministic lineage tracking.

This module provides two interfaces:

  1. New ADR-005 interface: to_format/3 with streaming, deterministic ordering, and manifests
  2. Legacy interface: export/2 for backward compatibility

ADR-005 Interface

The new interface requires explicit schema version specification and produces:

  • Deterministically ordered exports
  • Export manifests with SHA256 hashes
  • Streaming for large datasets

Examples

# New interface (recommended)
{:ok, result} = Anvil.Export.to_format(:csv, queue_id, %{
  schema_version_id: schema_version_id,
  output_path: "/tmp/export.csv"
})

# Legacy interface (for backward compatibility)
Anvil.Export.export(queue, format: :csv, path: "/tmp/export.csv")

Summary

Functions

Legacy export function for backward compatibility.

Exports labels to the specified format following ADR-005 specification.

Verifies export reproducibility by re-exporting and comparing hashes.

Types

export_result()

@type export_result() :: %{
  manifest: Anvil.Export.Manifest.t(),
  output_path: String.t()
}

format()

@type format() :: :csv | :jsonl

Functions

export(queue, opts)

@spec export(
  pid() | atom(),
  keyword()
) :: :ok | {:error, term()}

Legacy export function for backward compatibility.

This function is deprecated in favor of to_format/3 which provides better reproducibility guarantees through deterministic ordering and export manifests.

Options

  • :format - Export format (:csv or :jsonl)
  • :path - Output file path
  • :filter - Filter function to select labels
  • :include_metadata - Include labeling metadata (default: true)

Examples

iex> Anvil.Export.export(queue, format: :csv, path: "/tmp/labels.csv")
:ok

to_format(format, queue_id, opts)

@spec to_format(format(), binary(), map()) ::
  {:ok, export_result()} | {:error, term()}

Exports labels to the specified format following ADR-005 specification.

This is the recommended interface for exports, providing:

  • Deterministic ordering for reproducibility
  • Export manifests with cryptographic hashes
  • Streaming for memory safety
  • Version pinning

Parameters

  • format - Export format (:csv or :jsonl)
  • queue_id - UUID of the queue to export
  • opts - Export options (map)

Options

  • :schema_version_id - (required) UUID of the schema version
  • :output_path - (required) File path for export
  • :sample_version - (optional) Forge version tag
  • :limit - (optional) Maximum number of rows
  • :offset - (optional) Number of rows to skip
  • :filter - (optional) Filter criteria

Returns

  • {:ok, %{manifest: manifest, output_path: path}} on success
  • {:error, reason} on failure

Examples

iex> Anvil.Export.to_format(:csv, queue_id, %{
...>   schema_version_id: schema_v2_id,
...>   output_path: "/tmp/labels.csv"
...> })
{:ok, %{manifest: %Manifest{...}, output_path: "/tmp/labels.csv"}}

iex> Anvil.Export.to_format(:jsonl, queue_id, %{
...>   schema_version_id: schema_v2_id,
...>   output_path: "/tmp/labels.jsonl",
...>   limit: 1000,
...>   offset: 0
...> })
{:ok, %{manifest: %Manifest{...}, output_path: "/tmp/labels.jsonl"}}

verify_reproducibility(manifest)

@spec verify_reproducibility(Anvil.Export.Manifest.t()) ::
  {:ok, :reproducible} | {:error, term()}

Verifies export reproducibility by re-exporting and comparing hashes.

Examples

iex> manifest = Anvil.Export.Manifest.load("/tmp/export.csv.manifest.json")
iex> Anvil.Export.verify_reproducibility(manifest)
{:ok, :reproducible}