FormatParser.Data (format_parser v2.14.0)

Copy Markdown

A Data struct and functions for parsing data file formats.

The Data struct contains the fields format, nature, and intrinsics.

Supported Formats

FormatExtensionDescription
:pqt.parquetApache Parquet columnar format
:sqlite3.db, .sqliteSQLite 3 database
:duckdb.duckdbDuckDB database
:arrow.arrowApache Arrow IPC file format
:feather.featherFeather V1 format

Examples

iex> {:ok, file} = File.read("data.parquet")
iex> result = FormatParser.Data.parse(file)
%FormatParser.Data{format: :pqt, nature: :data, intrinsics: %{}}

Summary

Types

t()

A struct representing a parsed data file.

Functions

Parses binary data to detect data file formats.

Types

t()

@type t() :: %FormatParser.Data{
  format: atom() | nil,
  intrinsics: map(),
  nature: :data
}

A struct representing a parsed data file.

Fields

  • :format - The detected data format (e.g., :pqt, :sqlite3, :duckdb, :arrow, :feather)
  • :nature - Always :data for data files
  • :intrinsics - A map containing format-specific metadata

Functions

parse(file)

@spec parse({:error, binary()} | binary() | any()) :: any()

Parses binary data to detect data file formats.

This function attempts to identify data formats by examining magic bytes at the beginning of the binary content.

Arguments

  • input - Can be one of:
    • {:error, binary} - A tuple containing binary file content (used in parser chain)
    • binary - Raw binary file content
    • any - Any other value is returned as-is (pass-through for parser chain)

Returns

  • %FormatParser.Data{} - When a supported data format is detected
  • {:error, binary} - When the format is not recognized (for parser chain)
  • The input unchanged - When input is neither a binary nor an error tuple

Examples

iex> {:ok, file} = File.read("priv/test.parquet")
iex> FormatParser.Data.parse(file)
%FormatParser.Data{format: :pqt, nature: :data, intrinsics: %{}}

iex> FormatParser.Data.parse({:error, <<80, 65, 82, 49, 0>>})
%FormatParser.Data{format: :pqt, nature: :data, intrinsics: %{}}

iex> FormatParser.Data.parse(%FormatParser.Image{})
%FormatParser.Image{}