RustyCSV.Spreadsheet (RustyCSV v0.3.9)

Copy Markdown View Source

A spreadsheet-compatible parser using UTF-16 Little Endian encoding.

This module uses tab (\t) as the field separator and double-quote (") as the escape character. It handles UTF-16 LE encoding with BOM, which is the format commonly used by spreadsheet applications like Microsoft Excel.

This is a drop-in replacement for NimbleCSV.Spreadsheet.

Quick Start

alias RustyCSV.Spreadsheet

# Parse UTF-16 LE data (with BOM)
Spreadsheet.parse_string(utf16_data, skip_headers: false)
#=> [["name", "age"], ["john", "27"]]

# Dump to UTF-16 LE format (includes BOM)
Spreadsheet.dump_to_iodata([["name", "age"], ["john", "27"]])
|> IO.iodata_to_binary()

Configuration

This module was defined with:

RustyCSV.define(RustyCSV.Spreadsheet,
  separator: "\t",
  escape: "\"",
  encoding: {:utf16, :little},
  trim_bom: true,
  dump_bom: true
)

Summary

Functions

Converts an enumerable of rows to iodata in CSV format.

Lazily converts an enumerable of rows to a stream of iodata.

Returns the options used to define this CSV module.

Eagerly parses an enumerable of CSV data into a list of rows.

Lazily parses a stream of CSV data into a stream of rows.

Parses a CSV string into a list of rows.

Converts a stream of arbitrary binary chunks into a line-oriented stream.

Functions

dump_to_iodata(enumerable, opts \\ [])

@spec dump_to_iodata(Enumerable.t(), RustyCSV.dump_options()) :: iodata()

Converts an enumerable of rows to iodata in CSV format.

Returns a single flat binary (valid iodata/0). Unlike NimbleCSV, which returns an iodata list, RustyCSV writes all CSV bytes into one contiguous binary in the NIF for better performance and lower memory use.

Options

  • :strategy - Encoding strategy. By default, uses a single-threaded SIMD-accelerated encoder. Pass strategy: :parallel for multi-threaded encoding via rayon, which is faster for quoting-heavy data.

Examples

# Default encoder (best for most data)
RustyCSV.Spreadsheet.dump_to_iodata(rows)

# Parallel encoder (best for quoting-heavy data)
RustyCSV.Spreadsheet.dump_to_iodata(rows, strategy: :parallel)

dump_to_stream(enumerable)

@spec dump_to_stream(Enumerable.t()) :: Enumerable.t()

Lazily converts an enumerable of rows to a stream of iodata.

options()

@spec options() :: keyword()

Returns the options used to define this CSV module.

parse_enumerable(enumerable, opts \\ [])

@spec parse_enumerable(Enumerable.t(), RustyCSV.parse_options()) :: RustyCSV.rows()

Eagerly parses an enumerable of CSV data into a list of rows.

parse_stream(stream, opts \\ [])

@spec parse_stream(Enumerable.t(), RustyCSV.parse_options()) :: Enumerable.t()

Lazily parses a stream of CSV data into a stream of rows.

Options

  • :skip_headers - When true, skips the first row. Defaults to true.
  • :headers - Controls header handling. Defaults to false.
    • false - Return rows as lists (default behavior)
    • true - Use first row as string keys, return maps. :skip_headers is ignored.
    • [atom | string, ...] - Use explicit keys, return maps. First row skipped by default; pass skip_headers: false if no header row.

  • :chunk_size - Bytes per IO read. Defaults to 65536.
  • :batch_size - Rows per batch. Defaults to 1000.
  • :max_buffer_size - Maximum streaming buffer size in bytes. Defaults to 268_435_456 (256 MB). Raises if exceeded during parsing.

parse_string(string, opts \\ [])

@spec parse_string(binary(), RustyCSV.parse_options()) :: RustyCSV.rows() | [map()]

Parses a CSV string into a list of rows.

Options

  • :skip_headers - When true, skips the first row. Defaults to true.
  • :strategy - The parsing strategy. Defaults to :simd.
  • :headers - Controls header handling. Defaults to false.
    • false - Return rows as lists (default behavior)
    • true - Use first row as string keys, return maps. :skip_headers is ignored.
    • [atom | string, ...] - Use explicit keys, return maps. First row skipped by default; pass skip_headers: false if no header row.

Input is expected in {:utf16, :little} encoding and will be converted to UTF-8 for parsing.

to_line_stream(stream)

@spec to_line_stream(Enumerable.t()) :: Enumerable.t()

Converts a stream of arbitrary binary chunks into a line-oriented stream.