RustyCSV.Native (RustyCSV v0.3.9)

Low-level NIF bindings for CSV parsing and encoding.

This module provides direct access to the Rust NIF functions. For normal use, prefer the higher-level RustyCSV.RFC4180 or custom parsers defined with RustyCSV.define/2.

Separator Format

The _with_config functions accept the separator in three forms:

Integer — a single-byte separator: 44 (comma), 9 (tab)
Binary — a single separator, possibly multi-byte: <<44>> (comma), "::" (double colon)
List of binaries — multiple separators: [<<44>>, <<59>>] (comma or semicolon), [",", "::"] (comma or double colon)

Escape Format

The escape (quote character) accepts:

Integer — a single-byte escape: 34 (double quote)
Binary — possibly multi-byte: <<34>> (double quote), "$$" (dollar-dollar)

Strategies

The module exposes six parsing strategies:

parse_string/1 - Basic byte-by-byte parsing (Strategy A)
parse_string_fast/1 - SIMD-accelerated via memchr (Strategy B)
parse_string_indexed/1 - Two-phase index-then-extract (Strategy C)
parse_string_parallel/1 - Multi-threaded via rayon (Strategy E)
parse_string_zero_copy/1 - Sub-binary references (Strategy F)
streaming_* functions - Stateful streaming parser (Strategy D)

Strategy Selection

Strategy	Use Case	Memory Model
`parse_string_fast/1`	Default, most files	Copy (frees input)
`parse_string_parallel/1`	Large files 500MB+	Copy (frees input)
`parse_string_zero_copy/1`	Maximum speed	Sub-binary (keeps input)
`parse_string_indexed/1`	Row range extraction	Copy (frees input)
`streaming_*`	Unbounded files	Copy (per chunk)
`parse_string/1`	Debugging	Copy (frees input)

Scheduling

All parsing NIFs run on BEAM dirty CPU schedulers to avoid blocking normal schedulers. This includes all parse_string* functions, streaming_feed/2, streaming_next_rows/2, and streaming_finalize/1.

Parallel parsing runs on a dedicated named rustycsv-* rayon thread pool (capped at 8 threads) rather than the global rayon pool.

Concurrency

Streaming parser references (parser_ref/0) are safe to share across BEAM processes — the underlying Rust state is protected by a mutex. If a NIF panics while holding the lock, subsequent calls return :mutex_poisoned instead of crashing the VM.

Memory Tracking (Optional)

Memory tracking functions are available but require the memory_tracking Cargo feature to be enabled. Without the feature, they return 0 with no runtime overhead.

See get_rust_memory/0, get_rust_memory_peak/0, reset_rust_memory_stats/0.

Summary

Types

escape()

Quote/escape sequence. Accepts

parser_ref()

Opaque reference to a streaming parser

row()

A parsed row (list of field binaries)

rows()

Multiple parsed rows

separator()

Field separator(s). Accepts

Functions

encode_string(rows, separator, escape, line_separator, formula, encoding, reserved)

Encode rows to CSV using SIMD-accelerated scanning.

encode_string_parallel(rows, separator, escape, line_separator, formula, encoding, reserved)

Encode rows to CSV in parallel using rayon, returning iodata (list of binaries).

get_rust_memory()

Get current Rust heap allocation in bytes.

get_rust_memory_peak()

Get peak Rust heap allocation since last reset, in bytes.

parse_string(csv)

Parse CSV using basic byte-by-byte scanning. Runs on a dirty CPU scheduler.

parse_string_fast(csv)

Parse CSV using SIMD-accelerated delimiter scanning. Runs on a dirty CPU scheduler.

parse_string_fast_with_config(csv, separator, escape, newlines)

Parse CSV using SIMD with configurable separator(s) and escape.

parse_string_indexed(csv)

Parse CSV using two-phase index-then-extract approach. Runs on a dirty CPU scheduler.

parse_string_indexed_with_config(csv, separator, escape, newlines)

Parse CSV using two-phase approach with configurable separator(s) and escape.

parse_string_parallel(csv)

Parse CSV in parallel using multiple threads.

parse_string_parallel_with_config(csv, separator, escape, newlines)

Parse CSV in parallel with configurable separator(s) and escape.

parse_string_with_config(csv, separator, escape, newlines)

Parse CSV with configurable separator(s) and escape characters.

parse_string_zero_copy(csv)

Parse CSV using zero-copy sub-binary references where possible. Runs on a dirty CPU scheduler.

parse_string_zero_copy_with_config(csv, separator, escape, newlines)

Parse CSV using zero-copy with configurable separator(s) and escape.

parse_to_maps(input, separator, escape, newlines, strategy, header_mode, skip_first)

Parse CSV and return list of maps, dispatching to the specified strategy.

parse_to_maps_parallel(input, separator, escape, newlines, header_mode, skip_first)

Parse CSV in parallel and return list of maps.

reset_rust_memory_stats()

Reset memory tracking statistics.

streaming_feed(parser, chunk)

Feed a chunk of CSV data to the streaming parser. Runs on a dirty CPU scheduler.

streaming_finalize(parser)

Finalize the streaming parser and get any remaining rows. Runs on a dirty CPU scheduler.

streaming_new()

Create a new streaming parser instance.

streaming_new_with_config(separator, escape, newlines)

Create a new streaming parser with configurable separator(s) and escape.

streaming_next_rows(parser, max)

Take up to max complete rows from the streaming parser. Runs on a dirty CPU scheduler.

streaming_set_max_buffer(parser, max)

Set the maximum buffer size (in bytes) for the streaming parser. Default is 256 MB. Raises on overflow during streaming_feed/2.

streaming_status(parser)

Get the current status of the streaming parser.

Types

escape()

@type escape() :: binary() | non_neg_integer()

Quote/escape sequence. Accepts:

Integer byte (e.g., 34 for double-quote)
Binary (e.g., <<34>> or <<36, 36>> for $$)

parser_ref()

@opaque parser_ref()

Opaque reference to a streaming parser

row()

@type row() :: [binary()]

A parsed row (list of field binaries)

rows()

@type rows() :: [row()]

Multiple parsed rows

separator()

@type separator() :: binary() | non_neg_integer() | [binary()]

Field separator(s). Accepts:

Integer byte (e.g., 44 for comma) — single separator
Binary (e.g., <<44>> or <<58, 58>>) — single separator (possibly multi-byte)
List of binaries (e.g., [<<44>>, <<59>>]) — multiple separators

Functions

encode_string(rows, separator, escape, line_separator, formula, encoding, reserved)

@spec encode_string(
  [[binary()]],
  separator(),
  escape(),
  binary() | atom(),
  term(),
  term(),
  [binary()]
) ::
  iodata()

Encode rows to CSV using SIMD-accelerated scanning.

Uses portable SIMD to scan 16-32 bytes at a time for characters that need escaping. On platforms without SIMD hardware, portable_simd automatically degrades to scalar operations. Falls back to a general encoder for multi-byte separator/escape sequences.

Accepts a list of rows, where each row is a list of binary fields. Returns iodata (nested lists) — clean fields are passed through as zero-copy references, only dirty fields requiring quoting are allocated.

Parameters

rows - List of rows (list of lists of binaries)
separator - Field separator (see "Separator Format" above)
escape - Escape character (see "Escape Format" above)
line_separator - Line separator binary or :default for "\n"

Examples

iex> RustyCSV.Native.encode_string([["a", "b"], ["1", "2"]], 44, 34, :default)
"a,b\n1,2\n"

encode_string_parallel(rows, separator, escape, line_separator, formula, encoding, reserved)

@spec encode_string_parallel(
  [[binary()]],
  separator(),
  escape(),
  binary() | atom(),
  term(),
  term(),
  [
    binary()
  ]
) :: iodata()

Encode rows to CSV in parallel using rayon, returning iodata (list of binaries).

Uses multiple threads to encode chunks of rows simultaneously. Copies all field data into Rust-owned memory before dispatching to worker threads.

Best for quoting-heavy data — fields that frequently contain commas, quotes, or newlines (e.g., user-generated content, free-text descriptions). The per-field quoting work parallelizes well and outweighs the copy overhead.

For typical/clean data where most fields pass through unquoted, prefer encode_string/4 which avoids the copy via zero-copy term references.

Only supports single-byte separator/escape. Raises ArgumentError for multi-byte configurations — use encode_string/4 instead.

Parameters

rows - List of rows (list of lists of binaries)
separator - Field separator (see "Separator Format" above)
escape - Escape character (see "Escape Format" above)
line_separator - Line separator binary or :default for "\n"

Examples

iex> RustyCSV.Native.encode_string_parallel([["a", "b"], ["1", "2"]], [","], "\"", "\r\n")
...> |> IO.iodata_to_binary()
"a,b\r\n1,2\r\n"

get_rust_memory()

@spec get_rust_memory() :: non_neg_integer()

Get current Rust heap allocation in bytes.

Note: Requires the memory_tracking Cargo feature to be enabled. Without the feature, this returns 0 with no overhead.

To enable memory tracking, set the feature in native/rustycsv/Cargo.toml:

[features]
default = ["mimalloc", "memory_tracking"]

Examples

bytes = RustyCSV.Native.get_rust_memory()

get_rust_memory_peak()

@spec get_rust_memory_peak() :: non_neg_integer()

Get peak Rust heap allocation since last reset, in bytes.

Note: Requires the memory_tracking Cargo feature. Returns 0 otherwise.

Examples

peak_bytes = RustyCSV.Native.get_rust_memory_peak()

parse_string(csv)

@spec parse_string(binary()) :: rows()

Parse CSV using basic byte-by-byte scanning. Runs on a dirty CPU scheduler.

This is the simplest implementation, processing one byte at a time. Use parse_string_fast/1 for better performance in most cases.

Examples

iex> RustyCSV.Native.parse_string("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]

parse_string_fast(csv)

@spec parse_string_fast(binary()) :: rows()

Parse CSV using SIMD-accelerated delimiter scanning. Runs on a dirty CPU scheduler.

Uses the memchr crate for fast delimiter detection on supported architectures. This is the recommended strategy for most use cases.

Examples

iex> RustyCSV.Native.parse_string_fast("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]

parse_string_fast_with_config(csv, separator, escape, newlines)

@spec parse_string_fast_with_config(binary(), separator(), escape(), term()) :: rows()

Parse CSV using SIMD with configurable separator(s) and escape.

Examples

# TSV parsing with integer separator
iex> RustyCSV.Native.parse_string_fast_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]

# TSV parsing with binary separator
iex> RustyCSV.Native.parse_string_fast_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]

parse_string_indexed(csv)

@spec parse_string_indexed(binary()) :: rows()

Parse CSV using two-phase index-then-extract approach. Runs on a dirty CPU scheduler.

First builds an index of row/field boundaries, then extracts fields. This approach has better cache utilization and enables advanced use cases like extracting specific row ranges.

Examples

iex> RustyCSV.Native.parse_string_indexed("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]

parse_string_indexed_with_config(csv, separator, escape, newlines)

@spec parse_string_indexed_with_config(binary(), separator(), escape(), term()) ::
  rows()

Parse CSV using two-phase approach with configurable separator(s) and escape.

Examples

# TSV parsing with integer separator
iex> RustyCSV.Native.parse_string_indexed_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]

# TSV parsing with binary separator
iex> RustyCSV.Native.parse_string_indexed_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]

parse_string_parallel(csv)

@spec parse_string_parallel(binary()) :: rows()

Parse CSV in parallel using multiple threads.

Uses the rayon thread pool to parse rows in parallel. This is most beneficial for very large files (100MB+) where the parallelization overhead is outweighed by the parsing speedup.

This function runs on a dirty CPU scheduler to avoid blocking the normal BEAM schedulers.

Examples

iex> RustyCSV.Native.parse_string_parallel("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]

parse_string_parallel_with_config(csv, separator, escape, newlines)

@spec parse_string_parallel_with_config(binary(), separator(), escape(), term()) ::
  rows()

Parse CSV in parallel with configurable separator(s) and escape.

Examples

# TSV parallel parsing with integer separator
iex> RustyCSV.Native.parse_string_parallel_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]

# TSV parallel parsing with binary separator
iex> RustyCSV.Native.parse_string_parallel_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]

parse_string_with_config(csv, separator, escape, newlines)

@spec parse_string_with_config(binary(), separator(), escape(), term()) :: rows()

Parse CSV with configurable separator(s) and escape characters.

Parameters

csv - The CSV binary to parse
separator - Integer byte, binary, or list of binaries (see "Separator Format" above)
escape - Integer byte or binary (see "Escape Format" above)

Examples

# TSV parsing with integer separator
iex> RustyCSV.Native.parse_string_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]

# TSV parsing with binary separator
iex> RustyCSV.Native.parse_string_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]

# Multi-separator parsing (comma or semicolon)
iex> RustyCSV.Native.parse_string_with_config("a,b;c\n1;2,3\n", [<<44>>, <<59>>], 34)
[["a", "b", "c"], ["1", "2", "3"]]

# Multi-byte separator
iex> RustyCSV.Native.parse_string_with_config("a::b::c\n", "::", 34)
[["a", "b", "c"]]

# Multi-byte escape
iex> RustyCSV.Native.parse_string_with_config("$$hello$$,world\n", 44, "$$")
[["hello", "world"]]

parse_string_zero_copy(csv)

@spec parse_string_zero_copy(binary()) :: rows()

Parse CSV using zero-copy sub-binary references where possible. Runs on a dirty CPU scheduler.

Uses BEAM sub-binary references for unquoted and simply-quoted fields, only copying when quote unescaping is needed (hybrid Cow approach).

Trade-off: Sub-binaries keep the parent binary alive until all references are garbage collected. Use this when you want maximum speed and control memory lifetime yourself.

Examples

iex> RustyCSV.Native.parse_string_zero_copy("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]

parse_string_zero_copy_with_config(csv, separator, escape, newlines)

@spec parse_string_zero_copy_with_config(binary(), separator(), escape(), term()) ::
  rows()

Parse CSV using zero-copy with configurable separator(s) and escape.

Examples

# TSV zero-copy parsing with integer separator
iex> RustyCSV.Native.parse_string_zero_copy_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]

# TSV zero-copy parsing with binary separator
iex> RustyCSV.Native.parse_string_zero_copy_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]

parse_to_maps(input, separator, escape, newlines, strategy, header_mode, skip_first)

@spec parse_to_maps(
  binary(),
  separator(),
  escape(),
  term(),
  atom(),
  atom() | list(),
  boolean()
) :: [
  map()
]

Parse CSV and return list of maps, dispatching to the specified strategy.

Parameters

input - The CSV binary to parse
separator - Separator(s) (see "Separator Format" above)
escape - Escape sequence (see "Escape Format" above)
strategy - Atom: :basic, :simd, :indexed, or :zero_copy
header_mode - Atom :true (first row = keys) or list of key terms
skip_first - Whether to skip the first row when using explicit keys

parse_to_maps_parallel(input, separator, escape, newlines, header_mode, skip_first)

@spec parse_to_maps_parallel(
  binary(),
  separator(),
  escape(),
  term(),
  atom() | list(),
  boolean()
) :: [map()]

Parse CSV in parallel and return list of maps.

Uses the rayon thread pool on a dirty CPU scheduler.

Parameters

input - The CSV binary to parse
separator - Separator(s) (see "Separator Format" above)
escape - Escape sequence (see "Escape Format" above)
header_mode - Atom :true (first row = keys) or list of key terms
skip_first - Whether to skip the first row when using explicit keys

reset_rust_memory_stats()

@spec reset_rust_memory_stats() :: {non_neg_integer(), non_neg_integer()}

Reset memory tracking statistics.

Note: Requires the memory_tracking Cargo feature. Returns {0, 0} otherwise.

Returns {current_bytes, previous_peak_bytes}.

Examples

{current, peak} = RustyCSV.Native.reset_rust_memory_stats()

streaming_feed(parser, chunk)

@spec streaming_feed(parser_ref(), binary()) :: {non_neg_integer(), non_neg_integer()}

Feed a chunk of CSV data to the streaming parser. Runs on a dirty CPU scheduler.

Returns {available_rows, buffer_size} indicating the number of complete rows ready to be taken and the current buffer size.

Raises

:buffer_overflow — the chunk would push the internal buffer past the maximum size (default 256 MB). Use streaming_set_max_buffer/2 to adjust the limit.
:mutex_poisoned — a previous NIF call panicked while holding the parser lock. The parser should be discarded.

Examples

{available, buffer_size} = RustyCSV.Native.streaming_feed(parser, chunk)

streaming_finalize(parser)

@spec streaming_finalize(parser_ref()) :: rows()

Finalize the streaming parser and get any remaining rows. Runs on a dirty CPU scheduler.

This should be called after all data has been fed to get any partial row that was waiting for a terminating newline.

Examples

final_rows = RustyCSV.Native.streaming_finalize(parser)

streaming_new()

@spec streaming_new() :: parser_ref()

Create a new streaming parser instance.

The streaming parser maintains internal state and can process CSV data in chunks, making it suitable for large files with bounded memory usage.

The returned reference is safe to share across BEAM processes — the underlying Rust state is protected by a mutex.

Examples

parser = RustyCSV.Native.streaming_new()
RustyCSV.Native.streaming_feed(parser, "a,b\n")
RustyCSV.Native.streaming_feed(parser, "1,2\n")
rows = RustyCSV.Native.streaming_next_rows(parser, 100)

streaming_new_with_config(separator, escape, newlines)

@spec streaming_new_with_config(separator(), escape(), term()) :: parser_ref()

Create a new streaming parser with configurable separator(s) and escape.

Parameters

separator - Integer byte, binary, or list of binaries (see "Separator Format" above)
escape - Integer byte or binary (see "Escape Format" above)

Examples

# TSV streaming parser with integer separator
parser = RustyCSV.Native.streaming_new_with_config(9, 34)

# TSV streaming parser with binary separator
parser = RustyCSV.Native.streaming_new_with_config(<<9>>, 34)

# Multi-separator streaming parser
parser = RustyCSV.Native.streaming_new_with_config([<<44>>, <<59>>], 34)

# Multi-byte separator streaming parser
parser = RustyCSV.Native.streaming_new_with_config("::", 34)

streaming_next_rows(parser, max)

@spec streaming_next_rows(parser_ref(), non_neg_integer()) :: rows()

Take up to max complete rows from the streaming parser. Runs on a dirty CPU scheduler.

Returns the rows as a list of lists of binaries.

Examples

rows = RustyCSV.Native.streaming_next_rows(parser, 100)

streaming_set_max_buffer(parser, max)

@spec streaming_set_max_buffer(parser_ref(), non_neg_integer()) :: :ok

Set the maximum buffer size (in bytes) for the streaming parser. Default is 256 MB. Raises on overflow during streaming_feed/2.

streaming_status(parser)

@spec streaming_status(parser_ref()) ::
  {non_neg_integer(), non_neg_integer(), boolean()}

Get the current status of the streaming parser.

Returns {available_rows, buffer_size, has_partial}:

available_rows - Number of complete rows ready to be taken
buffer_size - Current size of the internal buffer in bytes
has_partial - Whether there's an incomplete row in the buffer

Examples

{available, buffer, has_partial} = RustyCSV.Native.streaming_status(parser)