Low-level NIF bindings for CSV parsing and encoding.
This module provides direct access to the Rust NIF functions. For normal use,
prefer the higher-level RustyCSV.RFC4180 or custom parsers defined with
RustyCSV.define/2.
Separator Format
The _with_config functions accept the separator in three forms:
- Integer — a single-byte separator:
44(comma),9(tab) - Binary — a single separator, possibly multi-byte:
<<44>>(comma),"::"(double colon) - List of binaries — multiple separators:
[<<44>>, <<59>>](comma or semicolon),[",", "::"](comma or double colon)
Escape Format
The escape (quote character) accepts:
- Integer — a single-byte escape:
34(double quote) - Binary — possibly multi-byte:
<<34>>(double quote),"$$"(dollar-dollar)
Strategies
The module exposes six parsing strategies:
parse_string/1- Basic byte-by-byte parsing (Strategy A)parse_string_fast/1- SIMD-accelerated via memchr (Strategy B)parse_string_indexed/1- Two-phase index-then-extract (Strategy C)parse_string_parallel/1- Multi-threaded via rayon (Strategy E)parse_string_zero_copy/1- Sub-binary references (Strategy F)streaming_*functions - Stateful streaming parser (Strategy D)
Strategy Selection
| Strategy | Use Case | Memory Model |
|---|---|---|
parse_string_fast/1 | Default, most files | Copy (frees input) |
parse_string_parallel/1 | Large files 500MB+ | Copy (frees input) |
parse_string_zero_copy/1 | Maximum speed | Sub-binary (keeps input) |
parse_string_indexed/1 | Row range extraction | Copy (frees input) |
streaming_* | Unbounded files | Copy (per chunk) |
parse_string/1 | Debugging | Copy (frees input) |
Scheduling
All parsing NIFs run on BEAM dirty CPU schedulers to avoid blocking
normal schedulers. This includes all parse_string* functions,
streaming_feed/2, streaming_next_rows/2, and streaming_finalize/1.
Parallel parsing runs on a dedicated named rustycsv-* rayon thread pool
(capped at 8 threads) rather than the global rayon pool.
Concurrency
Streaming parser references (parser_ref/0) are safe to share across
BEAM processes — the underlying Rust state is protected by a mutex. If a
NIF panics while holding the lock, subsequent calls return :mutex_poisoned
instead of crashing the VM.
Memory Tracking (Optional)
Memory tracking functions are available but require the memory_tracking
Cargo feature to be enabled. Without the feature, they return 0 with
no runtime overhead.
See get_rust_memory/0, get_rust_memory_peak/0, reset_rust_memory_stats/0.
Summary
Types
Quote/escape sequence. Accepts
Opaque reference to a streaming parser
A parsed row (list of field binaries)
Multiple parsed rows
Field separator(s). Accepts
Functions
Encode rows to CSV using SIMD-accelerated scanning.
Encode rows to CSV in parallel using rayon, returning iodata (list of binaries).
Get current Rust heap allocation in bytes.
Get peak Rust heap allocation since last reset, in bytes.
Parse CSV using basic byte-by-byte scanning. Runs on a dirty CPU scheduler.
Parse CSV using SIMD-accelerated delimiter scanning. Runs on a dirty CPU scheduler.
Parse CSV using SIMD with configurable separator(s) and escape.
Parse CSV using two-phase index-then-extract approach. Runs on a dirty CPU scheduler.
Parse CSV using two-phase approach with configurable separator(s) and escape.
Parse CSV in parallel using multiple threads.
Parse CSV in parallel with configurable separator(s) and escape.
Parse CSV with configurable separator(s) and escape characters.
Parse CSV using zero-copy sub-binary references where possible. Runs on a dirty CPU scheduler.
Parse CSV using zero-copy with configurable separator(s) and escape.
Parse CSV and return list of maps, dispatching to the specified strategy.
Parse CSV in parallel and return list of maps.
Reset memory tracking statistics.
Feed a chunk of CSV data to the streaming parser. Runs on a dirty CPU scheduler.
Finalize the streaming parser and get any remaining rows. Runs on a dirty CPU scheduler.
Create a new streaming parser instance.
Create a new streaming parser with configurable separator(s) and escape.
Take up to max complete rows from the streaming parser. Runs on a dirty CPU scheduler.
Set the maximum buffer size (in bytes) for the streaming parser.
Default is 256 MB. Raises on overflow during streaming_feed/2.
Get the current status of the streaming parser.
Types
@type escape() :: binary() | non_neg_integer()
Quote/escape sequence. Accepts:
- Integer byte (e.g.,
34for double-quote) - Binary (e.g.,
<<34>>or<<36, 36>>for$$)
@opaque parser_ref()
Opaque reference to a streaming parser
@type row() :: [binary()]
A parsed row (list of field binaries)
@type rows() :: [row()]
Multiple parsed rows
@type separator() :: binary() | non_neg_integer() | [binary()]
Field separator(s). Accepts:
- Integer byte (e.g.,
44for comma) — single separator - Binary (e.g.,
<<44>>or<<58, 58>>) — single separator (possibly multi-byte) - List of binaries (e.g.,
[<<44>>, <<59>>]) — multiple separators
Functions
@spec encode_string( [[binary()]], separator(), escape(), binary() | atom(), term(), term(), [binary()] ) :: iodata()
Encode rows to CSV using SIMD-accelerated scanning.
Uses portable SIMD to scan 16-32 bytes at a time for characters that need escaping. On platforms without SIMD hardware, portable_simd automatically degrades to scalar operations. Falls back to a general encoder for multi-byte separator/escape sequences.
Accepts a list of rows, where each row is a list of binary fields. Returns iodata (nested lists) — clean fields are passed through as zero-copy references, only dirty fields requiring quoting are allocated.
Parameters
rows- List of rows (list of lists of binaries)separator- Field separator (see "Separator Format" above)escape- Escape character (see "Escape Format" above)line_separator- Line separator binary or:defaultfor"\n"
Examples
iex> RustyCSV.Native.encode_string([["a", "b"], ["1", "2"]], 44, 34, :default)
"a,b\n1,2\n"
@spec encode_string_parallel( [[binary()]], separator(), escape(), binary() | atom(), term(), term(), [ binary() ] ) :: iodata()
Encode rows to CSV in parallel using rayon, returning iodata (list of binaries).
Uses multiple threads to encode chunks of rows simultaneously. Copies all field data into Rust-owned memory before dispatching to worker threads.
Best for quoting-heavy data — fields that frequently contain commas, quotes, or newlines (e.g., user-generated content, free-text descriptions). The per-field quoting work parallelizes well and outweighs the copy overhead.
For typical/clean data where most fields pass through unquoted, prefer
encode_string/4 which avoids the copy via zero-copy term references.
Only supports single-byte separator/escape. Raises ArgumentError for
multi-byte configurations — use encode_string/4 instead.
Parameters
rows- List of rows (list of lists of binaries)separator- Field separator (see "Separator Format" above)escape- Escape character (see "Escape Format" above)line_separator- Line separator binary or:defaultfor"\n"
Examples
iex> RustyCSV.Native.encode_string_parallel([["a", "b"], ["1", "2"]], [","], "\"", "\r\n")
...> |> IO.iodata_to_binary()
"a,b\r\n1,2\r\n"
@spec get_rust_memory() :: non_neg_integer()
Get current Rust heap allocation in bytes.
Note: Requires the memory_tracking Cargo feature to be enabled.
Without the feature, this returns 0 with no overhead.
To enable memory tracking, set the feature in native/rustycsv/Cargo.toml:
[features]
default = ["mimalloc", "memory_tracking"]Examples
bytes = RustyCSV.Native.get_rust_memory()
@spec get_rust_memory_peak() :: non_neg_integer()
Get peak Rust heap allocation since last reset, in bytes.
Note: Requires the memory_tracking Cargo feature. Returns 0 otherwise.
Examples
peak_bytes = RustyCSV.Native.get_rust_memory_peak()
Parse CSV using basic byte-by-byte scanning. Runs on a dirty CPU scheduler.
This is the simplest implementation, processing one byte at a time.
Use parse_string_fast/1 for better performance in most cases.
Examples
iex> RustyCSV.Native.parse_string("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]
Parse CSV using SIMD-accelerated delimiter scanning. Runs on a dirty CPU scheduler.
Uses the memchr crate for fast delimiter detection on supported
architectures. This is the recommended strategy for most use cases.
Examples
iex> RustyCSV.Native.parse_string_fast("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]
Parse CSV using SIMD with configurable separator(s) and escape.
Examples
# TSV parsing with integer separator
iex> RustyCSV.Native.parse_string_fast_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]
# TSV parsing with binary separator
iex> RustyCSV.Native.parse_string_fast_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]
Parse CSV using two-phase index-then-extract approach. Runs on a dirty CPU scheduler.
First builds an index of row/field boundaries, then extracts fields. This approach has better cache utilization and enables advanced use cases like extracting specific row ranges.
Examples
iex> RustyCSV.Native.parse_string_indexed("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]
Parse CSV using two-phase approach with configurable separator(s) and escape.
Examples
# TSV parsing with integer separator
iex> RustyCSV.Native.parse_string_indexed_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]
# TSV parsing with binary separator
iex> RustyCSV.Native.parse_string_indexed_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]
Parse CSV in parallel using multiple threads.
Uses the rayon thread pool to parse rows in parallel. This is most beneficial for very large files (100MB+) where the parallelization overhead is outweighed by the parsing speedup.
This function runs on a dirty CPU scheduler to avoid blocking the normal BEAM schedulers.
Examples
iex> RustyCSV.Native.parse_string_parallel("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]
Parse CSV in parallel with configurable separator(s) and escape.
Examples
# TSV parallel parsing with integer separator
iex> RustyCSV.Native.parse_string_parallel_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]
# TSV parallel parsing with binary separator
iex> RustyCSV.Native.parse_string_parallel_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]
Parse CSV with configurable separator(s) and escape characters.
Parameters
csv- The CSV binary to parseseparator- Integer byte, binary, or list of binaries (see "Separator Format" above)escape- Integer byte or binary (see "Escape Format" above)
Examples
# TSV parsing with integer separator
iex> RustyCSV.Native.parse_string_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]
# TSV parsing with binary separator
iex> RustyCSV.Native.parse_string_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]
# Multi-separator parsing (comma or semicolon)
iex> RustyCSV.Native.parse_string_with_config("a,b;c\n1;2,3\n", [<<44>>, <<59>>], 34)
[["a", "b", "c"], ["1", "2", "3"]]
# Multi-byte separator
iex> RustyCSV.Native.parse_string_with_config("a::b::c\n", "::", 34)
[["a", "b", "c"]]
# Multi-byte escape
iex> RustyCSV.Native.parse_string_with_config("$$hello$$,world\n", 44, "$$")
[["hello", "world"]]
Parse CSV using zero-copy sub-binary references where possible. Runs on a dirty CPU scheduler.
Uses BEAM sub-binary references for unquoted and simply-quoted fields, only copying when quote unescaping is needed (hybrid Cow approach).
Trade-off: Sub-binaries keep the parent binary alive until all references are garbage collected. Use this when you want maximum speed and control memory lifetime yourself.
Examples
iex> RustyCSV.Native.parse_string_zero_copy("a,b\n1,2\n")
[["a", "b"], ["1", "2"]]
Parse CSV using zero-copy with configurable separator(s) and escape.
Examples
# TSV zero-copy parsing with integer separator
iex> RustyCSV.Native.parse_string_zero_copy_with_config("a\tb\n1\t2\n", 9, 34)
[["a", "b"], ["1", "2"]]
# TSV zero-copy parsing with binary separator
iex> RustyCSV.Native.parse_string_zero_copy_with_config("a\tb\n1\t2\n", <<9>>, 34)
[["a", "b"], ["1", "2"]]
@spec parse_to_maps( binary(), separator(), escape(), term(), atom(), atom() | list(), boolean() ) :: [ map() ]
Parse CSV and return list of maps, dispatching to the specified strategy.
Parameters
input- The CSV binary to parseseparator- Separator(s) (see "Separator Format" above)escape- Escape sequence (see "Escape Format" above)strategy- Atom::basic,:simd,:indexed, or:zero_copyheader_mode- Atom:true(first row = keys) or list of key termsskip_first- Whether to skip the first row when using explicit keys
@spec parse_to_maps_parallel( binary(), separator(), escape(), term(), atom() | list(), boolean() ) :: [map()]
Parse CSV in parallel and return list of maps.
Uses the rayon thread pool on a dirty CPU scheduler.
Parameters
input- The CSV binary to parseseparator- Separator(s) (see "Separator Format" above)escape- Escape sequence (see "Escape Format" above)header_mode- Atom:true(first row = keys) or list of key termsskip_first- Whether to skip the first row when using explicit keys
@spec reset_rust_memory_stats() :: {non_neg_integer(), non_neg_integer()}
Reset memory tracking statistics.
Note: Requires the memory_tracking Cargo feature. Returns {0, 0} otherwise.
Returns {current_bytes, previous_peak_bytes}.
Examples
{current, peak} = RustyCSV.Native.reset_rust_memory_stats()
@spec streaming_feed(parser_ref(), binary()) :: {non_neg_integer(), non_neg_integer()}
Feed a chunk of CSV data to the streaming parser. Runs on a dirty CPU scheduler.
Returns {available_rows, buffer_size} indicating the number of complete
rows ready to be taken and the current buffer size.
Raises
:buffer_overflow— the chunk would push the internal buffer past the maximum size (default 256 MB). Usestreaming_set_max_buffer/2to adjust the limit.:mutex_poisoned— a previous NIF call panicked while holding the parser lock. The parser should be discarded.
Examples
{available, buffer_size} = RustyCSV.Native.streaming_feed(parser, chunk)
@spec streaming_finalize(parser_ref()) :: rows()
Finalize the streaming parser and get any remaining rows. Runs on a dirty CPU scheduler.
This should be called after all data has been fed to get any partial row that was waiting for a terminating newline.
Examples
final_rows = RustyCSV.Native.streaming_finalize(parser)
@spec streaming_new() :: parser_ref()
Create a new streaming parser instance.
The streaming parser maintains internal state and can process CSV data in chunks, making it suitable for large files with bounded memory usage.
The returned reference is safe to share across BEAM processes — the underlying Rust state is protected by a mutex.
Examples
parser = RustyCSV.Native.streaming_new()
RustyCSV.Native.streaming_feed(parser, "a,b\n")
RustyCSV.Native.streaming_feed(parser, "1,2\n")
rows = RustyCSV.Native.streaming_next_rows(parser, 100)
@spec streaming_new_with_config(separator(), escape(), term()) :: parser_ref()
Create a new streaming parser with configurable separator(s) and escape.
Parameters
separator- Integer byte, binary, or list of binaries (see "Separator Format" above)escape- Integer byte or binary (see "Escape Format" above)
Examples
# TSV streaming parser with integer separator
parser = RustyCSV.Native.streaming_new_with_config(9, 34)
# TSV streaming parser with binary separator
parser = RustyCSV.Native.streaming_new_with_config(<<9>>, 34)
# Multi-separator streaming parser
parser = RustyCSV.Native.streaming_new_with_config([<<44>>, <<59>>], 34)
# Multi-byte separator streaming parser
parser = RustyCSV.Native.streaming_new_with_config("::", 34)
@spec streaming_next_rows(parser_ref(), non_neg_integer()) :: rows()
Take up to max complete rows from the streaming parser. Runs on a dirty CPU scheduler.
Returns the rows as a list of lists of binaries.
Examples
rows = RustyCSV.Native.streaming_next_rows(parser, 100)
@spec streaming_set_max_buffer(parser_ref(), non_neg_integer()) :: :ok
Set the maximum buffer size (in bytes) for the streaming parser.
Default is 256 MB. Raises on overflow during streaming_feed/2.
@spec streaming_status(parser_ref()) :: {non_neg_integer(), non_neg_integer(), boolean()}
Get the current status of the streaming parser.
Returns {available_rows, buffer_size, has_partial}:
available_rows- Number of complete rows ready to be takenbuffer_size- Current size of the internal buffer in byteshas_partial- Whether there's an incomplete row in the buffer
Examples
{available, buffer, has_partial} = RustyCSV.Native.streaming_status(parser)