RustyCSV is an ultra-fast CSV parsing and dumping library powered by purpose-built Rust NIFs.
It provides a drop-in replacement for NimbleCSV with the same API, while offering multiple parsing strategies optimized for different use cases.
Quick Start
Use the pre-defined RustyCSV.RFC4180 parser:
alias RustyCSV.RFC4180, as: CSV
CSV.parse_string("name,age\njohn,27\n")
#=> [["john", "27"]]
CSV.parse_string("name,age\njohn,27\n", skip_headers: false)
#=> [["name", "age"], ["john", "27"]]Defining Custom Parsers
You can define custom CSV parsers with define/2:
RustyCSV.define(MyParser,
separator: ",",
escape: "\"",
line_separator: "\n"
)
MyParser.parse_string("a,b\n1,2\n")
#=> [["1", "2"]]Parsing Strategies
RustyCSV supports multiple parsing strategies via the :strategy option:
:simd- SIMD-accelerated scanning via memchr (default, fastest for most files):basic- Simple byte-by-byte parsing (good for debugging):indexed- Two-phase index-then-extract (good for re-extracting rows):parallel- Multi-threaded via rayon (best for very large files 500MB+ with complex quoting):zero_copy- Sub-binary references (NimbleCSV-like memory profile, max speed)
Example:
CSV.parse_string(large_csv, strategy: :parallel)Scheduling
All parsing NIFs run on BEAM dirty CPU schedulers, so they never block
normal schedulers. Parallel parsing (:parallel strategy) additionally
runs on a dedicated rustycsv-* rayon thread pool to avoid contention
with other Rayon users in the same VM.
Streaming
For large files, use parse_stream/2 which uses a bounded-memory streaming parser:
"huge.csv"
|> File.stream!()
|> CSV.parse_stream()
|> Stream.each(&process_row/1)
|> Stream.run()Streaming parsers are safe to share across processes — the underlying Rust resource is protected by a mutex. However, concurrent access is serialized, so for maximum throughput use one parser per process.
Encoding (Dumping)
Convert rows back to CSV format:
CSV.dump_to_iodata([["name", "age"], ["john", "27"]])
#=> "name,age\njohn,27\n"Encoding uses a SIMD-accelerated Rust NIF that writes all CSV bytes into a single flat binary. The NIF handles four modes: plain UTF-8, UTF-8 with formula escaping, non-UTF-8 encoding, and both combined.
Difference from NimbleCSV: NimbleCSV's
dump_to_iodata/1returns an iodata list (a nested list of small binaries) that callers typically flatten back into a single binary viaIO.iodata_to_binary/1before writing to a file, sending as a download, or passing to an API. RustyCSV skips that roundtrip — it returns the final binary directly, ready for use withIO.binwrite/2,Conn.send_resp/3,:gen_tcp.send/2,File.write/2, etc. The output bytes are identical; there is nothing to traverse or flatten.Code that pattern-matches on the return value expecting a list will need adjustment. This is a deliberate trade-off: building an iodata list across the NIF boundary requires allocating one Erlang term per field, separator, and newline, which is 18–63% slower and uses 3–6x more NIF memory than returning the bytes directly.
Encoding Strategies
dump_to_iodata/2 accepts a :strategy option:
default (no option) — Single-threaded SIMD-accelerated encoder. Writes all CSV bytes into a single flat binary. Best for most workloads.
:parallel— Multi-threaded encoding via rayon. Copies all field data into Rust-owned memory, splits rows into chunks, and encodes each chunk on a separate thread. Returns a short list of large binaries. Best for quoting-heavy data (user-generated content with embedded commas/quotes/newlines).
Example:
# Default (recommended for most cases)
CSV.dump_to_iodata(rows)
# Parallel (opt in for quoting-heavy data)
CSV.dump_to_iodata(rows, strategy: :parallel)High-Throughput Concurrent Exports
The encoding NIF runs on dirty CPU schedulers with per-thread mimalloc arenas, making it suitable for concurrent export workloads — e.g., thousands of users downloading CSV reports simultaneously:
# Phoenix controller — each request encodes independently
rows = MyApp.Reports.fetch_rows(user_id)
csv = MyCSV.dump_to_iodata(rows)
send_download(conn, {:binary, csv}, filename: "report.csv")For very large exports, use chunked NIF encoding for bounded memory:
MyApp.Reports.stream_rows(user_id)
|> Stream.chunk_every(5_000)
|> Stream.map(&MyCSV.dump_to_iodata/1)
|> Enum.each(&Conn.chunk(conn, &1))NimbleCSV Compatibility
RustyCSV is designed as a drop-in replacement for NimbleCSV. The API is identical:
parse_string/2- Parse CSV string to list of rowsparse_stream/2- Lazily parse a streamparse_enumerable/2- Parse any enumerabledump_to_iodata/2- Convert rows to iodata (returns a flat binary, not an iodata list — see "Encoding" section)dump_to_stream/1- Lazily convert rows to iodata streamto_line_stream/1- Convert arbitrary chunks to linesoptions/0- Return module configuration
RustyCSV extends NimbleCSV with additional options:
:strategyonparse_string/2- Select the parsing approach (:simd,:basic,:indexed,:parallel,:zero_copy):strategyondump_to_iodata/2- Select the encoding approach (default or:parallel):headers- Return rows as maps instead of lists
Headers-to-Maps
Use the :headers option to get maps instead of lists:
CSV.parse_string("name,age\njohn,27\n", headers: true)
#=> [%{"name" => "john", "age" => "27"}]
CSV.parse_string("name,age\njohn,27\n", headers: [:name, :age])
#=> [%{name: "john", age: "27"}]
CSV.parse_string("name,age\njohn,27\n", headers: ["n", "a"])
#=> [%{"n" => "john", "a" => "27"}]Streaming also supports headers:
"huge.csv"
|> File.stream!()
|> CSV.parse_stream(headers: true)
|> Stream.each(&process_map/1)
|> Stream.run()How :headers interacts with :skip_headers
With headers: true, the first row is always consumed as keys — :skip_headers
has no effect.
With headers: [keys], the :skip_headers option controls whether the first
row is skipped (default: true). Most CSV files have a header row, so skipping
it avoids mapping the header row itself into a map. If your file has no header
row, pass skip_headers: false:
# File with header row (typical) — first row skipped by default
CSV.parse_string("name,age\njohn,27\n", headers: [:n, :a])
#=> [%{n: "john", a: "27"}]
# File without header row — include all rows
CSV.parse_string("john,27\njane,30\n", headers: [:n, :a], skip_headers: false)
#=> [%{n: "john", a: "27"}, %{n: "jane", a: "30"}]Edge cases
- Fewer columns than keys — missing values are
nil - More columns than keys — extra columns are ignored
- Duplicate headers — last column wins
- Empty header field — key is
""
Multi-Separator Support
Like NimbleCSV, RustyCSV supports multiple separator characters. Separators can be single-byte or multi-byte:
RustyCSV.define(MyParser,
separator: [",", ";"],
escape: "\""
)
# Any separator in the list is recognized when parsing
MyParser.parse_string("a,b;c\\n1;2,3\\n", skip_headers: false)
#=> [["a", "b", "c"], ["1", "2", "3"]]
# Only the FIRST separator is used when dumping
MyParser.dump_to_iodata([["a", "b", "c"]]) |> IO.iodata_to_binary()
#=> "a,b,c\\n"Multi-byte separators are supported:
RustyCSV.define(MyParser,
separator: "::",
escape: "\""
)
MyParser.parse_string("a::b::c\\n", skip_headers: false)
#=> [["a", "b", "c"]]You can also mix single-byte and multi-byte separators:
RustyCSV.define(MyParser,
separator: [",", "::"],
escape: "\""
)Multi-Byte Escape Support
Escape sequences can also be multi-byte:
RustyCSV.define(MyParser,
separator: ",",
escape: "$$"
)
MyParser.parse_string("$$hello$$,world\\n", skip_headers: false)
#=> [["hello", "world"]]Encoding Support
RustyCSV supports character encoding conversion via the :encoding option.
This is useful when exporting CSVs with non-ASCII characters (accents, CJK,
emoji) that need to open correctly in spreadsheet applications:
alias RustyCSV.Spreadsheet
# Export data with international characters for Excel/Google Sheets/Numbers
rows = [["名前", "年齢"], ["田中", "27"], ["Müller", "35"]]
csv = Spreadsheet.dump_to_iodata(rows) |> IO.iodata_to_binary()
File.write!("export.csv", csv)The pre-defined RustyCSV.Spreadsheet module outputs UTF-16 LE with BOM,
which spreadsheet applications auto-detect correctly. You can also define
custom encodings:
RustyCSV.define(MySpreadsheet,
separator: "\t",
encoding: {:utf16, :little},
trim_bom: true,
dump_bom: true
)Supported encodings:
:utf8- UTF-8 (default, no conversion overhead):latin1- ISO-8859-1 / Latin-1{:utf16, :little}- UTF-16 Little Endian{:utf16, :big}- UTF-16 Big Endian{:utf32, :little}- UTF-32 Little Endian{:utf32, :big}- UTF-32 Big Endian
Summary
Types
Options for define/2.
Options for dump_to_iodata/2.
Encoding for CSV data.
Options for parsing functions.
A single row of CSV data, represented as a list of field binaries.
Multiple rows of CSV data.
Parsing strategy to use.
Callbacks
Converts rows to iodata in CSV format.
Lazily converts rows to a stream of iodata in CSV format.
Returns the options used to define this CSV module.
Eagerly parses an enumerable of CSV data into a list of rows.
Eagerly parses an enumerable of CSV data into a list of rows with options.
Lazily parses a stream of CSV data into a stream of rows.
Lazily parses a stream of CSV data into a stream of rows with options.
Parses a CSV string into a list of rows.
Parses a CSV string into a list of rows with options.
Converts a stream of arbitrary binary chunks into a line-oriented stream.
Functions
Defines a new CSV parser/dumper module.
Types
@type define_options() :: [ separator: String.t() | [String.t()], escape: String.t(), newlines: [String.t()], line_separator: String.t(), trim_bom: boolean(), dump_bom: boolean(), reserved: [String.t()], escape_formula: map() | nil, encoding: encoding(), strategy: strategy(), moduledoc: String.t() | false | nil ]
Options for define/2.
Parsing Options
:separator- Field separator character(s). Can be a single string (e.g.,",") or a list of strings for multi-separator support (e.g.,[",", ";"]). When parsing, any separator in the list is recognized as a field delimiter. When dumping, only the first separator is used for output. Defaults to",".:escape- Escape/quote character. Defaults to""".:newlines- List of recognized line endings. Defaults to[" ", " "].:trim_bom- Remove BOM when parsing strings. Defaults tofalse.:encoding- Character encoding. Defaults to:utf8. Seeencoding/0.
Dumping Options
:line_separator- Line separator for output. Defaults to" ".:dump_bom- Include BOM in output. Defaults tofalse.:reserved- Additional characters requiring escaping.:escape_formula- Map for formula injection prevention. Defaults tonil. When set, fields starting with trigger characters are prefixed with a replacement string inside quotes. Handled natively in the Rust NIF.
Other Options
:strategy- Default parsing strategy. Defaults to:simd.:moduledoc- Documentation for the generated module.
@type dump_options() :: [{:strategy, :parallel}]
Options for dump_to_iodata/2.
Options
:strategy- Encoding strategy to use. Defaults to the single-threaded SIMD-accelerated encoder (no option needed). Pass:parallelfor multi-threaded encoding via rayon, which is faster for quoting-heavy data.
@type encoding() ::
:utf8 | :latin1 | {:utf16, :little | :big} | {:utf32, :little | :big}
Encoding for CSV data.
Supported encodings:
:utf8- UTF-8 (default, no conversion):latin1- ISO-8859-1 / Latin-1{:utf16, :little}- UTF-16 Little Endian{:utf16, :big}- UTF-16 Big Endian{:utf32, :little}- UTF-32 Little Endian{:utf32, :big}- UTF-32 Big Endian
@type parse_options() :: [ skip_headers: boolean(), strategy: strategy(), headers: boolean() | [atom() | String.t()], chunk_size: pos_integer(), batch_size: pos_integer(), max_buffer_size: pos_integer() ]
Options for parsing functions.
Common Options
:skip_headers- Whentrue, skips the first row. Defaults totrue.:strategy- The parsing strategy to use. One of::simd- SIMD-accelerated (default):basic- Simple byte-by-byte:indexed- Two-phase index-then-extract:parallel- Multi-threaded via rayon:zero_copy- Sub-binary references (keeps parent binary alive)
:headers- Controls header handling. Defaults tofalse.false- Return rows as lists (default behavior)true- Use first row as string keys, return list of maps.:skip_headersis ignored (first row is always consumed as keys).- list of atoms or strings - Use as explicit keys, return list of maps.
The first row is skipped by default (
:skip_headersapplies). Passskip_headers: falseif the file has no header row.
Streaming Options
:chunk_size- Bytes per IO read for streaming. Defaults to65536.:batch_size- Rows per batch for streaming. Defaults to1000.:max_buffer_size- Maximum streaming buffer size in bytes. Defaults to268_435_456(256 MB). If the internal buffer exceeds this limit duringstreaming_feed/2, a:buffer_overflowexception is raised. Increase this if your data contains rows longer than 256 MB. Decrease it to fail faster on malformed input that lacks newlines.
@type row() :: [binary()]
A single row of CSV data, represented as a list of field binaries.
@type rows() :: [row()]
Multiple rows of CSV data.
@type strategy() :: :simd | :basic | :indexed | :parallel | :zero_copy
Parsing strategy to use.
These strategies apply to parse_string/2 and other parsing functions.
For encoding strategies, see dump_options/0.
Available Strategies
:simd- SIMD-accelerated scanning via memchr (default, fastest for most files):basic- Simple byte-by-byte parsing (useful for debugging):indexed- Two-phase index-then-extract (good for re-extracting rows):parallel- Multi-threaded via rayon (best for very large files 500MB+ with complex quoting):zero_copy- Sub-binary references (maximum speed, keeps parent binary alive)
Memory Model Comparison
All strategies use boundary-based parsing: the NIF scans the input to find field boundaries, then returns sub-binary references for clean fields (zero copy) and only allocates new binaries for fields that require unescaping. The input binary is kept alive while any sub-binary references it.
| Strategy | Best When |
|---|---|
:simd | Default, fastest for most files |
:basic | Debugging, baseline |
:indexed | Row range extraction |
:parallel | Large files 500MB+, complex quoting |
:zero_copy | Speed-critical, short-lived results |
Examples
# Default SIMD strategy
CSV.parse_string(data)
# Parallel for large files
CSV.parse_string(large_data, strategy: :parallel)
# Zero-copy for maximum speed
CSV.parse_string(data, strategy: :zero_copy)
Callbacks
@callback dump_to_iodata(Enumerable.t()) :: iodata()
Converts rows to iodata in CSV format.
Returns a single flat binary (not an iodata list). A binary is valid
iodata/0, so it works with IO.binwrite/2, IO.iodata_to_binary/1,
etc. See "Encoding (Dumping)" in the module doc for details on how this
differs from NimbleCSV.
Options
:strategy- Encoding strategy. Defaults to the single-threaded SIMD-accelerated encoder. Pass:parallelfor multi-threaded encoding via rayon, which is faster for quoting-heavy data.
@callback dump_to_iodata(Enumerable.t(), dump_options()) :: iodata()
@callback dump_to_stream(Enumerable.t()) :: Enumerable.t()
Lazily converts rows to a stream of iodata in CSV format.
@callback options() :: keyword()
Returns the options used to define this CSV module.
@callback parse_enumerable(Enumerable.t()) :: rows()
Eagerly parses an enumerable of CSV data into a list of rows.
@callback parse_enumerable(Enumerable.t(), parse_options()) :: rows()
Eagerly parses an enumerable of CSV data into a list of rows with options.
@callback parse_stream(Enumerable.t()) :: Enumerable.t()
Lazily parses a stream of CSV data into a stream of rows.
@callback parse_stream(Enumerable.t(), parse_options()) :: Enumerable.t()
Lazily parses a stream of CSV data into a stream of rows with options.
Parses a CSV string into a list of rows.
@callback parse_string(binary(), parse_options()) :: rows()
Parses a CSV string into a list of rows with options.
@callback to_line_stream(Enumerable.t()) :: Enumerable.t()
Converts a stream of arbitrary binary chunks into a line-oriented stream.
Functions
@spec define(module(), define_options()) :: :ok
Defines a new CSV parser/dumper module.
Options
Parsing Options
:separator- The field separator(s). Can be a single string (e.g.,",","::") or a list of strings for multi-separator support (e.g.,[",", ";"],[",", "::"]). Separators can be multi-byte. Defaults to",".When multiple separators are specified:
- Parsing: Any separator in the list is recognized as a field delimiter
- Dumping: Only the first separator is used for output
This is useful for parsing files with inconsistent delimiters or mixed comma/semicolon separators (common in European locales).
:escape- The escape/quote sequence. Can be multi-byte (e.g.,"$$"). Defaults to"\"".:newlines- List of recognized line endings for parsing. Defaults to["\r\n", "\n"]. Both CRLF and LF are always recognized.:trim_bom- Whentrue, removes the BOM (byte order marker) from the beginning of strings before parsing. Defaults tofalse.:encoding- Character encoding for input/output. Defaults to:utf8. Supported encodings::utf8- UTF-8 (default, no conversion overhead):latin1- ISO-8859-1 / Latin-1{:utf16, :little}- UTF-16 Little Endian{:utf16, :big}- UTF-16 Big Endian{:utf32, :little}- UTF-32 Little Endian{:utf32, :big}- UTF-32 Big Endian
When encoding is not
:utf8, input data is converted to UTF-8 for parsing, and output is converted back to the target encoding.
Dumping Options
:line_separator- The line separator for dumped output. Defaults to"\n".:dump_bom- Whentrue, includes the appropriate BOM at the start of dumped output. Defaults tofalse.:reserved- Additional characters that should trigger field escaping when dumping. By default, fields containing the separator, escape character, or newlines are escaped.:escape_formula- A map of characters to their escaped versions for preventing CSV formula injection. When set, fields starting with these characters will be prefixed with a tab. Defaults tonil.Example:
%{"=" => true, "+" => true, "-" => true, "@" => true}
Strategy Options
:strategy- The default parsing strategy. One of::simd- SIMD-accelerated via memchr (default, fastest):basic- Simple byte-by-byte parsing:indexed- Two-phase index-then-extract:parallel- Multi-threaded via rayon:zero_copy- Sub-binary references (NimbleCSV-like memory, max speed)
Documentation
:moduledoc- The@moduledocfor the generated module. Set tofalseto disable documentation.
Examples
# Define a standard CSV parser
RustyCSV.define(MyApp.CSV,
separator: ",",
escape: "\"",
line_separator: "\n"
)
# Use it
MyApp.CSV.parse_string("a,b\n1,2\n")
#=> [["1", "2"]]
# Define a UTF-16 spreadsheet parser
RustyCSV.define(MyApp.Spreadsheet,
separator: "\t",
encoding: {:utf16, :little},
trim_bom: true,
dump_bom: true
)
# Define a multi-separator parser (comma or semicolon)
RustyCSV.define(MyApp.FlexibleCSV,
separator: [",", ";"],
escape: "\""
)
# Parse files with mixed delimiters
MyApp.FlexibleCSV.parse_string("a,b;c\n1;2,3\n", skip_headers: false)
#=> [["a", "b", "c"], ["1", "2", "3"]]
# Dumping uses the first separator (comma)
MyApp.FlexibleCSV.dump_to_iodata([["x", "y"]]) |> IO.iodata_to_binary()
#=> "x,y\n"
# Get the configuration
MyApp.CSV.options()
#=> [separator: ",", escape: "\"", ...]