All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.3.9] - 2026-02-16
Improved memory efficiency and correctness for long-running streaming workloads.
Fixed
- Memory reclamation — streaming parsers now actively release unused memory during parsing and finalization, preventing gradual memory growth in long-lived streams
- Reduced baseline memory — thread pool uses ~48 MiB less memory
Improved
- Internal iterator correctness and buffer management hardening
[0.3.8] - 2026-02-11
Added
- musl/alpine compatibility — added target-specific dependency for
mimalloconmusltargets with thelocal_dynamic_tlsfeature enabled. This ensures stable support and prevents potential issues with thread-local storage when running in Alpine Linux containers (e.g., standard Elixir docker images).
[0.3.7] - 2026-02-03
Fixed
- Disabled
memory_trackingby default — thememory_trackingCargo feature was accidentally left enabled in the 0.3.6 release. This feature wraps every allocation/deallocation with atomic counter updates, adding measurable overhead. It is now disabled by default as intended. Enable explicitly for profiling:default = ["mimalloc", "memory_tracking"]innative/rustycsv/Cargo.toml.
[0.3.6] - 2026-02-02
Decoding and encoding overhaul. All batch decode strategies now use boundary-based sub-binaries (zero-copy for most fields). Encoding writes a single flat binary instead of an iodata list. 3.5–19x faster decoding, 2.5–31x faster encoding vs pure Elixir, with 5–14x less memory for decoding.
Added
- Parallel encoding option —
dump_to_iodata(rows, strategy: :parallel)for quoting-heavy workloads - Encoding benchmarks (
bench/encode_bench.exs)
Changed
- Boundary-based sub-binary decoding — all batch strategies (
:simd,:basic,:indexed,:zero_copy,:parallel) now parse field boundaries as(start, end)offset pairs, then create BEAM sub-binary references into the original input. Only fields requiring quote unescaping (""→") are copied. Previously,:simd/:basic/:indexedusedCow<[u8]>(copying intoNewBinaryfor every field) and:paralleldouble-copied (rayon workers viato_vec()+ main thread viaNewBinary). - Parallel strategy: boundary extraction — rayon workers now compute boundary pairs (pure index arithmetic) instead of copying field data. The main thread builds sub-binary terms. Eliminates the double-copy bottleneck that made
:parallelslower than NimbleCSV on small/medium files. - Flat binary encoding — the encoding NIF now writes raw CSV bytes into a single binary instead of constructing an iodata list, reducing NIF peak memory 3–6x and BEAM-side allocation to 80 bytes
[0.3.5] - 2026-02-02
Zero unsafe in application code. No user-facing API changes.
Changed
- Zero
unsafein application code — all parsing, scanning, and term-building code is now fully safe Rust. The only remainingunsafeis theGlobalAlloctrait impl behind the opt-inmemory_trackingfeature flag (required by the trait).- Sub-binary creation (
term.rs): Replaced hand-rolledenif_make_sub_binaryFFI call with rustler's safeBinary::make_subbinary().into()API, enabled by upstream PR #719 (#[inline]onmake_subbinary_unchecked+From<Binary> for Term) - SIMD quote detection (
simd_scanner.rs): RemovedunsafeCLMUL (x86_64) and PMULL (aarch64)std::archintrinsics for prefix-XOR. All targets now use the portable shift-and-xor cascade — benchmarked with no measurable difference on 15MB/100K-row workloads
- Sub-binary creation (
- rustler dependency pinned to git master pending 0.37.3 hex release
[0.3.4] - 2026-02-01
Major internal refactor replacing all per-strategy byte-by-byte parsers with a shared single-pass SIMD structural scanner. No user-facing API changes.
Changed
- SIMD structural scanner — all six parsing strategies now share a single
scan_structuralpass that finds every unquoted separator and row ending in one sweep. Usesstd::simdportable SIMD (128-bit on all targets, 256-bit on AVX2). Requires Rust nightly (#![feature(portable_simd)]), but only uses the stabilization-safe API subset — no swizzle, scatter/gather, or lane-count generics. :parallelstrategy overhauled — phase 1 now uses the shared SIMD scan instead of a separate sequential row-boundary pass
Performance
:zero_copy— up to 15% faster on small payloads, up to 31% on large:simd/:basic— 25-35% faster across mixed and large workloads:parallel— 2.4-3.7x faster, now competitive at all file sizes (previously only beneficial at 500MB+)- Streaming — 2.2x faster than NimbleCSV (was roughly even)
- vs NimbleCSV: 3.7x (simple) to 17.9x (quoted) to 12.5x (108MB)
[0.3.3] - 2026-01-29
Internal safety hardening and scheduler improvements. No new user-facing features — all changes are on by default with zero configuration required.
Changed
- NIF panic safety — all Rust NIF code paths now use explicit error handling instead of panics, eliminating the possibility of panic-induced lock poisoning or inconsistent state under any input
[0.3.2] - 2026-01-29
⚠️ Note: Streaming parsers now enforce a 256 MB buffer cap. If your workload streams chunks larger than 256 MB without any newline characters,
streaming_feed/2will raise:buffer_overflow. This is unlikely to affect real-world CSV data, but if needed you can raise the limit with the:max_buffer_sizeoption:CSV.parse_stream(stream, max_buffer_size: 512 * 1024 * 1024)
Added
- Bounded streaming buffer — streaming parsers now enforce a maximum buffer size (default 256 MB) to prevent unbounded memory growth when no newlines are encountered
streaming_feed/2raises:buffer_overflowif the buffer would exceed the limitstreaming_set_max_buffer/2— new NIF to configure the limit per parser instance- Configurable via
:max_buffer_sizeoption onparse_stream/2,stream_file/2,stream_enumerable/2, andstream_device/2
- Dedicated rayon thread pool — parallel parsing (
parse_string_parallel,parse_to_maps_parallel, and general multi-byte parallel) now runs on a namedrustycsv-*thread pool instead of the global rayon pool, avoiding contention with other Rayon users in the same VM - Atoms module — internal
mod atomsblock for DRY atom definitions (ok,error,mutex_poisoned,buffer_overflow)
Changed
- Dirty CPU scheduling — 12 NIFs that process unbounded input now run on dirty CPU schedulers to avoid blocking normal BEAM schedulers:
parse_string,parse_string_with_config,parse_string_fast,parse_string_fast_with_config,parse_string_indexed,parse_string_indexed_with_config,parse_string_zero_copy,parse_string_zero_copy_with_config,parse_to_maps,streaming_feed,streaming_next_rows,streaming_finalize
Fixed
- Mutex poisoning recovery — streaming parser NIFs now return a
:mutex_poisonedexception instead of panicking if a previous call panicked while holding the lock - Sub-binary bounds check —
make_subbinarynow validatesstart + len <= input_lenwith adebug_assert!in dev/test builds and a release-mode safety net that returns an empty binary instead of undefined behavior
[0.3.1] - 2026-01-28
Added
- Custom newline support — pass
newlinesoption through to the Rust parser so custom line terminators work for parsing, not just dumpingnewlines: ["|"]— single-byte custom newlinenewlines: ["<br>"]— multi-byte custom newlinenewlines: ["<br>", "|"]— multiple custom newlines- Default
["\r\n", "\n"]routes through existing SIMD-optimized paths — zero performance impact - Custom newlines route through the general byte-by-byte parser
- Works with all strategies:
:basic,:simd,:indexed,:parallel,:zero_copy - Works with streaming (
parse_stream/2) - Works with headers-to-maps (
headers: true)
Fixed
escape_formulauses configured replacement — no longer hardcodes\tprefix; respects the map's replacement value (e.g.%{["@", "+"] => "'"}now prepends'instead of\t)escape_charsuses configured newlines — custom newlines andline_separatornow trigger quoting during dump instead of hardcoded\n/\roptions/0normalizes separator to a list — always returns separator as a list (e.g.[","]) to match NimbleCSV behaviorparse_enumerableavoids eager concatenation — delegates toparse_streaminstead ofEnum.join, keeping peak memory proportional to result + one chunk- Integer codepoints accepted for
:separatorand:escape— e.g.separator: ?,, escape: ?"now works for NimbleCSV compatibility
[0.3.0] - 2026-01-28
Added
Headers-to-maps — return rows as Elixir maps instead of lists
headers: true— first row becomes string keysheaders: [:name, :age]— explicit atom keysheaders: ["n", "a"]— explicit string keys- Works with
parse_string/2(Rust-side map construction) andparse_stream/2(Elixir-sideStream.transform) - Rust-side key interning: header terms allocated once and reused across all rows
- Edge cases: fewer columns →
nil, extra columns → ignored, duplicate headers → last wins - All 5 batch strategies and streaming supported
- 97 new tests including cross-strategy consistency and parse_string/parse_stream agreement
Multi-separator support — multiple separator characters for NimbleCSV compatibility
separator: [",", ";"]— accepts a list of separator strings- Parsing: Any separator in the list is recognized as a field delimiter
- Dumping: Only the first separator is used for output (deterministic)
- Uses SIMD-optimized
memchr2/memchr3for 2-3 single-byte separators, with fallback for 4+ - Works with all parsing strategies and streaming
- Backward compatible: single separator string still works as before
Fixed
- Multi-byte separator and escape support - Separators and escape sequences are no longer
restricted to single bytes, completing NimbleCSV parity
separator: "::"orseparator: "||"— multi-byte separators now workseparator: [",", "::"]— lists can mix single-byte and multi-byte separatorsescape: "$$"— multi-byte escape sequences now work- Single-byte cases are unchanged — the existing SIMD-optimized code paths are used when all separators and the escape are single bytes (zero performance regression)
- Multi-byte cases use a new general-purpose byte-by-byte parser
- All 6 strategies and streaming support multi-byte separators and escapes
[0.2.0] - 2026-01-25
Added
:zero_copystrategy - New parsing strategy using BEAM sub-binary references- Zero-copy for unquoted and simply-quoted fields
- Hybrid approach: only copies when quote unescaping is needed (
""→") - Matches NimbleCSV's memory model while keeping SIMD scanning speed
- Trade-off: sub-binaries keep parent binary alive until GC
SIMD-accelerated row boundary scanning -
memchr3for parallel strategy- Replaces byte-by-byte scanning with hardware-accelerated jumps
- Only examines positions where quotes or newlines appear
- Properly handles RFC 4180 escaped quotes
mimalloc allocator - High-performance memory allocator (enabled by default)
- 10-20% faster allocation for many small objects
- Reduced memory fragmentation
- Zero tracking overhead in default configuration
Optional memory tracking - Opt-in profiling via
memory_trackingCargo feature- When disabled (default):
get_rust_memory/0etc. return0with zero overhead - When enabled: full allocation tracking for profiling
- Enable with
default = ["mimalloc", "memory_tracking"]in Cargo.toml
- When disabled (default):
Changed
- Memory tracking is now opt-in instead of always-on (removes ~5-10% overhead)
- Pre-allocated vectors throughout parsing paths for reduced reallocation
- Updated ARCHITECTURE.md with comprehensive strategy documentation
- Six parsing strategies now available (was five)
Performance
:parallelstrategy benefits from SIMD row boundary scanning:zero_copystrategy eliminates copy overhead for clean CSV data- All strategies benefit from mimalloc and pre-allocation improvements
Fixed
- Benchmark methodology - Corrected unfair streaming comparison (NimbleCSV now uses line-based streams)
- Memory claims - Honest metrics showing both BEAM and Rust allocations
:parallelthreshold - Updated from 100MB+ to 500MB+ based on actual crossover testing- Documentation now accurately reflects 3.5x-9x speedups (up to 18x for quoted data)
[0.1.0] - 2025-01-25
Added
- Initial release
- Five parsing strategies:
:simd,:parallel,:streaming,:indexed,:basic - Full NimbleCSV API compatibility
- RFC 4180 compliance with 147 tests
- Configurable separators (CSV, TSV, PSV, etc.)
- Bounded-memory streaming for large files
- Character encoding support: UTF-8, UTF-16 (LE/BE), UTF-32 (LE/BE), Latin-1
- Pre-defined
RustyCSV.Spreadsheetparser for Excel-compatible UTF-16 LE TSV - Rust memory tracking for profiling (now opt-in, see Unreleased)
- Comprehensive documentation
Parsing Strategies
:simd- SIMD-accelerated delimiter scanning viamemchr(default):parallel- Multi-threaded parsing viarayonfor 500MB+ files with complex quoting:streaming- Stateful chunked parser for unbounded files:indexed- Two-phase index-then-extract for row range access:basic- Simple byte-by-byte parsing for debugging
Encoding Support
:utf8- UTF-8 (default, zero overhead):latin1- ISO-8859-1 / Latin-1{:utf16, :little}- UTF-16 Little Endian (Excel/Windows){:utf16, :big}- UTF-16 Big Endian{:utf32, :little}- UTF-32 Little Endian{:utf32, :big}- UTF-32 Big Endian
Validation
- csv-spectrum acid test suite (12 tests)
- csv-test-data RFC 4180 suite (17 tests)
- PapaParse-inspired edge cases (53 tests)
- Encoding conversion tests (20 tests)
- Cross-strategy consistency validation
- NimbleCSV output compatibility verification