Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.3.9] - 2026-02-16

Improved memory efficiency and correctness for long-running streaming workloads.

Fixed

Memory reclamation — streaming parsers now actively release unused memory during parsing and finalization, preventing gradual memory growth in long-lived streams
Reduced baseline memory — thread pool uses ~48 MiB less memory

Improved

Internal iterator correctness and buffer management hardening

[0.3.8] - 2026-02-11

Added

musl/alpine compatibility — added target-specific dependency for mimalloc on musl targets with the local_dynamic_tls feature enabled. This ensures stable support and prevents potential issues with thread-local storage when running in Alpine Linux containers (e.g., standard Elixir docker images).

[0.3.7] - 2026-02-03

Fixed

Disabled memory_tracking by default — the memory_tracking Cargo feature was accidentally left enabled in the 0.3.6 release. This feature wraps every allocation/deallocation with atomic counter updates, adding measurable overhead. It is now disabled by default as intended. Enable explicitly for profiling: default = ["mimalloc", "memory_tracking"] in native/rustycsv/Cargo.toml.

[0.3.6] - 2026-02-02

Decoding and encoding overhaul. All batch decode strategies now use boundary-based sub-binaries (zero-copy for most fields). Encoding writes a single flat binary instead of an iodata list. 3.5–19x faster decoding, 2.5–31x faster encoding vs pure Elixir, with 5–14x less memory for decoding.

Added

Parallel encoding option — dump_to_iodata(rows, strategy: :parallel) for quoting-heavy workloads
Encoding benchmarks (bench/encode_bench.exs)

Changed

Boundary-based sub-binary decoding — all batch strategies (:simd, :basic, :indexed, :zero_copy, :parallel) now parse field boundaries as (start, end) offset pairs, then create BEAM sub-binary references into the original input. Only fields requiring quote unescaping ("" → ") are copied. Previously, :simd/:basic/:indexed used Cow<[u8]> (copying into NewBinary for every field) and :parallel double-copied (rayon workers via to_vec() + main thread via NewBinary).
Parallel strategy: boundary extraction — rayon workers now compute boundary pairs (pure index arithmetic) instead of copying field data. The main thread builds sub-binary terms. Eliminates the double-copy bottleneck that made :parallel slower than NimbleCSV on small/medium files.
Flat binary encoding — the encoding NIF now writes raw CSV bytes into a single binary instead of constructing an iodata list, reducing NIF peak memory 3–6x and BEAM-side allocation to 80 bytes

[0.3.5] - 2026-02-02

Zero unsafe in application code. No user-facing API changes.

Changed

Zero unsafe in application code — all parsing, scanning, and term-building code is now fully safe Rust. The only remaining unsafe is the GlobalAlloc trait impl behind the opt-in memory_tracking feature flag (required by the trait).
- Sub-binary creation (term.rs): Replaced hand-rolled enif_make_sub_binary FFI call with rustler's safe Binary::make_subbinary().into() API, enabled by upstream PR #719 (#[inline] on make_subbinary_unchecked + From<Binary> for Term)
- SIMD quote detection (simd_scanner.rs): Removed unsafe CLMUL (x86_64) and PMULL (aarch64) std::arch intrinsics for prefix-XOR. All targets now use the portable shift-and-xor cascade — benchmarked with no measurable difference on 15MB/100K-row workloads
rustler dependency pinned to git master pending 0.37.3 hex release

[0.3.4] - 2026-02-01

Major internal refactor replacing all per-strategy byte-by-byte parsers with a shared single-pass SIMD structural scanner. No user-facing API changes.

Changed

SIMD structural scanner — all six parsing strategies now share a single scan_structural pass that finds every unquoted separator and row ending in one sweep. Uses std::simd portable SIMD (128-bit on all targets, 256-bit on AVX2). Requires Rust nightly (#![feature(portable_simd)]), but only uses the stabilization-safe API subset — no swizzle, scatter/gather, or lane-count generics.
:parallel strategy overhauled — phase 1 now uses the shared SIMD scan instead of a separate sequential row-boundary pass

Performance

:zero_copy — up to 15% faster on small payloads, up to 31% on large
:simd / :basic — 25-35% faster across mixed and large workloads
:parallel — 2.4-3.7x faster, now competitive at all file sizes (previously only beneficial at 500MB+)
Streaming — 2.2x faster than NimbleCSV (was roughly even)
vs NimbleCSV: 3.7x (simple) to 17.9x (quoted) to 12.5x (108MB)

[0.3.3] - 2026-01-29

Internal safety hardening and scheduler improvements. No new user-facing features — all changes are on by default with zero configuration required.

Changed

NIF panic safety — all Rust NIF code paths now use explicit error handling instead of panics, eliminating the possibility of panic-induced lock poisoning or inconsistent state under any input

[0.3.2] - 2026-01-29

⚠️ Note: Streaming parsers now enforce a 256 MB buffer cap. If your workload streams chunks larger than 256 MB without any newline characters, streaming_feed/2 will raise :buffer_overflow. This is unlikely to affect real-world CSV data, but if needed you can raise the limit with the :max_buffer_size option:
CSV.parse_stream(stream, max_buffer_size: 512 * 1024 * 1024)

Added

Bounded streaming buffer — streaming parsers now enforce a maximum buffer size (default 256 MB) to prevent unbounded memory growth when no newlines are encountered
- streaming_feed/2 raises :buffer_overflow if the buffer would exceed the limit
- streaming_set_max_buffer/2 — new NIF to configure the limit per parser instance
- Configurable via :max_buffer_size option on parse_stream/2, stream_file/2, stream_enumerable/2, and stream_device/2
Dedicated rayon thread pool — parallel parsing (parse_string_parallel, parse_to_maps_parallel, and general multi-byte parallel) now runs on a named rustycsv-* thread pool instead of the global rayon pool, avoiding contention with other Rayon users in the same VM
Atoms module — internal mod atoms block for DRY atom definitions (ok, error, mutex_poisoned, buffer_overflow)

Changed

Dirty CPU scheduling — 12 NIFs that process unbounded input now run on dirty CPU schedulers to avoid blocking normal BEAM schedulers: parse_string, parse_string_with_config, parse_string_fast, parse_string_fast_with_config, parse_string_indexed, parse_string_indexed_with_config, parse_string_zero_copy, parse_string_zero_copy_with_config, parse_to_maps, streaming_feed, streaming_next_rows, streaming_finalize

Fixed

Mutex poisoning recovery — streaming parser NIFs now return a :mutex_poisoned exception instead of panicking if a previous call panicked while holding the lock
Sub-binary bounds check — make_subbinary now validates start + len <= input_len with a debug_assert! in dev/test builds and a release-mode safety net that returns an empty binary instead of undefined behavior

[0.3.1] - 2026-01-28

Added

Custom newline support — pass newlines option through to the Rust parser so custom line terminators work for parsing, not just dumping
- newlines: ["|"] — single-byte custom newline
- newlines: ["<br>"] — multi-byte custom newline
- newlines: ["<br>", "|"] — multiple custom newlines
- Default ["\r\n", "\n"] routes through existing SIMD-optimized paths — zero performance impact
- Custom newlines route through the general byte-by-byte parser
- Works with all strategies: :basic, :simd, :indexed, :parallel, :zero_copy
- Works with streaming (parse_stream/2)
- Works with headers-to-maps (headers: true)

Fixed

escape_formula uses configured replacement — no longer hardcodes \t prefix; respects the map's replacement value (e.g. %{["@", "+"] => "'"} now prepends ' instead of \t)
escape_chars uses configured newlines — custom newlines and line_separator now trigger quoting during dump instead of hardcoded \n/\r
options/0 normalizes separator to a list — always returns separator as a list (e.g. [","]) to match NimbleCSV behavior
parse_enumerable avoids eager concatenation — delegates to parse_stream instead of Enum.join, keeping peak memory proportional to result + one chunk
Integer codepoints accepted for :separator and :escape — e.g. separator: ?,, escape: ?" now works for NimbleCSV compatibility

[0.3.0] - 2026-01-28

Added

Headers-to-maps — return rows as Elixir maps instead of lists
- headers: true — first row becomes string keys
- headers: [:name, :age] — explicit atom keys
- headers: ["n", "a"] — explicit string keys
- Works with parse_string/2 (Rust-side map construction) and parse_stream/2 (Elixir-side Stream.transform)
- Rust-side key interning: header terms allocated once and reused across all rows
- Edge cases: fewer columns → nil, extra columns → ignored, duplicate headers → last wins
- All 5 batch strategies and streaming supported
- 97 new tests including cross-strategy consistency and parse_string/parse_stream agreement
Multi-separator support — multiple separator characters for NimbleCSV compatibility
- separator: [",", ";"] — accepts a list of separator strings
- Parsing: Any separator in the list is recognized as a field delimiter
- Dumping: Only the first separator is used for output (deterministic)
- Uses SIMD-optimized memchr2/memchr3 for 2-3 single-byte separators, with fallback for 4+
- Works with all parsing strategies and streaming
- Backward compatible: single separator string still works as before

Fixed

Multi-byte separator and escape support - Separators and escape sequences are no longer restricted to single bytes, completing NimbleCSV parity
- separator: "::" or separator: "||" — multi-byte separators now work
- separator: [",", "::"] — lists can mix single-byte and multi-byte separators
- escape: "$$" — multi-byte escape sequences now work
- Single-byte cases are unchanged — the existing SIMD-optimized code paths are used when all separators and the escape are single bytes (zero performance regression)
- Multi-byte cases use a new general-purpose byte-by-byte parser
- All 6 strategies and streaming support multi-byte separators and escapes

[0.2.0] - 2026-01-25

Added

:zero_copy strategy - New parsing strategy using BEAM sub-binary references
- Zero-copy for unquoted and simply-quoted fields
- Hybrid approach: only copies when quote unescaping is needed ("" → ")
- Matches NimbleCSV's memory model while keeping SIMD scanning speed
- Trade-off: sub-binaries keep parent binary alive until GC
SIMD-accelerated row boundary scanning - memchr3 for parallel strategy
- Replaces byte-by-byte scanning with hardware-accelerated jumps
- Only examines positions where quotes or newlines appear
- Properly handles RFC 4180 escaped quotes
mimalloc allocator - High-performance memory allocator (enabled by default)
- 10-20% faster allocation for many small objects
- Reduced memory fragmentation
- Zero tracking overhead in default configuration
Optional memory tracking - Opt-in profiling via memory_tracking Cargo feature
- When disabled (default): get_rust_memory/0 etc. return 0 with zero overhead
- When enabled: full allocation tracking for profiling
- Enable with default = ["mimalloc", "memory_tracking"] in Cargo.toml

Changed

Memory tracking is now opt-in instead of always-on (removes ~5-10% overhead)
Pre-allocated vectors throughout parsing paths for reduced reallocation
Updated ARCHITECTURE.md with comprehensive strategy documentation
Six parsing strategies now available (was five)

Performance

:parallel strategy benefits from SIMD row boundary scanning
:zero_copy strategy eliminates copy overhead for clean CSV data
All strategies benefit from mimalloc and pre-allocation improvements

Fixed

Benchmark methodology - Corrected unfair streaming comparison (NimbleCSV now uses line-based streams)
Memory claims - Honest metrics showing both BEAM and Rust allocations
:parallel threshold - Updated from 100MB+ to 500MB+ based on actual crossover testing
Documentation now accurately reflects 3.5x-9x speedups (up to 18x for quoted data)

[0.1.0] - 2025-01-25

Added

Initial release
Five parsing strategies: :simd, :parallel, :streaming, :indexed, :basic
Full NimbleCSV API compatibility
RFC 4180 compliance with 147 tests
Configurable separators (CSV, TSV, PSV, etc.)
Bounded-memory streaming for large files
Character encoding support: UTF-8, UTF-16 (LE/BE), UTF-32 (LE/BE), Latin-1
Pre-defined RustyCSV.Spreadsheet parser for Excel-compatible UTF-16 LE TSV
Rust memory tracking for profiling (now opt-in, see Unreleased)
Comprehensive documentation

Parsing Strategies

:simd - SIMD-accelerated delimiter scanning via memchr (default)
:parallel - Multi-threaded parsing via rayon for 500MB+ files with complex quoting
:streaming - Stateful chunked parser for unbounded files
:indexed - Two-phase index-then-extract for row range access
:basic - Simple byte-by-byte parsing for debugging

Encoding Support

:utf8 - UTF-8 (default, zero overhead)
:latin1 - ISO-8859-1 / Latin-1
{:utf16, :little} - UTF-16 Little Endian (Excel/Windows)
{:utf16, :big} - UTF-16 Big Endian
{:utf32, :little} - UTF-32 Little Endian
{:utf32, :big} - UTF-32 Big Endian

Validation

csv-spectrum acid test suite (12 tests)
csv-test-data RFC 4180 suite (17 tests)
PapaParse-inspired edge cases (53 tests)
Encoding conversion tests (20 tests)
Cross-strategy consistency validation
NimbleCSV output compatibility verification

← Previous Page Compliance & Validation

Next Page → License