RustyCSV Benchmarks

Copy Markdown View Source

This document presents benchmark results comparing RustyCSV's parsing and encoding performance against pure Elixir (NimbleCSV 1.3.0).

Test Environment

  • Elixir: 1.19.4
  • OTP: 28
  • Hardware: Apple Silicon M1 Pro (10 cores, 16 GB RAM)
  • RustyCSV: 0.3.6
  • Pure Elixir baseline: NimbleCSV 1.3.0
  • Test date: February 2, 2026

Note: All results below were collected on this specific hardware. Absolute throughput numbers will vary on different machines, but relative speedups should be broadly representative.

Strategies Compared

StrategyDescriptionBest For
:simdSIMD scan + boundary-based sub-binary fields (default)General use
:parallelSIMD scan + rayon parallel boundary extraction + sub-binariesLarge files
:streamingBounded-memory chunksUnbounded files

Both batch strategies share a single-pass SIMD structural scanner that finds every unquoted separator and row ending, then create BEAM sub-binary references into the original input. :simd extracts boundaries single-threaded; :parallel uses rayon across multiple threads, then builds sub-binary terms on the main thread.

Throughput Benchmark Results

Simple CSV (334 KB, 10K rows, no quotes)

StrategyThroughputvs pure Elixir
RustyCSV (simd)772 ips3.5x faster
RustyCSV (parallel)487 ips2.1x faster
Pure Elixir233 ipsbaseline

Quoted CSV (947 KB, 10K rows, all fields quoted with escapes)

StrategyThroughputvs pure Elixir
RustyCSV (simd)449 ips18.6x faster
RustyCSV (parallel)326 ips13.3x faster
Pure Elixir25 ipsbaseline

Mixed/Realistic CSV (652 KB, 10K rows)

StrategyThroughputvs pure Elixir
RustyCSV (simd)497 ips5.2x faster
RustyCSV (parallel)351 ips3.5x faster
Pure Elixir101 ipsbaseline

Large CSV (6.82 MB, 100K rows)

StrategyThroughputvs pure Elixir
RustyCSV (simd)48.9 ips11.4x faster
RustyCSV (parallel)39.3 ips9.1x faster
Pure Elixir4.3 ipsbaseline

Very Large CSV (108 MB, 1.5M rows)

StrategyThroughputvs pure Elixir
RustyCSV (simd)2.5 ips12.7x faster
RustyCSV (parallel)2.07 ips8.6x faster
Pure Elixir0.24 ipsbaseline

Memory Comparison

RustyCSV allocates on the Rust side (boundary vectors during parsing) while pure Elixir allocates entirely on the BEAM. With the memory_tracking feature enabled, we measure Rust NIF peak allocation alongside Benchee's BEAM-side measurement.

Methodology

We measure two metrics:

  1. NIF Peak: Peak allocation on the Rust side during parsing (requires memory_tracking feature)
  2. BEAM Allocation: Memory allocated on the BEAM during parsing (what Benchee measures)

RustyCSV's BEAM-side allocation is ~1.6 KB across all strategies — just list/tuple scaffolding. The parsed field data lives as sub-binary references into the original input binary (no per-field copy).

Decode Memory by File Type

ScenarioStrategyNIF Peak (RustyCSV)BEAM (Pure Elixir)Ratio
Simple CSV (334 KB)simd1.41 MB6.04 MB0.23x
Simple CSV (334 KB)parallel1.48 MB6.04 MB0.25x
Quoted CSV (947 KB)simd1.64 MB23.89 MB0.07x
Quoted CSV (947 KB)parallel1.72 MB23.89 MB0.07x
Mixed CSV (652 KB)simd1.63 MB9.64 MB0.17x
Large File (6.82 MB)simd15.88 MB97.02 MB0.16x
Large File (6.82 MB)parallel16.38 MB97.02 MB0.17x
Very Large (108 MB)simd240.8 MB1407.75 MB0.17x

Key insight: RustyCSV uses 5-14x less memory than pure Elixir. All batch strategies (including parallel) use boundary-based sub-binaries — just Vec<Vec<(usize, usize)>> boundary indices (16 bytes per field). Parallel has slightly higher NIF peak due to rayon thread-pool overhead.

BEAM Reductions (Scheduler Work)

StrategyReductionsvs pure Elixir
RustyCSV (simd)10,50024x fewer
RustyCSV (parallel)15,10017x fewer
Pure Elixir254,500baseline

What this means:

  • Low reductions = less scheduler overhead
  • NIFs run outside BEAM's reduction counting
  • Trade-off: NIFs can't be preempted mid-execution

Streaming Comparison

File: 6.8 MB (100K rows)

File.stream! Input (parse_stream/2)

ParserModeTimeSpeedup
RustyCSVline-based54ms2.2x faster
Pure Elixirline-based117msbaseline
RustyCSV64KB binary chunks244msunique capability

Result: RustyCSV is 2.2x faster for line-based streaming.

RustyCSV automatically detects File.Stream in line mode and switches to 64KB binary chunk reads, reducing stream iterations from ~100K (one per line) to ~100. The Rust NIF handles arbitrary chunk boundaries internally, so it can operate on raw binary chunks rather than pre-split lines.

Arbitrary Binary Chunks

RustyCSV can also process arbitrary binary chunks directly (useful for network streams, compressed data, etc.). Pure Elixir parse_stream operates on line-delimited input, which is the standard approach when using File.stream!/1.

Real-World Benchmark: Amazon Settlement Reports

This section presents results from parsing Amazon SP-API settlement reports in TSV format.

Test Data

  • Data source: Amazon Seller Central settlement reports (TSV format)
  • Report sizes: 1KB to 2.6MB (20 to 15,820 rows)

Small Files (<200 rows)

RowsRustyCSVPure ElixirString.split
202ms2ms2ms
242ms2ms2ms
362ms2ms2ms
932ms2ms2ms
1002ms2ms2ms
1412ms3ms3ms

Conclusion: For small files, all approaches perform equivalently (~2ms).

Large Files (10K+ rows)

RowsRustyCSVPure Elixirvs pure Elixir
9,98546ms64ms28% faster
10,96154ms68ms21% faster
11,24660ms69ms13% faster
11,75456ms78ms28% faster
13,07384ms96ms13% faster

Conclusion: RustyCSV is consistently 13-28% faster than pure Elixir for large real-world files.

Summary

Speed Rankings by File Type

File TypeBest StrategySpeedup vs pure Elixir
Simple CSV:simd3.5x
Quoted CSV:simd18.8x
Mixed CSV:simd5.7x
Large CSV (7MB):simd11.5x
Very Large CSV (108MB):simd13.0x
Streaming (6.8MB)parse_stream/22.2x
Real-world TSV:simd1.1-1.3x

Strategy Selection Guide

Use CaseRecommended Strategy
Default / General use:simd
Large files with many cores:parallel
Streaming / Unboundedparse_stream/2

Key Findings

  1. Quoted fields show largest gains — 18.6x faster due to SIMD prefix-XOR quote detection handling all escapes in a single pass

  2. 5-14x less memory than pure Elixir — Boundary-based parsing uses only 16 bytes per field (offset pairs) on the Rust side, then near-free sub-binary references on the BEAM side. Pure Elixir allocates full copies of every field.

  3. BEAM reductions are minimal — 17-24x fewer reductions than pure Elixir, reducing scheduler load (but NIFs can't be preempted)

  4. Streaming is 2.2x faster — RustyCSV auto-optimizes File.Stream to binary chunk mode. Also supports arbitrary binary chunks for non-file streams.

  5. Real-world vs synthetic — Synthetic benchmarks show 3.5-19x gains; real-world TSV shows 13-28% gains due to simpler data patterns.

Encoding Benchmark Results

dump_to_iodata returns a single flat binary. See the README for usage details and how this differs from pure Elixir.

Throughput

ScenarioOutput SizeRustyCSV ipsPure Elixir ipsSpeedup
Plain UTF-8 — DB export (10K rows × 8 cols)709 KB638.9253.32.5x
Plain UTF-8 — DB export (100K rows × 8 cols)7.1 MB65.918.33.6x
Plain UTF-8 — User content (10K rows, heavy quoting)955 KB717.4140.75.1x
Plain UTF-8 — Wide table (10K rows × 50 cols)2.9 MB141.732.44.4x
Formula UTF-8 — DB export (10K rows)709 KB582.7181.13.2x
Formula UTF-8 — Formula-heavy (10K rows, ~40% trigger)484 KB964.7285.33.4x
UTF-16 LE — DB export (10K rows)1.4 MB379.912.131.5x
Formula + UTF-16 LE — Formula-heavy (10K rows)964 KB565.319.528.9x

Memory

ScenarioNIF Peak (RustyCSV)BEAM (Pure Elixir)Ratio
Plain UTF-8 — DB export (10K rows)1.5 MB5.1 MB0.3x
Plain UTF-8 — DB export (100K rows)12.0 MB51.4 MB0.2x
Plain UTF-8 — User content (heavy quoting)1.5 MB5.9 MB0.3x
Plain UTF-8 — Wide table (50 cols)6.0 MB30.5 MB0.2x
Formula UTF-8 — DB export (10K rows)1.5 MB8.2 MB0.2x
Formula UTF-8 — Formula-heavy769 KB5.4 MB0.1x
UTF-16 LE — DB export (10K rows)3.0 MB52.8 MB0.1x
Formula + UTF-16 LE — Formula-heavy1.5 MB37.4 MB0.04x

RustyCSV's BEAM-side allocation is 80 bytes across all scenarios. NIF peak memory is proportional to the output size.

Encoding Summary

Encoding PathSpeedup vs pure ElixirMemory Ratio
Plain UTF-82.5–5.1x faster0.2–0.3x
Formula UTF-83.2–3.4x faster0.1–0.2x
UTF-16 LE31.5x faster0.1x
Formula + UTF-16 LE28.9x faster0.04x

Non-UTF-8 encoding shows the largest gains due to single-pass encoding of the entire output buffer.

Running the Benchmarks

# Decode benchmark (all strategies)
mix run bench/decode_bench.exs

# Encode benchmark (all encoding paths)
mix run bench/encode_bench.exs

For memory tracking details, enable the memory_tracking feature:

# In native/rustycsv/Cargo.toml
[features]
default = ["mimalloc", "memory_tracking"]

Then rebuild: FORCE_RUSTYCSV_BUILD=true mix compile --force