RustyCSV takes correctness seriously. With 464 tests across multiple test suites, including industry-standard validation suites used by CSV parsers across multiple languages, RustyCSV is one of the most thoroughly tested CSV libraries available for Elixir.
This document describes RFC 4180 compliance and the validation methodology.
RFC 4180 Compliance
RustyCSV.RFC4180 is fully compliant with RFC 4180 (Common Format and MIME Type for Comma-Separated Values).
RFC 4180 Requirements
| Section | Requirement | Status |
|---|---|---|
| 2.1 | Records separated by line breaks (CRLF) | ✅ Accepts CRLF and LF; outputs CRLF |
| 2.2 | Last record may or may not have trailing line break | ✅ |
| 2.3 | Optional header line | ✅ Via skip_headers and headers: options |
| 2.4 | Each record should have same number of fields | ✅ Parses variable-width rows |
| 2.5 | Spaces are part of the field | ✅ Preserved exactly |
| 2.6 | Fields may be enclosed in double quotes | ✅ |
| 2.6 | Fields containing CRLF must be quoted | ✅ |
| 2.6 | Fields containing double quotes must be quoted | ✅ |
| 2.6 | Fields containing commas must be quoted | ✅ |
| 2.7 | Double quotes escaped by doubling ("") | ✅ |
Line Ending Behavior
Parsing:
- Accepts both CRLF (
\r\n) and LF (\n) as record separators - Preserves embedded CRLF/LF inside quoted fields exactly as-is
Dumping:
- Uses CRLF (
\r\n) as the record separator (RFC 4180 compliant) - Matches NimbleCSV.RFC4180 output exactly
Differences from Strict RFC 4180
RustyCSV makes one practical concession shared by most CSV implementations:
- Accepts LF line endings - RFC 4180 specifies CRLF, but LF-only files are common on Unix systems. RustyCSV parses both.
Bare Carriage Return (\r)
A bare \r not followed by \n is treated as field data, not a line ending. This matches:
- The RFC 4180 ABNF grammar (bare
\ris not inTEXTDATA, only valid inside quoted fields) - NimbleCSV (only
\r\nand\nare line endings) - Go
encoding/csv, Ruby CSV, PostgreSQL COPY
Python's csv module differs — it treats bare \r as a line ending via universal newline handling.
Industry Test Suites
RustyCSV validates correctness against two industry-standard CSV test suites.
csv-spectrum (Acid Test)
Source: https://github.com/max-mapper/csv-spectrum
The csv-spectrum suite is a widely-used "acid test" for CSV parsers, providing CSV files with JSON expected outputs for verification.
Note: The csv-spectrum repository's raw files have LF line endings due to git normalization. Our test fixtures match the actual content served by GitHub, and we verify that both RustyCSV and NimbleCSV produce identical output for these files.
| Test File | Edge Case | Status |
|---|---|---|
simple.csv | Basic parsing | ✅ |
simple_crlf.csv | CRLF line endings | ✅ |
comma_in_quotes.csv | Commas inside quoted fields | ✅ |
escaped_quotes.csv | Doubled quotes ("" → ") | ✅ |
newlines.csv | LF inside quoted fields | ✅ |
newlines_crlf.csv | CRLF inside quoted fields | ✅ |
quotes_and_newlines.csv | Combined edge cases | ✅ |
empty.csv | Headers only (LF) | ✅ |
empty_crlf.csv | Headers only (CRLF) | ✅ |
utf8.csv | Unicode content | ✅ |
json.csv | JSON-like content in fields | ✅ |
location_coordinates.csv | Numeric/coordinate data | ✅ |
Test file: test/csv_spectrum_test.exs
csv-test-data (RFC 4180 Focused)
Source: https://github.com/sineemore/csv-test-data
A comprehensive RFC 4180-focused test suite with both valid and invalid CSV cases.
Valid Cases
| Test File | Edge Case | Status |
|---|---|---|
simple-lf.csv | Basic with LF endings | ✅ |
simple-crlf.csv | Basic with CRLF endings | ✅ |
quotes-with-comma.csv | Commas in quoted fields | ✅ |
quotes-with-escaped-quote.csv | Escaped quotes | ✅ |
quotes-with-newline.csv | Newlines in quoted fields | ✅ |
quotes-with-space.csv | Spaces in quoted fields | ✅ |
quotes-empty.csv | Empty quoted fields | ✅ |
empty-field.csv | Empty unquoted fields | ✅ |
one-column.csv | Single column | ✅ |
empty-one-column.csv | Single empty column | ✅ |
leading-space.csv | Leading spaces preserved | ✅ |
trailing-space.csv | Trailing spaces preserved | ✅ |
trailing-newline.csv | File ends with newline | ✅ |
utf8.csv | UTF-8 encoded content | ✅ |
header-simple.csv | Basic with header row | ✅ |
header-no-rows.csv | Headers only, no data | ✅ |
all-empty.csv | All empty fields | ✅ |
Test file: test/rfc4180_test_data_test.exs
Edge Case Tests (PapaParse-inspired)
Source: https://github.com/mholt/PapaParse/blob/master/tests/test-cases.js
A comprehensive edge case test suite inspired by PapaParse, covering malformed input, unusual delimiters, and stress testing.
| Category | Test Cases |
|---|---|
| Basic parsing | Empty input, single field, delimiter-only |
| Whitespace | Edges, tabs, quoted whitespace |
| Quoted fields | Delimiters, newlines, escaped quotes |
| Empty fields | Leading, trailing, consecutive |
| Line endings | LF, CRLF, mixed, no trailing |
| Field counts | Ragged rows, single/many columns |
| Unicode | UTF-8, emoji, mixed scripts, BOM |
| Special chars | Null bytes, control chars, backslash |
| Large data | 100K char fields, 1000 rows, 500 columns |
| Strategy consistency | All strategies produce identical output |
Test file: test/edge_cases_test.exs
Cross-Strategy Validation
All parsing strategies must produce identical output for the same input. This is verified by running every test file through all six strategies:
| Strategy | Description | Validates Against |
|---|---|---|
:basic | SIMD scan + basic field extraction | All test suites |
:simd | SIMD structural scanner (default) | All test suites |
:indexed | SIMD scan + two-phase index-then-extract | All test suites |
:parallel | SIMD scan + multi-threaded via rayon | All test suites |
:zero_copy | SIMD scan + sub-binary references | All test suites |
:streaming | Stateful chunked parser | All test suites |
# From test/csv_spectrum_test.exs
for strategy <- [:basic, :simd, :indexed, :parallel] do
test "all tests pass with #{strategy} strategy" do
for name <- test_files do
result = CSV.parse_string(csv, strategy: strategy)
assert result == expected
end
end
endNimbleCSV Compatibility
RustyCSV is designed as a drop-in replacement for NimbleCSV. Compatibility is verified by:
- API compatibility tests - All NimbleCSV API functions work identically
- Output matching -
dump_to_iodata/1produces identical output to NimbleCSV - Round-trip tests - Parse → dump → parse produces identical data
- Full-file validation - 100K-row CSV parsed through both libraries produces identical row-by-row output
Test file: test/nimble_csv_compat_test.exs
# Verify dump output matches NimbleCSV exactly
test "dump output matches NimbleCSV" do
data = [["a", "b"], ["1", "2"]]
assert RustyCSV.RFC4180.dump_to_iodata(data) ==
NimbleCSV.RFC4180.dump_to_iodata(data)
endKnown Behavioral Differences
Any input NimbleCSV accepts, RustyCSV also accepts with identical output. Two edge cases differ:
1. Unquoted fields containing double-quote characters
Input: "a",b\n (space before quote)
| Parser | Behavior |
|---|---|
| NimbleCSV | Raises ParseError |
| RustyCSV | Parses as [" \"a\"", "b"] |
Both behaviors are defensible for malformed input. NimbleCSV treats any " as significant. RustyCSV treats a field as quoted only if it starts with the escape character, matching Python's csv module and Go's encoding/csv (with LazyQuotes).
2. parse_stream/2 with non-line-delimited chunks
The two libraries use different streaming architectures. NimbleCSV's parse_stream expects each element of the input enumerable to be a complete line. RustyCSV's streaming parser accepts arbitrary chunk boundaries because the Rust NIF maintains parse state across feed() calls.
This difference is invisible for the standard use case (File.stream! |> parse_stream), where both produce identical output. It only surfaces when manually constructing a stream of non-line-delimited binary chunks.
Running Compliance Tests
# Run all tests including compliance suites
mix test
# Run only compliance tests
mix test test/csv_spectrum_test.exs test/rfc4180_test_data_test.exs
# Run with specific strategy
mix test --only strategy:parallel
Test Fixtures
Test fixtures are stored in test/fixtures/:
test/fixtures/
├── csv-spectrum/ # csv-spectrum acid test suite
│ ├── *.csv # CSV test files
│ └── *.json # Expected JSON outputs
└── csv-test-data/ # RFC 4180 test suite
├── *.csv # Valid/invalid CSV files
└── *.json # Expected outputsTest Summary
| Suite | Tests | Purpose |
|---|---|---|
| Core tests | 36 | Basic functionality and NimbleCSV compatibility |
| csv-spectrum | 17 | Industry acid test |
| csv-test-data | 23 | RFC 4180 compliance |
| Edge cases | 53 | Stress testing and malformed input |
| Encoding | 20 | UTF-16, UTF-32, Latin-1 conversion |
| Multi-separator | 19 | Multiple single-byte separator support |
| Multi-byte separator | 13 | Multi-byte separator support (::, ||, mixed) |
| Multi-byte escape | 12 | Multi-byte escape support ($$) |
| Native API | 40 | NIF-level separator/escape encoding |
| Headers-to-maps | 97 | headers: option, cross-strategy consistency, stream parity |
| Custom newlines | 18 | Custom newline parsing and streaming |
| Streaming safety | 12 | Buffer overflow, mutex poisoning, concurrent access |
| Concurrent access | 7 | Multi-process streaming safety |
| Total | 464 | |
Additional Test Resources
The following resources provide additional CSV test cases that may be valuable for future validation:
W3C CSVW Test Suite
The W3C CSV on the Web (CSVW) test suite contains 550+ tests for CSV validation and conversion to JSON/RDF. While focused on metadata and semantic representation, the parsing tests are valuable.
csv-fuzz (Fuzzing)
Fuzzing-based testing using Jazzer to find crashes, exceptions, and memory issues.
References
- RFC 4180 - Common Format and MIME Type for CSV
- csv-spectrum - CSV acid test suite
- csv-test-data - RFC 4180 test data
- PapaParse - JavaScript CSV parser with comprehensive test suite
- W3C CSVW Tests - W3C CSV on the Web test suite
- NimbleCSV - Elixir CSV library (compatibility target)