Library Comparison
Copy MarkdownA comparison of Elixir character encoding libraries: encoding_rs, codepagex, and iconv.
Feature Comparison
| Feature | encoding_rs | codepagex | iconv |
|---|---|---|---|
| Implementation | Rust NIF | Pure Elixir | Erlang NIF (C) |
| Encoding Support | 40 encodings, 200+ aliases (WHATWG) | ~50 | System-dependent |
| Streaming API | ✅ Yes | ❌ No | ❌ No |
| Batch Operations | ✅ Yes | ❌ No | ❌ No |
| BOM Detection | ✅ Yes | ❌ No | ❌ No |
| Precompiled Binaries | ✅ Yes | N/A | ❌ No |
| Native Dependencies | Optional (Rust) | None | Required (libiconv) |
| WHATWG Compliant | ✅ Yes | ❌ No | ❌ No |
| Dirty Scheduler Support | ✅ Yes | N/A | ❌ No |
Benchmark Results
Run the benchmarks yourself by temporarily adding these dev dependencies to mix.exs:
# In deps(), add:
{:benchee, "~> 1.0", only: :dev},
{:benchee_html, "~> 1.0", only: :dev},
{:codepagex, "~> 0.1", only: :dev},
{:iconv, "~> 1.0", only: :dev}Then run:
mix deps.get
mix run bench/comparison_bench.exs
open bench/output/*.html # View interactive HTML reports
Methodology
Library versions tested: encoding_rs 0.2.0, codepagex 0.1.13, iconv 1.0.14
The benchmarks use encoding-specific character sets to ensure fair comparison:
- iso-8859-1: 60% ASCII + 40% Latin-1 supplement (accented chars)
- shift_jis: 40% ASCII + 30% Hiragana + 30% Katakana
- utf-16le: 40% ASCII + 20% Latin-1 + 20% Hiragana + 20% CJK
This ensures all characters can be encoded without replacement, exercising realistic code paths.
Expected Performance Characteristics
encoding_rs: Fastest across all input sizes due to Rust's SIMD optimizations. Uses dirty schedulers for large data to avoid blocking the BEAM.
codepagex: Competitive for small inputs (~100 bytes) where NIF call overhead is significant. Slower for larger data due to pure Elixir implementation.
iconv: Consistently slower than encoding_rs. C implementation adds more overhead than Rust NIF approach.
Benchmark Results (Apple Silicon M1)
ISO-8859-1 (Western European) - All three libraries:
| Operation | Input Size | encoding_rs | codepagex | iconv | encoding_rs vs others |
|---|---|---|---|---|---|
| Encode | 100 B | 426 ns | 531 ns | 2.2 μs | 1.2x / 5.2x faster |
| Encode | 10 KB | 20 μs | 144 μs | 152 μs | 7x faster |
| Encode | 1 MB | 5.6 ms | 15 ms | 15.6 ms | 2.7x faster |
| Decode | 100 B | 347 ns | 487 ns | 2.0 μs | 1.4x / 5.6x faster |
| Decode | 10 KB | 9.2 μs | 118 μs | 130 μs | 13-14x faster |
| Decode | 1 MB | 3.0 ms | 12.6 ms | 13.1 ms | 4.2-4.4x faster |
Shift_JIS (Japanese) - encoding_rs vs iconv:
| Operation | Input Size | encoding_rs | iconv | Speedup |
|---|---|---|---|---|
| Encode | 100 B | 0.50 μs | 3.7 μs | 7.4x |
| Encode | 10 KB | 32 μs | 451 μs | 14x |
| Encode | 1 MB | 6.2 ms | 46 ms | 7.5x |
| Decode | 100 B | 0.35 μs | 2.3 μs | 6.5x |
| Decode | 10 KB | 13 μs | 196 μs | 15x |
| Decode | 1 MB | 3.4 ms | 21 ms | 6.3x |
UTF-16LE - encoding_rs vs iconv:
| Operation | Input Size | encoding_rs | iconv | Speedup |
|---|---|---|---|---|
| Encode | 100 B | 0.31 μs | 1.8 μs | 5.8x |
| Encode | 10 KB | 7.7 μs | 116 μs | 15x |
| Encode | 1 MB | 2.8 ms | 11.9 ms | 4.2x |
| Decode | 100 B | 0.33 μs | 1.7 μs | 5.1x |
| Decode | 10 KB | 8.1 μs | 98 μs | 12x |
| Decode | 1 MB | 0.83 ms | 10.4 ms | 12.5x |
Run mix run bench/comparison_bench.exs to generate results for your system.
Pros and Cons
encoding_rs
Pros:
- Fastest performance - Rust NIF with SIMD optimizations
- WHATWG compliant - Same behavior as web browsers
- Streaming support - Handle chunked data with stateful decoder
- Batch operations - Process multiple items efficiently
- BOM detection - Automatic byte order mark handling
- Firefox-tested - Battle-tested in Mozilla's browser
- Precompiled binaries - No Rust toolchain needed for most platforms
- Dirty scheduler aware - Won't block the BEAM with large data
Cons:
- Requires precompiled binary or Rust toolchain
- Larger dependency footprint than pure Elixir
- NIF crashes can take down the BEAM VM
codepagex
Pros:
- Pure Elixir - No native dependencies at all
- Simple installation - Just add to mix.exs
- Predictable behavior - No NIF edge cases
- Safe - Can't crash the BEAM VM
Cons:
- Significantly slower than NIF-based solutions
- Limited encoding support (~50 encodings)
- No streaming API for chunked data
- No batch operations
- Not WHATWG compliant
iconv
Pros:
- Fast - C-based implementation
- Wide encoding support - Whatever system iconv supports
- Mature - Well-tested libiconv library
Cons:
- System dependency - Requires libiconv installed
- No streaming API - Can't handle chunked data
- Platform variance - Different behavior across systems
- No precompiled binaries - Must compile on install
- No dirty scheduler support - Can block BEAM with large data
When to Use Each Library
Use encoding_rs when:
- Performance is critical
- Processing large files or high throughput
- Need streaming support for chunked data
- Batch processing multiple encodings
- WHATWG compliance matters (web content)
- Processing CJK encodings (Shift_JIS, GBK, Big5, etc.)
Use codepagex when:
- No native dependencies allowed
- Only need basic Western encodings
- Processing small amounts of data
- Deployment environment is restrictive
- BEAM stability is paramount
Use iconv when:
- Need encodings not in WHATWG standard
- Already have libiconv as a dependency
- System-native behavior is preferred
- Legacy system compatibility
API Comparison
Decoding
# encoding_rs
{:ok, utf8} = EncodingRs.decode(binary, "windows-1252")
# codepagex
utf8 = Codepagex.to_string!(binary, :iso_8859_1)
# iconv
utf8 = :iconv.convert("WINDOWS-1252", "UTF-8", binary)Encoding
# encoding_rs
{:ok, encoded} = EncodingRs.encode(utf8, "windows-1252")
# codepagex
encoded = Codepagex.from_string!(utf8, :iso_8859_1)
# iconv
encoded = :iconv.convert("UTF-8", "WINDOWS-1252", utf8)Streaming (encoding_rs only)
# Create decoder for chunked data
decoder = EncodingRs.Decoder.new("shift_jis")
# Process chunks (handles split multibyte characters)
{:ok, chunk1, decoder} = EncodingRs.Decoder.decode_chunk(decoder, data1)
{:ok, chunk2, decoder} = EncodingRs.Decoder.decode_chunk(decoder, data2)
{:ok, final} = EncodingRs.Decoder.finish(decoder)Batch Operations (encoding_rs only)
# Decode multiple items in one call
items = [
{"data1", "windows-1252"},
{"data2", "shift_jis"},
{"data3", "utf-16le"}
]
results = EncodingRs.decode_batch(items)Summary
| Priority | Recommended Library |
|---|---|
| Maximum performance | encoding_rs |
| No native dependencies | codepagex |
| System compatibility | iconv |
| Streaming/chunked data | encoding_rs |
| Web content processing | encoding_rs |
| Legacy system support | iconv |