Library Comparison

Copy Markdown

A comparison of Elixir character encoding libraries: encoding_rs, codepagex, and iconv.

Feature Comparison

Featureencoding_rscodepagexiconv
ImplementationRust NIFPure ElixirErlang NIF (C)
Encoding Support40 encodings, 200+ aliases (WHATWG)~50System-dependent
Streaming API✅ Yes❌ No❌ No
Batch Operations✅ Yes❌ No❌ No
BOM Detection✅ Yes❌ No❌ No
Precompiled Binaries✅ YesN/A❌ No
Native DependenciesOptional (Rust)NoneRequired (libiconv)
WHATWG Compliant✅ Yes❌ No❌ No
Dirty Scheduler Support✅ YesN/A❌ No

Benchmark Results

Run the benchmarks yourself by temporarily adding these dev dependencies to mix.exs:

# In deps(), add:
{:benchee, "~> 1.0", only: :dev},
{:benchee_html, "~> 1.0", only: :dev},
{:codepagex, "~> 0.1", only: :dev},
{:iconv, "~> 1.0", only: :dev}

Then run:

mix deps.get
mix run bench/comparison_bench.exs
open bench/output/*.html  # View interactive HTML reports

Methodology

Library versions tested: encoding_rs 0.2.0, codepagex 0.1.13, iconv 1.0.14

The benchmarks use encoding-specific character sets to ensure fair comparison:

  • iso-8859-1: 60% ASCII + 40% Latin-1 supplement (accented chars)
  • shift_jis: 40% ASCII + 30% Hiragana + 30% Katakana
  • utf-16le: 40% ASCII + 20% Latin-1 + 20% Hiragana + 20% CJK

This ensures all characters can be encoded without replacement, exercising realistic code paths.

Expected Performance Characteristics

  • encoding_rs: Fastest across all input sizes due to Rust's SIMD optimizations. Uses dirty schedulers for large data to avoid blocking the BEAM.

  • codepagex: Competitive for small inputs (~100 bytes) where NIF call overhead is significant. Slower for larger data due to pure Elixir implementation.

  • iconv: Consistently slower than encoding_rs. C implementation adds more overhead than Rust NIF approach.

Benchmark Results (Apple Silicon M1)

ISO-8859-1 (Western European) - All three libraries:

OperationInput Sizeencoding_rscodepagexiconvencoding_rs vs others
Encode100 B426 ns531 ns2.2 μs1.2x / 5.2x faster
Encode10 KB20 μs144 μs152 μs7x faster
Encode1 MB5.6 ms15 ms15.6 ms2.7x faster
Decode100 B347 ns487 ns2.0 μs1.4x / 5.6x faster
Decode10 KB9.2 μs118 μs130 μs13-14x faster
Decode1 MB3.0 ms12.6 ms13.1 ms4.2-4.4x faster

Shift_JIS (Japanese) - encoding_rs vs iconv:

OperationInput Sizeencoding_rsiconvSpeedup
Encode100 B0.50 μs3.7 μs7.4x
Encode10 KB32 μs451 μs14x
Encode1 MB6.2 ms46 ms7.5x
Decode100 B0.35 μs2.3 μs6.5x
Decode10 KB13 μs196 μs15x
Decode1 MB3.4 ms21 ms6.3x

UTF-16LE - encoding_rs vs iconv:

OperationInput Sizeencoding_rsiconvSpeedup
Encode100 B0.31 μs1.8 μs5.8x
Encode10 KB7.7 μs116 μs15x
Encode1 MB2.8 ms11.9 ms4.2x
Decode100 B0.33 μs1.7 μs5.1x
Decode10 KB8.1 μs98 μs12x
Decode1 MB0.83 ms10.4 ms12.5x

Run mix run bench/comparison_bench.exs to generate results for your system.

Pros and Cons

encoding_rs

Pros:

  • Fastest performance - Rust NIF with SIMD optimizations
  • WHATWG compliant - Same behavior as web browsers
  • Streaming support - Handle chunked data with stateful decoder
  • Batch operations - Process multiple items efficiently
  • BOM detection - Automatic byte order mark handling
  • Firefox-tested - Battle-tested in Mozilla's browser
  • Precompiled binaries - No Rust toolchain needed for most platforms
  • Dirty scheduler aware - Won't block the BEAM with large data

Cons:

  • Requires precompiled binary or Rust toolchain
  • Larger dependency footprint than pure Elixir
  • NIF crashes can take down the BEAM VM

codepagex

Pros:

  • Pure Elixir - No native dependencies at all
  • Simple installation - Just add to mix.exs
  • Predictable behavior - No NIF edge cases
  • Safe - Can't crash the BEAM VM

Cons:

  • Significantly slower than NIF-based solutions
  • Limited encoding support (~50 encodings)
  • No streaming API for chunked data
  • No batch operations
  • Not WHATWG compliant

iconv

Pros:

  • Fast - C-based implementation
  • Wide encoding support - Whatever system iconv supports
  • Mature - Well-tested libiconv library

Cons:

  • System dependency - Requires libiconv installed
  • No streaming API - Can't handle chunked data
  • Platform variance - Different behavior across systems
  • No precompiled binaries - Must compile on install
  • No dirty scheduler support - Can block BEAM with large data

When to Use Each Library

Use encoding_rs when:

  • Performance is critical
  • Processing large files or high throughput
  • Need streaming support for chunked data
  • Batch processing multiple encodings
  • WHATWG compliance matters (web content)
  • Processing CJK encodings (Shift_JIS, GBK, Big5, etc.)

Use codepagex when:

  • No native dependencies allowed
  • Only need basic Western encodings
  • Processing small amounts of data
  • Deployment environment is restrictive
  • BEAM stability is paramount

Use iconv when:

  • Need encodings not in WHATWG standard
  • Already have libiconv as a dependency
  • System-native behavior is preferred
  • Legacy system compatibility

API Comparison

Decoding

# encoding_rs
{:ok, utf8} = EncodingRs.decode(binary, "windows-1252")

# codepagex
utf8 = Codepagex.to_string!(binary, :iso_8859_1)

# iconv
utf8 = :iconv.convert("WINDOWS-1252", "UTF-8", binary)

Encoding

# encoding_rs
{:ok, encoded} = EncodingRs.encode(utf8, "windows-1252")

# codepagex
encoded = Codepagex.from_string!(utf8, :iso_8859_1)

# iconv
encoded = :iconv.convert("UTF-8", "WINDOWS-1252", utf8)

Streaming (encoding_rs only)

# Create decoder for chunked data
decoder = EncodingRs.Decoder.new("shift_jis")

# Process chunks (handles split multibyte characters)
{:ok, chunk1, decoder} = EncodingRs.Decoder.decode_chunk(decoder, data1)
{:ok, chunk2, decoder} = EncodingRs.Decoder.decode_chunk(decoder, data2)
{:ok, final} = EncodingRs.Decoder.finish(decoder)

Batch Operations (encoding_rs only)

# Decode multiple items in one call
items = [
  {"data1", "windows-1252"},
  {"data2", "shift_jis"},
  {"data3", "utf-16le"}
]
results = EncodingRs.decode_batch(items)

Summary

PriorityRecommended Library
Maximum performanceencoding_rs
No native dependenciescodepagex
System compatibilityiconv
Streaming/chunked dataencoding_rs
Web content processingencoding_rs
Legacy system supporticonv