Comprehensive benchmarks comparing RustyJson vs Jason across synthetic and real-world datasets.
Key Findings
- Fast across all workloads — plain data, struct-heavy data, and decoding (including deeply nested and small payloads)
- Encoding plain data shows the largest gains — 3-6x faster, 2-3x less memory
- Struct encoding optimized in v0.3.3 via single-pass iodata pipeline with compile-time codegen (~2x improvement over v0.3.2)
- Deep-nested decode optimized in v0.3.3 via single-entry fast path (~27% faster than v0.3.2 for 100-level nested JSON)
- Larger payloads = bigger advantage — real-world 10 MB files show better results than synthetic benchmarks
- BEAM scheduler load dramatically reduced — 100-28,000x fewer reductions
Test Environment
| Attribute | Value |
|---|---|
| OS | macOS |
| CPU | Apple M1 Pro |
| Cores | 10 |
| Memory | 16 GB |
| Elixir | 1.19.4 |
| Erlang/OTP | 28.2 |
Real-World Benchmarks: Amazon Settlement Reports
These are production JSON files from Amazon SP-API settlement reports, representing real-world API response patterns with nested objects, arrays of transactions, and mixed data types.
Encoding Performance (Elixir → JSON)
| File Size | RustyJson | Jason | Speed | Memory |
|---|---|---|---|---|
| 10.87 MB | 24 ms | 131 ms | 5.5x faster | 2.7x less |
| 9.79 MB | 21 ms | 124 ms | 5.9x faster | 2-3x less |
| 9.38 MB | 21 ms | 104 ms | 5.0x faster | 2-3x less |
Decoding Performance (JSON → Elixir)
| File Size | RustyJson | Jason | Speed | Memory |
|---|---|---|---|---|
| 10.87 MB | 61 ms | 152 ms | 2.5x faster | similar |
| 9.79 MB | 55 ms | 134 ms | 2.4x faster | similar |
| 9.38 MB | 50 ms | 119 ms | 2.4x faster | similar |
BEAM Reductions (Scheduler Load)
| File Size | RustyJson | Jason | Reduction |
|---|---|---|---|
| 10.87 MB encode | 404 | 11,570,847 | 28,641x fewer |
This is the most dramatic difference - RustyJson offloads virtually all work to native code.
Synthetic Benchmarks: nativejson-benchmark
Using standard datasets from nativejson-benchmark:
| Dataset | Size | Description |
|---|---|---|
| canada.json | 2.1 MB | Geographic coordinates (number-heavy) |
| citm_catalog.json | 1.6 MB | Event catalog (mixed types) |
| twitter.json | 617 KB | Social media with CJK (unicode-heavy) |
Decode Performance (JSON → Elixir)
| Input | RustyJson ips | Average |
|---|---|---|
| canada.json (2.1 MB) | 153 | 6.55 ms |
| citm_catalog.json (1.6 MB) | 323 | 3.09 ms |
| twitter.json (617 KB) | 430 | 2.33 ms |
| large_list (50k items, 2.3 MB) | 62 | 16.0 ms |
| deep_nested (1.1 KB, 100 levels) | 148K | 6.75 µs |
| wide_object (75 KB, 5k keys) | 1,626 | 0.61 ms |
Roundtrip Performance (Decode + Encode)
| Input | RustyJson | Jason | Speedup |
|---|---|---|---|
| canada.json | 14 ms | 48 ms | 3.4x faster |
| citm_catalog.json | 6 ms | 14 ms | 2.5x faster |
| twitter.json | 4 ms | 9 ms | 2.3x faster |
BEAM Reductions by Dataset
| Dataset | RustyJson | Jason | Ratio |
|---|---|---|---|
| canada.json | ~3,500 | ~964,000 | 275x fewer |
| citm_catalog.json | ~300 | ~621,000 | 2,000x fewer |
| twitter.json | ~2,000 | ~511,000 | 260x fewer |
Struct Encoding Benchmarks (v0.3.3+)
Encoding data that contains Elixir structs (e.g., @derive RustyJson.Encoder or custom defimpl) follows a different path than plain maps and lists. Structs require the RustyJson.Encoder protocol to convert them to JSON-serializable forms.
In v0.3.3, the struct encoding pipeline was rewritten from a three-pass approach (protocol dispatch → fragment resolution → NIF serialization) to a single-pass iodata pipeline with compile-time codegen for derived structs. This closed the last remaining performance gap, making RustyJson faster across all encoding workloads.
Struct Encoding Performance
| Workload | Speedup (v0.3.3 vs v0.3.2) |
|---|---|
| Derived struct (5 fields) | ~2x faster |
| Derived struct (10 fields) | ~2x faster |
Custom encoder (returning Encode.map) | ~2.5x faster |
| List of 1,000 derived structs | ~2x faster |
| Nested structs (3 levels deep) | ~2x faster |
Measured with protocol consolidation enabled (MIX_ENV=prod), which is the default for production builds.
How It Works
RustyJson's struct encoding produces iodata in a single pass:
- Derived encoders (
@derive RustyJson.Encoder) generate compile-time iodata templates with pre-escaped keys — no runtimeMap.from_struct,Map.to_list, or key escaping. - Map/List impls detect struct-containing data and route through
Encode.map/2/Encode.list/2to build iodata directly, wrapped in aFragment. - NIF bypass — When the top-level result is an iodata Fragment (no pretty-print or compression),
IO.iodata_to_binary/1is used directly, avoiding Erlang↔Rust term conversion entirely.
For plain data (no structs), encoding still uses the fast Rust NIF path unchanged.
Why Encoding Shows Bigger Gains
iolist Encoding Pattern (Pure Elixir)
encode(data)
→ allocate "{" binary
→ allocate "\"key\"" binary
→ allocate ":" binary
→ allocate "\"value\"" binary
→ allocate list cells to link them
→ return iolist (many BEAM allocations)RustyJson's Encoding Pattern (NIF)
encode(data)
→ [Rust: walk terms, write to single buffer]
→ copy buffer to BEAM binary
→ return binary (one BEAM allocation)Pure-Elixir encoders create many small BEAM allocations. RustyJson creates one.
Why Decoding Memory is Similar
Both libraries produce identical Elixir data structures when decoding. The resulting maps, lists, and strings take the same space regardless of which library created them.
Why Benchee Memory Measurements Don't Work for NIFs
Important: Benchee's memory_time option gives misleading results for NIF-based libraries.
What Benchee Reports (Incorrect)
| Library | Memory |
|-----------|-----------|
| RustyJson | 0.00169 MB |
| Jason | 20.27 MB |This suggests 12,000x less memory - which is wrong.
Why This Happens
Benchee measures memory using :erlang.memory/0, which only tracks BEAM allocations:
- BEAM process heap
- BEAM binary space
- ETS tables
RustyJson allocates memory in Rust via mimalloc, completely invisible to BEAM tracking. The 0.00169 MB is just NIF call overhead.
How We Measure Instead
We use :erlang.memory(:total) delta in isolated spawned processes:
spawn(fn ->
:erlang.garbage_collect()
before = :erlang.memory(:total)
results = for _ <- 1..10, do: RustyJson.encode!(data)
after_mem = :erlang.memory(:total)
# Report (after_mem - before) / 10
end)This captures BEAM allocations during the operation. For total system memory (including NIF), we verified with RSS measurements that Rust adds only ~1-2 MB temporary overhead.
Actual Memory Comparison
For a 10 MB settlement report encode:
| Metric | RustyJson | Jason |
|---|---|---|
| BEAM memory | 6.7 MB | 17.9 MB |
| NIF overhead | ~1-2 MB | N/A |
| Total | ~8 MB | ~18 MB |
| Ratio | 2-3x less |
Running Benchmarks
# 1. Download synthetic test data
mkdir -p bench/data && cd bench/data
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/canada.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/citm_catalog.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/twitter.json
cd ../..
# 2. Run memory benchmarks (no extra deps needed)
mix run bench/memory_bench.exs
# 3. (Optional) Run speed benchmarks with Benchee
# Add to mix.exs: {:benchee, "~> 1.0", only: :dev}
mix deps.get
mix run bench/stress_bench.exs
Key Interning Benchmarks
The keys: :intern option provides significant speedups when decoding arrays of objects with repeated keys (common in API responses, database results, etc.).
When Key Interning Helps: Homogeneous Arrays
Arrays where every object has the same keys:
[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ...]| Scenario | Default | keys: :intern | Improvement |
|---|---|---|---|
| 100 objects × 5 keys | 34.2 µs | 23.6 µs | 31% faster |
| 100 objects × 10 keys | 67.5 µs | 44.8 µs | 34% faster |
| 1,000 objects × 5 keys | 335 µs | 237 µs | 29% faster |
| 1,000 objects × 10 keys | 688 µs | 463 µs | 33% faster |
| 10,000 objects × 5 keys | 3.46 ms | 2.45 ms | 29% faster |
| 10,000 objects × 10 keys | 6.92 ms | 4.88 ms | 29% faster |
When Key Interning Hurts: Unique Keys
Single objects or heterogeneous arrays where keys aren't repeated:
| Scenario | Default | keys: :intern | Penalty |
|---|---|---|---|
| Single object, 100 keys | 5.1 µs | 13.6 µs | 2.6x slower |
| Single object, 1,000 keys | 52 µs | 169 µs | 3.2x slower |
| Single object, 5,000 keys | 260 µs | 831 µs | 3.2x slower |
| Heterogeneous 100 objects | 35 µs | 96 µs | 2.7x slower |
| Heterogeneous 500 objects | 186 µs | 475 µs | 2.5x slower |
Scaling: Benefit Increases with Object Count
With 5 keys per object, the benefit grows as more objects reuse the cached keys:
| Objects | Default | keys: :intern | Improvement |
|---|---|---|---|
| 10 | 3.5 µs | 3.0 µs | 13% faster |
| 50 | 17.1 µs | 12.5 µs | 27% faster |
| 100 | 33.8 µs | 23.8 µs | 30% faster |
| 500 | 170 µs | 119 µs | 30% faster |
| 1,000 | 339 µs | 242 µs | 29% faster |
| 5,000 | 1.81 ms | 1.24 ms | 31% faster |
| 10,000 | 3.47 ms | 2.49 ms | 28% faster |
Usage Recommendation
# API responses, database results, bulk data
RustyJson.decode!(json, keys: :intern)
# Config files, single objects, unknown schemas
RustyJson.decode!(json) # default, no interningRule of thumb: Use keys: :intern when you know you're decoding arrays of 10+ objects with the same schema.
Note: Keys containing escape sequences (e.g., "field\nname") are not interned because the raw JSON bytes differ from the decoded string. This is rare in practice and has negligible performance impact.
Summary
| Operation | Speed | Memory | Reductions |
|---|---|---|---|
| Encode plain data (large) | 5-6x | 2-3x less | 28,000x fewer |
| Encode plain data (medium) | 2-3x | 2-3x less | 200-2000x fewer |
| Encode structs (v0.3.3+) | ~2x improvement over v0.3.2 | similar | — |
| Decode (large) | 2-4.5x | similar | — |
| Decode (deep nested, v0.3.3+) | ~27% improvement over v0.3.2 | similar | — |
| Decode (keys: :intern) | +30%* | similar | — |
*For arrays of objects with repeated keys (API responses, DB results, etc.)
Bottom line: As of v0.3.3, RustyJson is fast across all encoding and decoding workloads, including deeply nested and small payloads. Plain data encoding shows the largest gains (5-6x, 2-3x less memory, dramatically fewer BEAM reductions). Struct encoding was rewritten in v0.3.3 with a single-pass iodata pipeline. Deep-nested decode was optimized in v0.3.3 with a single-entry fast path that avoids heap allocation for single-element objects and arrays. For decoding bulk data, enable keys: :intern for an additional 30% speedup.