RustyJson Benchmarks

Comprehensive benchmarks comparing RustyJson vs Jason across synthetic and real-world datasets.

Key Findings

Fast across all workloads — plain data, struct-heavy data, and decoding (including deeply nested and small payloads)
Encoding plain data shows the largest gains — 3-6x faster, 2-3x less memory
Struct encoding optimized in v0.3.3 via single-pass iodata pipeline with compile-time codegen (~2x improvement over v0.3.2)
Deep-nested decode optimized in v0.3.3 via single-entry fast path (~27% faster than v0.3.2 for 100-level nested JSON)
Larger payloads = bigger advantage — real-world 10 MB files show better results than synthetic benchmarks
BEAM scheduler load dramatically reduced — 100-28,000x fewer reductions

Test Environment

Attribute	Value
OS	macOS
CPU	Apple M1 Pro
Cores	10
Memory	16 GB
Elixir	1.19.4
Erlang/OTP	28.2

Real-World Benchmarks: Amazon Settlement Reports

These are production JSON files from Amazon SP-API settlement reports, representing real-world API response patterns with nested objects, arrays of transactions, and mixed data types.

Encoding Performance (Elixir → JSON)

File Size	RustyJson	Jason	Speed	Memory
10.87 MB	24 ms	131 ms	5.5x faster	2.7x less
9.79 MB	21 ms	124 ms	5.9x faster	2-3x less
9.38 MB	21 ms	104 ms	5.0x faster	2-3x less

Decoding Performance (JSON → Elixir)

File Size	RustyJson	Jason	Speed	Memory
10.87 MB	61 ms	152 ms	2.5x faster	similar
9.79 MB	55 ms	134 ms	2.4x faster	similar
9.38 MB	50 ms	119 ms	2.4x faster	similar

BEAM Reductions (Scheduler Load)

File Size	RustyJson	Jason	Reduction
10.87 MB encode	404	11,570,847	28,641x fewer

This is the most dramatic difference - RustyJson offloads virtually all work to native code.

Synthetic Benchmarks: nativejson-benchmark

Using standard datasets from nativejson-benchmark:

Dataset	Size	Description
canada.json	2.1 MB	Geographic coordinates (number-heavy)
citm_catalog.json	1.6 MB	Event catalog (mixed types)
twitter.json	617 KB	Social media with CJK (unicode-heavy)

Decode Performance (JSON → Elixir)

Input	RustyJson ips	Average
canada.json (2.1 MB)	153	6.55 ms
citm_catalog.json (1.6 MB)	323	3.09 ms
twitter.json (617 KB)	430	2.33 ms
large_list (50k items, 2.3 MB)	62	16.0 ms
deep_nested (1.1 KB, 100 levels)	148K	6.75 µs
wide_object (75 KB, 5k keys)	1,626	0.61 ms

Roundtrip Performance (Decode + Encode)

Input	RustyJson	Jason	Speedup
canada.json	14 ms	48 ms	3.4x faster
citm_catalog.json	6 ms	14 ms	2.5x faster
twitter.json	4 ms	9 ms	2.3x faster

BEAM Reductions by Dataset

Dataset	RustyJson	Jason	Ratio
canada.json	~3,500	~964,000	275x fewer
citm_catalog.json	~300	~621,000	2,000x fewer
twitter.json	~2,000	~511,000	260x fewer

Struct Encoding Benchmarks (v0.3.3+)

Encoding data that contains Elixir structs (e.g., @derive RustyJson.Encoder or custom defimpl) follows a different path than plain maps and lists. Structs require the RustyJson.Encoder protocol to convert them to JSON-serializable forms.

In v0.3.3, the struct encoding pipeline was rewritten from a three-pass approach (protocol dispatch → fragment resolution → NIF serialization) to a single-pass iodata pipeline with compile-time codegen for derived structs. This closed the last remaining performance gap, making RustyJson faster across all encoding workloads.

Struct Encoding Performance

Workload	Speedup (v0.3.3 vs v0.3.2)
Derived struct (5 fields)	~2x faster
Derived struct (10 fields)	~2x faster
Custom encoder (returning `Encode.map`)	~2.5x faster
List of 1,000 derived structs	~2x faster
Nested structs (3 levels deep)	~2x faster

Measured with protocol consolidation enabled (MIX_ENV=prod), which is the default for production builds.

How It Works

RustyJson's struct encoding produces iodata in a single pass:

Derived encoders (@derive RustyJson.Encoder) generate compile-time iodata templates with pre-escaped keys — no runtime Map.from_struct, Map.to_list, or key escaping.
Map/List impls detect struct-containing data and route through Encode.map/2 / Encode.list/2 to build iodata directly, wrapped in a Fragment.
NIF bypass — When the top-level result is an iodata Fragment (no pretty-print or compression), IO.iodata_to_binary/1 is used directly, avoiding Erlang↔Rust term conversion entirely.

For plain data (no structs), encoding still uses the fast Rust NIF path unchanged.

Why Encoding Shows Bigger Gains

iolist Encoding Pattern (Pure Elixir)

encode(data)
  → allocate "{" binary
  → allocate "\"key\"" binary
  → allocate ":" binary
  → allocate "\"value\"" binary
  → allocate list cells to link them
  → return iolist (many BEAM allocations)

RustyJson's Encoding Pattern (NIF)

encode(data)
  → [Rust: walk terms, write to single buffer]
  → copy buffer to BEAM binary
  → return binary (one BEAM allocation)

Pure-Elixir encoders create many small BEAM allocations. RustyJson creates one.

Why Decoding Memory is Similar

Both libraries produce identical Elixir data structures when decoding. The resulting maps, lists, and strings take the same space regardless of which library created them.

Why Benchee Memory Measurements Don't Work for NIFs

Important: Benchee's memory_time option gives misleading results for NIF-based libraries.

What Benchee Reports (Incorrect)

| Library   | Memory    |
|-----------|-----------|
| RustyJson | 0.00169 MB |
| Jason     | 20.27 MB   |

This suggests 12,000x less memory - which is wrong.

Why This Happens

Benchee measures memory using :erlang.memory/0, which only tracks BEAM allocations:

BEAM process heap
BEAM binary space
ETS tables

RustyJson allocates memory in Rust via mimalloc, completely invisible to BEAM tracking. The 0.00169 MB is just NIF call overhead.

How We Measure Instead

We use :erlang.memory(:total) delta in isolated spawned processes:

spawn(fn ->
  :erlang.garbage_collect()
  before = :erlang.memory(:total)
  results = for _ <- 1..10, do: RustyJson.encode!(data)
  after_mem = :erlang.memory(:total)
  # Report (after_mem - before) / 10
end)

This captures BEAM allocations during the operation. For total system memory (including NIF), we verified with RSS measurements that Rust adds only ~1-2 MB temporary overhead.

Actual Memory Comparison

For a 10 MB settlement report encode:

Metric	RustyJson	Jason
BEAM memory	6.7 MB	17.9 MB
NIF overhead	~1-2 MB	N/A
Total	~8 MB	~18 MB
Ratio		2-3x less

Running Benchmarks

# 1. Download synthetic test data
mkdir -p bench/data && cd bench/data
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/canada.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/citm_catalog.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/twitter.json
cd ../..

# 2. Run memory benchmarks (no extra deps needed)
mix run bench/memory_bench.exs

# 3. (Optional) Run speed benchmarks with Benchee
# Add to mix.exs: {:benchee, "~> 1.0", only: :dev}
mix deps.get
mix run bench/stress_bench.exs

Key Interning Benchmarks

The keys: :intern option provides significant speedups when decoding arrays of objects with repeated keys (common in API responses, database results, etc.).

When Key Interning Helps: Homogeneous Arrays

Arrays where every object has the same keys:

[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ...]

Scenario	Default	`keys: :intern`	Improvement
100 objects × 5 keys	34.2 µs	23.6 µs	31% faster
100 objects × 10 keys	67.5 µs	44.8 µs	34% faster
1,000 objects × 5 keys	335 µs	237 µs	29% faster
1,000 objects × 10 keys	688 µs	463 µs	33% faster
10,000 objects × 5 keys	3.46 ms	2.45 ms	29% faster
10,000 objects × 10 keys	6.92 ms	4.88 ms	29% faster

When Key Interning Hurts: Unique Keys

Single objects or heterogeneous arrays where keys aren't repeated:

Scenario	Default	`keys: :intern`	Penalty
Single object, 100 keys	5.1 µs	13.6 µs	2.6x slower
Single object, 1,000 keys	52 µs	169 µs	3.2x slower
Single object, 5,000 keys	260 µs	831 µs	3.2x slower
Heterogeneous 100 objects	35 µs	96 µs	2.7x slower
Heterogeneous 500 objects	186 µs	475 µs	2.5x slower

Scaling: Benefit Increases with Object Count

With 5 keys per object, the benefit grows as more objects reuse the cached keys:

Objects	Default	`keys: :intern`	Improvement
10	3.5 µs	3.0 µs	13% faster
50	17.1 µs	12.5 µs	27% faster
100	33.8 µs	23.8 µs	30% faster
500	170 µs	119 µs	30% faster
1,000	339 µs	242 µs	29% faster
5,000	1.81 ms	1.24 ms	31% faster
10,000	3.47 ms	2.49 ms	28% faster

Usage Recommendation

# API responses, database results, bulk data
RustyJson.decode!(json, keys: :intern)

# Config files, single objects, unknown schemas
RustyJson.decode!(json)  # default, no interning

Rule of thumb: Use keys: :intern when you know you're decoding arrays of 10+ objects with the same schema.

Note: Keys containing escape sequences (e.g., "field\nname") are not interned because the raw JSON bytes differ from the decoded string. This is rare in practice and has negligible performance impact.

Summary

Operation	Speed	Memory	Reductions
Encode plain data (large)	5-6x	2-3x less	28,000x fewer
Encode plain data (medium)	2-3x	2-3x less	200-2000x fewer
Encode structs (v0.3.3+)	~2x improvement over v0.3.2	similar	—
Decode (large)	2-4.5x	similar	—
Decode (deep nested, v0.3.3+)	~27% improvement over v0.3.2	similar	—
Decode (keys: :intern)	+30%*	similar	—

*For arrays of objects with repeated keys (API responses, DB results, etc.)

Bottom line: As of v0.3.3, RustyJson is fast across all encoding and decoding workloads, including deeply nested and small payloads. Plain data encoding shows the largest gains (5-6x, 2-3x less memory, dramatically fewer BEAM reductions). Struct encoding was rewritten in v0.3.3 with a single-pass iodata pipeline. Deep-nested decode was optimized in v0.3.3 with a single-entry fast path that avoids heap allocation for single-element objects and arrays. For decoding bulk data, enable keys: :intern for an additional 30% speedup.

← Previous Page RustyJson Architecture