RustyJson Benchmarks

Copy Markdown View Source

Comprehensive benchmarks comparing RustyJson vs Jason across synthetic and real-world datasets.

Key Findings

  1. Fast across all workloads — plain data, struct-heavy data, and decoding (including deeply nested and small payloads)
  2. Encoding plain data shows the largest gains — 3-6x faster, 2-3x less memory
  3. Struct encoding optimized in v0.3.3 via single-pass iodata pipeline with compile-time codegen (~2x improvement over v0.3.2)
  4. Deep-nested decode optimized in v0.3.3 via single-entry fast path (~27% faster than v0.3.2 for 100-level nested JSON)
  5. Larger payloads = bigger advantage — real-world 10 MB files show better results than synthetic benchmarks
  6. BEAM scheduler load dramatically reduced — 100-28,000x fewer reductions

Test Environment

AttributeValue
OSmacOS
CPUApple M1 Pro
Cores10
Memory16 GB
Elixir1.19.4
Erlang/OTP28.2

Real-World Benchmarks: Amazon Settlement Reports

These are production JSON files from Amazon SP-API settlement reports, representing real-world API response patterns with nested objects, arrays of transactions, and mixed data types.

Encoding Performance (Elixir → JSON)

File SizeRustyJsonJasonSpeedMemory
10.87 MB24 ms131 ms5.5x faster2.7x less
9.79 MB21 ms124 ms5.9x faster2-3x less
9.38 MB21 ms104 ms5.0x faster2-3x less

Decoding Performance (JSON → Elixir)

File SizeRustyJsonJasonSpeedMemory
10.87 MB61 ms152 ms2.5x fastersimilar
9.79 MB55 ms134 ms2.4x fastersimilar
9.38 MB50 ms119 ms2.4x fastersimilar

BEAM Reductions (Scheduler Load)

File SizeRustyJsonJasonReduction
10.87 MB encode40411,570,84728,641x fewer

This is the most dramatic difference - RustyJson offloads virtually all work to native code.

Synthetic Benchmarks: nativejson-benchmark

Using standard datasets from nativejson-benchmark:

DatasetSizeDescription
canada.json2.1 MBGeographic coordinates (number-heavy)
citm_catalog.json1.6 MBEvent catalog (mixed types)
twitter.json617 KBSocial media with CJK (unicode-heavy)

Decode Performance (JSON → Elixir)

InputRustyJson ipsAverage
canada.json (2.1 MB)1536.55 ms
citm_catalog.json (1.6 MB)3233.09 ms
twitter.json (617 KB)4302.33 ms
large_list (50k items, 2.3 MB)6216.0 ms
deep_nested (1.1 KB, 100 levels)148K6.75 µs
wide_object (75 KB, 5k keys)1,6260.61 ms

Roundtrip Performance (Decode + Encode)

InputRustyJsonJasonSpeedup
canada.json14 ms48 ms3.4x faster
citm_catalog.json6 ms14 ms2.5x faster
twitter.json4 ms9 ms2.3x faster

BEAM Reductions by Dataset

DatasetRustyJsonJasonRatio
canada.json~3,500~964,000275x fewer
citm_catalog.json~300~621,0002,000x fewer
twitter.json~2,000~511,000260x fewer

Struct Encoding Benchmarks (v0.3.3+)

Encoding data that contains Elixir structs (e.g., @derive RustyJson.Encoder or custom defimpl) follows a different path than plain maps and lists. Structs require the RustyJson.Encoder protocol to convert them to JSON-serializable forms.

In v0.3.3, the struct encoding pipeline was rewritten from a three-pass approach (protocol dispatch → fragment resolution → NIF serialization) to a single-pass iodata pipeline with compile-time codegen for derived structs. This closed the last remaining performance gap, making RustyJson faster across all encoding workloads.

Struct Encoding Performance

WorkloadSpeedup (v0.3.3 vs v0.3.2)
Derived struct (5 fields)~2x faster
Derived struct (10 fields)~2x faster
Custom encoder (returning Encode.map)~2.5x faster
List of 1,000 derived structs~2x faster
Nested structs (3 levels deep)~2x faster

Measured with protocol consolidation enabled (MIX_ENV=prod), which is the default for production builds.

How It Works

RustyJson's struct encoding produces iodata in a single pass:

  1. Derived encoders (@derive RustyJson.Encoder) generate compile-time iodata templates with pre-escaped keys — no runtime Map.from_struct, Map.to_list, or key escaping.
  2. Map/List impls detect struct-containing data and route through Encode.map/2 / Encode.list/2 to build iodata directly, wrapped in a Fragment.
  3. NIF bypass — When the top-level result is an iodata Fragment (no pretty-print or compression), IO.iodata_to_binary/1 is used directly, avoiding Erlang↔Rust term conversion entirely.

For plain data (no structs), encoding still uses the fast Rust NIF path unchanged.

Why Encoding Shows Bigger Gains

iolist Encoding Pattern (Pure Elixir)

encode(data)
   allocate "{" binary
   allocate "\"key\"" binary
   allocate ":" binary
   allocate "\"value\"" binary
   allocate list cells to link them
   return iolist (many BEAM allocations)

RustyJson's Encoding Pattern (NIF)

encode(data)
   [Rust: walk terms, write to single buffer]
   copy buffer to BEAM binary
   return binary (one BEAM allocation)

Pure-Elixir encoders create many small BEAM allocations. RustyJson creates one.

Why Decoding Memory is Similar

Both libraries produce identical Elixir data structures when decoding. The resulting maps, lists, and strings take the same space regardless of which library created them.

Why Benchee Memory Measurements Don't Work for NIFs

Important: Benchee's memory_time option gives misleading results for NIF-based libraries.

What Benchee Reports (Incorrect)

| Library   | Memory    |
|-----------|-----------|
| RustyJson | 0.00169 MB |
| Jason     | 20.27 MB   |

This suggests 12,000x less memory - which is wrong.

Why This Happens

Benchee measures memory using :erlang.memory/0, which only tracks BEAM allocations:

  • BEAM process heap
  • BEAM binary space
  • ETS tables

RustyJson allocates memory in Rust via mimalloc, completely invisible to BEAM tracking. The 0.00169 MB is just NIF call overhead.

How We Measure Instead

We use :erlang.memory(:total) delta in isolated spawned processes:

spawn(fn ->
  :erlang.garbage_collect()
  before = :erlang.memory(:total)
  results = for _ <- 1..10, do: RustyJson.encode!(data)
  after_mem = :erlang.memory(:total)
  # Report (after_mem - before) / 10
end)

This captures BEAM allocations during the operation. For total system memory (including NIF), we verified with RSS measurements that Rust adds only ~1-2 MB temporary overhead.

Actual Memory Comparison

For a 10 MB settlement report encode:

MetricRustyJsonJason
BEAM memory6.7 MB17.9 MB
NIF overhead~1-2 MBN/A
Total~8 MB~18 MB
Ratio2-3x less

Running Benchmarks

# 1. Download synthetic test data
mkdir -p bench/data && cd bench/data
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/canada.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/citm_catalog.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/twitter.json
cd ../..

# 2. Run memory benchmarks (no extra deps needed)
mix run bench/memory_bench.exs

# 3. (Optional) Run speed benchmarks with Benchee
# Add to mix.exs: {:benchee, "~> 1.0", only: :dev}
mix deps.get
mix run bench/stress_bench.exs

Key Interning Benchmarks

The keys: :intern option provides significant speedups when decoding arrays of objects with repeated keys (common in API responses, database results, etc.).

When Key Interning Helps: Homogeneous Arrays

Arrays where every object has the same keys:

[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ...]
ScenarioDefaultkeys: :internImprovement
100 objects × 5 keys34.2 µs23.6 µs31% faster
100 objects × 10 keys67.5 µs44.8 µs34% faster
1,000 objects × 5 keys335 µs237 µs29% faster
1,000 objects × 10 keys688 µs463 µs33% faster
10,000 objects × 5 keys3.46 ms2.45 ms29% faster
10,000 objects × 10 keys6.92 ms4.88 ms29% faster

When Key Interning Hurts: Unique Keys

Single objects or heterogeneous arrays where keys aren't repeated:

ScenarioDefaultkeys: :internPenalty
Single object, 100 keys5.1 µs13.6 µs2.6x slower
Single object, 1,000 keys52 µs169 µs3.2x slower
Single object, 5,000 keys260 µs831 µs3.2x slower
Heterogeneous 100 objects35 µs96 µs2.7x slower
Heterogeneous 500 objects186 µs475 µs2.5x slower

Scaling: Benefit Increases with Object Count

With 5 keys per object, the benefit grows as more objects reuse the cached keys:

ObjectsDefaultkeys: :internImprovement
103.5 µs3.0 µs13% faster
5017.1 µs12.5 µs27% faster
10033.8 µs23.8 µs30% faster
500170 µs119 µs30% faster
1,000339 µs242 µs29% faster
5,0001.81 ms1.24 ms31% faster
10,0003.47 ms2.49 ms28% faster

Usage Recommendation

# API responses, database results, bulk data
RustyJson.decode!(json, keys: :intern)

# Config files, single objects, unknown schemas
RustyJson.decode!(json)  # default, no interning

Rule of thumb: Use keys: :intern when you know you're decoding arrays of 10+ objects with the same schema.

Note: Keys containing escape sequences (e.g., "field\nname") are not interned because the raw JSON bytes differ from the decoded string. This is rare in practice and has negligible performance impact.

Summary

OperationSpeedMemoryReductions
Encode plain data (large)5-6x2-3x less28,000x fewer
Encode plain data (medium)2-3x2-3x less200-2000x fewer
Encode structs (v0.3.3+)~2x improvement over v0.3.2similar
Decode (large)2-4.5xsimilar
Decode (deep nested, v0.3.3+)~27% improvement over v0.3.2similar
Decode (keys: :intern)+30%*similar

*For arrays of objects with repeated keys (API responses, DB results, etc.)

Bottom line: As of v0.3.3, RustyJson is fast across all encoding and decoding workloads, including deeply nested and small payloads. Plain data encoding shows the largest gains (5-6x, 2-3x less memory, dramatically fewer BEAM reductions). Struct encoding was rewritten in v0.3.3 with a single-pass iodata pipeline. Deep-nested decode was optimized in v0.3.3 with a single-entry fast path that avoids heap allocation for single-element objects and arrays. For decoding bulk data, enable keys: :intern for an additional 30% speedup.