Resource consumption results

View Source

Captured by scripts/bench.escript --with-resources on 2026-05-06 at f43fe1b.

Hardware / runtime

  • CPU: 12th Gen Intel Core i9-12900HX (24 threads)
  • Kernel: Linux 6.19.6 (Arch)
  • OTP: 29 (erts 17.0, JIT)
  • Loadgen: 50 clients, 2 s warmup + 5 s measure, loopback only
  • Sampling: every 100 ms during the measure window; reports the PEAK observed RSS / BEAM memory(total) and the AVG CPU% over the window

cpu% is normalized to wall-clock — 1100 % means the equivalent of 11 cores fully busy on average. On this 24-thread machine the ceiling is ~2400 %.

This is a single-run snapshot to compare resource shape across servers; numbers will shift run-to-run by ~5–10 % on a loaded host. Re-run the relevant scenarios when chasing a specific regression. The scenario set mirrors the README's trimmed comparison tables; for the full per-scenario throughput grid see docs/bench_results.md.

HTTP/1.1

scenarioserverreq/srssbeamcpu%
helloroadrunner296 k104 MB67 MB1069 %
helloelli288 k110 MB51 MB1072 %
hellocowboy188 k116 MB58 MB1263 %
jsonelli280 k109 MB53 MB1099 %
jsonroadrunner274 k115 MB64 MB1115 %
jsoncowboy170 k107 MB55 MB1276 %
echoelli241 k118 MB52 MB1085 %
echoroadrunner231 k134 MB66 MB1154 %
echocowboy130 k118 MB54 MB1320 %
large_responseelli105 k96 MB50 MB850 %
large_responseroadrunner104 k98 MB62 MB842 %
large_responsecowboy81 k91 MB50 MB986 %
headers_heavyroadrunner221 k148 MB77 MB1187 %
headers_heavyelli221 k145 MB52 MB1184 %
headers_heavycowboy118 k131 MB61 MB1458 %
cookies_heavyroadrunner245 k129 MB68 MB1140 %
cookies_heavycowboy149 k112 MB56 MB1360 %
cookies_heavyelli— (no native cookie parser)
pipelined_h1roadrunner482 k118 MB78 MB1121 %
pipelined_h1cowboy322 k126 MB63 MB1363 %
pipelined_h1elli4.9 k79 MB46 MB154 % (broken)
varied_paths_routerroadrunner242 k110 MB69 MB1132 %
varied_paths_routercowboy150 k153 MB68 MB1306 %
gzip_responseroadrunner117 k213 MB90 MB1588 %
gzip_responsecowboy93 k188 MB63 MB1601 %
websocket_msg_throughputroadrunner198 k153 MB71 MB919 %
websocket_msg_throughputcowboy156 k103 MB52 MB1090 %

HTTP/2

scenarioserverreq/srssbeamcpu%
helloroadrunner159 k170 MB86 MB1248 %
hellocowboy149 k183 MB89 MB1270 %
jsonroadrunner152 k163 MB82 MB1250 %
jsoncowboy137 k191 MB84 MB1362 %
echoroadrunner146 k180 MB98 MB1262 %
echocowboy99 k167 MB70 MB1349 %
headers_heavyroadrunner142 k183 MB95 MB1288 %
headers_heavycowboy81 k163 MB80 MB1647 %
multi_stream_h2roadrunner325 k251 MB131 MB1382 %
multi_stream_h2cowboy297 k270 MB113 MB1439 %
tls_handshake_throughputcowboy2.9 k198 MB84 MB1214 %
tls_handshake_throughputroadrunner2.4 k145 MB63 MB927 %

Patterns observed

Where roadrunner WINS on resources

  • pipelined_h1: roadrunner uses 6 % less RSS and 18 % less CPU than cowboy while beating it 1.5× on throughput.
  • varied_paths_router: 28 % less RSS, 13 % less CPU than cowboy, with 1.6× more throughput.
  • websocket_msg_throughput: 16 % less CPU than cowboy with 1.27× more throughput (does pay 49 % more RSS for it — the per-frame buffer-and-validate machinery — but the CPU win is the headline).
  • multi_stream_h2 (h2): 7 % less RSS than cowboy with 9 % more throughput. The h2 hot path is more efficient end-to-end.
  • tls_handshake_throughput (h2): roadrunner uses 27 % less RSS and 24 % less CPU than cowboy on this scenario, but cowboy still wins 22 % on throughput — the per-handshake serialization is the bottleneck, not resources.

Where roadrunner pays a tax

  • BEAM heap is 15–60 % higher than elli across most h1 scenarios — the cost of the feature surface (telemetry ETS tables, drain group, request-id batching, persistent_term'd compiled patterns, slot atomics). The biggest gap is headers_heavy at +48 % BEAM (77 MB vs 52 MB) where the per-request decision cache + telemetry metadata accumulate more state per conn. Consistent across runs; not a regression target.
  • gzip_response RSS: roadrunner at 213 MB vs cowboy 188 MB (+13 %) for the +27 % throughput it delivers. zlib z-streams per conn aren't free; this is the price of native gzip middleware vs cowboy's "you wire your own" approach.
  • echo and h2 echo: ~14 % more RSS than elli/cowboy. The body-state machinery for read-body is allocated even when the handler reads in one shot — known sub-optimal for small-body POSTs (tracked in docs/roadmap.md).

Resource ties

  • hello, large_response, headers_heavy vs elli: all three sit within 5 % on RSS and CPU. The per-scenario CPU efficiency (req/s ÷ cpu%) is essentially identical to elli.

Reading the numbers honestly

  • Single-run snapshot. Run-to-run variance on RSS / CPU is in the 5–10 % range on a loaded host; treat anything inside that band as noise.
  • Loopback only — real-network deployments push more time into the kernel TCP path, which doesn't show up here.
  • Max-throughput shape: every server is pegged to its own CPU ceiling. The CPU% comparison answers "how much CPU does each server burn to do its best?" — not "how much does each need to serve N req/s?". For a fixed-rate comparison the bench would need an open-loop driver (out of scope today).
  • BEAM memory(total) excludes some allocator carrier waste; the OS-level RSS captures it. rss > beam is normal.
  • The doc is checked into the repo as a point-in-time snapshot; re-run when investigating a specific regression rather than treating these numbers as the ongoing baseline.

Reproducing

Single scenario:

mise exec -- ./scripts/bench.escript --servers roadrunner,elli,cowboy \
  --scenarios hello --clients 50 --duration 5 --warmup 2 --with-resources

To regenerate this whole doc with fresh numbers, loop the kept scenarios above with --with-resources and update the table. Automating it via scripts/bench_matrix.sh --with-resources is on the roadmap (docs/roadmap.md) but not yet implemented.

Cross-references