Performance

View Source

Benchmark comparing Skuld against pure baselines and minimal effect implementations. Run with mix run bench/skuld_benchmark.exs.

What's being measured: A loop that increments a counter from 0 to N using State.get() / State.put(n + 1) operations. This exercises the core effect invocation path repeatedly, measuring per-operation overhead.

Core Benchmark

TargetPure/RecMonadEvfEvf/CPSSkuld/NestSkuld/FxFL
5004 us10 us17 us17 us141 us54 us
100028 us55 us56 us58 us255 us166 us
200034 us78 us91 us97 us558 us325 us
500082 us189 us244 us258 us1.42 ms836 us
10000145 us157 us298 us325 us2.3 ms960 us

Implementations compared:

  • Pure/Rec - Non-effectful baseline using tail recursion with map state
  • Monad - Simple state monad (fn state -> {val, state} end) with no effect system
  • Evf - Flat evidence-passing, direct-style (no CPS) - can't support control effects
  • Evf/CPS - Flat evidence-passing with CPS - isolates CPS overhead (~1.1x vs Evf)
  • Skuld/Nest - Skuld with nested Comp.bind calls (typical usage pattern)
  • Skuld/FxFL - Skuld with FxFasterList iteration (optimized for collections)

Iteration Strategies

TargetFxFasterListFxListYield
100097 us (0.10 us/op)200 us (0.20 us/op)147 us (0.15 us/op)
5000492 us (0.10 us/op)959 us (0.19 us/op)762 us (0.15 us/op)
100001.02 ms (0.10 us/op)2.71 ms (0.27 us/op)1.52 ms (0.15 us/op)
500005.1 ms (0.10 us/op)-7.58 ms (0.15 us/op)
10000010.02 ms (0.10 us/op)-14.9 ms (0.15 us/op)

Iteration options:

  • FxFasterList - Uses Enum.reduce_while, fastest option (~2x faster than FxList)
  • FxList - Uses Comp.bind chains, supports full Yield/Suspend resume semantics
  • Yield - Coroutine-style suspend/resume, use when you need interruptible iteration

All three maintain constant per-operation cost as N grows.

Key Takeaways

  1. CPS overhead is minimal - Evf/CPS is only ~1.1x slower than direct-style Evf
  2. Skuld overhead (~7x vs Evf/CPS) comes from scoped handlers, exception handling, and auto-lifting
  3. FxFasterList is the fastest iteration strategy when you don't need Yield semantics
  4. Per-op cost is constant - no quadratic blowup at scale

Real-World Perspective

These benchmarks represent a worst-case scenario where computations do almost nothing except exercise the effects machinery. In practice, algebraic effects compose real work - serialization, domain calculations, transcoding - where actual computation dominates execution time.

For example, JSON encoding a moderate payload takes 10-100us, and domain validation or business logic involves similar compute. Compared to Skuld's ~0.1us per effect invocation, even dozens of effect operations add negligible overhead to real workloads. The architectural benefits - testability, composability, separation of concerns - far outweigh the microsecond-level cost.