murmur_nif

View Source

Erlang NIF wrapper around MurmurHash3 (x64_128) with a Cassandra-compatible signed-byte variant for token-aware routing against Cassandra and Scylla.

Build Status

Why

Replaces git-ref dependencies on hand-rolled Murmur3 NIF forks. Modern build toolchain (correct OTP 27+ -eval order, macOS -undefined dynamic_lookup, dirty-scheduler dispatch and enif_consume_timeslice accounting on the inline path), tested against OTP 25-28 in CI, and published to hex.pm.

Install

{deps, [{murmur_nif, "0.1.0"}]}.

Requires a C compiler (cc) on the build host -- universally available on systems that already run Erlang.

API

-spec murmur_nif:murmur3_x64_128(binary())           -> binary().
-spec murmur_nif:murmur3_cassandra_x64_128(binary()) -> binary().

Both functions return a fixed 16-byte binary representing the 128-bit hash, using seed 0.

1> murmur_nif:murmur3_x64_128(<<"hello">>).
<<2,155,189,65,179,167,216,203,25,29,174,72,106,144,30,91>>

Which variant to use

  • murmur3_x64_128/1 -- Austin Appleby's standard MurmurHash3 x64_128. Use for general-purpose hashing.
  • murmur3_cassandra_x64_128/1 -- Cassandra/Scylla-compatible variant. The input bytes are interpreted as signed (matching Java's signed byte type), which changes the sign-extension of the tail-block accumulator and produces hashes that match Cassandra's partitioner. Use to compute partition tokens for token-aware routing.

For pure-ASCII inputs (all bytes < 128) the two variants produce identical output. They only diverge when high bits are set.

Behaviour notes

  • Dirty CPU scheduler for inputs above 20 KB. In practice hash inputs are small (partition keys are typically tens to hundreds of bytes), but the threshold protects against scheduler hogs on large inputs.
  • Inline path reduction accounting via enif_consume_timeslice, proportional to bytes processed. Cost model: ~500 bytes/reduction (calibrated for ~5 GB/s hash throughput), 4000-reduction timeslice.

Build

rebar3 compile runs c_src/build.sh:

  • Resolves ERTS_INCLUDE_DIR via erl -noshell -eval ... -s init stop (option order is correct for OTP 27+).
  • Compiles c_src/murmur_nif.c + c_src/murmur3/murmur3.c with -O3 -march=native.
  • Outputs priv/murmur_nif.so.

Env vars honored:

VarEffect
ERTS_INCLUDE_DIRSkip the erl probe; use this path for erl_nif.h.
CCCompiler (default cc).
CFLAGSExtra flags appended after defaults.
MURMUR_NIF_NO_NATIVEIf set, omit -march=native/-mtune=native (use for portable cross-platform builds).

License

The Erlang wrapper code (src/, c_src/murmur_nif.c) is MIT.

The MurmurHash3 algorithm in c_src/murmur3/ was written by Austin Appleby and placed in the public domain. The Cassandra-compatible variant uses signed integer arithmetic to match Java's reference implementation; the algorithmic modification is trivial enough to remain in the public domain alongside the upstream code.