Elixir Mutation Testing Libraries: Muex vs Darwin vs Exavier

View Source

A comprehensive comparison of the three Elixir mutation testing libraries.

Executive Summary

Muex, Darwin, and Exavier are all mutation testing tools for the BEAM ecosystem. They share the same core idea -- introduce deliberate bugs into code and verify that tests catch them -- but differ significantly in architecture, maturity, feature breadth, and maintenance status. Darwin and Exavier were both created in 2019 and have been unmaintained since late 2020. Muex is actively developed (2026) and represents a generational leap in features and design.

Project Vitals

Muex

  • Version: 0.5.0 (March 2026, 8 published releases)
  • Elixir requirement: ~> 1.14
  • License: MIT
  • Hex downloads: 304 all-time
  • GitHub stars: New project
  • Maintainer: Aleksei Matiushkin (mudasobwa)
  • Last activity: March 2026 (actively maintained)
  • Dependencies: jason ~> 1.4 (runtime), plus dev/test tooling (credo, dialyxir, excoveralls, ex_doc)
  • LOC (lib): ~3,900
  • LOC (tests): ~2,300 (204 passing tests)
  • CI: GitHub Actions with matrix (Elixir 1.14-1.16, OTP 25-26)

Darwin

  • Version: 0.1.0 (only version ever published)
  • Elixir requirement: ~> 1.7
  • License: Not specified
  • Hex downloads: Listed on Hex but minimal
  • GitHub stars: 12
  • Maintainer: tmbb (sole contributor)
  • Last activity: December 2020 (abandoned)
  • Dependencies: parse_trans ~> 3.3, makeup_elixir ~> 0.14, plus benchee, stream_data, ex_doc (dev)
  • LOC (lib): ~15,000+ (estimated from file count and sizes)
  • CI: None

Exavier

  • Version: 0.3.0 (November 2020, 8 releases)
  • Elixir requirement: ~> 1.7
  • License: MIT
  • Hex downloads: 2,243 all-time (most popular of the three)
  • GitHub stars: 101
  • Maintainers: 4 contributors (dnlserrano, Cantido, KingOfRostov, tank-bohr)
  • Last activity: November 2020 (abandoned)
  • Dependencies: None (runtime), ex_doc ~> 0.23 (dev only)
  • LOC (lib): ~1,500 (estimated)
  • CI: Travis CI

Architecture

Muex: Plugin-Based, Language-Agnostic

Muex uses a behaviour-based plugin architecture with clear separation:

  1. Muex.Language behaviour -- Defines interface for language adapters (parse/1, unparse/1, compile/2, file_extensions/0, test_file_pattern/0)
  2. Muex.Mutator behaviour -- Defines interface for mutation strategies (mutate/2, name/0, description/0)
  3. Muex.Loader -- File discovery with glob patterns
  4. Muex.Compiler -- AST mutation application and hot-swapping
  5. Muex.Runner -- Test execution via port-based isolation
  6. Muex.WorkerPool -- GenServer-based parallel execution
  7. Muex.FileAnalyzer -- Intelligent file filtering via code analysis
  8. Muex.MutantOptimizer -- 7-strategy mutation reduction heuristics
  9. Muex.DependencyAnalyzer -- Test dependency graph for targeted execution
  10. Muex.Reporter / Muex.Reporter.Html / Muex.Reporter.Json -- Multi-format reporting
  11. Muex.Config -- Centralized configuration with CLI parsing
  12. Muex.CLI -- Escript entry point

Key design: mutations happen at the Elixir AST level, language adapters provide parse/unparse/compile, and test execution runs in isolated port processes (separate BEAM VM per mutation).

Darwin: Erlang Abstract Code Approach

Darwin takes a fundamentally different approach -- it works at the Erlang abstract code level:

  1. Converts Elixir source to Erlang abstract forms via Elixir-to-Erlang transpilation
  2. Applies mutations at the Erlang abstract code level
  3. Uses a "codon" model inspired by genetics -- mutation points are named codons
  4. Uses process dictionary for thread-local mutation activation (Darwin.ActiveMutation)
  5. Injects runtime dispatch: mutated code contains darwin_was_here/N calls that branch on the active mutation at runtime
  6. Re-runs ExUnit for each mutation using Darwin.TestCase to short-circuit after first failure

This approach means Darwin does not re-compile per mutation -- it compiles once with all mutation points embedded, then activates them one at a time via the process dictionary. This is architecturally clever but comes with its own trade-offs (code bloat from injected dispatchers, mandatory test modification).

Exavier: Direct AST Rewriting

Exavier uses the simplest approach:

  1. Runs code coverage analysis to determine which lines to mutate
  2. For each module (in parallel), for each mutator (sequentially):
  3. Uses Code.compile_quoted/2 with ignore_module_conflict: true for hot-swap
  4. GenServer-based reporter tracks results

The architecture is straightforward but tightly coupled to ExUnit internals and lacks isolation between mutations.

Language Support

Muex

  • Elixir: Full support via Muex.Language.Elixir adapter
  • Erlang: Full support via Muex.Language.Erlang adapter (uses :erl_scan, :erl_parse, :erl_prettypr, :compile)
  • Extensible: Any BEAM language can be supported by implementing the Muex.Language behaviour (5 callbacks)
  • Custom adapters: Registerable via config :muex, languages: %{"lua" => MyApp.Language.Lua} at compile time

Darwin

  • Elixir only (despite working through Erlang abstract code internally)
  • No formal language adapter system
  • Erlang abstract code is an internal implementation detail, not an extensibility point

Exavier

  • Elixir only
  • No language adapter system
  • Tightly coupled to Elixir AST and ExUnit

Mutation Strategies

Muex (6 strategies, 30+ individual mutations)

MutatorMutations
Arithmetic+ <-> -, * <-> /, + -> 0, - -> 0, * -> 1, / -> 1
Comparison== <-> !=, > <-> <, > <-> >=, < <-> <=, >= <-> <=, === <-> !==
Booleanand <-> or, && <-> ||, true <-> false, not x -> x
Literalnumbers +/-1, strings empty/append, empty list -> [:mutated], atoms -> :mutated_atom
FunctionCallRemove calls (replace with nil), swap first two arguments
ConditionalInvert if condition, always-true/always-false branches, unless -> if, remove if

All mutators are selectable via CLI (--mutators arithmetic,comparison). Custom mutators registerable via compile-time config.

Darwin (AOR/ROR + operator mutations)

Darwin uses pitest-inspired naming (AOR = Arithmetic Operator Replacement, ROR = Relational Operator Replacement) and implements mutations via Erlang abstract code transformations. The mutation approach embeds all possible mutations into the code at compile time and selects them at runtime. Specific mutators include:

  • Arithmetic operator replacement (+, -, *, /)
  • Relational operator replacement (==, /=, <, >, >=, =<)
  • Guard rewriting for mutated guards

Darwin's mutator system is based on a @callback mutate/2 that transforms Erlang abstract code forms. Mutators are registered in a default list. The codon-based system means each mutation point gets a unique index.

Exavier (13 mutators)

MutatorBased On
AOR1-AOR4Arithmetic operator replacement (pitest-style)
ROR1-ROR5Relational operator replacement (pitest-style)
IfTrueReplace if condition with true
NegateConditionalsNegate conditional operators
ConditionalsBoundaryChange boundary conditions (> to >=, etc.)
InvertNegativesRemove unary minus

Custom mutators supported via .exavier.exs config file. No boolean mutators, no literal mutators, no function call mutators.

Summary

  • Muex: Broadest mutation coverage. Unique strategies: literal mutation, function call removal/arg swapping, conditional branch removal. Named descriptively (not pitest codes).
  • Darwin: Erlang-level mutations with codon-indexed dispatch. Architecturally novel but harder to reason about.
  • Exavier: pitest-faithful naming, decent arithmetic/relational coverage, but missing boolean, literal, function call, and advanced conditional mutations.

Test Execution Model

Muex: Port-Based Isolation

  • Each mutation is tested in a separate BEAM VM spawned via Erlang ports
  • The worker pool writes the mutated source to the original file, deletes the .beam cache, runs mix test in a subprocess, then restores the original
  • Complete process isolation prevents any mutation side-effects from leaking
  • Supports incremental compilation (only the mutated module is recompiled)
  • Configurable concurrency via --concurrency flag and Muex.WorkerPool GenServer
  • Configurable timeout per mutation (--timeout)
  • Test dependency analysis selects only relevant test files per mutation

Darwin: In-Process Runtime Dispatch

  • Code is compiled once with all mutations embedded as runtime branches
  • Active mutation is selected via process dictionary (Darwin.ActiveMutation)
  • ExUnit runs within the same BEAM VM
  • Darwin.TestCase overrides test/2 and test/3 macros to short-circuit after first failure
  • Pro: No re-compilation overhead per mutation (fastest possible switching)
  • Con: Requires modifying test files (use Darwin.TestCase), modifying test_helper.exs, and listing modules in mix.exs

Exavier: In-Process Hot-Swap

  • Uses Code.compile_quoted/2 with ignore_module_conflict: true
  • Re-requires test files and calls ExUnit.run() per mutation
  • Parallel per module via Task.async_stream
  • Sequential per mutator within each module
  • No process isolation -- mutations share the same VM
  • Coverage analysis as a pre-processing step to identify lines to mutate

Configuration and CLI

Muex

Muex offers the richest CLI and configuration:

  • Entry points: mix muex (Mix task), muex (escript binary), mix archive.install hex muex (global archive)
  • File selection: --files with directory, file, or glob patterns (**/*.ex, {a,b}/**/*.ex)
  • Umbrella support: --app my_app automatically sets --files and --test-paths
  • Test paths: --test-paths "test/unit,test/integration" with glob expansion
  • Output formats: --format terminal|json|html
  • Score threshold: --fail-at 80 (CI/CD integration)
  • Filtering: --no-filter, --min-score, --max-mutations
  • Optimization: --optimize, --optimize-level conservative|balanced|aggressive, --min-complexity, --max-per-function
  • Compile-time config: Register custom language adapters and mutators in config/config.exs
  • Centralized config struct: %Muex.Config{} with typed fields and validation

Darwin

  • Modules to mutate listed in mix.exs under the :darwin key
  • No CLI flags documented
  • Requires modifying test_helper.exs and all test modules
  • No filtering, no optimization, no output format selection
  • HTML reporter outputs to darwin/reports/html/

Exavier

  • Run via mix exavier.test
  • Configuration via .exavier.exs dotfile:
    • :threshold -- mutation coverage threshold (default: 67%)
    • :test_files_to_modules -- custom test-to-module mapping
    • :custom_mutators -- additional mutator modules
  • No CLI flags for file selection, mutator selection, or concurrency

Intelligent Features

Muex (Unique)

File Analyzer (Muex.FileAnalyzer):

  • Scores files 0-100 based on code characteristics
  • Automatically excludes: Mix tasks, supervisors, application modules, behaviour definitions, protocols, reporters, dependency code
  • Scores based on: function count, conditionals, arithmetic, comparisons, pattern matching, cyclomatic complexity
  • Configurable minimum score threshold (--min-score)

Mutation Optimizer (Muex.MutantOptimizer):

  • 7 optimization strategies: equivalent mutant detection, impact scoring, complexity filtering, mutation clustering, per-function limits, boundary prioritization, pattern-based filtering
  • 3 presets: conservative (50-65% reduction, <1% score impact), balanced (70-85% reduction), aggressive (85-95% reduction)
  • Benchmark on Calculator project (76 LOC, 20 tests): 85 mutations reduced to 31 (63.5% reduction), score preserved at 100%

Test Dependency Analysis (Muex.DependencyAnalyzer):

  • Parses test files to extract module references (aliases, imports, function calls, describe/test strings)
  • Runs only tests that depend on the mutated module
  • Falls back to full test suite when no dependencies found

Darwin

  • Fast-fail: stops test suite after first failure per mutation (via Darwin.TestCase)
  • Debug output: writes mutated Erlang and Elixir source to _darwin_debug/

Exavier

  • Code coverage pre-processing: only mutates lines that are covered by tests
  • Threshold-based pass/fail (default 67%)

Reporting

Muex

  • Terminal: Color-coded output (ANSI), progress dots, categorized summary (killed/survived/invalid/timeout), survived mutation details with file:line
  • JSON: Structured JSON with full mutation details, CI/CD friendly (muex-report.json)
  • HTML: Interactive report with filter buttons, summary cards, color-coded mutations, responsive design (muex-report.html)

Darwin

  • Console logging: green (killed), red (survived)
  • HTML reporter: outputs to darwin/reports/html/ (described as "under heavy development")

Exavier

  • Console only: dots for progress (green = killed, red = survived)
  • Diff-style output showing original vs. mutated code for survived mutations
  • Summary line with percentages

Benchmark: Calculator Project

Test subject: Calculator module (76 LOC, 4 functions: add, subtract, multiply, divide with guards and error handling). 20 ExUnit tests.

Muex Results

Without optimization (--no-optimize --no-filter):

  • Mutations generated: 85
  • Killed: 85 (100%)
  • Survived: 0
  • Invalid: 0
  • Wall time: ~10.8s

With conservative optimization (--optimize --optimize-level conservative):

  • Mutations generated: 85 -> 31 after optimization (63.5% reduction)
  • Killed: 31 (100%)
  • Survived: 0
  • Wall time: ~4.2s (61% faster)

Darwin / Exavier

Both Darwin and Exavier have been abandoned since 2020 and require Elixir ~> 1.7. They cannot be installed in a modern Elixir 1.17+ project without dependency conflicts. Darwin's parse_trans dependency and Exavier's compilation model are incompatible with current Elixir/OTP releases. Therefore, direct benchmark comparison on the same codebase is not feasible.

Historical context: Exavier's README shows an example of 22 tests producing 27.27% mutation coverage on a simple hello-world module, with no timing data provided. Darwin provides no benchmark data.

Code Quality and Testing

Muex

  • 204 passing tests covering all major components
  • Test coverage for: Config, DependencyAnalyzer, Loader, all 6 mutators, Reporter, JSON Reporter, TestRunner.Port, WorkerPool, Language.Elixir, integration tests
  • Quality pipeline: mix quality runs formatter + credo --strict + dialyzer
  • CI matrix: Elixir 1.14-1.16, OTP 25-26
  • Typespecs on all public functions
  • @moduledoc and @doc on all modules and public functions
  • Documentation published on HexDocs with guides (Installation, Usage, Mutation Optimization)

Darwin

  • Tests present but minimal (uses stream_data for property-based testing of mutators)
  • No CI pipeline
  • No typespecs visible in main modules
  • Documentation incomplete (many TODO: document this comments)

Exavier

  • Self-described as "proof-of-concept" with a lengthy "To be done" list
  • Has tests but author admits they need "way more tests (OMG the irony)"
  • Travis CI (now defunct service)
  • Minimal typespecs
  • Published on HexDocs

Extensibility

Muex

  • Language adapters: Implement Muex.Language behaviour (5 callbacks), register via config
  • Mutators: Implement Muex.Mutator behaviour (3 callbacks), register via config
  • Compile-time registration: config :muex, languages: %{...}, mutators: %{...}
  • Programmatic API: Muex.run(%Muex.Config{}) returns {:ok, %{results: [...], score: float}}
  • Three installation modes: Mix dependency, hex archive (global), escript binary
  • Umbrella support: --app flag for targeting specific apps

Darwin

  • Mutators implement Darwin.Mutator callback, registered in default list
  • No compile-time or runtime registration mechanism for end users
  • No programmatic API
  • Mix dependency only

Exavier

  • Custom mutators via Exavier.Mutators.Mutator behaviour (2 callbacks)
  • Registration via .exavier.exs config file
  • No programmatic API
  • Mix dependency only

Integration and Deployment

Muex

  • CI/CD: JSON output format for machine consumption, --fail-at for threshold gates
  • Three distribution modes: dependency, archive, escript
  • Umbrella-aware with --app flag
  • Custom test path selection for monorepo/polyrepo setups

Darwin

  • Requires invasive changes to project (modify mix.exs, test_helper.exs, all test files)
  • HTML report generation
  • No CI/CD integration features

Exavier

  • Threshold-based exit code (pass/fail)
  • No JSON/HTML output
  • No CI/CD-specific features

Known Limitations

Muex

  • Port-based execution adds overhead per mutation (~100-200ms per test run vs in-process)
  • File-system based mutation (writes to source files temporarily, though with backup/restore)
  • Young project with growing download base

Darwin

  • Requires modifying all test files with use Darwin.TestCase
  • Requires modifying test_helper.exs
  • Requires listing all modules to mutate in mix.exs
  • Single contributor, abandoned since December 2020
  • Erlang abstract code approach means mutations may not perfectly map to Elixir source
  • No license specified

Exavier

  • Self-described proof-of-concept
  • Cannot tune which mutators are used (planned but unimplemented)
  • No parallel mutation within a module
  • Relies on ignore_module_conflict: true with no isolation
  • Hardcoded test file discovery (test/**/*_test.exs)
  • Abandoned since November 2020
  • No fast-fail mechanism (planned but unimplemented)

Feature Matrix

FeatureMuexDarwinExavier
StatusActive (2026)Abandoned (2020)Abandoned (2020)
Elixir supportYesYesYes
Erlang supportYesNoNo
Custom language adaptersYes (behaviour)NoNo
Arithmetic mutationsYes (6)YesYes (AOR1-4)
Comparison mutationsYes (12)YesYes (ROR1-5)
Boolean mutationsYes (7)NoNo
Literal mutationsYes (8)NoNo
Function call mutationsYes (2)NoNo
Conditional mutationsYes (8)NoYes (3)
Custom mutatorsYes (compile-time config)Yes (code-level)Yes (.exavier.exs)
Mutator selectionCLI flagNoNo (planned)
Process isolationPort (separate VM)In-process (pdict)In-process (hot-swap)
Parallel executionGenServer worker poolNoPer-module
Configurable concurrencyYes (--concurrency)NoNo
Intelligent file filteringYes (FileAnalyzer)NoNo
Mutation optimizationYes (7 strategies, 3 presets)NoNo
Test dependency analysisYesNoNo
Coverage-based filteringVia FileAnalyzerNoYes (pre-processing)
Umbrella supportYes (--app flag)NoNo
Custom test pathsYes (--test-paths)NoNo
Terminal outputColor-coded, progress dotsColor-codedColor-coded, diffs
JSON outputYesNoNo
HTML outputYes (interactive)Yes (basic)No
Score thresholdYes (--fail-at)NoYes (threshold in config)
Mix taskYes (mix muex)No (programmatic)Yes (mix exavier.test)
EscriptYesNoNo
Hex archive installYesNoNo
Programmatic APIYes (Muex.run/1)Yes (Mutator.mutate_compile_and_load_module/1)No
DocumentationHexDocs + guidesHexDocs (incomplete)HexDocs
TypespecsComprehensivePartialMinimal
Test suite204 testsMinimalMinimal
CI/CDGitHub Actions matrixNoneTravis CI (defunct)

Verdict

Muex is the only actively maintained option and offers by far the most complete feature set. Its language-agnostic architecture, intelligent file filtering, mutation optimization, test dependency analysis, multiple output formats, and umbrella support make it suitable for production CI/CD pipelines. The trade-off is port-based execution overhead, which the optimizer mitigates effectively (63.5% mutation reduction with no score impact on the calculator benchmark).

Darwin introduced an innovative compile-once-dispatch-at-runtime approach that eliminates per-mutation compilation overhead. This is architecturally interesting but comes at the cost of invasive project modifications and an Erlang-abstract-code complexity that makes the codebase harder to extend. It has been abandoned for over 5 years.

Exavier is the most well-known of the three (101 stars, 2,243 downloads) and pioneered Elixir mutation testing with a clean, simple design. However, the author explicitly described it as a proof-of-concept, and several planned features were never implemented. It has been abandoned for over 5 years.

For any new project considering mutation testing in 2026, Muex is the clear choice -- it is the only option that works with modern Elixir/OTP, is actively maintained, and provides the tooling depth needed for real-world adoption.