Elixir Mutation Testing Libraries: Muex vs Darwin vs Exavier

A comprehensive comparison of the three Elixir mutation testing libraries.

Executive Summary

Muex, Darwin, and Exavier are all mutation testing tools for the BEAM ecosystem. They share the same core idea -- introduce deliberate bugs into code and verify that tests catch them -- but differ significantly in architecture, maturity, feature breadth, and maintenance status. Darwin and Exavier were both created in 2019 and have been unmaintained since late 2020. Muex is actively developed (2026) and represents a generational leap in features and design.

Project Vitals

Muex

Version: 0.5.0 (March 2026, 8 published releases)
Elixir requirement: ~> 1.14
License: MIT
Hex downloads: 304 all-time
GitHub stars: New project
Maintainer: Aleksei Matiushkin (mudasobwa)
Last activity: March 2026 (actively maintained)
Dependencies: jason ~> 1.4 (runtime), plus dev/test tooling (credo, dialyxir, excoveralls, ex_doc)
LOC (lib): ~3,900
LOC (tests): ~2,300 (204 passing tests)
CI: GitHub Actions with matrix (Elixir 1.14-1.16, OTP 25-26)

Darwin

Version: 0.1.0 (only version ever published)
Elixir requirement: ~> 1.7
License: Not specified
Hex downloads: Listed on Hex but minimal
GitHub stars: 12
Maintainer: tmbb (sole contributor)
Last activity: December 2020 (abandoned)
Dependencies: parse_trans ~> 3.3, makeup_elixir ~> 0.14, plus benchee, stream_data, ex_doc (dev)
LOC (lib): ~15,000+ (estimated from file count and sizes)
CI: None

Exavier

Version: 0.3.0 (November 2020, 8 releases)
Elixir requirement: ~> 1.7
License: MIT
Hex downloads: 2,243 all-time (most popular of the three)
GitHub stars: 101
Maintainers: 4 contributors (dnlserrano, Cantido, KingOfRostov, tank-bohr)
Last activity: November 2020 (abandoned)
Dependencies: None (runtime), ex_doc ~> 0.23 (dev only)
LOC (lib): ~1,500 (estimated)
CI: Travis CI

Architecture

Muex: Plugin-Based, Language-Agnostic

Muex uses a behaviour-based plugin architecture with clear separation:

Muex.Language behaviour -- Defines interface for language adapters (parse/1, unparse/1, compile/2, file_extensions/0, test_file_pattern/0)
Muex.Mutator behaviour -- Defines interface for mutation strategies (mutate/2, name/0, description/0)
Muex.Loader -- File discovery with glob patterns
Muex.Compiler -- AST mutation application and hot-swapping
Muex.Runner -- Test execution via port-based isolation
Muex.WorkerPool -- GenServer-based parallel execution
Muex.FileAnalyzer -- Intelligent file filtering via code analysis
Muex.MutantOptimizer -- 7-strategy mutation reduction heuristics
Muex.DependencyAnalyzer -- Test dependency graph for targeted execution
Muex.Reporter / Muex.Reporter.Html / Muex.Reporter.Json -- Multi-format reporting
Muex.Config -- Centralized configuration with CLI parsing
Muex.CLI -- Escript entry point

Key design: mutations happen at the Elixir AST level, language adapters provide parse/unparse/compile, and test execution runs in isolated port processes (separate BEAM VM per mutation).

Darwin: Erlang Abstract Code Approach

Darwin takes a fundamentally different approach -- it works at the Erlang abstract code level:

Converts Elixir source to Erlang abstract forms via Elixir-to-Erlang transpilation
Applies mutations at the Erlang abstract code level
Uses a "codon" model inspired by genetics -- mutation points are named codons
Uses process dictionary for thread-local mutation activation (Darwin.ActiveMutation)
Injects runtime dispatch: mutated code contains darwin_was_here/N calls that branch on the active mutation at runtime
Re-runs ExUnit for each mutation using Darwin.TestCase to short-circuit after first failure

This approach means Darwin does not re-compile per mutation -- it compiles once with all mutation points embedded, then activates them one at a time via the process dictionary. This is architecturally clever but comes with its own trade-offs (code bloat from injected dispatchers, mandatory test modification).

Exavier: Direct AST Rewriting

Exavier uses the simplest approach:

Runs code coverage analysis to determine which lines to mutate
For each module (in parallel), for each mutator (sequentially):
- Rewrites the quoted AST using Code.compile_quoted/2
- Re-requires the test file
- Runs ExUnit
Uses Code.compile_quoted/2 with ignore_module_conflict: true for hot-swap
GenServer-based reporter tracks results

The architecture is straightforward but tightly coupled to ExUnit internals and lacks isolation between mutations.

Language Support

Muex

Elixir: Full support via Muex.Language.Elixir adapter
Erlang: Full support via Muex.Language.Erlang adapter (uses :erl_scan, :erl_parse, :erl_prettypr, :compile)
Extensible: Any BEAM language can be supported by implementing the Muex.Language behaviour (5 callbacks)
Custom adapters: Registerable via config :muex, languages: %{"lua" => MyApp.Language.Lua} at compile time

Darwin

Elixir only (despite working through Erlang abstract code internally)
No formal language adapter system
Erlang abstract code is an internal implementation detail, not an extensibility point

Exavier

Elixir only
No language adapter system
Tightly coupled to Elixir AST and ExUnit

Mutation Strategies

Muex (6 strategies, 30+ individual mutations)

Mutator	Mutations
Arithmetic	`+` <-> `-`, `` <-> `/`, `+` -> `0`, `-` -> `0`, `` -> `1`, `/` -> `1`
Comparison	`==` <-> `!=`, `>` <-> `<`, `>` <-> `>=`, `<` <-> `<=`, `>=` <-> `<=`, `===` <-> `!==`
Boolean	`and` <-> `or`, `&&` <-> `\|\|`, `true` <-> `false`, `not x` -> `x`
Literal	numbers +/-1, strings empty/append, empty list -> `[:mutated]`, atoms -> `:mutated_atom`
FunctionCall	Remove calls (replace with `nil`), swap first two arguments
Conditional	Invert `if` condition, always-true/always-false branches, `unless` -> `if`, remove `if`

All mutators are selectable via CLI (--mutators arithmetic,comparison). Custom mutators registerable via compile-time config.

Darwin (AOR/ROR + operator mutations)

Darwin uses pitest-inspired naming (AOR = Arithmetic Operator Replacement, ROR = Relational Operator Replacement) and implements mutations via Erlang abstract code transformations. The mutation approach embeds all possible mutations into the code at compile time and selects them at runtime. Specific mutators include:

Arithmetic operator replacement (+, -, *, /)
Relational operator replacement (==, /=, <, >, >=, =<)
Guard rewriting for mutated guards

Darwin's mutator system is based on a @callback mutate/2 that transforms Erlang abstract code forms. Mutators are registered in a default list. The codon-based system means each mutation point gets a unique index.

Exavier (13 mutators)

Mutator	Based On
AOR1-AOR4	Arithmetic operator replacement (pitest-style)
ROR1-ROR5	Relational operator replacement (pitest-style)
IfTrue	Replace `if` condition with `true`
NegateConditionals	Negate conditional operators
ConditionalsBoundary	Change boundary conditions (> to >=, etc.)
InvertNegatives	Remove unary minus

Custom mutators supported via .exavier.exs config file. No boolean mutators, no literal mutators, no function call mutators.

Summary

Muex: Broadest mutation coverage. Unique strategies: literal mutation, function call removal/arg swapping, conditional branch removal. Named descriptively (not pitest codes).
Darwin: Erlang-level mutations with codon-indexed dispatch. Architecturally novel but harder to reason about.
Exavier: pitest-faithful naming, decent arithmetic/relational coverage, but missing boolean, literal, function call, and advanced conditional mutations.

Test Execution Model

Muex: Port-Based Isolation

Each mutation is tested in a separate BEAM VM spawned via Erlang ports
The worker pool writes the mutated source to the original file, deletes the .beam cache, runs mix test in a subprocess, then restores the original
Complete process isolation prevents any mutation side-effects from leaking
Supports incremental compilation (only the mutated module is recompiled)
Configurable concurrency via --concurrency flag and Muex.WorkerPool GenServer
Configurable timeout per mutation (--timeout)
Test dependency analysis selects only relevant test files per mutation

Darwin: In-Process Runtime Dispatch

Code is compiled once with all mutations embedded as runtime branches
Active mutation is selected via process dictionary (Darwin.ActiveMutation)
ExUnit runs within the same BEAM VM
Darwin.TestCase overrides test/2 and test/3 macros to short-circuit after first failure
Pro: No re-compilation overhead per mutation (fastest possible switching)
Con: Requires modifying test files (use Darwin.TestCase), modifying test_helper.exs, and listing modules in mix.exs

Exavier: In-Process Hot-Swap

Uses Code.compile_quoted/2 with ignore_module_conflict: true
Re-requires test files and calls ExUnit.run() per mutation
Parallel per module via Task.async_stream
Sequential per mutator within each module
No process isolation -- mutations share the same VM
Coverage analysis as a pre-processing step to identify lines to mutate

Configuration and CLI

Muex

Muex offers the richest CLI and configuration:

Entry points: mix muex (Mix task), muex (escript binary), mix archive.install hex muex (global archive)
File selection: --files with directory, file, or glob patterns (**/*.ex, {a,b}/**/*.ex)
Umbrella support: --app my_app automatically sets --files and --test-paths
Test paths: --test-paths "test/unit,test/integration" with glob expansion
Output formats: --format terminal|json|html
Score threshold: --fail-at 80 (CI/CD integration)
Filtering: --no-filter, --min-score, --max-mutations
Optimization: --optimize, --optimize-level conservative|balanced|aggressive, --min-complexity, --max-per-function
Compile-time config: Register custom language adapters and mutators in config/config.exs
Centralized config struct: %Muex.Config{} with typed fields and validation

Darwin

Modules to mutate listed in mix.exs under the :darwin key
No CLI flags documented
Requires modifying test_helper.exs and all test modules
No filtering, no optimization, no output format selection
HTML reporter outputs to darwin/reports/html/

Exavier

Run via mix exavier.test
Configuration via .exavier.exs dotfile:
- :threshold -- mutation coverage threshold (default: 67%)
- :test_files_to_modules -- custom test-to-module mapping
- :custom_mutators -- additional mutator modules
No CLI flags for file selection, mutator selection, or concurrency

Intelligent Features

Muex (Unique)

File Analyzer (Muex.FileAnalyzer):

Scores files 0-100 based on code characteristics
Automatically excludes: Mix tasks, supervisors, application modules, behaviour definitions, protocols, reporters, dependency code
Scores based on: function count, conditionals, arithmetic, comparisons, pattern matching, cyclomatic complexity
Configurable minimum score threshold (--min-score)

Mutation Optimizer (Muex.MutantOptimizer):

7 optimization strategies: equivalent mutant detection, impact scoring, complexity filtering, mutation clustering, per-function limits, boundary prioritization, pattern-based filtering
3 presets: conservative (50-65% reduction, <1% score impact), balanced (70-85% reduction), aggressive (85-95% reduction)
Benchmark on Calculator project (76 LOC, 20 tests): 85 mutations reduced to 31 (63.5% reduction), score preserved at 100%

Test Dependency Analysis (Muex.DependencyAnalyzer):

Parses test files to extract module references (aliases, imports, function calls, describe/test strings)
Runs only tests that depend on the mutated module
Falls back to full test suite when no dependencies found

Darwin

Fast-fail: stops test suite after first failure per mutation (via Darwin.TestCase)
Debug output: writes mutated Erlang and Elixir source to _darwin_debug/

Exavier

Code coverage pre-processing: only mutates lines that are covered by tests
Threshold-based pass/fail (default 67%)

Reporting

Muex

Terminal: Color-coded output (ANSI), progress dots, categorized summary (killed/survived/invalid/timeout), survived mutation details with file:line
JSON: Structured JSON with full mutation details, CI/CD friendly (muex-report.json)
HTML: Interactive report with filter buttons, summary cards, color-coded mutations, responsive design (muex-report.html)

Darwin

Console logging: green (killed), red (survived)
HTML reporter: outputs to darwin/reports/html/ (described as "under heavy development")

Exavier

Console only: dots for progress (green = killed, red = survived)
Diff-style output showing original vs. mutated code for survived mutations
Summary line with percentages

Benchmark: Calculator Project

Test subject: Calculator module (76 LOC, 4 functions: add, subtract, multiply, divide with guards and error handling). 20 ExUnit tests.

Muex Results

Without optimization (--no-optimize --no-filter):

Mutations generated: 85
Killed: 85 (100%)
Survived: 0
Invalid: 0
Wall time: ~10.8s

With conservative optimization (--optimize --optimize-level conservative):

Mutations generated: 85 -> 31 after optimization (63.5% reduction)
Killed: 31 (100%)
Survived: 0
Wall time: ~4.2s (61% faster)

Darwin / Exavier

Both Darwin and Exavier have been abandoned since 2020 and require Elixir ~> 1.7. They cannot be installed in a modern Elixir 1.17+ project without dependency conflicts. Darwin's parse_trans dependency and Exavier's compilation model are incompatible with current Elixir/OTP releases. Therefore, direct benchmark comparison on the same codebase is not feasible.

Historical context: Exavier's README shows an example of 22 tests producing 27.27% mutation coverage on a simple hello-world module, with no timing data provided. Darwin provides no benchmark data.

Code Quality and Testing

Muex

204 passing tests covering all major components
Test coverage for: Config, DependencyAnalyzer, Loader, all 6 mutators, Reporter, JSON Reporter, TestRunner.Port, WorkerPool, Language.Elixir, integration tests
Quality pipeline: mix quality runs formatter + credo --strict + dialyzer
CI matrix: Elixir 1.14-1.16, OTP 25-26
Typespecs on all public functions
@moduledoc and @doc on all modules and public functions
Documentation published on HexDocs with guides (Installation, Usage, Mutation Optimization)

Darwin

Tests present but minimal (uses stream_data for property-based testing of mutators)
No CI pipeline
No typespecs visible in main modules
Documentation incomplete (many TODO: document this comments)

Exavier

Self-described as "proof-of-concept" with a lengthy "To be done" list
Has tests but author admits they need "way more tests (OMG the irony)"
Travis CI (now defunct service)
Minimal typespecs
Published on HexDocs

Extensibility

Muex

Language adapters: Implement Muex.Language behaviour (5 callbacks), register via config
Mutators: Implement Muex.Mutator behaviour (3 callbacks), register via config
Compile-time registration: config :muex, languages: %{...}, mutators: %{...}
Programmatic API: Muex.run(%Muex.Config{}) returns {:ok, %{results: [...], score: float}}
Three installation modes: Mix dependency, hex archive (global), escript binary
Umbrella support: --app flag for targeting specific apps

Darwin

Mutators implement Darwin.Mutator callback, registered in default list
No compile-time or runtime registration mechanism for end users
No programmatic API
Mix dependency only

Exavier

Custom mutators via Exavier.Mutators.Mutator behaviour (2 callbacks)
Registration via .exavier.exs config file
No programmatic API
Mix dependency only

Integration and Deployment

Muex

CI/CD: JSON output format for machine consumption, --fail-at for threshold gates
Three distribution modes: dependency, archive, escript
Umbrella-aware with --app flag
Custom test path selection for monorepo/polyrepo setups

Darwin

Requires invasive changes to project (modify mix.exs, test_helper.exs, all test files)
HTML report generation
No CI/CD integration features

Exavier

Threshold-based exit code (pass/fail)
No JSON/HTML output
No CI/CD-specific features

Known Limitations

Muex

Port-based execution adds overhead per mutation (~100-200ms per test run vs in-process)
File-system based mutation (writes to source files temporarily, though with backup/restore)
Young project with growing download base

Darwin

Requires modifying all test files with use Darwin.TestCase
Requires modifying test_helper.exs
Requires listing all modules to mutate in mix.exs
Single contributor, abandoned since December 2020
Erlang abstract code approach means mutations may not perfectly map to Elixir source
No license specified

Exavier

Self-described proof-of-concept
Cannot tune which mutators are used (planned but unimplemented)
No parallel mutation within a module
Relies on ignore_module_conflict: true with no isolation
Hardcoded test file discovery (test/**/*_test.exs)
Abandoned since November 2020
No fast-fail mechanism (planned but unimplemented)

Feature Matrix

Feature	Muex	Darwin	Exavier
Status	Active (2026)	Abandoned (2020)	Abandoned (2020)
Elixir support	Yes	Yes	Yes
Erlang support	Yes	No	No
Custom language adapters	Yes (behaviour)	No	No
Arithmetic mutations	Yes (6)	Yes	Yes (AOR1-4)
Comparison mutations	Yes (12)	Yes	Yes (ROR1-5)
Boolean mutations	Yes (7)	No	No
Literal mutations	Yes (8)	No	No
Function call mutations	Yes (2)	No	No
Conditional mutations	Yes (8)	No	Yes (3)
Custom mutators	Yes (compile-time config)	Yes (code-level)	Yes (.exavier.exs)
Mutator selection	CLI flag	No	No (planned)
Process isolation	Port (separate VM)	In-process (pdict)	In-process (hot-swap)
Parallel execution	GenServer worker pool	No	Per-module
Configurable concurrency	Yes (--concurrency)	No	No
Intelligent file filtering	Yes (FileAnalyzer)	No	No
Mutation optimization	Yes (7 strategies, 3 presets)	No	No
Test dependency analysis	Yes	No	No
Coverage-based filtering	Via FileAnalyzer	No	Yes (pre-processing)
Umbrella support	Yes (--app flag)	No	No
Custom test paths	Yes (--test-paths)	No	No
Terminal output	Color-coded, progress dots	Color-coded	Color-coded, diffs
JSON output	Yes	No	No
HTML output	Yes (interactive)	Yes (basic)	No
Score threshold	Yes (--fail-at)	No	Yes (threshold in config)
Mix task	Yes (mix muex)	No (programmatic)	Yes (mix exavier.test)
Escript	Yes	No	No
Hex archive install	Yes	No	No
Programmatic API	Yes (Muex.run/1)	Yes (Mutator.mutate_compile_and_load_module/1)	No
Documentation	HexDocs + guides	HexDocs (incomplete)	HexDocs
Typespecs	Comprehensive	Partial	Minimal
Test suite	204 tests	Minimal	Minimal
CI/CD	GitHub Actions matrix	None	Travis CI (defunct)

Verdict

Muex is the only actively maintained option and offers by far the most complete feature set. Its language-agnostic architecture, intelligent file filtering, mutation optimization, test dependency analysis, multiple output formats, and umbrella support make it suitable for production CI/CD pipelines. The trade-off is port-based execution overhead, which the optimizer mitigates effectively (63.5% mutation reduction with no score impact on the calculator benchmark).

Darwin introduced an innovative compile-once-dispatch-at-runtime approach that eliminates per-mutation compilation overhead. This is architecturally interesting but comes at the cost of invasive project modifications and an Erlang-abstract-code complexity that makes the codebase harder to extend. It has been abandoned for over 5 years.

Exavier is the most well-known of the three (101 stars, 2,243 downloads) and pioneered Elixir mutation testing with a clean, simple design. However, the author explicitly described it as a proof-of-concept, and several planned features were never implemented. It has been abandoned for over 5 years.

For any new project considering mutation testing in 2026, Muex is the clear choice -- it is the only option that works with modern Elixir/OTP, is actively maintained, and provides the tooling depth needed for real-world adoption.

← Previous Page Mutation Optimization