Elixir Mutation Testing Libraries: Muex vs Darwin vs Exavier
View SourceA comprehensive comparison of the three Elixir mutation testing libraries.
Executive Summary
Muex, Darwin, and Exavier are all mutation testing tools for the BEAM ecosystem. They share the same core idea -- introduce deliberate bugs into code and verify that tests catch them -- but differ significantly in architecture, maturity, feature breadth, and maintenance status. Darwin and Exavier were both created in 2019 and have been unmaintained since late 2020. Muex is actively developed (2026) and represents a generational leap in features and design.
Project Vitals
Muex
- Version: 0.5.0 (March 2026, 8 published releases)
- Elixir requirement: ~> 1.14
- License: MIT
- Hex downloads: 304 all-time
- GitHub stars: New project
- Maintainer: Aleksei Matiushkin (mudasobwa)
- Last activity: March 2026 (actively maintained)
- Dependencies: jason ~> 1.4 (runtime), plus dev/test tooling (credo, dialyxir, excoveralls, ex_doc)
- LOC (lib): ~3,900
- LOC (tests): ~2,300 (204 passing tests)
- CI: GitHub Actions with matrix (Elixir 1.14-1.16, OTP 25-26)
Darwin
- Version: 0.1.0 (only version ever published)
- Elixir requirement: ~> 1.7
- License: Not specified
- Hex downloads: Listed on Hex but minimal
- GitHub stars: 12
- Maintainer: tmbb (sole contributor)
- Last activity: December 2020 (abandoned)
- Dependencies: parse_trans ~> 3.3, makeup_elixir ~> 0.14, plus benchee, stream_data, ex_doc (dev)
- LOC (lib): ~15,000+ (estimated from file count and sizes)
- CI: None
Exavier
- Version: 0.3.0 (November 2020, 8 releases)
- Elixir requirement: ~> 1.7
- License: MIT
- Hex downloads: 2,243 all-time (most popular of the three)
- GitHub stars: 101
- Maintainers: 4 contributors (dnlserrano, Cantido, KingOfRostov, tank-bohr)
- Last activity: November 2020 (abandoned)
- Dependencies: None (runtime), ex_doc ~> 0.23 (dev only)
- LOC (lib): ~1,500 (estimated)
- CI: Travis CI
Architecture
Muex: Plugin-Based, Language-Agnostic
Muex uses a behaviour-based plugin architecture with clear separation:
Muex.Languagebehaviour -- Defines interface for language adapters (parse/1,unparse/1,compile/2,file_extensions/0,test_file_pattern/0)Muex.Mutatorbehaviour -- Defines interface for mutation strategies (mutate/2,name/0,description/0)Muex.Loader-- File discovery with glob patternsMuex.Compiler-- AST mutation application and hot-swappingMuex.Runner-- Test execution via port-based isolationMuex.WorkerPool-- GenServer-based parallel executionMuex.FileAnalyzer-- Intelligent file filtering via code analysisMuex.MutantOptimizer-- 7-strategy mutation reduction heuristicsMuex.DependencyAnalyzer-- Test dependency graph for targeted executionMuex.Reporter/Muex.Reporter.Html/Muex.Reporter.Json-- Multi-format reportingMuex.Config-- Centralized configuration with CLI parsingMuex.CLI-- Escript entry point
Key design: mutations happen at the Elixir AST level, language adapters provide parse/unparse/compile, and test execution runs in isolated port processes (separate BEAM VM per mutation).
Darwin: Erlang Abstract Code Approach
Darwin takes a fundamentally different approach -- it works at the Erlang abstract code level:
- Converts Elixir source to Erlang abstract forms via Elixir-to-Erlang transpilation
- Applies mutations at the Erlang abstract code level
- Uses a "codon" model inspired by genetics -- mutation points are named codons
- Uses process dictionary for thread-local mutation activation (
Darwin.ActiveMutation) - Injects runtime dispatch: mutated code contains
darwin_was_here/Ncalls that branch on the active mutation at runtime - Re-runs ExUnit for each mutation using
Darwin.TestCaseto short-circuit after first failure
This approach means Darwin does not re-compile per mutation -- it compiles once with all mutation points embedded, then activates them one at a time via the process dictionary. This is architecturally clever but comes with its own trade-offs (code bloat from injected dispatchers, mandatory test modification).
Exavier: Direct AST Rewriting
Exavier uses the simplest approach:
- Runs code coverage analysis to determine which lines to mutate
- For each module (in parallel), for each mutator (sequentially):
- Rewrites the quoted AST using
Code.compile_quoted/2 - Re-requires the test file
- Runs ExUnit
- Rewrites the quoted AST using
- Uses
Code.compile_quoted/2withignore_module_conflict: truefor hot-swap - GenServer-based reporter tracks results
The architecture is straightforward but tightly coupled to ExUnit internals and lacks isolation between mutations.
Language Support
Muex
- Elixir: Full support via
Muex.Language.Elixiradapter - Erlang: Full support via
Muex.Language.Erlangadapter (uses:erl_scan,:erl_parse,:erl_prettypr,:compile) - Extensible: Any BEAM language can be supported by implementing the
Muex.Languagebehaviour (5 callbacks) - Custom adapters: Registerable via
config :muex, languages: %{"lua" => MyApp.Language.Lua}at compile time
Darwin
- Elixir only (despite working through Erlang abstract code internally)
- No formal language adapter system
- Erlang abstract code is an internal implementation detail, not an extensibility point
Exavier
- Elixir only
- No language adapter system
- Tightly coupled to Elixir AST and ExUnit
Mutation Strategies
Muex (6 strategies, 30+ individual mutations)
| Mutator | Mutations |
|---|---|
| Arithmetic | + <-> -, * <-> /, + -> 0, - -> 0, * -> 1, / -> 1 |
| Comparison | == <-> !=, > <-> <, > <-> >=, < <-> <=, >= <-> <=, === <-> !== |
| Boolean | and <-> or, && <-> ||, true <-> false, not x -> x |
| Literal | numbers +/-1, strings empty/append, empty list -> [:mutated], atoms -> :mutated_atom |
| FunctionCall | Remove calls (replace with nil), swap first two arguments |
| Conditional | Invert if condition, always-true/always-false branches, unless -> if, remove if |
All mutators are selectable via CLI (--mutators arithmetic,comparison). Custom mutators registerable via compile-time config.
Darwin (AOR/ROR + operator mutations)
Darwin uses pitest-inspired naming (AOR = Arithmetic Operator Replacement, ROR = Relational Operator Replacement) and implements mutations via Erlang abstract code transformations. The mutation approach embeds all possible mutations into the code at compile time and selects them at runtime. Specific mutators include:
- Arithmetic operator replacement (+, -, *, /)
- Relational operator replacement (==, /=, <, >, >=, =<)
- Guard rewriting for mutated guards
Darwin's mutator system is based on a @callback mutate/2 that transforms Erlang abstract code forms. Mutators are registered in a default list. The codon-based system means each mutation point gets a unique index.
Exavier (13 mutators)
| Mutator | Based On |
|---|---|
| AOR1-AOR4 | Arithmetic operator replacement (pitest-style) |
| ROR1-ROR5 | Relational operator replacement (pitest-style) |
| IfTrue | Replace if condition with true |
| NegateConditionals | Negate conditional operators |
| ConditionalsBoundary | Change boundary conditions (> to >=, etc.) |
| InvertNegatives | Remove unary minus |
Custom mutators supported via .exavier.exs config file. No boolean mutators, no literal mutators, no function call mutators.
Summary
- Muex: Broadest mutation coverage. Unique strategies: literal mutation, function call removal/arg swapping, conditional branch removal. Named descriptively (not pitest codes).
- Darwin: Erlang-level mutations with codon-indexed dispatch. Architecturally novel but harder to reason about.
- Exavier: pitest-faithful naming, decent arithmetic/relational coverage, but missing boolean, literal, function call, and advanced conditional mutations.
Test Execution Model
Muex: Port-Based Isolation
- Each mutation is tested in a separate BEAM VM spawned via Erlang ports
- The worker pool writes the mutated source to the original file, deletes the
.beamcache, runsmix testin a subprocess, then restores the original - Complete process isolation prevents any mutation side-effects from leaking
- Supports incremental compilation (only the mutated module is recompiled)
- Configurable concurrency via
--concurrencyflag andMuex.WorkerPoolGenServer - Configurable timeout per mutation (
--timeout) - Test dependency analysis selects only relevant test files per mutation
Darwin: In-Process Runtime Dispatch
- Code is compiled once with all mutations embedded as runtime branches
- Active mutation is selected via process dictionary (
Darwin.ActiveMutation) - ExUnit runs within the same BEAM VM
Darwin.TestCaseoverridestest/2andtest/3macros to short-circuit after first failure- Pro: No re-compilation overhead per mutation (fastest possible switching)
- Con: Requires modifying test files (
use Darwin.TestCase), modifyingtest_helper.exs, and listing modules inmix.exs
Exavier: In-Process Hot-Swap
- Uses
Code.compile_quoted/2withignore_module_conflict: true - Re-requires test files and calls
ExUnit.run()per mutation - Parallel per module via
Task.async_stream - Sequential per mutator within each module
- No process isolation -- mutations share the same VM
- Coverage analysis as a pre-processing step to identify lines to mutate
Configuration and CLI
Muex
Muex offers the richest CLI and configuration:
- Entry points:
mix muex(Mix task),muex(escript binary),mix archive.install hex muex(global archive) - File selection:
--fileswith directory, file, or glob patterns (**/*.ex,{a,b}/**/*.ex) - Umbrella support:
--app my_appautomatically sets--filesand--test-paths - Test paths:
--test-paths "test/unit,test/integration"with glob expansion - Output formats:
--format terminal|json|html - Score threshold:
--fail-at 80(CI/CD integration) - Filtering:
--no-filter,--min-score,--max-mutations - Optimization:
--optimize,--optimize-level conservative|balanced|aggressive,--min-complexity,--max-per-function - Compile-time config: Register custom language adapters and mutators in
config/config.exs - Centralized config struct:
%Muex.Config{}with typed fields and validation
Darwin
- Modules to mutate listed in
mix.exsunder the:darwinkey - No CLI flags documented
- Requires modifying
test_helper.exsand all test modules - No filtering, no optimization, no output format selection
- HTML reporter outputs to
darwin/reports/html/
Exavier
- Run via
mix exavier.test - Configuration via
.exavier.exsdotfile::threshold-- mutation coverage threshold (default: 67%):test_files_to_modules-- custom test-to-module mapping:custom_mutators-- additional mutator modules
- No CLI flags for file selection, mutator selection, or concurrency
Intelligent Features
Muex (Unique)
File Analyzer (Muex.FileAnalyzer):
- Scores files 0-100 based on code characteristics
- Automatically excludes: Mix tasks, supervisors, application modules, behaviour definitions, protocols, reporters, dependency code
- Scores based on: function count, conditionals, arithmetic, comparisons, pattern matching, cyclomatic complexity
- Configurable minimum score threshold (
--min-score)
Mutation Optimizer (Muex.MutantOptimizer):
- 7 optimization strategies: equivalent mutant detection, impact scoring, complexity filtering, mutation clustering, per-function limits, boundary prioritization, pattern-based filtering
- 3 presets: conservative (50-65% reduction, <1% score impact), balanced (70-85% reduction), aggressive (85-95% reduction)
- Benchmark on Calculator project (76 LOC, 20 tests): 85 mutations reduced to 31 (63.5% reduction), score preserved at 100%
Test Dependency Analysis (Muex.DependencyAnalyzer):
- Parses test files to extract module references (aliases, imports, function calls, describe/test strings)
- Runs only tests that depend on the mutated module
- Falls back to full test suite when no dependencies found
Darwin
- Fast-fail: stops test suite after first failure per mutation (via
Darwin.TestCase) - Debug output: writes mutated Erlang and Elixir source to
_darwin_debug/
Exavier
- Code coverage pre-processing: only mutates lines that are covered by tests
- Threshold-based pass/fail (default 67%)
Reporting
Muex
- Terminal: Color-coded output (ANSI), progress dots, categorized summary (killed/survived/invalid/timeout), survived mutation details with file:line
- JSON: Structured JSON with full mutation details, CI/CD friendly (
muex-report.json) - HTML: Interactive report with filter buttons, summary cards, color-coded mutations, responsive design (
muex-report.html)
Darwin
- Console logging: green (killed), red (survived)
- HTML reporter: outputs to
darwin/reports/html/(described as "under heavy development")
Exavier
- Console only: dots for progress (green = killed, red = survived)
- Diff-style output showing original vs. mutated code for survived mutations
- Summary line with percentages
Benchmark: Calculator Project
Test subject: Calculator module (76 LOC, 4 functions: add, subtract, multiply, divide with guards and error handling). 20 ExUnit tests.
Muex Results
Without optimization (--no-optimize --no-filter):
- Mutations generated: 85
- Killed: 85 (100%)
- Survived: 0
- Invalid: 0
- Wall time: ~10.8s
With conservative optimization (--optimize --optimize-level conservative):
- Mutations generated: 85 -> 31 after optimization (63.5% reduction)
- Killed: 31 (100%)
- Survived: 0
- Wall time: ~4.2s (61% faster)
Darwin / Exavier
Both Darwin and Exavier have been abandoned since 2020 and require Elixir ~> 1.7. They cannot be installed in a modern Elixir 1.17+ project without dependency conflicts. Darwin's parse_trans dependency and Exavier's compilation model are incompatible with current Elixir/OTP releases. Therefore, direct benchmark comparison on the same codebase is not feasible.
Historical context: Exavier's README shows an example of 22 tests producing 27.27% mutation coverage on a simple hello-world module, with no timing data provided. Darwin provides no benchmark data.
Code Quality and Testing
Muex
- 204 passing tests covering all major components
- Test coverage for: Config, DependencyAnalyzer, Loader, all 6 mutators, Reporter, JSON Reporter, TestRunner.Port, WorkerPool, Language.Elixir, integration tests
- Quality pipeline:
mix qualityruns formatter + credo --strict + dialyzer - CI matrix: Elixir 1.14-1.16, OTP 25-26
- Typespecs on all public functions
@moduledocand@docon all modules and public functions- Documentation published on HexDocs with guides (Installation, Usage, Mutation Optimization)
Darwin
- Tests present but minimal (uses
stream_datafor property-based testing of mutators) - No CI pipeline
- No typespecs visible in main modules
- Documentation incomplete (many
TODO: document thiscomments)
Exavier
- Self-described as "proof-of-concept" with a lengthy "To be done" list
- Has tests but author admits they need "way more tests (OMG the irony)"
- Travis CI (now defunct service)
- Minimal typespecs
- Published on HexDocs
Extensibility
Muex
- Language adapters: Implement
Muex.Languagebehaviour (5 callbacks), register via config - Mutators: Implement
Muex.Mutatorbehaviour (3 callbacks), register via config - Compile-time registration:
config :muex, languages: %{...}, mutators: %{...} - Programmatic API:
Muex.run(%Muex.Config{})returns{:ok, %{results: [...], score: float}} - Three installation modes: Mix dependency, hex archive (global), escript binary
- Umbrella support:
--appflag for targeting specific apps
Darwin
- Mutators implement
Darwin.Mutatorcallback, registered in default list - No compile-time or runtime registration mechanism for end users
- No programmatic API
- Mix dependency only
Exavier
- Custom mutators via
Exavier.Mutators.Mutatorbehaviour (2 callbacks) - Registration via
.exavier.exsconfig file - No programmatic API
- Mix dependency only
Integration and Deployment
Muex
- CI/CD: JSON output format for machine consumption,
--fail-atfor threshold gates - Three distribution modes: dependency, archive, escript
- Umbrella-aware with
--appflag - Custom test path selection for monorepo/polyrepo setups
Darwin
- Requires invasive changes to project (modify
mix.exs,test_helper.exs, all test files) - HTML report generation
- No CI/CD integration features
Exavier
- Threshold-based exit code (pass/fail)
- No JSON/HTML output
- No CI/CD-specific features
Known Limitations
Muex
- Port-based execution adds overhead per mutation (~100-200ms per test run vs in-process)
- File-system based mutation (writes to source files temporarily, though with backup/restore)
- Young project with growing download base
Darwin
- Requires modifying all test files with
use Darwin.TestCase - Requires modifying
test_helper.exs - Requires listing all modules to mutate in
mix.exs - Single contributor, abandoned since December 2020
- Erlang abstract code approach means mutations may not perfectly map to Elixir source
- No license specified
Exavier
- Self-described proof-of-concept
- Cannot tune which mutators are used (planned but unimplemented)
- No parallel mutation within a module
- Relies on
ignore_module_conflict: truewith no isolation - Hardcoded test file discovery (
test/**/*_test.exs) - Abandoned since November 2020
- No fast-fail mechanism (planned but unimplemented)
Feature Matrix
| Feature | Muex | Darwin | Exavier |
|---|---|---|---|
| Status | Active (2026) | Abandoned (2020) | Abandoned (2020) |
| Elixir support | Yes | Yes | Yes |
| Erlang support | Yes | No | No |
| Custom language adapters | Yes (behaviour) | No | No |
| Arithmetic mutations | Yes (6) | Yes | Yes (AOR1-4) |
| Comparison mutations | Yes (12) | Yes | Yes (ROR1-5) |
| Boolean mutations | Yes (7) | No | No |
| Literal mutations | Yes (8) | No | No |
| Function call mutations | Yes (2) | No | No |
| Conditional mutations | Yes (8) | No | Yes (3) |
| Custom mutators | Yes (compile-time config) | Yes (code-level) | Yes (.exavier.exs) |
| Mutator selection | CLI flag | No | No (planned) |
| Process isolation | Port (separate VM) | In-process (pdict) | In-process (hot-swap) |
| Parallel execution | GenServer worker pool | No | Per-module |
| Configurable concurrency | Yes (--concurrency) | No | No |
| Intelligent file filtering | Yes (FileAnalyzer) | No | No |
| Mutation optimization | Yes (7 strategies, 3 presets) | No | No |
| Test dependency analysis | Yes | No | No |
| Coverage-based filtering | Via FileAnalyzer | No | Yes (pre-processing) |
| Umbrella support | Yes (--app flag) | No | No |
| Custom test paths | Yes (--test-paths) | No | No |
| Terminal output | Color-coded, progress dots | Color-coded | Color-coded, diffs |
| JSON output | Yes | No | No |
| HTML output | Yes (interactive) | Yes (basic) | No |
| Score threshold | Yes (--fail-at) | No | Yes (threshold in config) |
| Mix task | Yes (mix muex) | No (programmatic) | Yes (mix exavier.test) |
| Escript | Yes | No | No |
| Hex archive install | Yes | No | No |
| Programmatic API | Yes (Muex.run/1) | Yes (Mutator.mutate_compile_and_load_module/1) | No |
| Documentation | HexDocs + guides | HexDocs (incomplete) | HexDocs |
| Typespecs | Comprehensive | Partial | Minimal |
| Test suite | 204 tests | Minimal | Minimal |
| CI/CD | GitHub Actions matrix | None | Travis CI (defunct) |
Verdict
Muex is the only actively maintained option and offers by far the most complete feature set. Its language-agnostic architecture, intelligent file filtering, mutation optimization, test dependency analysis, multiple output formats, and umbrella support make it suitable for production CI/CD pipelines. The trade-off is port-based execution overhead, which the optimizer mitigates effectively (63.5% mutation reduction with no score impact on the calculator benchmark).
Darwin introduced an innovative compile-once-dispatch-at-runtime approach that eliminates per-mutation compilation overhead. This is architecturally interesting but comes at the cost of invasive project modifications and an Erlang-abstract-code complexity that makes the codebase harder to extend. It has been abandoned for over 5 years.
Exavier is the most well-known of the three (101 stars, 2,243 downloads) and pioneered Elixir mutation testing with a clean, simple design. However, the author explicitly described it as a proof-of-concept, and several planned features were never implemented. It has been abandoned for over 5 years.
For any new project considering mutation testing in 2026, Muex is the clear choice -- it is the only option that works with modern Elixir/OTP, is actively maintained, and provides the tooling depth needed for real-world adoption.