View Source erlperf (erlperf v2.3.0)

Convenience APIs for benchmarking.

This module implements following benchmarking modes:
  • Continuous mode
  • Timed (low overhead) mode
  • Concurrency estimation (squeeze) mode

continuous-mode

Continuous mode

This is the default mode. Separate erlperf_job is started for each benchmark, iterating supplied runner in a tight loop, bumping a counter for each iteration of each worker. erlperf reads this counter every second (or sample_duration), calculating the difference between current and previous value. This difference is called a sample.

By default, erlperf collects 3 samples and stops, reporting the average. To give an example, if your function runs for 20 milliseconds, erlperf may capture samples with 48, 52 and 50 iterations. The average would be 50.

This approach works well for CPU-bound calculations, but may produce unexpected results for slow functions taking longer than sample duration. For example, timer:sleep(2000) with default settings yields zero throughput. You can change the sample duration and the number of samples to take to avoid that.

timed-mode

Timed mode

In this mode erlperf loops your code a specified amount of times, measuring how long it took to complete. It is essentially what timer:tc/3 does. This mode has slightly less overhead compared to continuous mode. This difference may be significant if you’re profiling low-level ERTS primitives.

This mode does not support concurrency setting (concurrency locked to 1).

concurrency-estimation-mode

Concurrency estimation mode

In this mode erlperf attempts to estimate how concurrent the supplied runner code is. The run consists of multiple passes, increasing concurrency with each pass, and stopping when total throughput is no longer growing. This mode proves useful to find concurrency bottlenecks. For example, some functions may have limited throughput because they execute remote calls served by a single process. See benchmark/3 for the detailed description.

Link to this section Summary

Types

Convenience type used in run/1,2,3 and compare/2.

Basic concurrency estimation report

Concurrency estimation mode options.

Concurrency estimation mode result

Node isolation settings.

Full benchmark report, containing all collected samples and statistics

Benchmarking mode selection and parameters of the benchmark run.

Benchmark results.

Results reported by a single benchmark run.

Functions

Generic benchmarking suite, accepting multiple code maps, modes and options.

Comparison run: benchmark multiple jobs at the same time.

Runs a single benchmark for 3 seconds, returns average number of iterations per second.

Runs a single benchmark job, returns average number of iterations per second, or a full report.

Concurrency estimation run, or an alias for quick benchmarking of an MFA tuple.

Starts a new supervised job with the specified concurrency.

Timed benchmarking mode. Iterates the runner code Count times and returns elapsed time in microseconds.

Link to this section Types

Convenience type used in run/1,2,3 and compare/2.
-type concurrency_result() :: {QPS :: non_neg_integer(), Concurrency :: non_neg_integer()}.

Basic concurrency estimation report

Only the highest throughput run is reported. Concurrency contains the number of concurrently running workers when the best result is achieved.
-type concurrency_test() ::
    #{threshold => pos_integer(),
      min => pos_integer(),
      step => pos_integer(),
      max => pos_integer()}.

Concurrency estimation mode options.

  • min: initial number of workers, default is 1
  • step: increase the number of workers by this value on each iteration, default is 1
  • max: maximum number of workers, defaults to erlang:system_info(process_limit) - 1000
  • threshold: stop concurrency run when adding this amount of workers does not result in further total throughput increase. Default is 3
Link to this type

concurrency_test_result/0

View Source
-type concurrency_test_result() ::
    concurrency_result() | {Max :: concurrency_result(), [concurrency_result()]}.

Concurrency estimation mode result

Extended report contains results for all runs, starting from the minimum number of workers, to the highest throughput detected, plus up to threshold more.
-type isolation() :: #{host => string()}.

Node isolation settings.

Currently, host selection is not supported.
-type report() ::
    #{mode := timed | continuous | concurrency,
      result := run_statistics(),
      history => [{Concurrency :: pos_integer(), Result :: run_statistics()}],
      code := erlperf_job:code_map(),
      run_options := run_options(),
      concurrency_options => concurrency_test(),
      system => system_information(),
      sleep => sleep | busy_wait}.

Full benchmark report, containing all collected samples and statistics

  • mode: benchmark run mode
  • result: benchmark result. Concurrency estimation mode contains the best result (with the highest average throughput recorded)
  • code: original code
  • run_options: full set of options, with all defaults filled in
  • system: information about the system benchmark is running on
  • history: returned only for concurrency estimation mode, contains a list of all runs with their results
  • concurrency_options: returned only for concurrency estimation mode, with all defaults filled in
  • sleep: method used for waiting for a specified amount of time. Normally set to sleep, but may be reported as busy_wait if erlperf scheduling is impacted by lock contention or another problem preventing it from using precise timing
-type run_options() ::
    #{concurrency => pos_integer(),
      sample_duration => pos_integer() | undefined | {timed, pos_integer()},
      warmup => non_neg_integer(),
      samples => pos_integer(),
      cv => float() | undefined,
      priority => erlang:priority_level(),
      report => basic | extended | full,
      isolation => isolation()}.

Benchmarking mode selection and parameters of the benchmark run.

  • concurrency: number of workers to run, applies only for the continuous benchmarking mode
  • cv: coefficient of variation. Acceptable standard deviation for the test to conclude. Not applicable for timed mode. When the value is set, benchmark will continue running until standard deviation for the last collected samples divided by average value (arithmetic mean) is smaller than cv specified.
  • isolation: request separate Erlang VM instance for each job. Some benchmarks may lead to internal VM structures corruption, or change global structures affecting other benchmarks when running in the same VM. host sub-option is currently ignored.
  • priority: sets the job controller process priority (defaults to high). Running with normal or lower priority may prevent the controller from timely starting ot stopping workers.
  • report: applies only for continuous mode. basic report contains only the average value. Specify extended to get the list of actual samples, to calculate exotic statistics. Pass full to receive full report, including benchmark settings and extra statistics for continuous mode - minimum, maximum, average, median and 99th percentile (more metrics may be added in future releases)
  • samples: number of measurements to take. Default is 3. For continuous mode it results in a 3 second run, when sample_duration is set to default 1000 ms.
  • sample_duration: time, milliseconds, between taking iteration counter samples. Multiplied by samples, this parameter defines the total benchmark run duration. Default is 1000 ms. Passing {timed, Counter}` engages timed mode with `Counter iterations taken samples times
  • warmup: how many extra samples are collected and discarded at the beginning of the continuous run
-type run_result() :: non_neg_integer() | [non_neg_integer()].

Benchmark results.

For continuous mode, an average (arithmetic mean) of the collected samples, or a list of all samples collected. Timed mode returns elapsed time (microseconds).
-type run_statistics() ::
    #{average => non_neg_integer(),
      variance => float(),
      stddev => float(),
      median => non_neg_integer(),
      p99 => non_neg_integer(),
      best => non_neg_integer(),
      worst => non_neg_integer(),
      samples => [non_neg_integer()],
      time => non_neg_integer(),
      iteration_time => non_neg_integer()}.

Results reported by a single benchmark run.

  • best: highest throughput for continuous mode, or lowest time for timed
  • worst: lowest throughput, or highest time
  • average: arithmetic mean, iterations for continuous mode and microseconds for timed
  • stddev: standard deviation
  • median: median (50% percentile)
  • p99: 99th percentile
  • samples: raw samples from the run, monotonic counter for continuous mode, and times measured for timed run
  • time: total benchmark duration (us), may exceed sample * sample_duration when cv is specified and results are not immediately stable
  • iteration_time: approximate single iteration time (of one runner)
-type system_information() ::
    #{os := {unix | win32, atom()},
      system_version := string(),
      debug => boolean(),
      emu_type => atom(),
      emu_flavor => atom(),
      dynamic_trace => atom(),
      cpu => string()}.

Link to this section Functions

Link to this function

benchmark(Codes, RunOptions, ConcurrencyTestOpts)

View Source
-spec benchmark([erlperf_job:code_map()], RunOptions :: run_options(), undefined) ->
             run_result() | [run_result()] | [report()];
         ([erlperf_job:code_map()], RunOptions :: run_options(), concurrency_test()) ->
             concurrency_test_result() | [report()].

Generic benchmarking suite, accepting multiple code maps, modes and options.

Codes contain a list of code versions. Every element is a separate job that runs in parallel with all other jobs. Same RunOptions are applied to all jobs.

ConcurrencyTestOpts specifies options for concurrency estimation mode. Passing undefined results in a continuous or a timed run. It is not supported to run multiple jobs while doing a concurrency estimation run.

Concurrency estimation run consists of multiple passes. First pass is done with a min number of workers, subsequent passes are increasing concurrency by 1, until max concurrency is reached, or total job iterations stop growing for threshold consecutive passes. To give an example, if your code is not concurrent at all, and you try to benchmark it with threshold set to 3, there will be 4 passes in total: first with a single worker, then 3 more, demonstrating no throughput growth.

In this mode, job is started once before the first pass. Subsequent passes only change the concurrency. All other options passed in RunOptions are honoured. So, if you set samples to 30, keeping default duration of a second, every single pass will last for 30 seconds.
Link to this function

compare(Codes, RunOptions)

View Source
-spec compare(Codes :: [code()], RunOptions :: run_options()) -> [run_result()] | [report()].

Comparison run: benchmark multiple jobs at the same time.

A job is defined by either erlperf_job:code_map(), or just the runner callable. Example comparing rand:uniform/0 %% performance to rand:mwc59/1:
  (erlperf@ubuntu22)7> erlperf:compare([
      {rand, uniform, []},
      #{runner => "run(X) -> rand:mwc59(X).", init_runner => {rand, mwc59_seed, []}}
  ], #{}).
  [14823854,134121999]
See benchmark/3 for RunOptions definition and return values.
-spec run(code()) -> non_neg_integer().

Runs a single benchmark for 3 seconds, returns average number of iterations per second.

Accepts either a full erlperf_job:code_map(), or just the runner callable.
-spec run(Code :: code(), RunOptions :: run_options()) -> run_result() | report().

Runs a single benchmark job, returns average number of iterations per second, or a full report.

Accepts either a full erlperf_job:code_map(), or just the runner callable. Equivalent of returning the first result of run([Code], RunOptions).
Link to this function

run(Module, Function, Args)

View Source
-spec run(code(), run_options(), concurrency_test()) -> concurrency_test_result() | report();
   (module(), atom(), [term()]) -> QPS :: non_neg_integer().

Concurrency estimation run, or an alias for quick benchmarking of an MFA tuple.

Attempt to find concurrency characteristics of the runner code, see benchmark/3 for a detailed description. Accepts either a full erlperf_job:code_map(), or just the runner callable.

When Module and Function are atoms, and Args is a list, this call is equivalent of run(Module, Function, Args).
Link to this function

start(Code, Concurrency)

View Source
-spec start(code(), Concurrency :: non_neg_integer()) -> pid().

Starts a new supervised job with the specified concurrency.

Requires erlperf application to be running. Returns job controller process identifier. This function is designed for distributed benchmarking, when jobs are started in different nodes, and monitored via erlperf_cluster_monitor.
-spec time(code(), Count :: non_neg_integer()) -> TimeUs :: non_neg_integer().

Timed benchmarking mode. Iterates the runner code Count times and returns elapsed time in microseconds.

This method has lower overhead compared to continuous benchmarking. It is not supported to run multiple workers in this mode.