View Source erlperf (erlperf v2.2.0)
Convenience APIs for benchmarking.
This module implements following benchmarking modes:- Continuous mode
- Timed (low overhead) mode
- Concurrency estimation (squeeze) mode
continuous-mode
Continuous mode
This is the default mode. Separate erlperf_job
is started for each benchmark, iterating supplied runner in a tight loop, bumping a counter for each iteration of each worker. erlperf
reads this counter every second (or sample_duration
), calculating the difference between current and previous value. This difference is called a sample.
By default, erlperf
collects 3 samples and stops, reporting the average. To give an example, if your function runs for 20 milliseconds, erlperf
may capture samples with 48, 52 and 50 iterations. The average would be 50.
This approach works well for CPU-bound calculations, but may produce unexpected results for slow functions taking longer than sample duration. For example, timer:sleep(2000) with default settings yields zero throughput. You can change the sample duration and the number of samples to take to avoid that.
timed-mode
Timed mode
In this mode erlperf
loops your code a specified amount of times, measuring how long it took to complete. It is essentially what timer:tc/3
does. This mode has slightly less overhead compared to continuous mode. This difference may be significant if you’re profiling low-level ERTS primitives.
This mode does not support concurrency
setting (concurrency locked to 1).
concurrency-estimation-mode
Concurrency estimation mode
In this mode erlperf
attempts to estimate how concurrent the supplied runner code is. The run consists of multiple passes, increasing concurrency with each pass, and stopping when total throughput is no longer growing. This mode proves useful to find concurrency bottlenecks. For example, some functions may have limited throughput because they execute remote calls served by a single process. See benchmark/3
for the detailed description.
Link to this section Summary
Types
Basic concurrency estimation report
Concurrency estimation mode options.
Concurrency estimation mode result
Node isolation settings.
Full benchmark report, containing all collected samples and statistics
Benchmarking mode selection and parameters of the benchmark run.
Benchmark results.
Results reported by a single benchmark run.
erlang:system_info/1
May also contain CPU model name on supported operating systems.Functions
Generic benchmarking suite, accepting multiple code maps, modes and options.
Comparison run: benchmark multiple jobs at the same time.
Runs a single benchmark for 3 seconds, returns average number of iterations per second.
Runs a single benchmark job, returns average number of iterations per second, or a full report.
Concurrency estimation run, or an alias for quick benchmarking of an MFA tuple.
Starts a new supervised job with the specified concurrency.
Timed benchmarking mode. Iterates the runner code Count
times and returns elapsed time in microseconds.
Link to this section Types
-type code() :: erlperf_job:code_map() | erlperf_job:callable().
run/1,2,3
and compare/2
.
-type concurrency_result() :: {QPS :: non_neg_integer(), Concurrency :: non_neg_integer()}.
Basic concurrency estimation report
Only the highest throughput run is reported.Concurrency
contains the number of concurrently running workers when the best result is achieved.
-type concurrency_test() :: #{threshold => pos_integer(), min => pos_integer(), max => pos_integer()}.
Concurrency estimation mode options.
min
: initial number of workers, default is 1max
: maximum number of workers, defaults toerlang:system_info(process_limit) - 1000
threshold
: stop concurrency run when adding this amount of workers does not result in further total throughput increase. Default is 3
-type concurrency_test_result() :: concurrency_result() | {Max :: concurrency_result(), [concurrency_result()]}.
Concurrency estimation mode result
Extended report contains results for all runs, starting from the minimum number of workers, to the highest throughput detected, plus up tothreshold
more.
-type isolation() :: #{host => string()}.
Node isolation settings.
Currently,host
selection is not supported.
-type report() :: #{mode := timed | continuous | concurrency, result := run_statistics(), history => [{Concurrency :: pos_integer(), Result :: run_statistics()}], code := erlperf_job:code_map(), run_options := run_options(), concurrency_options => concurrency_test(), system => system_information(), sleep => sleep | busy_wait}.
Full benchmark report, containing all collected samples and statistics
mode
: benchmark run moderesult
: benchmark result. Concurrency estimation mode contains the best result (with the highest average throughput recorded)code
: original coderun_options
: full set of options, with all defaults filled insystem
: information about the system benchmark is running onhistory
: returned only for concurrency estimation mode, contains a list of all runs with their resultsconcurrency_options
: returned only for concurrency estimation mode, with all defaults filled insleep
: method used for waiting for a specified amount of time. Normally set tosleep
, but may be reported asbusy_wait
iferlperf
scheduling is impacted by lock contention or another problem preventing it from using precise timing
-type run_options() :: #{concurrency => pos_integer(), sample_duration => pos_integer() | undefined | {timed, pos_integer()}, warmup => non_neg_integer(), samples => pos_integer(), cv => float() | undefined, priority => erlang:priority_level(), report => basic | extended | full, isolation => isolation()}.
Benchmarking mode selection and parameters of the benchmark run.
concurrency
: number of workers to run, applies only for the continuous benchmarking modecv
: coefficient of variation. Acceptable standard deviation for the test to conclude. Not applicable for timed mode. When the value is set, benchmark will continue running until standard deviation for the last collectedsamples
divided by average value (arithmetic mean) is smaller thancv
specified.isolation
: request separate Erlang VM instance for each job. Some benchmarks may lead to internal VM structures corruption, or change global structures affecting other benchmarks when running in the same VM.host
sub-option is currently ignored.priority
: sets the job controller process priority (defaults tohigh
). Running withnormal
or lower priority may prevent the controller from timely starting ot stopping workers.report
: applies only for continuous mode.basic
report contains only the average value. Specifyextended
to get the list of actual samples, to calculate exotic statistics. Passfull
to receive full report, including benchmark settings and extra statistics for continuous mode - minimum, maximum, average, median and 99th percentile (more metrics may be added in future releases)samples
: number of measurements to take. Default is 3. For continuous mode it results in a 3 second run, whensample_duration
is set to default 1000 ms.sample_duration
: time, milliseconds, between taking iteration counter samples. Multiplied bysamples
, this parameter defines the total benchmark run duration. Default is 1000 ms. Passing{timed, Counter}` engages timed mode with `Counter
iterations takensamples
timeswarmup
: how many extra samples are collected and discarded at the beginning of the continuous run
-type run_result() :: non_neg_integer() | [non_neg_integer()].
Benchmark results.
For continuous mode, an average (arithmetic mean) of the collected samples, or a list of all samples collected. Timed mode returns elapsed time (microseconds).-type run_statistics() ::
#{average => non_neg_integer(),
variance => float(),
stddev => float(),
median => non_neg_integer(),
p99 => non_neg_integer(),
best => non_neg_integer(),
worst => non_neg_integer(),
samples => [non_neg_integer()],
time => non_neg_integer(),
iteration_time => non_neg_integer()}.
Results reported by a single benchmark run.
best
: highest throughput for continuous mode, or lowest time for timedworst
: lowest throughput, or highest timeaverage
: arithmetic mean, iterations for continuous mode and microseconds for timedstddev
: standard deviationmedian
: median (50% percentile)p99
: 99th percentilesamples
: raw samples from the run, monotonic counter for continuous mode, and times measured for timed runtime
: total benchmark duration (us), may exceedsample
*sample_duration
whencv
is specified and results are not immediately stableiteration_time
: approximate single iteration time (of one runner)
-type system_information() ::
#{os := {unix | win32, atom()}, system_version := string(), cpu => string()}.
erlang:system_info/1
May also contain CPU model name on supported operating systems.
Link to this section Functions
-spec benchmark([erlperf_job:code_map()], RunOptions :: run_options(), undefined) -> run_result() | [run_result()] | [report()]; ([erlperf_job:code_map()], RunOptions :: run_options(), concurrency_test()) -> concurrency_test_result() | [report()].
Generic benchmarking suite, accepting multiple code maps, modes and options.
Codes
contain a list of code versions. Every element is a separate job that runs in parallel with all other jobs. Same RunOptions
are applied to all jobs.
ConcurrencyTestOpts
specifies options for concurrency estimation mode. Passing undefined
results in a continuous or a timed run. It is not supported to run multiple jobs while doing a concurrency estimation run.
Concurrency estimation run consists of multiple passes. First pass is done with a min
number of workers, subsequent passes are increasing concurrency by 1, until max
concurrency is reached, or total job iterations stop growing for threshold
consecutive passes. To give an example, if your code is not concurrent at all, and you try to benchmark it with threshold
set to 3, there will be 4 passes in total: first with a single worker, then 3 more, demonstrating no throughput growth.
RunOptions
are honoured. So, if you set samples
to 30, keeping default duration of a second, every single pass will last for 30 seconds.
-spec compare(Codes :: [code()], RunOptions :: run_options()) -> [run_result()] | [report()].
Comparison run: benchmark multiple jobs at the same time.
A job is defined by eithererlperf_job:code_map()
, or just the runner callable. Example comparing rand:uniform/0
%% performance to rand:mwc59/1
: (erlperf@ubuntu22)7> erlperf:compare([
{rand, uniform, []},
#{runner => "run(X) -> rand:mwc59(X).", init_runner => {rand, mwc59_seed, []}}
], #{}).
[14823854,134121999]
See benchmark/3
for RunOptions
definition and return values.
-spec run(code()) -> non_neg_integer().
Runs a single benchmark for 3 seconds, returns average number of iterations per second.
Accepts either a fullerlperf_job:code_map()
, or just the runner callable.
-spec run(Code :: code(), RunOptions :: run_options()) -> run_result() | report().
Runs a single benchmark job, returns average number of iterations per second, or a full report.
Accepts either a fullerlperf_job:code_map()
, or just the runner callable. Equivalent of returning the first result of run([Code], RunOptions)
.
-spec run(code(), run_options(), concurrency_test()) -> concurrency_test_result() | report(); (module(), atom(), [term()]) -> QPS :: non_neg_integer().
Concurrency estimation run, or an alias for quick benchmarking of an MFA tuple.
Attempt to find concurrency characteristics of the runner code, see benchmark/3
for a detailed description. Accepts either a full erlperf_job:code_map()
, or just the runner callable.
Module
and Function
are atoms, and Args
is a list, this call is equivalent of run(Module, Function, Args)
.
-spec start(code(), Concurrency :: non_neg_integer()) -> pid().
Starts a new supervised job with the specified concurrency.
Requireserlperf
application to be running. Returns job controller process identifier. This function is designed for distributed benchmarking, when jobs are started in different nodes, and monitored via erlperf_cluster_monitor
.
-spec time(code(), Count :: non_neg_integer()) -> TimeUs :: non_neg_integer().
Timed benchmarking mode. Iterates the runner code Count
times and returns elapsed time in microseconds.