Chi2fit.Utilities (Chi-SquaredFit v1.3.0) View Source

Provides various utilities:

  • Bootstrapping
  • Derivatives
  • Creating Cumulative Distribution Functions / Histograms from sample data
  • Solving linear, quadratic, and cubic equations
  • Autocorrelation coefficients

Link to this section Summary

Types

Algorithm used to assign errors to frequencey data: Wald score and Wilson score.

Average and standard deviationm (error)

Cumulative Distribution Function

Binned data with error bounds specified through low and high values

Supported numerical integration methods

Functions

Adjusts the times to working hours and/or work days.

Walks a map structure while applying the function fun.

Pretty-prints a nested array-like structure (list or tuple) as a table.

Calculates the autocorrelation coefficient of a list of observations.

Calculates the systematic errors for bins due to uncertainty in assigning data to bins.

Implements bootstrapping procedure as resampling with replacement.

Converts a CDF function to a list of data points.

Reads CSV data, extracts one column, and returns it as a list of NaiveDateTime.

Generates a Cullen & Frey plot for the sample data.

Extracts data point with standard deviation from Cullen & Frey plot data.

Calculates the partial derivative of a function and returns the value.

Generates an empirical Cumulative Distribution Function from sample data.

Calculates and returns the error associated with a list of observables.

Forecasts how many time periods are needed to complete size items

Returns a function for forecasting the duration to complete a number of items.

Returns a function for forecasting the number of completed items in a number periods.

Numerical integration providing Gauss and Romberg types.

Returns a Stream that generates a stream of dates.

Calculates the jacobian of the function at the point x.

Converts a list of numbers to frequency data.

Maps the date to weekdays such that weekends are eliminated; it does so with respect to a given Saturday

Maps the time of a day into the working hour period

Basic Monte Carlo simulation to repeatedly run a simulation multiple times.

Calculates the nth moment of the sample.

Calculates the nth centralized moment of the sample.

Calculates the nth centralized moment of the sample.

Calculates the nth normalized moment of the sample.

Calculates the nth normalized moment of the sample.

Calculates the nth normalized moment of the sample.

Newton-Fourier method for locating roots and returning the interval where the root is located.

Converts the input so that the result is a Puiseaux diagram, that is a strict convex shape.

Outputs and formats the errors that result from a call to Chi2fit.Fit.chi2/4

Reads data from a file specified by filename and returns a stream with the data parsed as floats.

Reamples the subsequences of numbers contained in the list as determined by analyze/2

Calculates the test statistic for subexponentiality of a sample.

Examples

iex> subsequences []
[]

iex> subsequences [:a, :b]
[[:a], [:a, :b]]

iex> Stream.cycle([1,2,3]) |> subsequences |> Enum.take(4)
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 1]]

Counts the number of dates (datelist) that is between consecutive dates in intervals and returns the result as a list of numbers.

Returns a list of time differences (assumes an ordered list as input)

Converts raw data to binned data with (asymmetrical) errors.

Unzips lists of 1-, 2-, 3-, 4-, 5-, 6-, 7-, and 8-tuples.

Link to this section Types

Specs

algorithm() :: :wilson | :wald

Algorithm used to assign errors to frequencey data: Wald score and Wilson score.

Specs

avgsd() :: {avg :: float(), sd :: float()}

Average and standard deviationm (error)

Specs

cdf() :: (number() -> {number(), number(), number()})

Cumulative Distribution Function

Specs

cullenfrey() :: [{squared_skewness :: float(), kurtosis :: float()} | nil]

Specs

ecdf() :: [{float(), float(), float(), float()}]

Binned data with error bounds specified through low and high values

Specs

method() :: :gauss | :gauss2 | :gauss3 | :romberg | :romberg2 | :romberg3

Supported numerical integration methods

Specs

range() :: {float(), float()} | [float(), ...]

Link to this section Functions

Link to this function

adjust_times(data, options)

View Source

Specs

adjust_times(Enumerable.t(), options :: Keyword.t()) :: Enumerable.t()

Adjusts the times to working hours and/or work days.

Options

`workhours` - a 2-tuple containing the starting and ending hours of the work day (defaults
    to {8.0, 18.0})
`epoch` - the epoch to which all data elements are relative (defaults to 1970-01-01)
`saturday` - number of days since the epoch that corresponds to a Saturday (defaults
    to 9)
`correct` - whether to correct the times for working hours and weekdays; possible values
    `:worktime`, `:weekday`, `:"weekday+worktime"` (defaults to `false`)
Link to this function

analyze(map, fun, options)

View Source

Specs

analyze(
  map :: %{},
  fun :: ([number()], Keyword.t() -> Keyword.t()),
  options :: Keyword.t()
) :: Keyword.t()

Walks a map structure while applying the function fun.

Specs

as_table(rows :: [any()], header :: tuple()) :: list()

Pretty-prints a nested array-like structure (list or tuple) as a table.

Link to this function

auto(list, opts \\ [nproc: 1])

View Source

Specs

auto([number()], Keyword.t()) :: [number()]

Calculates the autocorrelation coefficient of a list of observations.

The implementation uses the discrete Fast Fourier Transform to calculate the autocorrelation. For available options see Chi2fit.FFT.fft/2. Returns a list of the autocorrelation coefficients.

Example

iex> auto [1,2,3]
[14.0, 7.999999999999999, 2.999999999999997]
Link to this function

binerror(data, noise_fun, options \\ [])

View Source

Specs

binerror(
  data :: [number()],
  noise_fun :: (Enumerable.t() -> Enumerable.t()),
  options :: Keyword.t()
) :: [{bin :: number(), avg :: number(), error :: number()}]

Calculates the systematic errors for bins due to uncertainty in assigning data to bins.

Options

`bin` - the size of bins to use (defaults to 1)
`iterations` - the number of iterations to use to estimate the error due to noise (defatuls to 100)
Link to this function

bootstrap(total, data, fun, options \\ [])

View Source

Specs

bootstrap(
  total :: integer(),
  data :: [number()],
  fun :: ([number()], integer() -> number()),
  options :: Keyword.t()
) :: [any()]

Implements bootstrapping procedure as resampling with replacement.

It supports saving intermediate results to a file using :dets. Use the options :safe and :filename (see below)

Arguments:

`total` - Total number resamplings to perform
`data` - The sample data
`fun` - The function to evaluate
`options` - A keyword list of options, see below.

Options

`:safe` - Whether to safe intermediate results to a file, so as to support continuation when it is interrupted.
      Valid values are `:safe` and `:cont`.
`:filename` - The filename to use for storing intermediate results

Specs

convert_cdf({cdf(), range()}) :: [{float(), float(), float(), float()}]

Converts a CDF function to a list of data points.

Example

iex> convert_cdf {fn x->{:math.exp(-x),:math.exp(-x)/16,:math.exp(-x)/4} end, {1,4}}
[{1, 0.36787944117144233, 0.022992465073215146, 0.09196986029286058},
 {2, 0.1353352832366127, 0.008458455202288294, 0.033833820809153176},
 {3, 0.049787068367863944, 0.0031116917729914965, 0.012446767091965986},
 {4, 0.01831563888873418, 0.0011447274305458862, 0.004578909722183545}]
Link to this function

csv_to_list(csvdata, key, options \\ [])

View Source

Specs

csv_to_list(
  csvcata :: Enumerable.t(),
  key :: String.t(),
  options :: Keyword.t()
) :: [NaiveDateTime.t()]

Reads CSV data, extracts one column, and returns it as a list of NaiveDateTime.

Examples

iex> csv = ["Done","2019/05/01","2019/06/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]

iex> csv = ["Done","2019/May/01","2019/Jun/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: "{YYYY}/{Mshort}/{0D}"
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]

iex> csv = ["Done","2019/May/01","2019/06/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: "{YYYY}/{Mshort}/{0D}"
[~N[2019-05-01 00:00:00]]

iex> csv = ["Done","2019/May/01","2019/06/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: ["{YYYY}/{Mshort}/{0D}","{YYYY}/{0M}/{0D}"]
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]

iex> csv = ["Done","2019/May/01","2019/Jun/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: ["%Y/%b/%d"], parser: :strftime
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]
Link to this function

cullen_frey(sample, n \\ 100)

View Source

Specs

cullen_frey(sample :: [number()], n :: integer()) :: cullenfrey()

Generates a Cullen & Frey plot for the sample data.

The kurtosis returned is the 'excess kurtosis'.

Specs

cullen_frey_point(data :: cullenfrey()) ::
  {{x :: float(), dx :: float()}, {y :: float(), dy :: float()}}

Extracts data point with standard deviation from Cullen & Frey plot data.

Link to this function

der(parameters, fun, options \\ [])

View Source

Specs

der([float() | {float(), integer()}], ([float()] -> float()), Keyword.t()) ::
  float()

Calculates the partial derivative of a function and returns the value.

Examples

The function value at a point:
iex> der([3.0], fn [x]-> x*x end) |> Float.round(3)
9.0

The first derivative of a function at a point:
iex> der([{3.0,1}], fn [x]-> x*x end) |> Float.round(3)
6.0

The second derivative of a function at a point:
iex> der([{3.0,2}], fn [x]-> x*x end) |> Float.round(3)
2.0

Partial derivatives with respect to two variables:
iex> der([{2.0,1},{3.0,1}], fn [x,y] -> 3*x*x*y end) |> Float.round(3)
12.0
Link to this function

display(device \\ :stdio, results)

View Source

Specs

display(device :: IO.device(), Chi2fit.Fit.chi2probe() | avgsd()) :: none()

Displays results of the function Chi2fit.Fit.chi2probe/4

Link to this function

display(device \\ :stdio, hdata, model, arg, options)

View Source

Specs

display(
  device :: IO.device(),
  hdata :: ecdf(),
  model :: Chi2fit.Distribution.model(),
  Chi2fit.Fit.chi2fit(),
  options :: Keyword.t()
) :: none()

Displays results of the function Chi2fit.Fit.chi2fit/4

Link to this function

display_subsequences(device \\ :stdio, trends, intervals)

View Source

Specs

display_subsequences(
  device :: IO.device(),
  trends :: list(),
  intervals :: [NaiveDateTime.t()]
) :: none()

Pretty prints subsequences.

Link to this function

empirical_cdf(data, bin \\ {1.0, 0.5}, algorithm \\ :wilson, correction \\ 0)

View Source

Specs

empirical_cdf(
  [{float(), number()}],
  {number(), number()},
  algorithm(),
  integer()
) :: {cdf(), bins :: [float()], numbins :: pos_integer(), sum :: float()}

Generates an empirical Cumulative Distribution Function from sample data.

Three parameters determine the resulting empirical distribution:

  1. algorithm for assigning errors,

  2. the size of the bins,

  3. a correction for limiting the bounds on the 'y' values

When e.g. task effort/duration is modeled, some tasks measured have 0 time. In practice what is actually is meant, is that the task effort is between 0 and 1 hour. This is where binning of the data happens. Specify a size of the bins to control how this is done. A bin size of 1 means that 0 effort will be mapped to 1/2 effort (at the middle of the bin). This also prevents problems when the fited distribution cannot cope with an effort os zero.

Supports two ways of assigning errors: Wald score or Wilson score. See [1]. Valie values for the algorithm argument are :wald or :wilson.

In the handbook of MCMC [1] a cumulative distribution is constructed. For the largest 'x' value in the sample, the 'y' value is exactly one (1). In combination with the Wald score this gives zero errors on the value '1'. If the resulting distribution is used to fit a curve this may give an infinite contribution to the maximum likelihood function. Use the correction number to have a 'y' value of slightly less than 1 to prevent this from happening. Especially the combination of 0 correction, algorithm :wald, and 'linear' model for handling asymmetric errors gives problems.

The algorithm parameter determines how the errors onthe 'y' value are determined. Currently supported values include :wald and :wilson.

References

[1] "Handbook of Monte Carlo Methods" by Kroese, Taimre, and Botev, section 8.4
[2] See https://en.wikipedia.org/wiki/Cumulative_frequency_analysis
[3] https://arxiv.org/pdf/1112.2593v3.pdf
[4] See https://en.wikipedia.org/wiki/Student%27s_t-distribution:
    90% confidence ==> t = 1.645 for many data points (> 120)
    70% confidence ==> t = 1.000

Specs

error([{gamma :: number(), k :: pos_integer()}], :initial_sequence_method) ::
  {var :: number(), lag :: number()}

Calculates and returns the error associated with a list of observables.

Usually these are the result of a Markov Chain Monte Carlo simulation run.

The only supported method is the so-called Initial Sequence Method. See section 1.10.2 (Initial sequence method) of [1].

Input is a list of autocorrelation coefficients. This may be the output of auto/2.

References

[1] 'Handbook of Markov Chain Monte Carlo'
Link to this function

forecast(fun, size, tries \\ 0, update \\ fn -> 1 end)

View Source

Specs

forecast(
  fun :: (() -> non_neg_integer()),
  size :: pos_integer(),
  tries :: pos_integer(),
  update :: (() -> number())
) :: number()

Forecasts how many time periods are needed to complete size items

Related functions: forecast_duration/2 and forecast_items/2.

Link to this function

forecast_duration(data, size)

View Source

Specs

forecast_duration(data :: [number()] | (() -> number()), size :: pos_integer()) ::
  (() -> number())

Returns a function for forecasting the duration to complete a number of items.

This function is a wrapper for forecast/4.

Arguments

`data` - either a data set to base the forecasting on, or a function that returns (random) numbers
`size` - the number of items to complete
Link to this function

forecast_items(data, periods)

View Source

Specs

forecast_items(data :: [number()] | (() -> number()), periods :: pos_integer()) ::
  (() -> number())

Returns a function for forecasting the number of completed items in a number periods.

This function is a wrapper for forecast/4.

Arguments

`data` - either a data set to base the forecasting on, or a function that returns (random) numbers
`periods` - the number of periods to forecast the number of completed items for
Link to this function

get_cdf(data, binsize \\ {1.0, 0.5}, algorithm \\ :wilson, correction \\ 0)

View Source

Specs

get_cdf([number()], number() | {number(), number()}, algorithm(), integer()) ::
  {cdf(), bins :: [float()], numbins :: pos_integer(), sum :: float()}

Calculates the empirical CDF from a sample.

Convenience function that chains make_histogram/2 and empirical_cdf/3.

Link to this function

integrate(method, func, a, b, options \\ [])

View Source

Specs

integrate(
  method(),
  (float() -> float()),
  a :: float(),
  b :: float(),
  options :: Keyword.t()
) :: float()

Numerical integration providing Gauss and Romberg types.

Link to this function

intervals(options \\ [])

View Source

Specs

intervals(options :: Keyword.t()) :: Stream.t()

Returns a Stream that generates a stream of dates.

Examples

iex> intervals(end: ~D[2019-06-01]) |> Enum.take(4)
[~D[2019-06-01], ~D[2019-05-16], ~D[2019-05-01], ~D[2019-04-16]]

iex> intervals(end: ~D[2019-06-01], type: :weekly) |> Enum.take(4)
[~D[2019-06-01], ~D[2019-05-18], ~D[2019-05-04], ~D[2019-04-20]]

iex> intervals(end: ~D[2019-06-01], type: :weekly, weeks: 1) |> Enum.take(4)
[~D[2019-06-01], ~D[2019-05-25], ~D[2019-05-18], ~D[2019-05-11]]

iex> intervals(end: ~D[2019-06-01], type: :weekly, weeks: [3,2]) |> Enum.take(4)
[~D[2019-06-01], ~D[2019-05-11], ~D[2019-04-27], ~D[2019-04-13]]
Link to this function

jacobian(x, fun, options \\ [])

View Source

Calculates the jacobian of the function at the point x.

Examples

iex> jacobian([2.0,3.0], fn [x,y] -> x*y end) |> Enum.map(&Float.round(&1))
[3.0, 2.0]
Link to this function

make_histogram(list, binsize \\ 1.0, offset \\ 0.0)

View Source

Specs

make_histogram([number()], number(), number()) :: [
  {non_neg_integer(), pos_integer()}
]

Converts a list of numbers to frequency data.

The data is divided into bins of size binsize and the number of data points inside a bin are counted. A map is returned with the bin's index as a key and as value the number of data points in that bin.

The function returns a list of 2-tuples. Each tuple contains the index of the bin and the value of the count of the number of items in the bin. The index of the bins start at 1 in the following way:

  • [0..1) has index 1 (including 0 and excludes 1),
  • [1..2) has index 2,
  • etc.

When an offset is used, the bin starting from the offset, i.e. [offset..offset+1) gets index 1. Values less than the offset are gathered in a bin with index 0.

Examples

iex> make_histogram [1,2,3]
[{2, 1}, {3, 1}, {4, 1}]

iex> make_histogram [1,2,3], 1.0, 0
[{2, 1}, {3, 1}, {4, 1}]

iex> make_histogram [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9]
[{2, 1}, {3, 1}, {4, 2}, {5, 3}, {6, 3}, {7, 2}, {8, 1}, {9, 1}, {10  , 1}]

iex> make_histogram [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9], 3, 1.5
[{0, 1}, {1, 6}, {2, 6}, {3, 2}]

iex> make_histogram [0,0,0,1,3,4,3,2,6,7],1
[{1,3},{2,1},{3,1},{4,2},{5,1},{7,1},{8,1}]

iex> make_histogram [0,0,0,1,3,4,3,2,6,7],1,0.5
[{0,3},{1,1},{2,1},{3,2},{4,1},{6,1},{7,1}]

Specs

map2weekdays(t :: number(), sat :: pos_integer()) :: number()

Maps the date to weekdays such that weekends are eliminated; it does so with respect to a given Saturday

Example

iex> map2weekdays(43568.123,43566)
43566.123

iex> map2weekdays(43574.123,43566)
43571.123
Link to this function

map2workhours(t, startofday, endofday)

View Source

Specs

map2workhours(t :: number(), startofday :: number(), endofday :: number()) ::
  number()

Maps the time of a day into the working hour period

Scales the resulting part of the day between 0..1.

Arguments

`t` - date and time of day as a float; the integer part specifies the day and the fractional part the hour of the day
`startofday` - start of the work day in hours
`endofday` - end of the working day in hours

Example

iex> map2workhours(43568.1, 8, 18)
43568.0

iex> map2workhours(43568.5, 8, 18)
43568.4
Link to this function

mc(iterations, fun, options \\ [])

View Source

Specs

mc(
  iterations :: pos_integer(),
  fun :: (pos_integer() -> float()),
  options :: Keyword.t()
) ::
  {avg :: float(), sd :: float(), tries :: [float()]}
  | {avg :: float(), sd :: float()}

Basic Monte Carlo simulation to repeatedly run a simulation multiple times.

Options

`:collect_all?` - If true, collects data from each individual simulation run and returns this an the third element of the result tuple

Specs

moment(sample :: [number()], n :: pos_integer()) :: float()

Calculates the nth moment of the sample.

Example

iex> moment [1,2,3,4,5,6], 1
3.5

Specs

momentc(sample :: [number()], n :: pos_integer()) :: float()

Calculates the nth centralized moment of the sample.

Example

iex> momentc [1,2,3,4,5,6], 1
0.0

iex> momentc [1,2,3,4,5,6], 2
2.9166666666666665

Specs

momentc(sample :: [number()], n :: pos_integer(), mu :: float()) :: float()

Calculates the nth centralized moment of the sample.

Example

iex> momentc [1,2,3,4,5,6], 2, 3.5
2.9166666666666665

Specs

momentn(sample :: [number()], n :: pos_integer()) :: float()

Calculates the nth normalized moment of the sample.

Example

iex> momentn [1,2,3,4,5,6], 1
0.0

iex> momentn [1,2,3,4,5,6], 2
1.0

iex> momentn [1,2,3,4,5,6], 4
1.7314285714285718

Specs

momentn(sample :: [number()], n :: pos_integer(), mu :: float()) :: float()

Calculates the nth normalized moment of the sample.

Example

iex> momentn [1,2,3,4,5,6], 4, 3.5
1.7314285714285718
Link to this function

momentn(sample, n, mu, sigma)

View Source

Specs

momentn(
  sample :: [number()],
  n :: pos_integer(),
  mu :: float(),
  sigma :: float()
) :: float()

Calculates the nth normalized moment of the sample.

Link to this function

newton(a, b, func, maxiter \\ 10, options)

View Source

Specs

newton(
  a :: float(),
  b :: float(),
  func :: (x :: float() -> float()),
  maxiter :: non_neg_integer(),
  options :: Keyword.t()
) :: {float(), {float(), float()}, {float(), float()}}

Newton-Fourier method for locating roots and returning the interval where the root is located.

See [https://en.wikipedia.org/wiki/Newton's_method#Newton.E2.80.93Fourier_method]

Link to this function

puiseaux(list, result \\ [], flag \\ false)

View Source

Specs

puiseaux([number()], [number()], boolean()) :: [number()]

Converts the input so that the result is a Puiseaux diagram, that is a strict convex shape.

Examples

iex> puiseaux [1]
[1]

iex> puiseaux [5,3,3,2]
[5, 3, 2.5, 2]
Link to this function

puts_errors(device \\ :stdio, errors)

View Source

Specs

puts_errors(device :: IO.device(), errors :: tuple()) :: none()

Outputs and formats the errors that result from a call to Chi2fit.Fit.chi2/4

Errors are tuples of length 2 and larger: {[min1,max1], [min2,max2], ...}.

Specs

read_data(filename :: String.t()) :: Stream.t()

Reads data from a file specified by filename and returns a stream with the data parsed as floats.

It expects a single data point on a separate line and removes entries that:

  • are not floats, and
  • smaller than zero (0)

Specs

resample(data :: [number()], options :: Keyword.t()) :: [number()]

Reamples the subsequences of numbers contained in the list as determined by analyze/2

Link to this function

richardson(func, init, factor, results \\ [], options)

View Source

Specs

richardson(
  func :: (term() -> {float(), term()}),
  init :: term(),
  factor :: float(),
  results :: [float()],
  options :: Keyword.t()
) :: float()

Richardson extrapolation.

Link to this function

subexponential_stat(data, test \\ :sum, n \\ 2, binsize \\ {1, 0})

View Source

Calculates the test statistic for subexponentiality of a sample.

A value close to 0 is a strong indication that the sample shows subexponential behaviour (extremistan), i.e. is fat-tailed.

Specs

subsequences(Enumerable.t()) :: Enumerable.t()

Examples

iex> subsequences []
[]

iex> subsequences [:a, :b]
[[:a], [:a, :b]]

iex> Stream.cycle([1,2,3]) |> subsequences |> Enum.take(4)
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 1]]
Link to this function

throughput(intervals, datelist)

View Source

Specs

throughput(intervals :: Enumerable.t(), datelist :: [NaiveDateTime.t()]) :: [
  number()
]

Counts the number of dates (datelist) that is between consecutive dates in intervals and returns the result as a list of numbers.

Link to this function

time_diff(data, options)

View Source

Specs

time_diff(data :: Enumrable.t(), options :: Keyword.t()) :: Enumerable.t()

Returns a list of time differences (assumes an ordered list as input)

Options

`cutoff` - time differences below the cutoff are changed to the cutoff value (defaults to `0.01`)
`drop?` - whether to drop time differences below the cutoff (defaults to `false`)
Link to this function

to_bins(data, binsize \\ {1.0, 0.5})

View Source

Specs

to_bins(data :: [number()], binsize :: {number(), number()}) :: ecdf()

Converts raw data to binned data with (asymmetrical) errors.

Link to this function

total_mc(result, fun, mode \\ :use_bounds, iterations \\ 1000)

View Source

Specs

unzip(list :: [tuple()]) :: tuple()

Unzips lists of 1-, 2-, 3-, 4-, 5-, 6-, 7-, and 8-tuples.