View Source Chi2fit.Cli (Chi-SquaredFit v2.0.2)

Provides a command line interface for fitting data against a known cumulative distribution function.

Tool for fitting particular probability distributions to empirical cumulative distribution functions. Distributions supported are Weibull, Wald (Inverse Gauss), Normal, Exponential, Erlang, and Skewed Exponential.

It uses the Chi-squared Pearson statistic as the likelihood function for fitting. This statistic applies to empirical data that is categorial in nature.

It provides various options for controlling the fitting procedure and assignment of errors. It supports asymmetrical errors in fitting the data.

basic-usage-scanning-the-surface

Basic usage: scanning the surface

As described above fitting the parameters is done by minimizing the chi-squared statistic. Usually this is a function of the distribution paremeters.

Scanning the surface is a simple way to have an initial guess of the parameters. The following command does a simple scan of the chi-squared surface against data:

$ chi2fit data.csv --ranges '[{0.8,1.2},{0.6,1.2}]' --cdf weibull

Initial guess:
    chi2:		1399.3190035059733
    pars:		[0.800467783803376, 29.98940654419653]
    errors:		{[0.800467783803376, 0.800467783803376], [29.98940654419653, 29.98940654419653]}

and the file data.csv is formatted as

Lead Time
26
0
105
69
3
36
...

In this form the command will scan or probe the Chi-squared surface for the parameters within the provided range. It returns the found minimum Chi-squared and the parameter values at this minimum. The reported error ranges correspond to a change of Chi-squared of +1.

Options available:

  • probes - The number of probes to use for guessing parameter values at initialization
  • progress - Shows progress during 'probing' (shows progress every 1000 probes)
    • c - Mark progress every 100th probe
    • x - Mark progress every 10th probe

More options are described below and are available using the option --help.

input-data-options

Input data options

Several options control how the input data is interpreted. These are:

  • model - determines how errors are assigned to the data points. Possible values include simple|asimple|linear
  • data - instead of using the file for data, use this option to pass a list of data points
  • correction - Estimate of number of events missed in the right tail of the sample

An example of specifying data on the command line is:

$ chi2fit --ranges '[{0.8,1.2},{0.6,1.2}]' --cdf weibull --data '[2,3,4,5,5,4,4,7]'

distribution-options

Distribution options

Distributions supported are: Wald, Weibull, Normal, Erlang, Exponential, and SEP (Skewed Exponential: 3 and 4 parameters).

For the distributions of SEP (4 parameters), and SEP0 (3 parameters) the following options exist:

  • method - Supported values are 'gauss|gauss2|gaus3|romberg|romberg2|romberg3'

Romberg integration supports the options:

  • tolerance - The target precision for Romberg integration
  • itermax - The maximum number of iterations to use in Romberg integration

Gauss integration supports the option:

  • npoints - The number of points to use in Gauss integration (4, 8, 16, and 32)

fitting-options

Fitting options

AFter probing the surface for an initial guess of the parameters, a fine grained search for the optimum can be done by enabling the fit procedure. The algorithm implemented assumes that the initial guess is close enough to the minimum and uses a combination of parameter estimation and Monte Carlo methods.

An additional strategy is to use a so-called grid-search by changing only one parameter at a time. It selects the parameters in a round robin fashion. Using Romberg iteration and Newton root finding algorithm the parameter value minimizing chi-squared is determined while kepping the other parameters constant. Then the other parameters are varied. Especially fitting distributions with 3 or more parameters may benefit from this strategy.

Options controlling these are:

  • fit - Enables the fine-grained fitting of parameters
  • iterations - Number of iterations to use in the optimizing the Likelihood function
  • grid - Uses a grid search to fit one parameter at a time in a round robin fashion

Sometimes the chi-squared surface is not smooth but numerically problematic to get stable. In this case smoothing the surface may help. The next option enables this feature:

  • smoothing - Smoothing of the likelihood function with a Gaussian kernel

The fitting procedures uses derivatives (first and second order) to estimate changes in the parameters that will result in a better fit. Derivaties are calculated using Romberg differentiation. The accuracy and maximum number of iterations are controlled by the options:

  • tolerance - The target precision for Romberg integration
  • itermax - The maximum number of iterations to use in Romberg integration

bootstrapping

Bootstrapping

Bootstrapping can be enabled to estimate the errors in the parameters. The supported options are:

  • bootstrap - Enables bootstrapping. Specifies the number of iterations to perform
  • sample - The sample size to use from the empirical distribution

output-options

Output options

These options are useful for printing data for generating charts of the data:

  • print - Outputs the empirical input data with errors included
  • output - Outputs the fitted distribution function values at the data points
  • surface - Outputs the Chi-squared surface to a file
  • smoothing - Smoothing of the likelihood function with a Gaussian kernel

general-options

General options

Options available for scanning, fitting, and bootstrapping:

  • debug - Outputs additional data for debugging purposes"

references

References

[1] R.A. Arndt and M.H. MacGregor, Methods in Computational Physics, Vol. 6 (1966) 256-296

[2] Marius M. Nagels, Baryon-Baryon Scattering in a One-Boson-Exchange Potential Mode, PhD. Thesis, Nijmegen University, 1975

[3] Richard A. Arndt and Malcolm H. MacGregor, Determination of the Nucleon-Nucleon Elastic-Scattering Matrix. IV. Comparison of Energy-Dependent and Energy-Independent Phase-Shift Analyses, Physical Review Volume 142, Number 3, January 1966

Link to this section Summary

Link to this section Functions