View Source Chi2fit.Cli (Chi-SquaredFit v2.0.2)
Provides a command line interface for fitting data against a known cumulative distribution function.
Tool for fitting particular probability distributions to empirical cumulative distribution functions. Distributions supported are Weibull, Wald (Inverse Gauss), Normal, Exponential, Erlang, and Skewed Exponential.
It uses the Chi-squared Pearson statistic as the likelihood function for fitting. This statistic applies to empirical data that is categorial in nature.
It provides various options for controlling the fitting procedure and assignment of errors. It supports asymmetrical errors in fitting the data.
basic-usage-scanning-the-surface
Basic usage: scanning the surface
As described above fitting the parameters is done by minimizing the chi-squared statistic. Usually this is a function of the distribution paremeters.
Scanning the surface is a simple way to have an initial guess of the parameters. The following command does a simple scan of the chi-squared surface against data:
$ chi2fit data.csv --ranges '[{0.8,1.2},{0.6,1.2}]' --cdf weibull
Initial guess:
chi2: 1399.3190035059733
pars: [0.800467783803376, 29.98940654419653]
errors: {[0.800467783803376, 0.800467783803376], [29.98940654419653, 29.98940654419653]}
and the file data.csv
is formatted as
Lead Time
26
0
105
69
3
36
...
In this form the command will scan or probe the Chi-squared surface for the parameters within the provided range. It returns the found minimum Chi-squared and the parameter values at this minimum. The reported error ranges correspond to a change of Chi-squared of +1.
Options available:
probes
- The number of probes to use for guessing parameter values at initializationprogress
- Shows progress during 'probing' (shows progress every 1000 probes)c
- Mark progress every 100th probex
- Mark progress every 10th probe
More options are described below and are available using the option --help
.
input-data-options
Input data options
Several options control how the input data is interpreted. These are:
model
- determines how errors are assigned to the data points. Possible values includesimple|asimple|linear
data
- instead of using the file for data, use this option to pass a list of data pointscorrection
- Estimate of number of events missed in the right tail of the sample
An example of specifying data on the command line is:
$ chi2fit --ranges '[{0.8,1.2},{0.6,1.2}]' --cdf weibull --data '[2,3,4,5,5,4,4,7]'
distribution-options
Distribution options
Distributions supported are: Wald, Weibull, Normal, Erlang, Exponential, and SEP (Skewed Exponential: 3 and 4 parameters).
For the distributions of SEP (4 parameters), and SEP0 (3 parameters) the following options exist:
method
- Supported values are 'gauss|gauss2|gaus3|romberg|romberg2|romberg3'
Romberg integration supports the options:
tolerance
- The target precision for Romberg integrationitermax
- The maximum number of iterations to use in Romberg integration
Gauss integration supports the option:
npoints
- The number of points to use in Gauss integration (4, 8, 16, and 32)
fitting-options
Fitting options
AFter probing the surface for an initial guess of the parameters, a fine grained search for the optimum can be done by enabling the fit procedure. The algorithm implemented assumes that the initial guess is close enough to the minimum and uses a combination of parameter estimation and Monte Carlo methods.
An additional strategy is to use a so-called grid-search by changing only one parameter at a time. It selects the parameters in a round robin fashion. Using Romberg iteration and Newton root finding algorithm the parameter value minimizing chi-squared is determined while kepping the other parameters constant. Then the other parameters are varied. Especially fitting distributions with 3 or more parameters may benefit from this strategy.
Options controlling these are:
fit
- Enables the fine-grained fitting of parametersiterations
- Number of iterations to use in the optimizing the Likelihood functiongrid
- Uses a grid search to fit one parameter at a time in a round robin fashion
Sometimes the chi-squared surface is not smooth but numerically problematic to get stable. In this case smoothing the surface may help. The next option enables this feature:
smoothing
- Smoothing of the likelihood function with a Gaussian kernel
The fitting procedures uses derivatives (first and second order) to estimate changes in the parameters that will result in a better fit. Derivaties are calculated using Romberg differentiation. The accuracy and maximum number of iterations are controlled by the options:
tolerance
- The target precision for Romberg integrationitermax
- The maximum number of iterations to use in Romberg integration
bootstrapping
Bootstrapping
Bootstrapping can be enabled to estimate the errors in the parameters. The supported options are:
bootstrap
- Enables bootstrapping. Specifies the number of iterations to performsample
- The sample size to use from the empirical distribution
output-options
Output options
These options are useful for printing data for generating charts of the data:
print
- Outputs the empirical input data with errors includedoutput
- Outputs the fitted distribution function values at the data pointssurface
- Outputs the Chi-squared surface to a filesmoothing
- Smoothing of the likelihood function with a Gaussian kernel
general-options
General options
Options available for scanning, fitting, and bootstrapping:
debug
- Outputs additional data for debugging purposes"
references
References
[1] R.A. Arndt and M.H. MacGregor, Methods in Computational Physics, Vol. 6 (1966) 256-296
[2] Marius M. Nagels, Baryon-Baryon Scattering in a One-Boson-Exchange Potential Mode, PhD. Thesis, Nijmegen University, 1975
[3] Richard A. Arndt and Malcolm H. MacGregor, Determination of the Nucleon-Nucleon Elastic-Scattering Matrix. IV. Comparison of Energy-Dependent and Energy-Independent Phase-Shift Analyses, Physical Review Volume 142, Number 3, January 1966