Numy v0.1.5 Numy.Fit.SimpleLinear View Source
Simple Linear Regression.
In y = α + βx, one approach to estimating the unknowns α and β is to consider the sum of squared residuals function, or SSR.
∑rᵢ² = ∑(yᵢ - α - βxᵢ)²
It is a fact that among all possible α and β, the following values minimize the SSR:
- β = cov(x,y) / var(x)
- α = ȳ - βx̄
Example
iex(34)> x = Numy.Lapack.Vector.new(0..9)
#Vector<size=10, [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]>
iex(35)> y = Vc.scale(x,2) |> Vcm.offset!(-3.0) # make slope=2 and intercept=-3
#Vector<size=10, [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0]>
iex(36)> err = Numy.Lapack.Vector.new(10) |> Vc.assign_random |> Vcm.offset!(-0.5) |> Vcm.scale!(0.1)
iex(37)> Vcm.add!(y,err) # add errors to the ideal line
iex(38)> line = Numy.Fit.SimpleLinear.fit(x,y)
{-2.9939933270609496, 1.9966330251198818} # got intercept=-3 and slope=2 as expected
iex(40)> Numy.Fit.SimpleLinear.predict(0,line)
-2.9939933270609496
iex(41)> Numy.Fit.SimpleLinear.predict(10,line)
16.97233692413787
iex(9)> predicted = Numy.Lapack.Vector.new(Numy.Fit.SimpleLinear.predict(Enum.to_list(0..9),line))
iex(10)> Numy.Fit.SimpleLinear.pearson_correlation(predicted, y)
0.9999884096405113
Link to this section Summary
Functions
In statistics, covariance is a measure of the joint variability of 2 random variables.
Calculate cov/var in one step.
Find slope and intercept of the line that fits the input data.
Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations.
Calculate y by x using slope and intercept.
In statistics, variance is the expectation of the squared deviation of a random variable from its mean.
Link to this section Functions
In statistics, covariance is a measure of the joint variability of 2 random variables.
Here is two-pass stable algorithm
Calculate cov/var in one step.
Find slope and intercept of the line that fits the input data.
Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations.
A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of 0 implies that there is no linear correlation between the variables.
Calculate y by x using slope and intercept.
In statistics, variance is the expectation of the squared deviation of a random variable from its mean.
var(x) = E[ (x - x̄)² ] = (∑(xᵢ - x̄)²) / n