Numy v0.1.5 Numy.Fit.SimpleLinear View Source

Simple Linear Regression.

In y = α + βx, one approach to estimating the unknowns α and β is to consider the sum of squared residuals function, or SSR.

∑rᵢ² = ∑(yᵢ - α - βxᵢ)²

It is a fact that among all possible α and β, the following values minimize the SSR:

  1. β = cov(x,y) / var(x)
  2. α = ȳ - βx̄

Example

iex(34)> x = Numy.Lapack.Vector.new(0..9)
#Vector<size=10, [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]>
iex(35)> y = Vc.scale(x,2) |> Vcm.offset!(-3.0) # make slope=2 and intercept=-3
#Vector<size=10, [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0]>
iex(36)> err = Numy.Lapack.Vector.new(10) |> Vc.assign_random |> Vcm.offset!(-0.5) |> Vcm.scale!(0.1)
iex(37)> Vcm.add!(y,err) # add errors to the ideal line
iex(38)> line = Numy.Fit.SimpleLinear.fit(x,y)
{-2.9939933270609496, 1.9966330251198818} # got intercept=-3 and slope=2 as expected
iex(40)> Numy.Fit.SimpleLinear.predict(0,line)
-2.9939933270609496
iex(41)> Numy.Fit.SimpleLinear.predict(10,line)
16.97233692413787
iex(9)> predicted = Numy.Lapack.Vector.new(Numy.Fit.SimpleLinear.predict(Enum.to_list(0..9),line))
iex(10)> Numy.Fit.SimpleLinear.pearson_correlation(predicted, y)
0.9999884096405113

Link to this section Summary

Functions

In statistics, covariance is a measure of the joint variability of 2 random variables.

Calculate cov/var in one step.

Find slope and intercept of the line that fits the input data.

Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations.

Calculate y by x using slope and intercept.

In statistics, variance is the expectation of the squared deviation of a random variable from its mean.

Link to this section Functions

In statistics, covariance is a measure of the joint variability of 2 random variables.

Here is two-pass stable algorithm

Link to this function

covariance_over_variance(x, y)

View Source

Calculate cov/var in one step.

Find slope and intercept of the line that fits the input data.

Link to this function

pearson_correlation(x, y)

View Source

Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations.

A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of 0 implies that there is no linear correlation between the variables.

Calculate y by x using slope and intercept.

In statistics, variance is the expectation of the squared deviation of a random variable from its mean.

var(x) = E[ (x - x̄)² ] = (∑(xᵢ - x̄)²) / n