View Source Scholar.Linear.BayesianRidgeRegression (Scholar v0.3.0)
Bayesian ridge regression: A fully probabilistic linear model with parameter regularization.
In order to obtain a fully probabilistic linear model, we declare the precision parameter in the model: $\alpha$, This parameter describes the dispersion of the data around the mean.
$$ p(y | X, w, \alpha) = \mathcal{N}(y | Xw, \alpha^{-1}) $$
Where:
$X$ is an input data
$y$ is an input target
$w$ is the model weights matrix
$\alpha$ is the precision parameter of the target and $\alpha^{-1} = \sigma^{2}$, the variance.
In order to obtain a fully probabilistic regularized linear model, we declare the distribution of the model weights matrix with it's corresponding precision parameter:
$$ p(w | \lambda) = \mathcal{N}(w, \lambda^{-1}) $$
Where $\lambda$ is the precision parameter of the weights matrix.
Both $\alpha$ and $\lambda$ are choosen to have prior gamma distributions, controlled through hyperparameters $\alpha_1$, $\alpha_2$, $\lambda_1$, $\lambda_2$. These parameters are set by default to non-informative $\alpha_1 = \alpha_2 = \lambda_1 = \lambda_2 = 1^{-6}$.
This model is similar to the classical ridge regression. Confusingly the classical ridge regression's $\alpha$ parameter is the Bayesian ridge's $\lambda$ parameter.
Other than that, the differences between alorithms are:
- The matrix weight regularization parameter is estimated from data,
- The precision of the target is estimated.
As such, Bayesian ridge is more flexible to the data at hand. These features come at higher computational cost.
This implementation is ported from Python's scikit-learn. It uses the algorithm described in (Tipping, 2001) and regularization parameters are updated as by (MacKay, 1992).
References:
D. J. C. MacKay, Bayesian Interpolation, Computation and Neural Systems, Vol. 4, No. 3, 1992.
M. E. Tipping, Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research, Vol. 1, 2001.
Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 12, pp. 2825-2830, 2011.
Summary
Functions
Fits a Bayesian ridge model for sample inputs x
and
sample targets y
.
Makes predictions with the given model
on input x
.
Functions
Fits a Bayesian ridge model for sample inputs x
and
sample targets y
.
Options
:iterations
(pos_integer/0
) - Maximum number of iterations before stopping the fitting algorithm. The number of iterations may be lower is parameters converge. The default value is300
.:sample_weights
- The weights for each observation. If not provided, all observations are assigned equal weight.:fit_intercept?
(boolean/0
) - If set totrue
, a model will fit the intercept. Otherwise, the intercept is set to0.0
. The intercept is an independent term in a linear model. Specifically, it is the expected mean value of targets for a zero-vector on input. The default value istrue
.:compute_scores?
(boolean/0
) - If set totrue
, the log marginal likelihood will be computed at each iteration of the algorithm. The default value isfalse
.:alpha_init
- The initial value for alpha. This parameter influences the precision of the noise.:alpha
must be a non-negative float i.e. in [0, inf). Defaults to 1/Var(y).:lambda_init
- The initial value for lambda. This parameter influences the precision of the weights.:lambda
must be a non-negative float i.e. in [0, inf). Defaults to 1. The default value is1.0
.:alpha_1
- Hyper-parameter : shape parameter for the Gamma distribution prior
over the alpha parameter. The default value is1.0e-6
.:alpha_2
- Hyper-parameter : inverse scale (rate) parameter for the Gamma distribution prior over the alpha parameter. The default value is1.0e-6
.:lambda_1
- Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter. The default value is1.0e-6
.:lambda_2
- Hyper-parameter : inverse scale (rate) parameter for the Gamma distribution prior
over the lambda parameter. The default value is1.0e-6
.:eps
(float/0
) - The convergence tolerance. WhenNx.sum(Nx.abs(coef - coef_new)) < :eps
, the algorithm is considered to have converged. The default value is1.0e-8
.
Return Values
The function returns a struct with the following parameters:
:coefficients
- Estimated coefficients for the linear regression problem.:intercept
- Independent term in the linear model.:alpha
- Estimated precision of the noise.:lambda
- Estimated precision of the weights.:sigma
- Estimated variance covariance matrix of weights with shape (n_features, n_features).:iterations
- How many times the optimization algorithm was computed.:has_converged
- Whether the coefficients converged during the optimization algorithm.:scores
- Value of the log marginal likelihood at each iteration during the optimization.
Examples
iex> x = Nx.tensor([[1], [2], [6], [8], [10]])
iex> y = Nx.tensor([1, 2, 6, 8, 10])
iex> model = Scholar.Linear.BayesianRidgeRegression.fit(x, y)
iex> model.coefficients
#Nx.Tensor<
f32[1]
[0.9932512044906616]
>
iex> model.intercept
#Nx.Tensor<
f32
0.03644371032714844
>
Makes predictions with the given model
on input x
.
Examples
iex> x = Nx.tensor([[1], [2], [6], [8], [10]])
iex> y = Nx.tensor([1, 2, 6, 8, 10])
iex> model = Scholar.Linear.BayesianRidgeRegression.fit(x, y)
iex> Scholar.Linear.BayesianRidgeRegression.predict(model, Nx.tensor([[1], [3], [4]]))
Nx.tensor(
[1.02969491481781, 3.0161972045898438, 4.009448528289795]
)