emel/ml/k_nearest_neighbors

A non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space.

Functions

pub fn classifier(data: List(#(List(Float), String)), k: Int) -> fn(
  List(Float),
) -> String

Returns the function that classifies a point by finding the k nearest neighbors.

Data = [
  {[7.0, 7.0], "bad"},
  {[7.0, 4.0], "bad"},
  {[3.0, 4.0], "good"},
  {[1.0, 4.0], "good"}
],
K = 3,
F = emel@ml@k_nearest_neighbors:classifier(Data, K),
F([3.0, 7.0]).
% "good"
pub fn k_nearest_neighbors(data: List(#(List(Float), a)), point: List(
    Float,
  ), k: Int) -> List(#(List(Float), a))

It searches through the data and returns the k most similar items to the point.

Data = [
  {[7.0, 7.0], "bad"},
  {[7.0, 4.0], "bad"},
  {[3.0, 4.0], "good"},
  {[1.0, 4.0], "good"}
],
Point = [3.0, 7.0],
K = 3,
emel@ml@k_nearest_neighbors:k_nearest_neighbors(Data, Point, K).
% [
%   {[3.0, 4.0], "good"},
%   {[1.0, 4.0], "good"},
%   {[7.0, 7.0], "bad"}
% ]
pub fn predictor(data: List(#(List(Float), Float)), k: Int) -> fn(
  List(Float),
) -> Float

Returns the function that calculates the average value of the dependent variable for the k nearest neighbors.

Data = [
  {[0.0, 0.0, 0.0], 0.0},
  {[0.5, 0.5, 0.5], 1.5},
  {[1.0, 1.0, 1.0], 3.0},
  {[1.5, 1.5, 1.5], 4.5},
  {[2.0, 2.0, 2.0], 6.0},
  {[2.5, 2.5, 2.5], 7.5},
  {[3.0, 3.3, 3.0], 9.0}
],
K = 2,
F = emel@ml@k_nearest_neighbors:predictor(Data, K),
F([1.725, 1.725, 1.725]).
% 5.25