View Source Scholar.Cluster.DBSCAN (Scholar v0.3.0)

Perform DBSCAN clustering from vector array or distance matrix.

DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density.

The time complexity is $O(N^2)$ for $N$ samples. The space complexity is $O(N^2)$.

Summary

Functions

Perform DBSCAN clustering from vector array or distance matrix.

Functions

Perform DBSCAN clustering from vector array or distance matrix.

Options

  • :eps - The maximum distance between two samples for them to be considered as in the same neighborhood. The default value is 0.5.

  • :min_samples (integer/0) - The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. The default value is 5.

  • :metric - The function that measures the pairwise distance between two points. Possible values:

    • {:minkowski, p} - Minkowski metric. By changing value of p parameter (a positive number or :infinity) we can set Manhattan (1), Euclidean (2), Chebyshev (:infinity), or any arbitrary $L_p$ metric.

    • :cosine - Cosine metric.

    • Anonymous function of arity 2 that takes two rank-2 tensors.

    The default value is &Scholar.Metrics.Distance.pairwise_minkowski/2.

  • :weights - The weights for each observation in x. If equals to nil, all observations are assigned equal weight.

Return Values

The function returns a struct with the following parameters:

  • :core_sample_indices - Indices of core samples represented as a mask. The mask is a boolean array of shape {num_samples} where 1 indicates that the corresponding sample is a core sample and 0 otherwise.

  • :labels - Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1.

Examples

iex> x = Nx.tensor([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]])
iex> Scholar.Cluster.DBSCAN.fit(x, eps: 3, min_samples: 2)
%Scholar.Cluster.DBSCAN{
  core_sample_indices: Nx.tensor(
    [1, 1, 1, 1, 1, 0], type: :u8
  ),
  labels: Nx.tensor(
    [0, 0, 0, 1, 1, -1]
  )
}