K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
npm i ml-kmeans
import { kmeans } from 'ml-kmeans';
const data = [
[1, 1, 1],
[1, 2, 1],
[-1, -1, -1],
[-1, -1, -1.5],
];
const centers = [
[1, 2, 1],
[-1, -1, -1],
];
const ans = kmeans(data, 2, { initialization: centers });
console.log(ans);
/*
KMeansResult {
clusters: [ 0, 0, 1, 1 ],
centroids: [ [ 1, 1.5, 1 ], [ -1, -1, -1.25 ] ],
converged: true,
iterations: 2,
distance: [Function: squaredEuclidean]
}
*/
// Compute the mean error and size of each cluster.
console.log(ans.computeInformation(data));
/*
[
{ centroid: [ 1, 1.5, 1 ], error: 0.25, size: 2 },
{ centroid: [ -1, -1, -1.25 ], error: 0.0625, size: 2 }
]
*/
// Assign new points to the clusters found above.
console.log(ans.nearest([[1, 2, 1]]));
// [ 0 ]Runs the K-means algorithm and returns a KMeansResult.
data: array of points to cluster, each in the format[x, y, z, ...].k: number of clusters.options: an optional object with the following properties.
| Option | Type | Default | Description |
|---|---|---|---|
initialization |
string or number[][] |
'kmeans++' |
Either custom start centroids ([x, y, z, ...]), or one of the methods 'kmeans++', 'random', or 'mostDistant'. |
maxIterations |
number |
100 |
Maximum number of iterations allowed. Set to 0 to iterate until convergence. |
tolerance |
number |
1e-6 |
Error tolerance used as the convergence criterion. |
distanceFunction |
(p, q) => number |
squaredEuclidean |
Distance function to use between two points. |
seed |
number |
(none) | Seed for the random number generator, used by the 'random' and 'mostDistant' initialization methods to make results reproducible. |
'kmeans++': uses the kmeans++ seeding method.'random': chooseskrandom distinct points.'mostDistant': chooses the most distant points starting from a first random pick.
The object returned by kmeans.
clusters: array with the cluster index of each input point.centroids: array with the resulting centroids.converged: whether the convergence criterion was satisfied.iterations: number of iterations performed.result.nearest(points): returns the cluster index of each of the given newpoints.result.computeInformation(data): returns thecentroid, meanerror, andsizeof each cluster.
Generator variant of kmeans that yields the KMeansResult of each iteration, which is useful to observe the algorithm step by step.
D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035. Link to article