A basic Javascript implementation of the [cluster analysis] 1 algorithm.
npm i @jbeuckm/k-means-js --save
- Optionally, normalize the data.
The normalizer will scale numerical data between [0,1] and will generate n outputs of either zero or one for discrete data, eg. category.
// Tell the normalizer about the category field.
const params = {
category: "discrete",
};
// Category is a discrete field with two possible values.
// Value is a linear field with continuous possible values.
const data = [
{
category: "a",
value: 25,
},
{
category: "b",
value: 7.6,
},
{
category: "a",
value: 28,
},
];
import { dataset } from "@jbeuckm/k-means-js";
// Get ranges for normalizing and denormalizing the data
const ranges = dataset.findRanges(params, data);
// Optionally, set the relative importance of one or more fields
// *The default weight for any field is one.*
const weights = { category: 2 };
const normalized = dataset.normalize(data, ranges, weights);
- Run the algorithm.
// This non-normalized sample data with n=k is a pretty awful example.
var points = [
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9],
];
var k = 3;
import kmeans from "@jbeuckm/k-means-js";
const means = kmeans.cluster(points, k, console.log);
The call to cluster() will find the data's range in each dimension, generate k=3 random points, and iterate until the means are static.
- Find the best K
The method described by Pham, et al. is implemented. The algorithm evaluates K-means repeatedly for different values of K, and returns the best (guess) value for K as well as the set of means found during evaluation.
import { phamBestK } from "@jbeuckm/k-means-js";
const maxKToTest = 10;
const result = phamBestK.findBestK(points, maxKToTest);
console.log("this data has " + result.K + " clusters");
console.log("cluster centroids = " + result.means);
- Denormalize data
Denormalization can be used to show the means discovered:
for (let i = 0, l = result.means.length; i < l; i++) {
console.log(dataset.denormalizeDatum(result.means[i], ranges));
}
denormalize data- provide ability to label data points, dimensions and means
- build an asynchronous version of the algorithm
- Typescript