@jbeuckm/k-means-js

0.6.1 • Public • Published

K-Means Clustering

Build Status

A basic Javascript implementation of the [cluster analysis] 1 algorithm.

Install

npm i @jbeuckm/k-means-js --save

Usage

  • Optionally, normalize the data.

The normalizer will scale numerical data between [0,1] and will generate n outputs of either zero or one for discrete data, eg. category.

// Tell the normalizer about the category field.
const params = {
  category: "discrete",
};

// Category is a discrete field with two possible values.
// Value is a linear field with continuous possible values.
const data = [
  {
    category: "a",
    value: 25,
  },
  {
    category: "b",
    value: 7.6,
  },
  {
    category: "a",
    value: 28,
  },
];

import { dataset } from "@jbeuckm/k-means-js";

// Get ranges for normalizing and denormalizing the data
const ranges = dataset.findRanges(params, data);

// Optionally, set the relative importance of one or more fields
// *The default weight for any field is one.*
const weights = { category: 2 };

const normalized = dataset.normalize(data, ranges, weights);
  • Run the algorithm.
// This non-normalized sample data with n=k is a pretty awful example.
var points = [
  [0.1, 0.2, 0.3],
  [0.4, 0.5, 0.6],
  [0.7, 0.8, 0.9],
];

var k = 3;

import kmeans from "@jbeuckm/k-means-js";

const means = kmeans.cluster(points, k, console.log);

The call to cluster() will find the data's range in each dimension, generate k=3 random points, and iterate until the means are static.

  • Find the best K

The method described by Pham, et al. is implemented. The algorithm evaluates K-means repeatedly for different values of K, and returns the best (guess) value for K as well as the set of means found during evaluation.

import { phamBestK } from "@jbeuckm/k-means-js";

const maxKToTest = 10;
const result = phamBestK.findBestK(points, maxKToTest);

console.log("this data has " + result.K + " clusters");
console.log("cluster centroids = " + result.means);
  • Denormalize data

Denormalization can be used to show the means discovered:

for (let i = 0, l = result.means.length; i < l; i++) {
  console.log(dataset.denormalizeDatum(result.means[i], ranges));
}

Todo

  • denormalize data
  • provide ability to label data points, dimensions and means
  • build an asynchronous version of the algorithm
  • Typescript

Versions

Current Tags

VersionDownloads (Last 7 Days)Tag
0.6.12latest

Version History

VersionDownloads (Last 7 Days)Published
0.6.12
0.6.00
0.5.01
0.2.11
0.2.00

Package Sidebar

Install

npm i @jbeuckm/k-means-js

Weekly Downloads

4

Version

0.6.1

License

none

Unpacked Size

38.7 kB

Total Files

17

Last publish

Collaborators

  • jbeuckm