k-means-plus

1.0.0 • Public • Published

major-colors

K-means++ clustering a classification of data, so that points assigned to the same cluster are similar (in some sense). It is identical to the K-means algorithm, except for the careful selection of initial conditions. It is a way of avoiding the sometimes poor clusterings found by the standard k-means algorithm. k-means++ variance address this major theoretic shortcoming and guarantee an approximation ratio O(log k) in expectation (over the randomness of the algorithm), where k is the number of clusters used.

Install

npm i k-means-plus
yarn add k-means-plus

Usage

import KMeansCluster from 'k-means-plus';
const cluster = new KMeansCluster({ distance, maximumIterations, convergedFn });
const results = cluster.cluster(vectors, k);
import KMeansCluster from 'k-means-plus';
 
const data = [
  {'company': 'Microsoft' , 'size': 91259, 'revenue': 60420},
  {'company': 'IBM' , 'size': 400000, 'revenue': 98787},
  {'company': 'Skype' , 'size': 700, 'revenue': 716},
  {'company': 'SAP' , 'size': 48000, 'revenue': 11567},
  {'company': 'Yahoo!' , 'size': 14000 , 'revenue': 6426 },
  {'company': 'eBay' , 'size': 15000, 'revenue': 8700},
];
 
// Create the data 2D-array (vectors) describing the data
let vectors = [];
for (let i = 0 ; i < data.length ; i++) {
  vectors.push([ data[i]['size'] , data[i]['revenue']]);
}
 
const k = 4;
 
const cluster = new KMeansCluster();
 
const {
  model: { observations, centroids, assignments },
  iterations,
  durationMs 
= cluster.cluster(vectors, k);

Constructor

  • distance - (vector1: [number], vector2: [number]) => number distance function used for clustering between two vectors.
  • Default: euclidean distance
  • maximumIterations - number maximum iterations of clustering.
  • Default: 200
  • convergedFn - (centroids1: [[number]], centroids2: [[number]]) => boolean determine if two consecutive set of centroids are converged.
  • Default: the clusters are converged if the cluster assignments are the same.

Inputs

  • vectors - [[number]] point vectors to cluster
  • k - number of final clusters

Outputs

type result = {
  model: {
    observations: [[number]], // the original vectors to cluster
    centroids: [[number]], // vectors of the centers of cluster
    assignments: [number] // mapping from index of original vector to the index of cluter center it belongs to
  },
  iterations: number, // number of iterations ran before converging
  durationMs: number // the duration of the algorithm
}

Readme

Keywords

none

Package Sidebar

Install

npm i k-means-plus

Weekly Downloads

1

Version

1.0.0

License

MIT

Unpacked Size

157 kB

Total Files

7

Last publish

Collaborators

  • goldensunliu