dbscan_gps
A node.js packpage for clustering GPS coordinates by using DBSCAN algorithm.
Background
DBSCAN (https://en.wikipedia.org/wiki/DBSCAN), per its name suggested, is a density-based clustering algorithm. This module is developped specifically to cluster points with GPS coordinates (in decimal degree format for now) by using the algorithm. You can also modify the module to leverage your own distance function.
Through work, I have also used K-mean, G-means(described in https://github.com/haifengl/smile/), Hierarchical-clustering algorithms. I found that for small dataset of GPS coordinates, they do not work very well.
Pros and Cons
Pros
DBSCAN is a very simple yet effective algorithm to cluster data. You can see in the source code the implementation is less than 250 lines. It is also very versatile in clustering different types of data (assuming you can provide custom distance function).
Cons
The algorithm has a worst complexity of O(n^2) and an average of O(nlogn) complexity. This means if you have a large set of data, the algorithm could become slow. Another con is the fact that you have to specify the minimal number of points in a cluster as well as the epsilon (radius).
Example
var dataset = 3738049441 -12204046815 3738049441 -12204046815 3738049441 -12204046815 3779859258 -12242328555 3780167405 -12243965453 3778532559 -12240361793 3778229027 -12240579998 377982797 -12243723815 3742121791 -1222151668 3742121791 -1222151668 3732949423 -12190143483 3780125018 -12242455065 3733392574 -12200670057 3950234395 -11986786681 3778155398 -12239547014 3778155398 -12239547014 3119726586 3536302658 31477502 35386898 377840328 -12243300302 3779598335 -12240626251 3779827759 -12242747273 3779828489 -12244717598 3778393 -122433003 3778393 -122433003 3780009745 -122439658 3779647682 -12242203823 3782472266 -12245186823 3777660911 -12238905006 3761576117 -12238790989 3611603686 -11517426968 36118691 -11517585 36118691 -11517585 360836502 -11514985085 3761576117 -12238790989 37800006 -1224378754 37778665 -12241774077 3778789485 -12240747929 37792861 -122392363 3778393 -122433003 3775029014 -12220294476 377978607 -1224286596 3779379 -122393 37826832 -122491052 4064902844 -7378194377 3780023 -1224395 3777059339 -12243182761 3777059339 -12243182761 3776025 -12241215 3776023 -12241199 3776614415 -12242939294 3776617776 -12242937449 377661208 -12242946594 3799517 -12246531 3778919 -12240089 3776613125 -12242941954 3780867078 -12243037428 3952104 -1198131 3776493 -12242233 3779402 -12239907 3779399 -12239909 3775108 -12220049 3775635584 -12241927966 3776632708 -12242945496 3776613789 -12242936044 3777492013 -12243762416 377801 -12239021 3394437 -11840094 3406674903 -11844984965 3406664 -11844845 3406681667 -11844984147 3406819306 -11844247907 3406820579 -11844248629 3406817425 -11844253283 3780568636 -12245048073 3950265928 -11986663568 3777438431 -12242749737 3776612013 -12242920292 3407283094 -1184683974 3895660367 -774482698 3892988722 -7720645691 3780898321 -12243169989 3780859133 -12243139564 3780855077 -12243133356 3026290915 -9773613041 3752729797 -12225178528 3776648093 -12242950282 3771226741 -12221261144 377796211 -1223912277 3761865062 -12238100052 3950759943 -11986416337 3780066578 -12245820697 377976913 -1224329605 3761865062 -12238100052 199258027 -15588752502 3780782478 -12243127054 3778448656 -12240315955 340221 -118481 3864914252 -12151816104 407142 -740064 37793869 -1223948593 3950253933 -11986668932 3799457903 -12246467818 3787281873 -12245648861 372567848 -12203411448 3779917 -12244133 3802983 -784855 3798267 -7845517 388895 -7701067 3778492 -12241883 3777986 -12248336 3778 -12248333 3779633 -12239383 3952524 -11981432 3951017 -119805 3780053 -12243791 3780064372 -12243795387 3778097 -12240264 37775 -122418 var Clustering = ; var clustering = dataset 100 "mi" 2; clustering
The points used in the example displayed on a map:
After clustering
In total, there are 4 clusters predicted
About the example
These points are extracted from an individual's Facebook check-ins. Some of the points are discarded as noise points. For example, the one in Hawaii or the few ones in Africa.
var clustering = dataset 100 "mi" 2;
In this particular example, we are interested in limiting the radius to 100 Miles radius (you can also use "km" to specify that you are interested in kilometers) and 2 minimal points to form a cluster.
API
var clustering = new Clustering(dataset, eps, eps_unit, minpts);
@param {array} dataset: an array of points, each point is shaped as [lat, lon]; The lat and lon must be in Decimal Degrees (DD) format
@param {integer} eps: epsilon, the radius of the cluster
@param {string} eps_unit: in either kilometers "km" or miles "mi"
@param {integer} minPts: the minimal number of points in a cluster
clustering.fit(function(e, clusters){ }
@callback {function} callback: (error, clusters):
error is an error description in string type,
clusters is an array of clusters; each cluster is another array of points with index coorresponding to the index in the input dataset
Future Improvement
- Add parameter to allow custom distance function
- Add support for other formats of GPS coordinates such as Degrees and decimal minutes (DMM) and Degrees, minutes, and seconds (DMS)