The TK Toolkit
A work-in-progress collection of utilities to help with reading, transforming and writing data. It's a collection of the following libraries:
- Indian Ocean for reading and writing data files (json, csv or a variety of delimited-value files) — converts spreadsheet-like data to json.
- Tablespoon for creating sqlite and postgresql databases from json objects.
- Joiner for doing left joins on two json or geojson objects.
- Simple statistics for performing simple stats operations like average, mean and some less-simple things like variance, jenks clustering, t-tests and bayesian classification.
Installation
npm install tktk
indian-ocean documentation for most updated list
Functions, See- Reading data
- .readData(filepath, [delimiter], callback)
- .readDataSync(filepath, [delimiter])
- .readJson(filepath, callback)
- .readJsonSync(filepath)
- .readCsv(filepath, callback)
- .readCsvSync(filepath)
- .readTsv(filepath, callback)
- .readTsvSync(filepath)
- .readPsv(filepath, callback)
- .readPsvSync(filepath)
- .readDbf(filepath, callback)
- Writing data
- Joining data
- .join.left(leftData, leftDataKey, rightData, rightDataKey, [nestedKeyName])
- .join.geoJson(leftData, leftDataKey, rightData, rightDataKey)
- Creating a database
- .db.sqlite()
- .db.pgsql(dbConnectionString)
- .db.createTable(dataobject, [tablename], [tableschema], [permanent])
- .db.createTableCommands(dataobject, [tablename], [tableschema], [permanent], [skipinsert])
- .db.createEmptyTable(dataobject, [tablename], [tableschema], [permanent])
- .db.insert(dataobject, [tablename])
- Querying a database
- .db.query(queryString, function)
- .db.query.each(queryString, function)
- .db.queries(list, function)
- .db.queries.each(list, function)
- Statistics
- .stats
- Helpers
Reading data
Uses the indian-ocean
module. Reads a variety of data file formats in as json.
.readData(filepath, [delimiter], callback)
Reads in a data file given a path ending in the file format. Callback structure is function(err, data)
.
Supported formats:
.json
Array of objects.csv
Comma-separated.tsv
Tab-separated.psv
Pipe-separated
Pass in a delimiter as the second argument to read in another format.
Note: Does not currently support .dbf files.
.readDataSync(filepath, [delimiter])
Syncronous version of .readData()
.
.readJson(filepath, callback)
Read in a json file. Callback structure is function(err, data)
.
.readJsonSync(filepath)
Read json syncronously.
.readCsv(filepath, callback)
Read in a comma-separated value file. Callback structure is function(err, data)
.
.readCsvSync(filepath)
Read csv syncronously.
.readTsv(filepath, callback)
Read in a tab-separated value file. Callback structure is function(err, data)
.
.readTsvSync(filepath)
Read tsv syncronously.
.readPsv(filepath, callback)
Read in a pipe-separated value file. Callback structure is function(err, data)
.
.readPsvSync(filepath)
Read psv syncronously.
.readDbf(filepath, callback)
Read in a .dbf file. Callback structure is function(err, data)
.
Writing data
Uses the indian-ocean
module. Writes json objects to the specified format.
.writeData(filepath, data, callback)
Write out the data object, inferring the file format from the file ending specified in filepath
. Callback structure is function(err, data)
.
Supported formats:
.json
Array of objects.csv
Comma-separated.tsv
Tab-separated.psv
Pipe-separated
Note: Does not currently support .dbf files.
.writeDataSync(filepath, data)
Syncronous version of .writeData
. Callback structure is function(err)
.
.writeDbfToData(inFilepath, outFilepath, callback)
Reads in a dbf file with .readDbf
and write to file using .writeData
. Callback structure is function(err)
.
Joining data
Uses the joiner
module. All methods return an object with the following structure:
data: data objectreport: diff: a: data in A b: data in A a_and_b: data in A and B a_not_in_b: data in A not in B b_not_in_a: data in B not in A : prose: summary: summary description of join result number of matches in A and B A not in B B not in A full: full list of which rows were joined in each of the above categories
_.left(leftData, leftDataKey, rightData, rightDataKey, [nestedKeyName])
Perform a left join on the two array-of-object json datasets. Optionally, you can pass in a key name in case the left data's attribute dictionary is nested, such as in GeoJson where the attributes are under a properties
object.
.geoJson(leftData, leftDataKey, rightData, rightDataKey)
Does the same thing as .left but navigates to the features
array and passes in properties
as the nested key name.
Database operations
Uses the tablespoon
module. Check out the wiki for the full documention. All tablespoon
methods are accessible under the tk.db
namespace, e.g.
tkdb;
Statistics
Uses the simple-statistics
module. All simple-statistics
methods are accessible under the tk.stats
namespace, e.g.
var mean = tkstats;
Helpers
.discernFormat(filepath)
Given a filepath
return its file extension. Used internally by .discernPaser
and .discernFileFormatter
.
E.g. tk.discernFormat('path/to/data.csv')
returns 'csv'
.discernParser(filepath, [delimiter])
Given a filepath
, optionally a delimiter, return a parser that can read that file as json. Used internally by .readData
and .readDataSync
.
E.g.
var csvParser = tk; var json = ;
.discernFileFormatter(filepath)
Returns a formatter that will format json data to file type specified by the extension in filepath
. Used internally by .writeData
and .writeDataSync
.
E.g.
var formatter = tk;var csv = ;
fs
Exposes the native File System module for convenience.
What's the name mean?
In news writing, TK
is used as a placeholder for facts or sections you don't have yet. For example:
Mrs. Williamson arrived at the office at TK EXACT TIME to speak with the board members.
Depending on whom you ask, it either stands for TO COME
if you like your acronyms phonetic or TO KNOW
if you don't mind the silent 'K'.
What's that have to do with this?
This library is a work in progress so it's largely TO COME
. You could also say you can use it TO KNOW
things since it's a collection of data utilities. Or you could say it's a (T)ool(K)it of toolkits: a TK TK.