timothy: Node.js library for writing Hadoop MapReduce jobs in JS
Timothy's primary goal is to make Hadoop's Yellow Elephant rich and famous.
Installation
npm install timothy
Basic Example
// require timothy // basic configuration for the job: hadoop conf, input, output, name, etc // map function: one (line) or two (key, value) arguments // reduce function: two arguments (key, value) // run function, creates the job, uploads it and blocks until // the execution has finished ;
Testing in the local machine
// runLocal can be used instead of run to simulate the job execution // from the command line ;
Initialising a job
// global variables and functions will be available in the map and reduce functions ;
Passing Environment Variables
{ ; }1040;
Using node libraries
// Libraries can be added using the same syntax as // in a NPM package.json file ;
Status and counters
Status and counters for the job can be updated using the this.updateStatus and this.updateCounter functions.
Caveats
map, reduce and setup functions are used as templates for the job functions. Trying to use values from these function definition closures will fail when running the actual job. Use the 'cmdenv' configuration to pass values to the job instead.
At the moment, the setup function does not handle blocking asynchronous operations. If one of these operations is invoked, the script will continue executing the map/reduce function before the asynchronous callback is executed.
About
Forward Internet Group (c) 2012. Available under the LGPL V3 license.
forward-timothy@googlegroups.com, abhinay.mehta@forward.co.uk, antonio.garrote@forward.co.uk