ETL-JS CLI
Extract, Transform, and Load sharable and repeatable from command line.
mkdir my-etl && cd my-etl
# Initialize
etl-js init
# Use local executor
sed -i "s/executor: remote1/executor: local1/" settings.yml
# Create a simple template downloading Orion nebula from NASA site
echo -e "etlSets:\n default:\n - step1\nstep1:\n files:\n /tmp/orion-nebula.jpg:\n source: https://www.nasa.gov/sites/default/files/thumbnails/image/orion-nebula-xlarge_web.jpg" > download_orion_nebula.yml
# Run template
etl-js run download_orion_nebula.yml
The image is downloaded locally to /tmp/orion-nebula.jpg
.
Table of Contents
Installation
npm install --global @lpezet/etl-js-cli
Features
- Template-based process using YML to express steps and activities as part of ETL
- Built-in modules to leverage already installed software (e.g. mysql, mysqlimport, etc.)
- Dynamic behavior through the use of tags in activities.
Concept
This command line tool lets you tap into the power of ETL-JS. The idea is to be able to share and easily repeat activities, and leverage existing tools as much as possible.
Steps and activities are basically specified in YML like so:
etlSets:
default:
- activity1
- activity2
somethingelse:
- activity3
activity1:
commands:
my_command:
command: echo "Hello..."
activity2:
commands:
something:
command: echo "World!"
activity3:
commands:
bye_bye:
command: echo "Bye bye!"
For more details, have a look at ETL-JS.
Getting started
You can get starting right away and figure things out along the way. If anything is unclear or confusing, it is best to take a look at ETL-JS. The sample template only uses the Commands Mod.
Once etl-js-cli
has been installed, run the following:
mkdir etl-js-test
cd etl-js-test
etl-js init
At this point, 2 new files have been created in your current directory:
- etl.yml: a sample ETL template to show some of ETL JS features
-
settings.yml: the settings
etl-js-cli
will be loading to run any ETL template.
You can ope etl.yml
in your favorite editor and see its content.
It should have something similar to this:
etlSets:
default: ["hello", "world"]
envTest: ["envTest"]
varTest: ["varTest"]
hello:
commands:
say_hello:
command: echo "Hello..."
world:
commands:
say_world:
command: echo "...world!"
envTest:
commands:
with_env:
command: 'echo "The value for env variable ''TESTENV'': {{env.TESTENV}}"'
varTest:
commands:
001_create_var:
command: printf "hello"
var: TESTVAR
002_use_var:
command: 'echo "The value for var TESTVAR: {{vars.TESTVAR}}"'
It basically provides 3 different ETL Sets:
-
default: if no special argument is passed to
etl-js-cli
, this is what it will run by default. - envTest: this ETL process simply demonstrates how environment variables can be used in the template.
- varTest: this ETL process demonstrates how variables generated by other steps can be using within the template.
First run
To execute your first ETL template simply run the following:
etl-js run etl.yml
This will effectively execute the default
ETL set. Be default, the complete result of all activities execute by this ETL Set is outputed, along with other log messages:
{ exit: false,
activities:
[ { activity: 'hello',
steps:
{ commands:
{ exit: false,
skip: false,
results:
[ { command: 'say_hello',
results:
{ exit: false,
pass: true,
skip: false,
_stdout: 'Hello...\n',
_stderr: '',
result: 'Hello...\n' },
exit: false,
skip: false } ] } },
exit: false,
skip: false
},
{ activity: 'world',
steps:
{ commands:
{ exit: false,
skip: false,
results:
[ { command: 'say_world',
results:
{ exit: false,
pass: true,
skip: false,
_stdout: '...world!\n',
_stderr: '',
result: '...world!\n' },
exit: false,
skip: false } ] } },
exit: false,
skip: false } ] }
This ETL Set consists of two simple commands echo-ing "Hello...world".
Environment variables
To see how environment variables work, first run the following:
etl-js run etl.yml envTest
By specifying envTest
, we are asking etl-js-cli
to only run the ETL Set envTest
.
The final output should look like the following:
{ exit: false,
activities:
[ { activity: 'envTest',
steps:
{ commands:
{ exit: false,
skip: false,
results:
[ { command: 'with_env',
results:
{ exit: false,
pass: true,
skip: false,
_stdout: 'The value for env variable \'TESTENV\': \n',
_stderr: '',
result: 'The value for env variable \'TESTENV\': \n' },
exit: false,
skip: false } ] } },
exit: false,
skip: false } ] }
You should notice here that the standard output did not resolve the value of the environment variable TESTENV.
This is because when running the previous command, we did not have TESTENV environment variable set.
We can set it using the env
command in linux like so:
env TESTENV="Hello world!" etl-js run etl.yml envTest
The output should then look like the following:
{ exit: false,
activities:
[ { activity: 'envTest',
steps:
{ commands:
{ exit: false,
skip: false,
results:
[ { command: 'with_env',
results:
{ exit: false,
pass: true,
skip: false,
_stdout: 'The value for env variable \'TESTENV\': Hello world!\n',
_stderr: '',
result: 'The value for env variable \'TESTENV\': Hello world!\n' },
exit: false,
skip: false } ] } },
exit: false,
skip: false } ] }
Variables
The varTest
ELT Set in etl.yml
is as followed:
varTest:
commands:
001_create_var:
command: printf "hello"
var: TESTVAR
002_use_var:
command: 'echo "The value for var TESTVAR: {{vars.TESTVAR}}"'
It basically consists to 2 commands:
-
001_create_var: This command will echo a piece of text and save (its output) into a variable named
TESTVAR
. -
002_use_var: This command will echo another piece of text which includes a tag for the variable
TESTVAR
.
You can run this ETL set and see its output:
etl-js run etl.yml varTest
The final output should look like this:
{ exit: false,
activities:
[ { activity: 'varTest',
steps:
{ commands:
{ exit: false,
skip: false,
results:
[ { command: '001_create_var',
results:
{ exit: false,
pass: true,
skip: false,
_stdout: 'hello',
_stderr: '',
result: 'hello' },
exit: false,
skip: false },
{ command: '002_use_var',
results:
{ exit: false,
pass: true,
skip: false,
_stdout: 'The value for var TESTVAR: hello\n',
_stderr: '',
result: 'The value for var TESTVAR: hello\n' },
exit: false,
skip: false } ] } },
exit: false,
skip: false } ] }
You can see how the variable has been resoled using the output of the first command.
NB: Something worth noticing here is that the echo
command always adds a newline at the end of a text by default. Simply calling echo hello
doesn't display just hello
but hello\n
. Here we are using printf
instead which does not behave like echo
and does not generate this newline.
Examples/Tutorials
Examples and tutorials can be found here.
License
Publishing
To publish next version of etl-js-cli
, run the following:
npm version patch
git push --tags origin master
npm run dist
npm publish dist/ --access public