Metadata Editor Import(r-factory)
R integration module of the Metadata Editor application. This module contains R scripts, Node based R utility methods and test cases. This module passing data from NodeJS to R by using PanApps customized version of r-script. This module includes various data import export features as well as data analytic features such as resequence, spread metadata etc. Here are the list features..
- import dataset from file formats
SPSS / STATA / CSV
- export dataset to different file formats
- destring straing variables
- resequence variables
- spread metadata
- export to dictionary
- update variable status
- calculate variable statistics
Prerequisites
Install Node and R if not installed. Set environment variable for windows.
- Node 10.15.1
- R version 3.3.3
Check whether R packages are installed and the version. If not please install using the command install.packages("package_name")
R packages
- jsonlite (version: 1.3)
- haven (version: 1.1.0)
- plyr (version: 1.8.4)
- stringr (version: 1.2.0)
- labelled (version 1.0.0)
- readr (version 1.1.1)
Installation
Install the dependencies and devDependencies.
npm install
Build the application
npm run build
Test the application
npm run test
Publish the application to npm
npm publish --access public
Running the tests
Unit test are written for each features. You can copy input files to test-data/input
directory. Please see the commands to run unit test below.
Note:- Please start the editor before run the tests. Editor start the OpenCPU API server and it will be used in the unit test.
npm run test
unit test to check the dataset import/export functionalities. Keep only dataset files to be tested in the test-data/input/dataset
folder, remove other files.
known issues - some datasets may fail the unit tests due to labelled integer validation while exporting to STATA dataset format(eg: cs1_pupil.dta)
flow of test execution :-
- import datasets from
test-data/input/datasets
directory - export the imported files to
test-data/output/datasets
- import the exported datasets
npm run test:resequence
unit test to import dataset and perform resequence on the imported datasets. Drop the dataset files to be tested in the test-data/input/dataset
folder and run command
flow of test execution :-
- import datasets from
test-data/input/datasets
directory - perform resequence and write updated varable json file to
test-data/output/json
directory
npm run test:destring
unit test to check destring functionality in the imported file. Since we have to mention the variables to be destringed, the test is limited for a particular dataset "ghs_2015_person_v1.1_20160608.dta". Keep this file in the input folder and remove others while run the test.
flow of test execution :-
- import datasets from
test-data/input/datasets
directory - perform destring to the selected variables and write the updated csv file to
test_data/output/csv/
directory
npm run test:dictionary
npm run test:dictionary:stata
npm run test:dictionary:spss
unit test for export to dictionary format.
flow of test execution :-
- import datasets from
test-data/input/datasets
directory - export dataset to
test_data/data-dictionary/
directory
npm run test:validateKey
unit test to check the unique key constraint for the given key variable of a dataset.
Steps :-
- Copy the data file to
test-data/input/datasets
directory. - Set the data
datasetname
andkeyVariables
indist/test/validation.unit.test.js
constructor method - run the command
flow of test execution :-
- import dataset from
test-data/input/datasets
directory - validate the key variables
Contributors
- Navin VI (navin.v.i@panapps.co)
- Anoop Xaviour (anoopx@panapps.co)
- Libin Thomas (libint@panapps.co)
License
MIT