csv-data-cleaner` is a powerful CLI tool for cleaning and preprocessing CSV data files. It supports various data cleaning tasks including handling missing values, standardizing text, normalizing numeric data, and detecting outliers.
To install csv-data-cleaner
, you need to have Node.js installed. You can then install the package globally using npm:
npm install -g csv-data-cleaner
To use csv-data-cleaner
, run the following command in your terminal:
csv-data-cleaner -i input_file.csv -o output_file.csv -c column_names [-f functionality] [options]
-
-i, --input <path>
: Path to the input CSV file (required). -
-o, --output <path>
: Path to the output CSV file (required). -
-c, --columns <columns>
: Comma-separated list of column names to clean (required). -
-f, --functions <funcs>
: Comma-separated list of cleaning functions to apply. Options include:removeMissingValues
removeDuplicates
standardizeText
normalizeNumericData
detectOutliers
-
-h, --help
: Show help message.
Here's an example of how to use the tool to clean a CSV file:
csv-data-cleaner -i uncleaned_data.csv -o cleaned_data.csv -c name,age,email,numericColumn \
-f removeMissingValues,removeDuplicates,standardizeText,normalizeNumericData,detectOutliers
This command reads uncleaned_data.csv
, applies the selected cleaning functions, and writes the cleaned data to cleaned_data.csv
.
Removes rows with missing values in the specified columns.
Removes duplicate rows based on the specified columns.
Converts text to lowercase and trims whitespace for specified columns.
Normalizes numeric data to a range between 0 and 1.
Removes outliers from numeric data using the Z-score method.
To contribute to the development of csv-data-cleaner
, clone the repository and install dependencies:
git clone https://github.com/shashwatmishraog/csv-data-cleaner
cd csv-data-cleaner
npm install