cleanifix
TypeScript icon, indicating that this package has built-in type declarations

1.1.0 • Public • Published

Cleanifix

A CLI tool that automatically cleans your data files through natural language commands. Like having a data analyst in your terminal.

🚀 Quick Start bash# Install npm install -g cleanifix

Basic usage

cleanifix @sales.csv "remove duplicates" cleanifix @users.csv "fill missing emails with 'unknown@example.com'" cleanifix @data.json "standardize all dates to ISO format"

Interactive mode

cleanifix interactive @messy_data.csv 🎯 Features Core Capabilities (MVP)

Missing Value Detection & Handling - Find and fix missing data automatically Data Standardization - Normalize dates, phone numbers, addresses, and more Deduplication - Remove duplicate rows with smart matching

Natural Language Interface bash# Just describe what you want cleanifix @customers.csv "find missing phone numbers and fill with 'N/A'" cleanifix @inventory.csv "standardize product names to title case" cleanifix @transactions.csv "remove duplicate entries keeping the most recent" Smart Suggestions bash$ cleanifix @data.csv "analyze"

📊 Data Quality Report: ✗ 156 missing values in 'email' column ✗ 89 inconsistent date formats ✗ 34 potential duplicates

Suggested fixes:

  1. Fill missing emails with domain-based patterns
  2. Standardize dates to YYYY-MM-DD
  3. Remove exact duplicates keeping first occurrence

Apply all fixes? [Y/n] 📦 Installation Prerequisites

Node.js 18+ Python 3.8+ 4GB RAM recommended for large files

Install from npm bashnpm install -g cleanifix Install from source bashgit clone https://github.com/rickyjs1955/cleanifix.git cd cleanifix ./scripts/setup-dev.sh 🛠️ Usage Examples Basic Cleaning bash# Find issues cleanifix @data.csv "show me data quality issues"

Fix missing values

cleanifix @sales.csv "fill missing prices with median"

Standardize formats

cleanifix @contacts.csv "standardize all phone numbers to international format"

Remove duplicates

cleanifix @emails.csv "remove duplicate emails keeping the latest entry" Batch Processing bash# Create a config file cat > cleaning-rules.yaml << EOF rules:

  • type: missing_values columns: [price, quantity] strategy: median
  • type: standardize column: phone format: E164
  • type: deduplicate keys: [email] keep: last EOF

Run batch cleaning

cleanifix batch @data.csv --rules cleaning-rules.yaml Interactive Mode bashcleanifix interactive @messy_data.csv

🧹 Cleanifix Interactive Mode

analyze my data fill missing ages with average by city
standardize all names to proper case save as cleaned_data.csv exit 🏗️ Architecture Cleanifix uses a hybrid architecture:

CLI Interface (Node.js) - Fast, responsive user interaction Processing Engine (Python) - Powerful data manipulation with pandas Communication - JSON-based message passing between components

🤝 Contributing We welcome contributions! See CONTRIBUTING.md for guidelines. Development Setup bash# Clone the repo git clone https://github.com/rickyjs1955/cleanifix.git cd cleanifix

Setup development environment

./scripts/setup-dev.sh

Run tests

npm test # CLI tests python -m pytest # Engine tests

Run in development mode

npm run dev 📋 Roadmap Phase 1 (Current) - MVP

Basic CLI interface Missing value handling Simple standardization Exact deduplication CSV support JSON support

Phase 2 - Enhanced Rules

Fuzzy deduplication Custom regex patterns Outlier detection Data type inference Excel support

Phase 3 - ML Integration

Smart imputation Anomaly detection Pattern learning Confidence scoring Auto-cleaning mode

📄 License MIT License - see LICENSE file for details 🙏 Acknowledgments Built with:

Commander.js - CLI framework Pandas - Data manipulation Chalk - Terminal styling

💬 Support

Documentation: docs.cleanifix.dev Issues: GitHub Issues Discussions: GitHub Discussions

Made with ❤️ by data people, for data people

Package Sidebar

Install

npm i cleanifix

Weekly Downloads

54

Version

1.1.0

License

MIT

Unpacked Size

213 kB

Total Files

63

Last publish

Collaborators

  • cleanifix