Cleanifix
A CLI tool that automatically cleans your data files through natural language commands. Like having a data analyst in your terminal.
🚀 Quick Start bash# Install npm install -g cleanifix
cleanifix @sales.csv "remove duplicates" cleanifix @users.csv "fill missing emails with 'unknown@example.com'" cleanifix @data.json "standardize all dates to ISO format"
cleanifix interactive @messy_data.csv 🎯 Features Core Capabilities (MVP)
Missing Value Detection & Handling - Find and fix missing data automatically Data Standardization - Normalize dates, phone numbers, addresses, and more Deduplication - Remove duplicate rows with smart matching
Natural Language Interface bash# Just describe what you want cleanifix @customers.csv "find missing phone numbers and fill with 'N/A'" cleanifix @inventory.csv "standardize product names to title case" cleanifix @transactions.csv "remove duplicate entries keeping the most recent" Smart Suggestions bash$ cleanifix @data.csv "analyze"
📊 Data Quality Report: ✗ 156 missing values in 'email' column ✗ 89 inconsistent date formats ✗ 34 potential duplicates
Suggested fixes:
- Fill missing emails with domain-based patterns
- Standardize dates to YYYY-MM-DD
- Remove exact duplicates keeping first occurrence
Apply all fixes? [Y/n] 📦 Installation Prerequisites
Node.js 18+ Python 3.8+ 4GB RAM recommended for large files
Install from npm bashnpm install -g cleanifix Install from source bashgit clone https://github.com/rickyjs1955/cleanifix.git cd cleanifix ./scripts/setup-dev.sh 🛠️ Usage Examples Basic Cleaning bash# Find issues cleanifix @data.csv "show me data quality issues"
cleanifix @sales.csv "fill missing prices with median"
cleanifix @contacts.csv "standardize all phone numbers to international format"
cleanifix @emails.csv "remove duplicate emails keeping the latest entry" Batch Processing bash# Create a config file cat > cleaning-rules.yaml << EOF rules:
- type: missing_values columns: [price, quantity] strategy: median
- type: standardize column: phone format: E164
- type: deduplicate keys: [email] keep: last EOF
cleanifix batch @data.csv --rules cleaning-rules.yaml Interactive Mode bashcleanifix interactive @messy_data.csv
🧹 Cleanifix Interactive Mode
analyze my data fill missing ages with average by city
standardize all names to proper case save as cleaned_data.csv exit 🏗️ Architecture Cleanifix uses a hybrid architecture:
CLI Interface (Node.js) - Fast, responsive user interaction Processing Engine (Python) - Powerful data manipulation with pandas Communication - JSON-based message passing between components
🤝 Contributing We welcome contributions! See CONTRIBUTING.md for guidelines. Development Setup bash# Clone the repo git clone https://github.com/rickyjs1955/cleanifix.git cd cleanifix
./scripts/setup-dev.sh
npm test # CLI tests python -m pytest # Engine tests
npm run dev 📋 Roadmap Phase 1 (Current) - MVP
Basic CLI interface Missing value handling Simple standardization Exact deduplication CSV support JSON support
Phase 2 - Enhanced Rules
Fuzzy deduplication Custom regex patterns Outlier detection Data type inference Excel support
Phase 3 - ML Integration
Smart imputation Anomaly detection Pattern learning Confidence scoring Auto-cleaning mode
📄 License MIT License - see LICENSE file for details 🙏 Acknowledgments Built with:
Commander.js - CLI framework Pandas - Data manipulation Chalk - Terminal styling
💬 Support
Documentation: docs.cleanifix.dev Issues: GitHub Issues Discussions: GitHub Discussions
Made with ❤️ by data people, for data people