Data Cleaner

Name: Data Cleaner
Author: Community

Flagged

Profiles, cleans, and standardizes messy datasets by detecting and fixing inconsistencies, outliers, duplicates, and formatting issues.

By Community 6,400 stars v1.1.0 Updated 2026-03-10

$ Copy the SKILL.md file to your project's .claude/skills/ directory

$ Copy the skill prompt to .cursor/rules/ as a .mdc file

$ Add to AGENTS.md

About This Skill

Data Cleaner automates the most tedious part of any data project — getting raw data into a usable state. It goes beyond find-and-replace to understand the semantics of your data and apply the right cleaning strategy for each issue.

Data Profiling

Before cleaning, produces a profile report:
Column-level statistics (type, cardinality, null rate, min/max)
Distribution shapes for numeric columns
Pattern frequency for text columns (email, phone, date formats present)
Correlation matrix highlighting redundant features

Cleaning Operations

Type Standardization - Date parsing across 30+ formats → ISO 8601 - Currency strings ("$1,234.56", "€1.234,56") → numeric - Boolean variants ("Yes/No", "1/0", "TRUE/FALSE") → consistent - Phone numbers → E.164 format (+1XXXXXXXXXX)

String Normalization - Case standardization (Title Case for names, uppercase for codes) - Whitespace trimming and internal whitespace collapse - Unicode normalization (NFC) and encoding repair (mojibake detection) - Consistent abbreviation expansion ("St." → "Street", "Dr" → "Doctor")

Deduplication - Exact duplicate removal - Fuzzy deduplication using Jaro-Winkler similarity for names and addresses - Blocking strategies for large datasets to make fuzzy matching tractable

Missing Value Handling - Mean/median/mode imputation for numeric columns - Forward-fill or backward-fill for time series - Indicator variable creation for informative missingness - Row removal when missing rate exceeds configurable threshold

Audit Trail

Every transformation logged to `cleaning_log.json` with: column affected, operation, rows changed, and before/after samples.

Use Cases

Standardizing address and phone number formats across CRM exports
Deduplicating customer records with fuzzy name matching
Fixing encoding issues in international datasets
Imputing missing values using appropriate statistical strategies

Pros & Cons

Pros

+Never overwrites original data — always writes to new output file
+Comprehensive data profiling before any changes are made
+Fuzzy deduplication for name and address matching
+Full audit trail in cleaning_log.json for data governance

Cons

-Fuzzy matching on very large datasets (1M+ rows) requires chunking and may be slow
-Domain-specific cleaning rules (e.g., medical codes) may need custom extensions

Related AI Tools

Claude Code

Paid

Anthropic's agentic CLI for autonomous terminal-native coding workflows

Terminal-native autonomous coding agent
Full file system and shell access for multi-step tasks
Deep codebase understanding via repository indexing

View Pricing →

Cursor

Freemium

AI-native code editor with deep multi-model integration and agentic coding

AI-native Cmd+K inline editing and generation
Composer Agent for autonomous multi-file changes
Full codebase indexing and context awareness

Get Started →

GitHub Copilot

Freemium

AI pair programmer that suggests code in real time across your IDE

Real-time code completions across 30+ languages
Copilot Chat for natural language code Q&A
Pull request description and summary generation

Get Started →

Related Skills

Pandas Assistant

Optimizes Python pandas workflows by writing efficient DataFrame operations, fixing common performance pitfalls, and converting between pandas, polars, and SQL.

Excel Analyzer

Analyzes Excel and CSV files to produce statistical summaries, pivot tables, charts, and actionable insights without leaving your AI workflow.

FAQ

What does Data Cleaner do?

Profiles, cleans, and standardizes messy datasets by detecting and fixing inconsistencies, outliers, duplicates, and formatting issues.

What platforms support Data Cleaner?

Data Cleaner is available on Claude Code, Cursor, OpenAI Codex CLI.

What are the use cases for Data Cleaner?

Standardizing address and phone number formats across CRM exports. Deduplicating customer records with fuzzy name matching. Fixing encoding issues in international datasets.

What tools work with Data Cleaner?

Data Cleaner works well with Claude Code, Cursor, GitHub Copilot.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

AI Humanizer

Make AI text undetectable

AI Detector

Free, unlimited

PDF Tools

Merge, split, compress

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.

Open Free Tools Try Claude Code