Data Pipeline
CautionDesigns and implements ETL/ELT data pipelines using Python, SQL, and orchestration tools like Airflow, dbt, and Prefect for batch and streaming workflows.
Install
Claude Code
Copy the SKILL.md file to your project's .claude/skills/ directory About This Skill
Data Pipeline is a skill for designing and implementing robust data pipelines. It covers the full spectrum from simple Python ETL scripts to complex orchestrated workflows with Airflow, dbt, and Prefect. The skill emphasizes reliability patterns: idempotency, incremental loads, data quality checks, and proper error handling.
How It Works
- Source analysis — Understands your data sources (APIs, databases, files, streams) and their characteristics
- Architecture design — Selects between ETL, ELT, or streaming patterns based on volume and latency needs
- Pipeline generation — Produces code for extraction, transformation, and loading with proper dependency ordering
- Orchestration setup — Creates Airflow DAGs, dbt project structure, or Prefect flows with scheduling and retries
- Quality checks — Adds data validation, row count assertions, and freshness monitoring
Best For
- Building data warehouses with proper dimensional modeling
- Migrating legacy ETL jobs to modern orchestration tools
- Creating reliable data feeds for analytics and ML pipelines
- Implementing CDC (Change Data Capture) for real-time sync
Design Principles
All generated pipelines follow these principles: idempotent operations (safe to re-run), incremental by default (process only new/changed data), fail loudly with clear error messages, and produce lineage metadata for debugging.
Use Cases
- Design Airflow DAGs for multi-stage data processing
- Create dbt models with proper staging, intermediate, and mart layers
- Build Python ETL scripts with error handling and idempotency
- Implement incremental load patterns for large datasets
Pros & Cons
Pros
- + Covers ETL, ELT, and streaming patterns comprehensively
- + Emphasizes reliability with idempotency and incremental loads
- + Multi-tool support: Airflow, dbt, Prefect, plain Python
Cons
- - Cannot test pipelines against live data sources
- - Complex streaming architectures (Kafka/Flink) need additional expertise
Related Skills
SQL Optimizer
CautionAnalyzes SQL queries for performance issues, rewrites slow queries, recommends index strategies, and explains execution plans across PostgreSQL, MySQL, and SQLite.
Schema Designer
VerifiedDesigns relational and NoSQL database schemas with proper normalization, indexing strategies, migration scripts, and entity-relationship diagrams.
CSV Transformer
CautionTransforms, cleans, and converts data between CSV, JSON, Excel, and other tabular formats with column mapping, type casting, and validation.
Stay Updated on Agent Skills
Get weekly curated skills + safety alerts
每周精选 Skills + 安全预警