Data Pipeline
CautionDesigns and implements ETL/ELT data pipelines using Python, SQL, and orchestration tools like Airflow, dbt, and Prefect for batch and streaming workflows.
$ Copy the SKILL.md file to your project's .claude/skills/ directory About This Skill
Data Pipeline is a skill for designing and implementing robust data pipelines. It covers the full spectrum from simple Python ETL scripts to complex orchestrated workflows with Airflow, dbt, and Prefect. The skill emphasizes reliability patterns: idempotency, incremental loads, data quality checks, and proper error handling.
How It Works
- Source analysis — Understands your data sources (APIs, databases, files, streams) and their characteristics
- Architecture design — Selects between ETL, ELT, or streaming patterns based on volume and latency needs
- Pipeline generation — Produces code for extraction, transformation, and loading with proper dependency ordering
- Orchestration setup — Creates Airflow DAGs, dbt project structure, or Prefect flows with scheduling and retries
- Quality checks — Adds data validation, row count assertions, and freshness monitoring
Best For
- Building data warehouses with proper dimensional modeling
- Migrating legacy ETL jobs to modern orchestration tools
- Creating reliable data feeds for analytics and ML pipelines
- Implementing CDC (Change Data Capture) for real-time sync
Design Principles
All generated pipelines follow these principles: idempotent operations (safe to re-run), incremental by default (process only new/changed data), fail loudly with clear error messages, and produce lineage metadata for debugging.
Use Cases
- Design Airflow DAGs for multi-stage data processing
- Create dbt models with proper staging, intermediate, and mart layers
- Build Python ETL scripts with error handling and idempotency
- Implement incremental load patterns for large datasets
Pros & Cons
Pros
- +Covers ETL, ELT, and streaming patterns comprehensively
- +Emphasizes reliability with idempotency and incremental loads
- +Multi-tool support: Airflow, dbt, Prefect, plain Python
Cons
- -Cannot test pipelines against live data sources
- -Complex streaming architectures (Kafka/Flink) need additional expertise
Related Skills
SQL Optimizer
Analyzes SQL queries for performance issues, rewrites slow queries, recommends index strategies, and explains execution plans across PostgreSQL, MySQL, and SQLite.
Schema Designer
Designs relational and NoSQL database schemas with proper normalization, indexing strategies, migration scripts, and entity-relationship diagrams.
CSV Transformer
Transforms, cleans, and converts data between CSV, JSON, Excel, and other tabular formats with column mapping, type casting, and validation.
FAQ
What does Data Pipeline do?
What platforms support Data Pipeline?
What are the use cases for Data Pipeline?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.