Skip to content

Data Pipeline

Caution

Designs and implements ETL/ELT data pipelines using Python, SQL, and orchestration tools like Airflow, dbt, and Prefect for batch and streaming workflows.

By Data Skills Lab 3,740 v1.9.0 Updated 2026-03-10

Install

Claude Code

Copy the SKILL.md file to your project's .claude/skills/ directory

About This Skill

Data Pipeline is a skill for designing and implementing robust data pipelines. It covers the full spectrum from simple Python ETL scripts to complex orchestrated workflows with Airflow, dbt, and Prefect. The skill emphasizes reliability patterns: idempotency, incremental loads, data quality checks, and proper error handling.

How It Works

  1. Source analysis — Understands your data sources (APIs, databases, files, streams) and their characteristics
  2. Architecture design — Selects between ETL, ELT, or streaming patterns based on volume and latency needs
  3. Pipeline generation — Produces code for extraction, transformation, and loading with proper dependency ordering
  4. Orchestration setup — Creates Airflow DAGs, dbt project structure, or Prefect flows with scheduling and retries
  5. Quality checks — Adds data validation, row count assertions, and freshness monitoring

Best For

  • Building data warehouses with proper dimensional modeling
  • Migrating legacy ETL jobs to modern orchestration tools
  • Creating reliable data feeds for analytics and ML pipelines
  • Implementing CDC (Change Data Capture) for real-time sync

Design Principles

All generated pipelines follow these principles: idempotent operations (safe to re-run), incremental by default (process only new/changed data), fail loudly with clear error messages, and produce lineage metadata for debugging.

Use Cases

  • Design Airflow DAGs for multi-stage data processing
  • Create dbt models with proper staging, intermediate, and mart layers
  • Build Python ETL scripts with error handling and idempotency
  • Implement incremental load patterns for large datasets

Pros & Cons

Pros

  • + Covers ETL, ELT, and streaming patterns comprehensively
  • + Emphasizes reliability with idempotency and incremental loads
  • + Multi-tool support: Airflow, dbt, Prefect, plain Python

Cons

  • - Cannot test pipelines against live data sources
  • - Complex streaming architectures (Kafka/Flink) need additional expertise

Related Skills

Stay Updated on Agent Skills

Get weekly curated skills + safety alerts

每周精选 Skills + 安全预警