Data Pipeline

Name: Data Pipeline
Author: Data Skills Lab

Caution

Designs and implements ETL/ELT data pipelines using Python, SQL, and orchestration tools like Airflow, dbt, and Prefect for batch and streaming workflows.

By Data Skills Lab 3,740 stars v1.9.0 Updated 2026-03-10

$ Copy the SKILL.md file to your project's .claude/skills/ directory

$ Copy the skill prompt to .cursor/rules/ as a .mdc file

$ Add the skill prompt to your codex configuration

$ Add the skill prompt to your windsurf rules

About This Skill

Data Pipeline is a skill for designing and implementing robust data pipelines. It covers the full spectrum from simple Python ETL scripts to complex orchestrated workflows with Airflow, dbt, and Prefect. The skill emphasizes reliability patterns: idempotency, incremental loads, data quality checks, and proper error handling.

How It Works

Source analysis — Understands your data sources (APIs, databases, files, streams) and their characteristics
Architecture design — Selects between ETL, ELT, or streaming patterns based on volume and latency needs
Pipeline generation — Produces code for extraction, transformation, and loading with proper dependency ordering
Orchestration setup — Creates Airflow DAGs, dbt project structure, or Prefect flows with scheduling and retries
Quality checks — Adds data validation, row count assertions, and freshness monitoring

Best For

Building data warehouses with proper dimensional modeling
Migrating legacy ETL jobs to modern orchestration tools
Creating reliable data feeds for analytics and ML pipelines
Implementing CDC (Change Data Capture) for real-time sync

Design Principles

All generated pipelines follow these principles: idempotent operations (safe to re-run), incremental by default (process only new/changed data), fail loudly with clear error messages, and produce lineage metadata for debugging.

Use Cases

Design Airflow DAGs for multi-stage data processing
Create dbt models with proper staging, intermediate, and mart layers
Build Python ETL scripts with error handling and idempotency
Implement incremental load patterns for large datasets

Pros & Cons

Pros

+Covers ETL, ELT, and streaming patterns comprehensively
+Emphasizes reliability with idempotency and incremental loads
+Multi-tool support: Airflow, dbt, Prefect, plain Python

Cons

-Cannot test pipelines against live data sources
-Complex streaming architectures (Kafka/Flink) need additional expertise

Related Skills

SQL Optimizer

Analyzes SQL queries for performance issues, rewrites slow queries, recommends index strategies, and explains execution plans across PostgreSQL, MySQL, and SQLite.

Schema Designer

Designs relational and NoSQL database schemas with proper normalization, indexing strategies, migration scripts, and entity-relationship diagrams.

CSV Transformer

Transforms, cleans, and converts data between CSV, JSON, Excel, and other tabular formats with column mapping, type casting, and validation.

FAQ

What does Data Pipeline do?

Designs and implements ETL/ELT data pipelines using Python, SQL, and orchestration tools like Airflow, dbt, and Prefect for batch and streaming workflows.

What platforms support Data Pipeline?

Data Pipeline is available on Claude Code, Cursor, OpenAI Codex CLI, Windsurf.

What are the use cases for Data Pipeline?

Design Airflow DAGs for multi-stage data processing. Create dbt models with proper staging, intermediate, and mart layers. Build Python ETL scripts with error handling and idempotency.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

AI Humanizer

Make AI text undetectable

AI Detector

Free, unlimited

PDF Tools

Merge, split, compress

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.

Open Free Tools Try AI Detector