Skip to content

Data Scraper

Verified

Web page data collection and structured text extraction

355 downloads
$ Add to .claude/skills/

About This Skill

# data-scraper

Web Data Scraper — Extract structured data from web pages using curl + parsing. Lightweight, no browser required. Supports HTML-to-text, table extraction, price monitoring, and batch scraping.

When to Use

  • Extract text content from web pages (articles, blogs, docs)
  • Scrape product prices, reviews, or listings
  • Monitor pages for changes (price drops, new content)
  • Batch-collect data from multiple URLs
  • Convert HTML tables to structured formats (JSON/CSV)

Quick Start

```bash # Extract readable text from URL data-scraper fetch "https://example.com/article"

# Extract specific elements data-scraper extract "https://example.com" --selector "h2, .price"

# Monitor for changes data-scraper watch "https://example.com/product" --interval 3600 ```

Extraction Modes

Text Mode (default) Fetches page and extracts readable content, stripping HTML tags, scripts, and styles. Similar to reader mode.

```bash data-scraper fetch URL # Output: clean markdown text ```

Selector Mode Target specific CSS selectors for precise extraction.

```bash data-scraper extract URL --selector ".product-title, .price, .rating" # Output: matched elements as structured data ```

Table Mode Extract HTML tables into structured formats.

```bash data-scraper table URL --index 0 # Output: JSON array of row objects (header → value mapping) ```

Link Mode Extract all links from a page with optional filtering.

```bash data-scraper links URL --filter "*.pdf" # Output: filtered list of absolute URLs ```

Batch Scraping

```bash # Scrape multiple URLs data-scraper batch urls.txt --output results/

# With rate limiting data-scraper batch urls.txt --delay 2000 --output results/ ```

`urls.txt` format: ``` https://site1.com/page https://site2.com/page https://site3.com/page ```

Change Monitoring

```bash # Watch for changes, alert on diff data-scraper watch URL --selector ".price" --interval 3600

# Compare with previous snapshot data-scraper diff URL ```

Stores snapshots in `data-scraper/snapshots/` with timestamps. Alerts via notification-hub when changes detected.

Output Formats

| Format | Flag | Use Case | |--------|------|----------| | Text | `--format text` | Reading, summarization | | JSON | `--format json` | Data processing | | CSV | `--format csv` | Spreadsheets | | Markdown | `--format md` | Documentation |

Headers & Auth

```bash # Custom headers data-scraper fetch URL --header "Authorization: Bearer TOKEN"

# Cookie-based auth data-scraper fetch URL --cookie "session=abc123"

# User-Agent override data-scraper fetch URL --ua "Mozilla/5.0..." ```

Rate Limiting & Ethics

  • Default: 1 request per second per domain
  • Respects `robots.txt` when `--polite` flag is set
  • Configurable delay between requests
  • Stops on 429 (Too Many Requests) and backs off

Error Handling

| Error | Behavior | |-------|----------| | 404 | Log and skip | | 403/401 | Warn about auth requirement | | 429 | Exponential backoff (max 3 retries) | | Timeout | Retry once with longer timeout | | SSL error | Warn, option to proceed with `--insecure` |

Integration

  • web-claude: Use as fallback when web_fetch isn't enough
  • competitor-watch: Feed scraped data into competitor analysis
  • seo-audit: Scrape competitor pages for SEO comparison
  • performance-tracker: Collect social metrics from public profiles

Use Cases

  • Scrape structured data from websites and export to JSON, CSV, or database formats
  • Extract product listings, pricing data, or directory information from web pages
  • Set up recurring scraping jobs for monitoring price changes or content updates
  • Handle pagination, dynamic loading, and login-protected content during scraping
  • Clean and validate scraped data before loading into analytics pipelines

Pros & Cons

Pros

  • +Handles common scraping challenges like pagination and dynamic content loading
  • +Multiple export formats support different downstream data processing needs
  • +Configurable extraction rules adapt to different website structures

Cons

  • -Web scraping may violate target website terms of service
  • -Only available on claude-code and openclaw platforms
  • -Scrapers break when target websites change their HTML structure

FAQ

What does Data Scraper do?
Web page data collection and structured text extraction
What platforms support Data Scraper?
Data Scraper is available on Claude Code, OpenClaw.
What are the use cases for Data Scraper?
Scrape structured data from websites and export to JSON, CSV, or database formats. Extract product listings, pricing data, or directory information from web pages. Set up recurring scraping jobs for monitoring price changes or content updates.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.