Data Scraper

Verified

Web page data collection and structured text extraction

355 downloads

$ Add to .claude/skills/

$ openclaw install

About This Skill

# data-scraper

Web Data Scraper — Extract structured data from web pages using curl + parsing. Lightweight, no browser required. Supports HTML-to-text, table extraction, price monitoring, and batch scraping.

When to Use

Extract text content from web pages (articles, blogs, docs)
Scrape product prices, reviews, or listings
Monitor pages for changes (price drops, new content)
Batch-collect data from multiple URLs
Convert HTML tables to structured formats (JSON/CSV)

Quick Start

```bash # Extract readable text from URL data-scraper fetch "https://example.com/article"

# Extract specific elements data-scraper extract "https://example.com" --selector "h2, .price"

# Monitor for changes data-scraper watch "https://example.com/product" --interval 3600 ```

Extraction Modes

Text Mode (default) Fetches page and extracts readable content, stripping HTML tags, scripts, and styles. Similar to reader mode.

```bash data-scraper fetch URL # Output: clean markdown text ```

Selector Mode Target specific CSS selectors for precise extraction.

```bash data-scraper extract URL --selector ".product-title, .price, .rating" # Output: matched elements as structured data ```

Table Mode Extract HTML tables into structured formats.

```bash data-scraper table URL --index 0 # Output: JSON array of row objects (header → value mapping) ```

Link Mode Extract all links from a page with optional filtering.

```bash data-scraper links URL --filter "*.pdf" # Output: filtered list of absolute URLs ```

Batch Scraping

```bash # Scrape multiple URLs data-scraper batch urls.txt --output results/

# With rate limiting data-scraper batch urls.txt --delay 2000 --output results/ ```

`urls.txt` format: ``` https://site1.com/page https://site2.com/page https://site3.com/page ```

Change Monitoring

```bash # Watch for changes, alert on diff data-scraper watch URL --selector ".price" --interval 3600

# Compare with previous snapshot data-scraper diff URL ```

Stores snapshots in `data-scraper/snapshots/` with timestamps. Alerts via notification-hub when changes detected.

Output Formats

| Format | Flag | Use Case | |--------|------|----------| | Text | `--format text` | Reading, summarization | | JSON | `--format json` | Data processing | | CSV | `--format csv` | Spreadsheets | | Markdown | `--format md` | Documentation |

Headers & Auth

```bash # Custom headers data-scraper fetch URL --header "Authorization: Bearer TOKEN"

# Cookie-based auth data-scraper fetch URL --cookie "session=abc123"

# User-Agent override data-scraper fetch URL --ua "Mozilla/5.0..." ```

Rate Limiting & Ethics

Default: 1 request per second per domain
Respects `robots.txt` when `--polite` flag is set
Configurable delay between requests
Stops on 429 (Too Many Requests) and backs off

Error Handling

| Error | Behavior | |-------|----------| | 404 | Log and skip | | 403/401 | Warn about auth requirement | | 429 | Exponential backoff (max 3 retries) | | Timeout | Retry once with longer timeout | | SSL error | Warn, option to proceed with `--insecure` |

Integration

web-claude: Use as fallback when web_fetch isn't enough
competitor-watch: Feed scraped data into competitor analysis
seo-audit: Scrape competitor pages for SEO comparison
performance-tracker: Collect social metrics from public profiles

Use Cases

Scrape structured data from websites and export to JSON, CSV, or database formats
Extract product listings, pricing data, or directory information from web pages
Set up recurring scraping jobs for monitoring price changes or content updates
Handle pagination, dynamic loading, and login-protected content during scraping
Clean and validate scraped data before loading into analytics pipelines

Pros & Cons

Pros

+Handles common scraping challenges like pagination and dynamic content loading
+Multiple export formats support different downstream data processing needs
+Configurable extraction rules adapt to different website structures

Cons

-Web scraping may violate target website terms of service
-Only available on claude-code and openclaw platforms
-Scrapers break when target websites change their HTML structure

FAQ

What does Data Scraper do?

Web page data collection and structured text extraction

What platforms support Data Scraper?

Data Scraper is available on Claude Code, OpenClaw.

What are the use cases for Data Scraper?

Scrape structured data from websites and export to JSON, CSV, or database formats. Extract product listings, pricing data, or directory information from web pages. Set up recurring scraping jobs for monitoring price changes or content updates.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

AI Humanizer

Make AI text undetectable

AI Detector

Free, unlimited

PDF Tools

Merge, split, compress

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.

Open Free Tools Try AI Detector