Skip to content

Playwright Scraper Skill

Verified

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.

8,769 downloads
$ Add to .claude/skills/

About This Skill

# Playwright Scraper Skill

A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.

---

🎯 Use Case Matrix

| Target Website | Anti-Bot Level | Recommended Method | Script | |---------------|----------------|-------------------|--------| | Regular Sites | Low | web_fetch tool | N/A (built-in) | | Dynamic Sites | Medium | Playwright Simple | `scripts/playwright-simple.js` | | Cloudflare Protected | High | Playwright Stealth ⭐ | `scripts/playwright-stealth.js` | | YouTube | Special | deep-scraper | Install separately | | Reddit | Special | reddit-scraper | Install separately |

---

📦 Installation

```bash cd playwright-scraper-skill npm install npx playwright install chromium ```

---

🚀 Quick Start

1️⃣ Simple Sites (No Anti-Bot)

Use OpenClaw's built-in `web_fetch` tool:

```bash # Invoke directly in OpenClaw Hey, fetch me the content from https://example.com ```

---

2️⃣ Dynamic Sites (Requires JavaScript)

Use Playwright Simple:

```bash node scripts/playwright-simple.js "https://example.com" ```

Example output: ```json { "url": "https://example.com", "title": "Example Domain", "content": "...", "elapsedSeconds": "3.45" } ```

---

3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use Playwright Stealth:

```bash node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot" ```

  • Features:
  • Hide automation markers (`navigator.webdriver = false`)
  • Realistic User-Agent (iPhone, Android)
  • Random delays to mimic human behavior
  • Screenshot and HTML saving support

---

4️⃣ YouTube Video Transcripts

Use deep-scraper (install separately):

```bash # Install deep-scraper skill npx clawhub install deep-scraper

# Use it cd skills/deep-scraper node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID" ```

---

📖 Script Descriptions

`scripts/playwright-simple.js` - **Use Case:** Regular dynamic websites - **Speed:** Fast (3-5 seconds) - **Anti-Bot:** None - **Output:** JSON (title, content, URL)

`scripts/playwright-stealth.js` ⭐ - **Use Case:** Sites with Cloudflare or anti-bot protection - **Speed:** Medium (5-20 seconds) - **Anti-Bot:** Medium-High (hides automation, realistic UA) - **Output:** JSON + Screenshot + HTML file - **Verified:** 100% success on Discuss.com.hk

---

🎓 Best Practices

1. Try web_fetch First If the site doesn't have dynamic loading, use OpenClaw's `web_fetch` tool—it's fastest.

2. Need JavaScript? Use Playwright Simple If you need to wait for JavaScript rendering, use `playwright-simple.js`.

3. Getting Blocked? Use Stealth If you encounter 403 or Cloudflare challenges, use `playwright-stealth.js`.

4. Special Sites Need Specialized Skills - YouTube → deep-scraper - Reddit → reddit-scraper - Twitter → bird skill

---

🔧 Customization

All scripts support environment variables:

```bash # Set screenshot path SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds) WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser) HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL ```

---

📊 Performance Comparison

| Method | Speed | Anti-Bot | Success Rate (Discuss.com.hk) | |--------|-------|----------|-------------------------------| | web_fetch | ⚡ Fastest | ❌ None | 0% | | Playwright Simple | 🚀 Fast | ⚠️ Low | 20% | | Playwright Stealth | ⏱️ Medium | ✅ Medium | 100% ✅ | | Puppeteer Stealth | ⏱️ Medium | ✅ Medium-High | ~80% | | Crawlee (deep-scraper) | 🐢 Slow | ❌ Detected | 0% | | Chaser (Rust) | ⏱️ Medium | ❌ Detected | 0% |

---

🛡️ Anti-Bot Techniques Summary

Lessons learned from our testing:

✅ Effective Anti-Bot Measures 1. **Hide `navigator.webdriver`** — Essential 2. **Realistic User-Agent** — Use real devices (iPhone, Android) 3. **Mimic Human Behavior** — Random delays, scrolling 4. **Avoid Framework Signatures** — Crawlee, Selenium are easily detected 5. **Use `addInitScript` (Playwright)** — Inject before page load

❌ Ineffective Anti-Bot Measures 1. **Only changing User-Agent** — Not enough 2. **Using high-level frameworks (Crawlee)** — More easily detected 3. **Docker isolation** — Doesn't help with Cloudflare

---

🔍 Troubleshooting

Issue: 403 Forbidden **Solution:** Use `playwright-stealth.js`

Issue: Cloudflare Challenge Page **Solution:** 1. Increase wait time (10-15 seconds) 2. Try `headless: false` (headful mode sometimes has higher success rate) 3. Consider using proxy IPs

Issue: Blank Page **Solution:** 1. Increase `waitForTimeout` 2. Use `waitUntil: 'networkidle'` or `'domcontentloaded'` 3. Check if login is required

---

📝 Memory & Experience

2026-02-07 Discuss.com.hk Test Conclusions - ✅ **Pure Playwright + Stealth** succeeded (5s, 200 OK) - ❌ Crawlee (deep-scraper) failed (403) - ❌ Chaser (Rust) failed (Cloudflare) - ❌ Puppeteer standard failed (403)

Best Solution: Pure Playwright + anti-bot techniques (framework-independent)

---

🚧 Future Improvements

  • [ ] Add proxy IP rotation
  • [ ] Implement cookie management (maintain login state)
  • [ ] Add CAPTCHA handling (2captcha / Anti-Captcha)
  • [ ] Batch scraping (parallel URLs)
  • [ ] Integration with OpenClaw's `browser` tool

---

📚 References

Use Cases

  • Automate browser interactions for web scraping and testing
  • Extract structured data from websites using headless browser automation
  • Navigate websites, fill forms, and capture screenshots programmatically
  • Scrape dynamic JavaScript-rendered content that simple HTTP requests cannot access

Pros & Cons

Pros

  • +Extremely popular with 17,537+ downloads indicating strong community validation
  • +Community-endorsed with 38 stars on ClawHub
  • +Well-structured approach ensures consistent and reliable results
  • +Integrates smoothly into existing workflows

Cons

  • -Requires installing external dependencies before use
  • -Focused scope means it may not cover edge cases outside its primary use case
  • -May require adaptation for non-standard project configurations

FAQ

What does Playwright Scraper Skill do?
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
What platforms support Playwright Scraper Skill?
Playwright Scraper Skill is available on Claude Code, OpenClaw.
What are the use cases for Playwright Scraper Skill?
Automate browser interactions for web scraping and testing. Extract structured data from websites using headless browser automation. Navigate websites, fill forms, and capture screenshots programmatically.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.