Browserbase Scraper
VerifiedScrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection,...
$ Add to .claude/skills/ About This Skill
# Browserbase Scraper
Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.
When to Use
- Website blocks curl/fetch with Cloudflare "Just a moment..." page
- Playwright headless gets detected and blocked
- Need structured data extraction from dynamic content
- Scraping auction sites, marketplaces, or other protected pages
Prerequisites
```bash npm install @browserbasehq/stagehand zod ```
- Required environment variables:
- `BROWSERBASE_API_KEY` — from browserbase.com dashboard
- `BROWSERBASE_PROJECT_ID` — from browserbase.com
- `GOOGLE_GENERATIVE_AI_API_KEY` — for Gemini extraction (or use OpenAI)
Quick Start
```javascript import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({ env: 'BROWSERBASE', apiKey: process.env.BROWSERBASE_API_KEY, projectId: process.env.BROWSERBASE_PROJECT_ID, model: { modelName: 'google/gemini-3-flash-preview', apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY, }, });
await stagehand.init(); const page = stagehand.context.pages()[0];
// Navigate (Cloudflare bypass is automatic) await page.goto('https://protected-site.com/search?q=term'); await page.waitForTimeout(5000); // Let page fully load
// AI-powered extraction (instruction-only works best) const data = await stagehand.extract(` Extract all product listings as JSON array: [{ "title": "...", "price": 123, "url": "..." }] Return ONLY the JSON array. `);
await stagehand.close(); ```
Key Patterns
1. Instruction-Only Extraction (Recommended) Schema-based extraction often returns empty. Use natural language instructions instead:
```javascript const extraction = await stagehand.extract(` Look at this page and extract: - All item titles - Prices as numbers - URLs Return as JSON array. `); ```
2. Handle Cloudflare Delays Sometimes the challenge takes longer:
```javascript const title = await page.title(); if (title.toLowerCase().includes('moment')) { await page.waitForTimeout(10000); // Wait for challenge } ```
3. Scroll to Load More Many sites lazy-load content:
```javascript for (let i = 0; i < 5; i++) { await page.evaluate(() => window.scrollBy(0, window.innerHeight)); await page.waitForTimeout(800); } ```
4. Parse Extraction Results The extraction returns a string that needs parsing:
```javascript let listings = []; try { const jsonMatch = extraction?.extraction?.match(/\[[\s\S]*\]/); if (jsonMatch) listings = JSON.parse(jsonMatch[0]); } catch (e) { console.log('Parse error:', e.message); } ```
Browserbase Free Tier Limits
- 1 concurrent session — cron jobs can conflict with interactive use
- Sessions auto-close after inactivity
- Use `stagehand.close()` to release session immediately
Cron Integration
For scheduled scraping, use OpenClaw cron with isolated sessions:
```bash openclaw cron add \ --name "Daily Scrape" \ --cron "0 6 * * *" \ --session isolated \ --message "Run: node ~/scripts/scraper.js" ```
Troubleshooting
| Issue | Solution | |-------|----------| | Empty extraction | Use instruction-only (no schema), increase wait time | | Cloudflare loop | Wait 10-15s, check if title contains "moment" | | Session limit | Close other Browserbase sessions, check dashboard | | 429 errors | Wait for session to complete, don't retry immediately |
Example: Full Scraper
See `scripts/example_scraper.js` for a complete working example.
Use Cases
- Scrape product listings from Cloudflare-protected marketplaces and auction sites
- Extract structured JSON data from dynamic pages using AI-powered natural language instructions
- Bypass 'Just a moment...' Cloudflare challenges that block headless Playwright
- Schedule daily scraping jobs via OpenClaw cron with isolated Browserbase sessions
- Collect pricing data from e-commerce sites that actively detect and block bot traffic
Pros & Cons
Pros
- +Automatic Cloudflare bypass — no manual challenge solving or cookie manipulation needed
- +AI-powered extraction via Stagehand understands page context better than CSS selector-based scraping
- +Cloud-hosted browsers eliminate local Chrome dependency and work in headless CI environments
Cons
- -Free tier limited to 1 concurrent session — conflicts between cron jobs and interactive use
- -Schema-based extraction often returns empty results; natural language instructions work but are less predictable
- -Requires three separate API keys (Browserbase, project ID, and Gemini/OpenAI) to get started
FAQ
What does Browserbase Scraper do?
What platforms support Browserbase Scraper?
What are the use cases for Browserbase Scraper?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.