Skip to content

Browserbase Scraper

Verified

Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection,...

62 downloads
$ Add to .claude/skills/

About This Skill

# Browserbase Scraper

Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.

When to Use

  • Website blocks curl/fetch with Cloudflare "Just a moment..." page
  • Playwright headless gets detected and blocked
  • Need structured data extraction from dynamic content
  • Scraping auction sites, marketplaces, or other protected pages

Prerequisites

```bash npm install @browserbasehq/stagehand zod ```

  • Required environment variables:
  • `BROWSERBASE_API_KEY` — from browserbase.com dashboard
  • `BROWSERBASE_PROJECT_ID` — from browserbase.com
  • `GOOGLE_GENERATIVE_AI_API_KEY` — for Gemini extraction (or use OpenAI)

Quick Start

```javascript import { Stagehand } from '@browserbasehq/stagehand';

const stagehand = new Stagehand({ env: 'BROWSERBASE', apiKey: process.env.BROWSERBASE_API_KEY, projectId: process.env.BROWSERBASE_PROJECT_ID, model: { modelName: 'google/gemini-3-flash-preview', apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY, }, });

await stagehand.init(); const page = stagehand.context.pages()[0];

// Navigate (Cloudflare bypass is automatic) await page.goto('https://protected-site.com/search?q=term'); await page.waitForTimeout(5000); // Let page fully load

// AI-powered extraction (instruction-only works best) const data = await stagehand.extract(` Extract all product listings as JSON array: [{ "title": "...", "price": 123, "url": "..." }] Return ONLY the JSON array. `);

await stagehand.close(); ```

Key Patterns

1. Instruction-Only Extraction (Recommended) Schema-based extraction often returns empty. Use natural language instructions instead:

```javascript const extraction = await stagehand.extract(` Look at this page and extract: - All item titles - Prices as numbers - URLs Return as JSON array. `); ```

2. Handle Cloudflare Delays Sometimes the challenge takes longer:

```javascript const title = await page.title(); if (title.toLowerCase().includes('moment')) { await page.waitForTimeout(10000); // Wait for challenge } ```

3. Scroll to Load More Many sites lazy-load content:

```javascript for (let i = 0; i < 5; i++) { await page.evaluate(() => window.scrollBy(0, window.innerHeight)); await page.waitForTimeout(800); } ```

4. Parse Extraction Results The extraction returns a string that needs parsing:

```javascript let listings = []; try { const jsonMatch = extraction?.extraction?.match(/\[[\s\S]*\]/); if (jsonMatch) listings = JSON.parse(jsonMatch[0]); } catch (e) { console.log('Parse error:', e.message); } ```

Browserbase Free Tier Limits

  • 1 concurrent session — cron jobs can conflict with interactive use
  • Sessions auto-close after inactivity
  • Use `stagehand.close()` to release session immediately

Cron Integration

For scheduled scraping, use OpenClaw cron with isolated sessions:

```bash openclaw cron add \ --name "Daily Scrape" \ --cron "0 6 * * *" \ --session isolated \ --message "Run: node ~/scripts/scraper.js" ```

Troubleshooting

| Issue | Solution | |-------|----------| | Empty extraction | Use instruction-only (no schema), increase wait time | | Cloudflare loop | Wait 10-15s, check if title contains "moment" | | Session limit | Close other Browserbase sessions, check dashboard | | 429 errors | Wait for session to complete, don't retry immediately |

Example: Full Scraper

See `scripts/example_scraper.js` for a complete working example.

Use Cases

  • Scrape product listings from Cloudflare-protected marketplaces and auction sites
  • Extract structured JSON data from dynamic pages using AI-powered natural language instructions
  • Bypass 'Just a moment...' Cloudflare challenges that block headless Playwright
  • Schedule daily scraping jobs via OpenClaw cron with isolated Browserbase sessions
  • Collect pricing data from e-commerce sites that actively detect and block bot traffic

Pros & Cons

Pros

  • +Automatic Cloudflare bypass — no manual challenge solving or cookie manipulation needed
  • +AI-powered extraction via Stagehand understands page context better than CSS selector-based scraping
  • +Cloud-hosted browsers eliminate local Chrome dependency and work in headless CI environments

Cons

  • -Free tier limited to 1 concurrent session — conflicts between cron jobs and interactive use
  • -Schema-based extraction often returns empty results; natural language instructions work but are less predictable
  • -Requires three separate API keys (Browserbase, project ID, and Gemini/OpenAI) to get started

FAQ

What does Browserbase Scraper do?
Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection,...
What platforms support Browserbase Scraper?
Browserbase Scraper is available on Claude Code, OpenClaw.
What are the use cases for Browserbase Scraper?
Scrape product listings from Cloudflare-protected marketplaces and auction sites. Extract structured JSON data from dynamic pages using AI-powered natural language instructions. Bypass 'Just a moment...' Cloudflare challenges that block headless Playwright.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.