Scrape URL

Request Body

url

string

required

The URL to scrape. Must be a valid HTTP or HTTPS URL. Private IPs and cloud metadata endpoints are blocked for security.

engine

string

default:"auto"

Scraping engine tier to use. ScrapeBadger automatically selects the best approach.

Value	Description	Cost
`auto`	Automatically picks the best engine for the target site (recommended)	From 1 credit
`browser`	Force headless browser with full JavaScript rendering	5 credits

In auto mode, simple pages use fast HTTP (1 credit) and JavaScript-heavy pages use a browser (5 credits). When escalate is enabled, the system can further escalate to premium browsers (10 credits) for heavily protected sites.

format

string

default:"html"

Output format for the scraped content.

html — Raw HTML of the page
markdown — Converted to clean Markdown
text — Plain text with HTML tags stripped

render_js

boolean

default:false

Force JavaScript rendering before extracting content. Automatically switches to the browser engine. Use this for single-page applications or pages that load content dynamically.

wait_for

string

CSS selector or XPath expression to wait for before extracting content. Only works with browser engines (browser, patchright, windows_chrome). If render_js is false and this is set, JS rendering is forced automatically.

{ "wait_for": "#main-content" }

{ "wait_for": "//div[@class='results']" }

wait_timeout

integer

default:30000

Maximum time in milliseconds to wait for the wait_for selector to appear. Range: 1000 – 120000.

wait_after_load

integer

Additional milliseconds to wait after the page has finished loading, before extracting content. Useful for pages with animations or delayed rendering. Only works with browser engines. Range: 0 – 30000.

js_scenario

array

A list of browser actions to perform before extracting content. Forces the browser engine. Each action is an object with a type and action-specific parameters.Supported actions:

Type	Parameters	Description
`click`	`selector`	Click an element
`fill`	`selector`, `value`	Fill an input field
`scroll`	`direction`, `amount`	Scroll the page
`wait`	`milliseconds`	Wait for a duration

{
  "js_scenario": [
    { "type": "click", "selector": "#load-more" },
    { "type": "wait", "milliseconds": 2000 },
    { "type": "scroll", "direction": "down", "amount": 500 }
  ]
}

session_id

string

A unique identifier to persist cookies, fingerprint, and browser storage across multiple requests. Use the same session_id to maintain login state or continue a browsing session.

{ "session_id": "my-shopping-session" }

retry_count

integer

default:3

Maximum number of retry attempts when a blocking page is detected. Retries are free — you only pay for the final successful engine. Range: 0 – 10.

retry_on_block

boolean

default:true

Whether to automatically retry when a blocking page is detected. Set to false to get the blocked response immediately.

country

string

ISO 3166-1 alpha-2 country code for proxy geo-targeting. Routes the request through a proxy in the specified country.

{ "country": "US" }

Common values: US, GB, DE, FR, JP, BR, AU.

custom_headers

object

Additional HTTP headers to include in the request to the target URL. Accepts a key-value object.

{
  "custom_headers": {
    "Accept-Language": "en-US",
    "Referer": "https://google.com"
  }
}

screenshot

boolean

default:false

Capture a full-page screenshot (PNG). Forces the browser engine. Returned as base64 in the screenshot_url response field.

video

boolean

default:false

Record a video of the browser session (animated GIF). Forces the browser engine. Returned as base64 in the video_url response field. Adds +3 credits. Useful for debugging, visual verification, or monitoring how a page loads.

anti_bot

boolean

default:false

Attempt to bypass detected anti-bot protection using registered solvers. Adds +5 credits to the request cost when a solver is invoked. Only triggered when blocking is actually detected.

escalate

boolean

default:false

Allow automatic escalation to more powerful engines when the initial engine is blocked.Escalation path: curl_cffi → browser → windows_chromeYou only pay for the engine that succeeds — costs are not cumulative. Without this flag, only the selected engine is tried.

max_cost

integer

Maximum credits to spend on this request. The request fails with a 400 error if the estimated cost would exceed this budget. Useful for controlling costs when using escalate or anti_bot. Minimum: 1.

ai_extract

boolean

default:false

Run AI-powered extraction on the scraped content using the instruction in ai_prompt. Adds +2 credits to the request cost. The scrape result is still returned even if AI extraction fails.

ai_prompt

string

Natural language instruction for AI data extraction. Required when ai_extract is true. Maximum 2000 characters.

{
  "ai_extract": true,
  "ai_prompt": "Extract all product names and prices as a JSON array"
}

Response

success

boolean

Whether the scrape completed successfully. false when all retries are exhausted and the page is still blocked.

url

string

The final URL after any redirects.

status_code

integer

HTTP status code from the target URL.

content

string

The scraped content in the requested format. null when success is false.

format

string

The output format used: html, markdown, or text.

engine_used

string

The engine tier that produced the final result.

credits_used

integer

Total credits charged for this request, including engine cost, solver, and AI extraction.

duration_ms

integer

Total request processing time in milliseconds.

retries_used

integer

Number of retry attempts performed. 0 if the first attempt succeeded.

content_length

integer

Size of the returned content in bytes.

screenshot_url

string

Base64-encoded PNG screenshot of the page. Only present when screenshot: true was requested.

video_url

string

Base64-encoded animated GIF of the browser session. Only present when video: true was requested.

headers

object

HTTP response headers from the target URL.

blocking_detected

boolean

Whether a blocking page was detected during scraping.

blocking_details

object

Details about the detected blocking page. Only present when blocking_detected is true.

Show child attributes

is_blocked

boolean

Whether the page is confirmed as a blocking page.

block_type

string

Type of block detected (e.g., cloudflare, datadome, akamai, kasada).

confidence

number

Confidence score from 0.0 to 1.0.

details

string

Human-readable description of the block.

antibot_systems

array

List of anti-bot systems detected on the page.

Show child attributes

system

string

System name (e.g., cloudflare_turnstile, datadome, akamai, kasada, amazon_waf).

confidence

number

Confidence score from 0.0 to 1.0.

details

string

Additional detection details.

captcha_systems

array

List of CAPTCHA systems detected on the page.

Show child attributes

system

string

System name (e.g., recaptcha_v2, recaptcha_v3, hcaptcha, geetest).

confidence

number

Confidence score from 0.0 to 1.0.

details

string

Additional detection details.

anti_bot_solved

boolean

Whether the anti-bot solver successfully bypassed the protection.

solver_used

string

Name of the solver that successfully bypassed the block. null if no solver was used.

ai_extraction

object | string | array

Structured data extracted by the LLM based on ai_prompt. The shape depends on your prompt. null when ai_extract is false or extraction failed.

ai_model

string

The LLM model used for extraction (e.g., gpt-4o-mini). null when AI extraction was not used.

ai_error

string

Error message if AI extraction failed. The scrape result is still returned. null on success.

Examples

Basic scrape

curl -X POST "https://scrapebadger.com/v1/web/scrape" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://scrapebadger.com", "format": "markdown"}'

JavaScript rendering with wait

{
  "url": "https://scrapebadger.com/spa-page",
  "format": "html",
  "render_js": true,
  "wait_for": "#dynamic-content",
  "wait_timeout": 10000
}

AI extraction

{
  "url": "https://scrapebadger.com/products",
  "format": "markdown",
  "ai_extract": true,
  "ai_prompt": "Extract all product names, prices, and ratings as a JSON array of objects with keys: name, price, rating"
}

Full anti-bot bypass with budget

{
  "url": "https://heavily-protected-site.com",
  "format": "markdown",
  "escalate": true,
  "anti_bot": true,
  "max_cost": 20,
  "country": "US"
}

Browser automation scenario

{
  "url": "https://scrapebadger.com/infinite-scroll",
  "format": "text",
  "js_scenario": [
    { "type": "scroll", "direction": "down", "amount": 1000 },
    { "type": "wait", "milliseconds": 2000 },
    { "type": "scroll", "direction": "down", "amount": 1000 },
    { "type": "wait", "milliseconds": 2000 }
  ]
}

Error Responses

Status	Description
`400`	Invalid URL, cost exceeds `max_cost`, or requested engine not available
`402`	Insufficient credits
`422`	Blocking detected after all retries exhausted (`success: false`, `blocking_details` populated)
`429`	Rate limit exceeded
`500`	Unexpected server error

{
  "success": true,
  "url": "https://scrapebadger.com",
  "status_code": 200,
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "format": "markdown",
  "engine_used": "http",
  "credits_used": 1,
  "duration_ms": 342,
  "retries_used": 0,
  "content_length": 1256,
  "screenshot_url": null,
  "video_url": null,
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "blocking_detected": false,
  "blocking_details": null,
  "antibot_systems": [],
  "captcha_systems": [],
  "anti_bot_solved": false,
  "solver_used": null,
  "ai_extraction": null,
  "ai_model": null,
  "ai_error": null
}

Authorizations

x-api-key

string

header

required

Body

application/json

url

string

required

The URL to scrape. Must be a valid HTTP or HTTPS URL.

engine

enum<string>

default:auto

Scraping engine tier to use.

Available options:

auto,

browser

format

enum<string>

default:html

Output format for the scraped content.

Available options:

html,

markdown,

text

render_js

boolean

default:false

Force JavaScript rendering.

wait_for

string

CSS selector or XPath to wait for before extracting.

wait_timeout

integer

default:30000

Max wait time in ms for wait_for selector.

wait_after_load

integer

Additional ms to wait after page load.

js_scenario

object[]

Browser actions to perform before extracting.

session_id

string

Persist cookies and state across requests.

retry_count

integer

default:3

Max retry attempts on blocking detection.

retry_on_block

boolean

default:true

Auto-retry on blocking page detection.

country

string

ISO 3166-1 alpha-2 country code for proxy geo-targeting.

custom_headers

object

Additional HTTP headers for the target request.

screenshot

boolean

default:false

Capture a full-page PNG screenshot.

video

boolean

default:false

Record browser session as animated GIF (+3 credits).

anti_bot

boolean

default:false

Attempt anti-bot bypass when blocking detected.

escalate

boolean

default:false

Allow auto-escalation to stronger engines.

max_cost

integer

Maximum credits budget for this request.

ai_extract

boolean

default:false

Run AI extraction on scraped content.

ai_prompt

string

Natural language instruction for AI extraction.

Response

200 - application/json

Successful scrape

success

boolean

url

string

status_code

integer

content

string

format

string

engine_used

string

credits_used

integer

duration_ms

integer

retries_used

integer

content_length

integer

screenshot_url

string | null

video_url

string | null

headers

object

blocking_detected

boolean

blocking_details

object

antibot_systems

array

captcha_systems

array

anti_bot_solved

boolean

solver_used

string | null

ai_extraction

unknown

ai_model

string | null

ai_error

string | null

Overview

API Reference

Request Body

Response

Examples

Basic scrape

JavaScript rendering with wait

AI extraction

Full anti-bot bypass with budget

Browser automation scenario

Error Responses

Authorizations

Body

Response

Overview

API Reference

Documentation Index

​Request Body

​Response

​Examples

​Basic scrape

​JavaScript rendering with wait

​AI extraction

​Full anti-bot bypass with budget

​Browser automation scenario

​Error Responses

Authorizations

Body

Response

Request Body

Response

Examples

Basic scrape

JavaScript rendering with wait

AI extraction

Full anti-bot bypass with budget

Browser automation scenario

Error Responses