Skip to main content
POST
/
v1
/
web
/
scrape
Scrape URL
curl --request POST \
  --url https://api.example.com/v1/web/scrape \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "<string>",
  "engine": "<string>",
  "format": "<string>",
  "render_js": true,
  "wait_for": "<string>",
  "wait_timeout": 123,
  "wait_after_load": 123,
  "js_scenario": [
    {}
  ],
  "session_id": "<string>",
  "retry_count": 123,
  "retry_on_block": true,
  "country": "<string>",
  "custom_headers": {},
  "screenshot": true,
  "anti_bot": true,
  "escalate": true,
  "max_cost": 123,
  "ai_extract": true,
  "ai_prompt": "<string>"
}
'
{
  "success": true,
  "url": "https://example.com",
  "status_code": 200,
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "format": "markdown",
  "engine_used": "curl_cffi",
  "credits_used": 1,
  "duration_ms": 342,
  "retries_used": 0,
  "content_length": 1256,
  "screenshot_url": null,
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "blocking_detected": false,
  "blocking_details": null,
  "antibot_systems": [],
  "captcha_systems": [],
  "anti_bot_solved": false,
  "solver_used": null,
  "ai_extraction": null,
  "ai_model": null,
  "ai_error": null
}

Request Body

url
string
required
The URL to scrape. Must be a valid HTTP or HTTPS URL. Private IPs and cloud metadata endpoints are blocked for security.
engine
string
default:"auto"
Scraping engine to use.
ValueDescriptionCost
autoSelects curl_cffi by default, escalates to browser if JS rendering is neededVaries
curl_cffiHTTP client with Chrome TLS fingerprint impersonation1 credit
rnetBackward-compatible alias for curl_cffi1 credit
browserHeadless browser (Patchright/Camoufox) with full JS rendering5 credits
patchrightBackward-compatible alias for browser5 credits
windows_chromeReal Chrome on a remote Windows VPS via CDP10 credits
format
string
default:"html"
Output format for the scraped content.
  • html — Raw HTML of the page
  • markdown — Converted to clean Markdown
  • text — Plain text with HTML tags stripped
render_js
boolean
default:false
Force JavaScript rendering before extracting content. Automatically switches to the browser engine. Use this for single-page applications or pages that load content dynamically.
wait_for
string
CSS selector or XPath expression to wait for before extracting content. Only works with browser engines (browser, patchright, windows_chrome). If render_js is false and this is set, JS rendering is forced automatically.
{ "wait_for": "#main-content" }
{ "wait_for": "//div[@class='results']" }
wait_timeout
integer
default:30000
Maximum time in milliseconds to wait for the wait_for selector to appear. Range: 1000120000.
wait_after_load
integer
Additional milliseconds to wait after the page has finished loading, before extracting content. Useful for pages with animations or delayed rendering. Only works with browser engines. Range: 030000.
js_scenario
array
A list of browser actions to perform before extracting content. Forces the browser engine. Each action is an object with a type and action-specific parameters.Supported actions:
TypeParametersDescription
clickselectorClick an element
fillselector, valueFill an input field
scrolldirection, amountScroll the page
waitmillisecondsWait for a duration
{
  "js_scenario": [
    { "type": "click", "selector": "#load-more" },
    { "type": "wait", "milliseconds": 2000 },
    { "type": "scroll", "direction": "down", "amount": 500 }
  ]
}
session_id
string
A unique identifier to persist cookies, fingerprint, and browser storage across multiple requests. Use the same session_id to maintain login state or continue a browsing session.
{ "session_id": "my-shopping-session" }
retry_count
integer
default:3
Maximum number of retry attempts when a blocking page is detected. Retries are free — you only pay for the final successful engine. Range: 010.
retry_on_block
boolean
default:true
Whether to automatically retry when a blocking page is detected. Set to false to get the blocked response immediately.
country
string
ISO 3166-1 alpha-2 country code for proxy geo-targeting. Routes the request through a proxy in the specified country.
{ "country": "US" }
Common values: US, GB, DE, FR, JP, BR, AU.
custom_headers
object
Additional HTTP headers to include in the request to the target URL. Accepts a key-value object.
{
  "custom_headers": {
    "Accept-Language": "en-US",
    "Referer": "https://google.com"
  }
}
screenshot
boolean
default:false
Capture a screenshot of the page. Forces the browser engine. The screenshot URL is returned in the screenshot_url field of the response.
anti_bot
boolean
default:false
Attempt to bypass detected anti-bot protection using registered solvers. Adds +5 credits to the request cost when a solver is invoked. Only triggered when blocking is actually detected.
escalate
boolean
default:false
Allow automatic escalation to more powerful engines when the initial engine is blocked.Escalation path: curl_cffibrowserwindows_chromeYou only pay for the engine that succeeds — costs are not cumulative. Without this flag, only the selected engine is tried.
max_cost
integer
Maximum credits to spend on this request. The request fails with a 400 error if the estimated cost would exceed this budget. Useful for controlling costs when using escalate or anti_bot. Minimum: 1.
ai_extract
boolean
default:false
Run AI-powered extraction on the scraped content using the instruction in ai_prompt. Adds +2 credits to the request cost. The scrape result is still returned even if AI extraction fails.
ai_prompt
string
Natural language instruction for AI data extraction. Required when ai_extract is true. Maximum 2000 characters.
{
  "ai_extract": true,
  "ai_prompt": "Extract all product names and prices as a JSON array"
}

Response

success
boolean
Whether the scrape completed successfully. false when all retries are exhausted and the page is still blocked.
url
string
The final URL after any redirects.
status_code
integer
HTTP status code from the target URL.
content
string
The scraped content in the requested format. null when success is false.
format
string
The output format used: html, markdown, or text.
engine_used
string
The engine that produced the final result (e.g., curl_cffi, browser, windows_chrome).
credits_used
integer
Total credits charged for this request, including engine cost, solver, and AI extraction.
duration_ms
integer
Total request processing time in milliseconds.
retries_used
integer
Number of retry attempts performed. 0 if the first attempt succeeded.
content_length
integer
Size of the returned content in bytes.
screenshot_url
string
URL to the captured screenshot image. Only present when screenshot: true was requested.
headers
object
HTTP response headers from the target URL.
blocking_detected
boolean
Whether a blocking page was detected during scraping.
blocking_details
object
Details about the detected blocking page. Only present when blocking_detected is true.
antibot_systems
array
List of anti-bot systems detected on the page.
captcha_systems
array
List of CAPTCHA systems detected on the page.
anti_bot_solved
boolean
Whether the anti-bot solver successfully bypassed the protection.
solver_used
string
Name of the solver that successfully bypassed the block. null if no solver was used.
ai_extraction
object | string | array
Structured data extracted by the LLM based on ai_prompt. The shape depends on your prompt. null when ai_extract is false or extraction failed.
ai_model
string
The LLM model used for extraction (e.g., gpt-4o-mini). null when AI extraction was not used.
ai_error
string
Error message if AI extraction failed. The scrape result is still returned. null on success.

Examples

Basic scrape

curl -X POST "https://scrapebadger.com/v1/web/scrape" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "markdown"}'

JavaScript rendering with wait

{
  "url": "https://example.com/spa-page",
  "format": "html",
  "render_js": true,
  "wait_for": "#dynamic-content",
  "wait_timeout": 10000
}

AI extraction

{
  "url": "https://example.com/products",
  "format": "markdown",
  "ai_extract": true,
  "ai_prompt": "Extract all product names, prices, and ratings as a JSON array of objects with keys: name, price, rating"
}

Full anti-bot bypass with budget

{
  "url": "https://heavily-protected-site.com",
  "format": "markdown",
  "escalate": true,
  "anti_bot": true,
  "max_cost": 20,
  "country": "US"
}

Browser automation scenario

{
  "url": "https://example.com/infinite-scroll",
  "format": "text",
  "js_scenario": [
    { "type": "scroll", "direction": "down", "amount": 1000 },
    { "type": "wait", "milliseconds": 2000 },
    { "type": "scroll", "direction": "down", "amount": 1000 },
    { "type": "wait", "milliseconds": 2000 }
  ]
}

Error Responses

StatusDescription
400Invalid URL, cost exceeds max_cost, or requested engine not available
402Insufficient credits
422Blocking detected after all retries exhausted (success: false, blocking_details populated)
429Rate limit exceeded
500Unexpected server error
{
  "success": true,
  "url": "https://example.com",
  "status_code": 200,
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "format": "markdown",
  "engine_used": "curl_cffi",
  "credits_used": 1,
  "duration_ms": 342,
  "retries_used": 0,
  "content_length": 1256,
  "screenshot_url": null,
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "blocking_detected": false,
  "blocking_details": null,
  "antibot_systems": [],
  "captcha_systems": [],
  "anti_bot_solved": false,
  "solver_used": null,
  "ai_extraction": null,
  "ai_model": null,
  "ai_error": null
}