Skip to main content
POST
/
v1
/
web
/
scrape
Scrape URL
curl --request POST \
  --url https://scrapebadger.com/v1/web/scrape \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "url": "<string>",
  "engine": "auto",
  "format": "html",
  "render_js": false,
  "wait_for": "<string>",
  "wait_timeout": 30000,
  "wait_after_load": 123,
  "js_scenario": [
    {}
  ],
  "session_id": "<string>",
  "retry_count": 3,
  "retry_on_block": true,
  "country": "<string>",
  "custom_headers": {},
  "screenshot": false,
  "video": false,
  "anti_bot": false,
  "escalate": false,
  "max_cost": 123,
  "ai_extract": false,
  "ai_prompt": "<string>"
}
'
{
  "success": true,
  "url": "https://scrapebadger.com",
  "status_code": 200,
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "format": "markdown",
  "engine_used": "http",
  "credits_used": 1,
  "duration_ms": 342,
  "retries_used": 0,
  "content_length": 1256,
  "screenshot_url": null,
  "video_url": null,
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "blocking_detected": false,
  "blocking_details": null,
  "antibot_systems": [],
  "captcha_systems": [],
  "anti_bot_solved": false,
  "solver_used": null,
  "ai_extraction": null,
  "ai_model": null,
  "ai_error": null
}

Request Body

url
string
required
The URL to scrape. Must be a valid HTTP or HTTPS URL. Private IPs and cloud metadata endpoints are blocked for security.
engine
string
default:"auto"
Scraping engine tier to use. ScrapeBadger automatically selects the best approach.
ValueDescriptionCost
autoAutomatically picks the best engine for the target site (recommended)From 1 credit
browserForce headless browser with full JavaScript rendering5 credits
In auto mode, simple pages use fast HTTP (1 credit) and JavaScript-heavy pages use a browser (5 credits). When escalate is enabled, the system can further escalate to premium browsers (10 credits) for heavily protected sites.
format
string
default:"html"
Output format for the scraped content.
  • html — Raw HTML of the page
  • markdown — Converted to clean Markdown
  • text — Plain text with HTML tags stripped
render_js
boolean
default:false
Force JavaScript rendering before extracting content. Automatically switches to the browser engine. Use this for single-page applications or pages that load content dynamically.
wait_for
string
CSS selector or XPath expression to wait for before extracting content. Only works with browser engines (browser, patchright, windows_chrome). If render_js is false and this is set, JS rendering is forced automatically.
{ "wait_for": "#main-content" }
{ "wait_for": "//div[@class='results']" }
wait_timeout
integer
default:30000
Maximum time in milliseconds to wait for the wait_for selector to appear. Range: 1000120000.
wait_after_load
integer
Additional milliseconds to wait after the page has finished loading, before extracting content. Useful for pages with animations or delayed rendering. Only works with browser engines. Range: 030000.
js_scenario
array
A list of browser actions to perform before extracting content. Forces the browser engine. Each action is an object with a type and action-specific parameters.Supported actions:
TypeParametersDescription
clickselectorClick an element
fillselector, valueFill an input field
scrolldirection, amountScroll the page
waitmillisecondsWait for a duration
{
  "js_scenario": [
    { "type": "click", "selector": "#load-more" },
    { "type": "wait", "milliseconds": 2000 },
    { "type": "scroll", "direction": "down", "amount": 500 }
  ]
}
session_id
string
A unique identifier to persist cookies, fingerprint, and browser storage across multiple requests. Use the same session_id to maintain login state or continue a browsing session.
{ "session_id": "my-shopping-session" }
retry_count
integer
default:3
Maximum number of retry attempts when a blocking page is detected. Retries are free — you only pay for the final successful engine. Range: 010.
retry_on_block
boolean
default:true
Whether to automatically retry when a blocking page is detected. Set to false to get the blocked response immediately.
country
string
ISO 3166-1 alpha-2 country code for proxy geo-targeting. Routes the request through a proxy in the specified country.
{ "country": "US" }
Common values: US, GB, DE, FR, JP, BR, AU.
custom_headers
object
Additional HTTP headers to include in the request to the target URL. Accepts a key-value object.
{
  "custom_headers": {
    "Accept-Language": "en-US",
    "Referer": "https://google.com"
  }
}
screenshot
boolean
default:false
Capture a full-page screenshot (PNG). Forces the browser engine. Returned as base64 in the screenshot_url response field.
video
boolean
default:false
Record a video of the browser session (animated GIF). Forces the browser engine. Returned as base64 in the video_url response field. Adds +3 credits. Useful for debugging, visual verification, or monitoring how a page loads.
anti_bot
boolean
default:false
Attempt to bypass detected anti-bot protection using registered solvers. Adds +5 credits to the request cost when a solver is invoked. Only triggered when blocking is actually detected.
escalate
boolean
default:false
Allow automatic escalation to more powerful engines when the initial engine is blocked.Escalation path: curl_cffibrowserwindows_chromeYou only pay for the engine that succeeds — costs are not cumulative. Without this flag, only the selected engine is tried.
max_cost
integer
Maximum credits to spend on this request. The request fails with a 400 error if the estimated cost would exceed this budget. Useful for controlling costs when using escalate or anti_bot. Minimum: 1.
ai_extract
boolean
default:false
Run AI-powered extraction on the scraped content using the instruction in ai_prompt. Adds +2 credits to the request cost. The scrape result is still returned even if AI extraction fails.
ai_prompt
string
Natural language instruction for AI data extraction. Required when ai_extract is true. Maximum 2000 characters.
{
  "ai_extract": true,
  "ai_prompt": "Extract all product names and prices as a JSON array"
}

Response

success
boolean
Whether the scrape completed successfully. false when all retries are exhausted and the page is still blocked.
url
string
The final URL after any redirects.
status_code
integer
HTTP status code from the target URL.
content
string
The scraped content in the requested format. null when success is false.
format
string
The output format used: html, markdown, or text.
engine_used
string
The engine tier that produced the final result.
credits_used
integer
Total credits charged for this request, including engine cost, solver, and AI extraction.
duration_ms
integer
Total request processing time in milliseconds.
retries_used
integer
Number of retry attempts performed. 0 if the first attempt succeeded.
content_length
integer
Size of the returned content in bytes.
screenshot_url
string
Base64-encoded PNG screenshot of the page. Only present when screenshot: true was requested.
video_url
string
Base64-encoded animated GIF of the browser session. Only present when video: true was requested.
headers
object
HTTP response headers from the target URL.
blocking_detected
boolean
Whether a blocking page was detected during scraping.
blocking_details
object
Details about the detected blocking page. Only present when blocking_detected is true.
antibot_systems
array
List of anti-bot systems detected on the page.
captcha_systems
array
List of CAPTCHA systems detected on the page.
anti_bot_solved
boolean
Whether the anti-bot solver successfully bypassed the protection.
solver_used
string
Name of the solver that successfully bypassed the block. null if no solver was used.
ai_extraction
object | string | array
Structured data extracted by the LLM based on ai_prompt. The shape depends on your prompt. null when ai_extract is false or extraction failed.
ai_model
string
The LLM model used for extraction (e.g., gpt-4o-mini). null when AI extraction was not used.
ai_error
string
Error message if AI extraction failed. The scrape result is still returned. null on success.

Examples

Basic scrape

curl -X POST "https://scrapebadger.com/v1/web/scrape" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://scrapebadger.com", "format": "markdown"}'

JavaScript rendering with wait

{
  "url": "https://scrapebadger.com/spa-page",
  "format": "html",
  "render_js": true,
  "wait_for": "#dynamic-content",
  "wait_timeout": 10000
}

AI extraction

{
  "url": "https://scrapebadger.com/products",
  "format": "markdown",
  "ai_extract": true,
  "ai_prompt": "Extract all product names, prices, and ratings as a JSON array of objects with keys: name, price, rating"
}

Full anti-bot bypass with budget

{
  "url": "https://heavily-protected-site.com",
  "format": "markdown",
  "escalate": true,
  "anti_bot": true,
  "max_cost": 20,
  "country": "US"
}

Browser automation scenario

{
  "url": "https://scrapebadger.com/infinite-scroll",
  "format": "text",
  "js_scenario": [
    { "type": "scroll", "direction": "down", "amount": 1000 },
    { "type": "wait", "milliseconds": 2000 },
    { "type": "scroll", "direction": "down", "amount": 1000 },
    { "type": "wait", "milliseconds": 2000 }
  ]
}

Error Responses

StatusDescription
400Invalid URL, cost exceeds max_cost, or requested engine not available
402Insufficient credits
422Blocking detected after all retries exhausted (success: false, blocking_details populated)
429Rate limit exceeded
500Unexpected server error
{
  "success": true,
  "url": "https://scrapebadger.com",
  "status_code": 200,
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "format": "markdown",
  "engine_used": "http",
  "credits_used": 1,
  "duration_ms": 342,
  "retries_used": 0,
  "content_length": 1256,
  "screenshot_url": null,
  "video_url": null,
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "blocking_detected": false,
  "blocking_details": null,
  "antibot_systems": [],
  "captcha_systems": [],
  "anti_bot_solved": false,
  "solver_used": null,
  "ai_extraction": null,
  "ai_model": null,
  "ai_error": null
}

Authorizations

x-api-key
string
header
required

Body

application/json
url
string
required

The URL to scrape. Must be a valid HTTP or HTTPS URL.

engine
enum<string>
default:auto

Scraping engine tier to use.

Available options:
auto,
browser
format
enum<string>
default:html

Output format for the scraped content.

Available options:
html,
markdown,
text
render_js
boolean
default:false

Force JavaScript rendering.

wait_for
string

CSS selector or XPath to wait for before extracting.

wait_timeout
integer
default:30000

Max wait time in ms for wait_for selector.

wait_after_load
integer

Additional ms to wait after page load.

js_scenario
object[]

Browser actions to perform before extracting.

session_id
string

Persist cookies and state across requests.

retry_count
integer
default:3

Max retry attempts on blocking detection.

retry_on_block
boolean
default:true

Auto-retry on blocking page detection.

country
string

ISO 3166-1 alpha-2 country code for proxy geo-targeting.

custom_headers
object

Additional HTTP headers for the target request.

screenshot
boolean
default:false

Capture a full-page PNG screenshot.

video
boolean
default:false

Record browser session as animated GIF (+3 credits).

anti_bot
boolean
default:false

Attempt anti-bot bypass when blocking detected.

escalate
boolean
default:false

Allow auto-escalation to stronger engines.

max_cost
integer

Maximum credits budget for this request.

ai_extract
boolean
default:false

Run AI extraction on scraped content.

ai_prompt
string

Natural language instruction for AI extraction.

Response

200 - application/json

Successful scrape

success
boolean
url
string
status_code
integer
content
string
format
string
engine_used
string
credits_used
integer
duration_ms
integer
retries_used
integer
content_length
integer
screenshot_url
string | null
video_url
string | null
headers
object
blocking_detected
boolean
blocking_details
object
antibot_systems
array
captcha_systems
array
anti_bot_solved
boolean
solver_used
string | null
ai_extraction
unknown
ai_model
string | null
ai_error
string | null