> ## Documentation Index > Fetch the complete documentation index at: https://docs.scrapebadger.com/llms.txt > Use this file to discover all available pages before exploring further. # Scrape URL > Scrape a webpage and return its content as HTML, Markdown, or plain text. ## Request Body The URL to scrape. Must be a valid HTTP or HTTPS URL. Private IPs and cloud metadata endpoints are blocked for security. Scraping engine tier to use. ScrapeBadger automatically selects the best approach. | Value | Description | Cost | | --------- | --------------------------------------------------------------------- | ------------- | | `auto` | Automatically picks the best engine for the target site (recommended) | From 1 credit | | `browser` | Force headless browser with full JavaScript rendering | 5 credits | In `auto` mode, simple pages use fast HTTP (1 credit) and JavaScript-heavy pages use a browser (5 credits). When `escalate` is enabled, the system can further escalate to premium browsers (10 credits) for heavily protected sites. Output format for the scraped content. * `html` — Raw HTML of the page * `markdown` — Converted to clean Markdown * `text` — Plain text with HTML tags stripped Force JavaScript rendering before extracting content. Automatically switches to the `browser` engine. Use this for single-page applications or pages that load content dynamically. CSS selector or XPath expression to wait for before extracting content. Only works with browser engines (`browser`, `patchright`, `windows_chrome`). If `render_js` is `false` and this is set, JS rendering is forced automatically. ```json theme={null} { "wait_for": "#main-content" } ``` ```json theme={null} { "wait_for": "//div[@class='results']" } ``` Maximum time in milliseconds to wait for the `wait_for` selector to appear. Range: `1000` – `120000`. Additional milliseconds to wait after the page has finished loading, before extracting content. Useful for pages with animations or delayed rendering. Only works with browser engines. Range: `0` – `30000`. A list of browser actions to perform before extracting content. Forces the `browser` engine. Each action is an object with a `type` and action-specific parameters. **Supported actions:** | Type | Parameters | Description | | -------- | --------------------- | ------------------- | | `click` | `selector` | Click an element | | `fill` | `selector`, `value` | Fill an input field | | `scroll` | `direction`, `amount` | Scroll the page | | `wait` | `milliseconds` | Wait for a duration | ```json theme={null} { "js_scenario": [ { "type": "click", "selector": "#load-more" }, { "type": "wait", "milliseconds": 2000 }, { "type": "scroll", "direction": "down", "amount": 500 } ] } ``` A unique identifier to persist cookies, fingerprint, and browser storage across multiple requests. Use the same `session_id` to maintain login state or continue a browsing session. ```json theme={null} { "session_id": "my-shopping-session" } ``` Maximum number of retry attempts when a blocking page is detected. Retries are free — you only pay for the final successful engine. Range: `0` – `10`. Whether to automatically retry when a blocking page is detected. Set to `false` to get the blocked response immediately. ISO 3166-1 alpha-2 country code for proxy geo-targeting. Routes the request through a proxy in the specified country. ```json theme={null} { "country": "US" } ``` Common values: `US`, `GB`, `DE`, `FR`, `JP`, `BR`, `AU`. Additional HTTP headers to include in the request to the target URL. Accepts a key-value object. ```json theme={null} { "custom_headers": { "Accept-Language": "en-US", "Referer": "https://google.com" } } ``` Capture a full-page screenshot (PNG). Forces the browser engine. Returned as base64 in the `screenshot_url` response field. Record a video of the browser session (animated GIF). Forces the browser engine. Returned as base64 in the `video_url` response field. Adds **+3 credits**. Useful for debugging, visual verification, or monitoring how a page loads. Attempt to bypass detected anti-bot protection using registered solvers. Adds **+5 credits** to the request cost when a solver is invoked. Only triggered when blocking is actually detected. Allow automatic escalation to more powerful engines when the initial engine is blocked. Escalation path: `curl_cffi` → `browser` → `windows_chrome` You only pay for the engine that succeeds — costs are **not cumulative**. Without this flag, only the selected engine is tried. Maximum credits to spend on this request. The request fails with a `400` error if the estimated cost would exceed this budget. Useful for controlling costs when using `escalate` or `anti_bot`. Minimum: `1`. Run AI-powered extraction on the scraped content using the instruction in `ai_prompt`. Adds **+2 credits** to the request cost. The scrape result is still returned even if AI extraction fails. Natural language instruction for AI data extraction. Required when `ai_extract` is `true`. Maximum 2000 characters. ```json theme={null} { "ai_extract": true, "ai_prompt": "Extract all product names and prices as a JSON array" } ``` ## Response Whether the scrape completed successfully. `false` when all retries are exhausted and the page is still blocked. The final URL after any redirects. HTTP status code from the target URL. The scraped content in the requested format. `null` when `success` is `false`. The output format used: `html`, `markdown`, or `text`. The engine tier that produced the final result. Total credits charged for this request, including engine cost, solver, and AI extraction. Total request processing time in milliseconds. Number of retry attempts performed. `0` if the first attempt succeeded. Size of the returned content in bytes. Base64-encoded PNG screenshot of the page. Only present when `screenshot: true` was requested. Base64-encoded animated GIF of the browser session. Only present when `video: true` was requested. HTTP response headers from the target URL. Whether a blocking page was detected during scraping. Details about the detected blocking page. Only present when `blocking_detected` is `true`. Whether the page is confirmed as a blocking page. Type of block detected (e.g., `cloudflare`, `datadome`, `akamai`, `kasada`). Confidence score from `0.0` to `1.0`. Human-readable description of the block. List of anti-bot systems detected on the page. System name (e.g., `cloudflare_turnstile`, `datadome`, `akamai`, `kasada`, `amazon_waf`). Confidence score from `0.0` to `1.0`. Additional detection details. List of CAPTCHA systems detected on the page. System name (e.g., `recaptcha_v2`, `recaptcha_v3`, `hcaptcha`, `geetest`). Confidence score from `0.0` to `1.0`. Additional detection details. Whether the anti-bot solver successfully bypassed the protection. Name of the solver that successfully bypassed the block. `null` if no solver was used. Structured data extracted by the LLM based on `ai_prompt`. The shape depends on your prompt. `null` when `ai_extract` is `false` or extraction failed. The LLM model used for extraction (e.g., `gpt-4o-mini`). `null` when AI extraction was not used. Error message if AI extraction failed. The scrape result is still returned. `null` on success. ## Examples ### Basic scrape ```bash cURL theme={null} curl -X POST "https://scrapebadger.com/v1/web/scrape" \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://scrapebadger.com", "format": "markdown"}' ``` ```python Python theme={null} import requests response = requests.post( "https://scrapebadger.com/v1/web/scrape", headers={"x-api-key": "YOUR_API_KEY"}, json={"url": "https://scrapebadger.com", "format": "markdown"} ) print(response.json()["content"]) ``` ```javascript JavaScript theme={null} const res = await fetch("https://scrapebadger.com/v1/web/scrape", { method: "POST", headers: { "x-api-key": "YOUR_API_KEY", "Content-Type": "application/json" }, body: JSON.stringify({ url: "https://scrapebadger.com", format: "markdown" }) }); const data = await res.json(); console.log(data.content); ``` ### JavaScript rendering with wait ```json theme={null} { "url": "https://scrapebadger.com/spa-page", "format": "html", "render_js": true, "wait_for": "#dynamic-content", "wait_timeout": 10000 } ``` ### AI extraction ```json theme={null} { "url": "https://scrapebadger.com/products", "format": "markdown", "ai_extract": true, "ai_prompt": "Extract all product names, prices, and ratings as a JSON array of objects with keys: name, price, rating" } ``` ### Full anti-bot bypass with budget ```json theme={null} { "url": "https://heavily-protected-site.com", "format": "markdown", "escalate": true, "anti_bot": true, "max_cost": 20, "country": "US" } ``` ### Browser automation scenario ```json theme={null} { "url": "https://scrapebadger.com/infinite-scroll", "format": "text", "js_scenario": [ { "type": "scroll", "direction": "down", "amount": 1000 }, { "type": "wait", "milliseconds": 2000 }, { "type": "scroll", "direction": "down", "amount": 1000 }, { "type": "wait", "milliseconds": 2000 } ] } ``` ## Error Responses | Status | Description | | ------ | ---------------------------------------------------------------------------------------------- | | `400` | Invalid URL, cost exceeds `max_cost`, or requested engine not available | | `402` | Insufficient credits | | `422` | Blocking detected after all retries exhausted (`success: false`, `blocking_details` populated) | | `429` | Rate limit exceeded | | `500` | Unexpected server error | ```json 200 — Success theme={null} { "success": true, "url": "https://scrapebadger.com", "status_code": 200, "content": "# Example Domain\n\nThis domain is for use in illustrative examples...", "format": "markdown", "engine_used": "http", "credits_used": 1, "duration_ms": 342, "retries_used": 0, "content_length": 1256, "screenshot_url": null, "video_url": null, "headers": { "content-type": "text/html; charset=UTF-8" }, "blocking_detected": false, "blocking_details": null, "antibot_systems": [], "captcha_systems": [], "anti_bot_solved": false, "solver_used": null, "ai_extraction": null, "ai_model": null, "ai_error": null } ``` ```json 200 — With AI Extraction theme={null} { "success": true, "url": "https://scrapebadger.com/products", "status_code": 200, "content": "# Products\n\n...", "format": "markdown", "engine_used": "http", "credits_used": 3, "duration_ms": 2150, "retries_used": 0, "content_length": 8432, "screenshot_url": null, "video_url": null, "headers": {}, "blocking_detected": false, "blocking_details": null, "antibot_systems": [], "captcha_systems": [], "anti_bot_solved": false, "solver_used": null, "ai_extraction": [ { "name": "Widget Pro", "price": "$29.99", "rating": 4.5 }, { "name": "Widget Basic", "price": "$9.99", "rating": 4.2 } ], "ai_model": "gpt-4o-mini", "ai_error": null } ``` ```json 422 — Blocked theme={null} { "success": false, "url": "https://protected-site.com", "status_code": 403, "content": null, "format": "markdown", "engine_used": "http", "credits_used": 0, "duration_ms": 5230, "retries_used": 3, "content_length": 0, "screenshot_url": null, "video_url": null, "headers": {}, "blocking_detected": true, "blocking_details": { "is_blocked": true, "block_type": "cloudflare", "confidence": 0.95, "details": "Cloudflare challenge page detected" }, "antibot_systems": [ { "system": "cloudflare_turnstile", "confidence": 0.95, "details": null } ], "captcha_systems": [], "anti_bot_solved": false, "solver_used": null, "ai_extraction": null, "ai_model": null, "ai_error": null } ``` ## OpenAPI ````yaml POST /v1/web/scrape openapi: 3.1.0 info: title: ScrapeBadger Web Scraping API version: 1.0.0 description: Web scraping API with anti-bot bypass, JS rendering, and AI extraction. servers: - url: https://scrapebadger.com description: Production security: - apiKeyAuth: [] paths: /v1/web/scrape: post: summary: Scrape URL description: >- Scrape a webpage and return its content as HTML, Markdown, or plain text. operationId: scrapeUrl requestBody: required: true content: application/json: schema: type: object required: - url properties: url: type: string description: The URL to scrape. Must be a valid HTTP or HTTPS URL. engine: type: string default: auto enum: - auto - browser description: Scraping engine tier to use. format: type: string default: html enum: - html - markdown - text description: Output format for the scraped content. render_js: type: boolean default: false description: Force JavaScript rendering. wait_for: type: string description: CSS selector or XPath to wait for before extracting. wait_timeout: type: integer default: 30000 description: Max wait time in ms for wait_for selector. wait_after_load: type: integer description: Additional ms to wait after page load. js_scenario: type: array items: type: object description: Browser actions to perform before extracting. session_id: type: string description: Persist cookies and state across requests. retry_count: type: integer default: 3 description: Max retry attempts on blocking detection. retry_on_block: type: boolean default: true description: Auto-retry on blocking page detection. country: type: string description: ISO 3166-1 alpha-2 country code for proxy geo-targeting. custom_headers: type: object description: Additional HTTP headers for the target request. screenshot: type: boolean default: false description: Capture a full-page PNG screenshot. video: type: boolean default: false description: Record browser session as animated GIF (+3 credits). anti_bot: type: boolean default: false description: Attempt anti-bot bypass when blocking detected. escalate: type: boolean default: false description: Allow auto-escalation to stronger engines. max_cost: type: integer description: Maximum credits budget for this request. ai_extract: type: boolean default: false description: Run AI extraction on scraped content. ai_prompt: type: string description: Natural language instruction for AI extraction. responses: '200': description: Successful scrape content: application/json: schema: type: object properties: success: type: boolean url: type: string status_code: type: integer content: type: string format: type: string engine_used: type: string credits_used: type: integer duration_ms: type: integer retries_used: type: integer content_length: type: integer screenshot_url: type: string nullable: true video_url: type: string nullable: true headers: type: object blocking_detected: type: boolean blocking_details: type: object nullable: true antibot_systems: type: array captcha_systems: type: array anti_bot_solved: type: boolean solver_used: type: string nullable: true ai_extraction: nullable: true ai_model: type: string nullable: true ai_error: type: string nullable: true components: securitySchemes: apiKeyAuth: type: apiKey in: header name: x-api-key ````