Turn any URL into clean markdown.

One async Python interface over 15 scraping engines — with automatic anti-bot escalation and LLM-ready output. Open source, MIT.

⭐ Star on GitHub View on PyPI
$pip install scrapefoldcopy

15 engines · 4 anti-bot stacks handled (Cloudflare · Datadome · PerimeterX · Akamai) · 631 tests · MIT

scrapefold scrape https://example.com producing clean markdown and auto-escalating past Cloudflare

Why Scrapefold?

Every scraping vendor has trade-offs. Scrapefold lets you switch between them with one line — and escalates from free local engines to paid APIs only as far as a site forces it.

Try a new vendor

Before: rewrite your pipeline
After: change one string — engines=("firecrawl",)

Cascade on block pages

Before: hand-roll try/except chains
After: built-in is_suspicious + ladder escalation

Whole-site crawl

Before: build sitemap parser + BFS + dedup
After: await crawl_site(root, opts)

LLM-ready output

Before: strip HTML by hand
After: result.markdown always populated

15 engines, one interface

Local engines are free and fast; SaaS engines add premium proxies and stealth. The router picks the cheapest tier that works. Ratings: ★★★ excellent · ★★☆ good · ★☆☆ basic.

requests local
Static HTML · ultra-fast · ★★★
scrapling_fast local
TLS-impersonation HTTP · ★★★
scrapling_stealth local
JS render + stealth · ★★★
crawl4ai local
JS render · native markdown · ★★★
cloakbrowser local
Anti-fingerprint browser · ★★★
selenium local
Classic JS rendering · ★★☆
Jina Reader saas · free tier
Direct URL → markdown · ★★★
Firecrawl saas
LLM-ready markdown + stealth · ★★★
ScrapingBee saas
Premium proxy + JS · ★★★
Scrapingdog saas
Affordable proxy + browser · ★★★
Cloudflare BR saas
Browser rendering at the edge · ★★★
Oxylabs saas
Web Scraper API · residential geo · ★★★
Anysite saas
General-purpose · native markdown · ★★★
Apify (LinkedIn) saas · site
Vendor-managed actor runs · ★★☆
Outscraper saas · site
Niche aggregator scrapes · ★★☆

How to choose

Or skip the decision entirely — call scrape(url) and let the router pick.

  • Static blog or documentation siterequests — zero deps, sub-second
  • JS-rendered SPA, no anti-botscrapling_fast (free) or Jina Reader (free tier)
  • Cloudflare / Datadome / PerimeterXscrapling_stealth (free) → Firecrawl / ScrapingBee (paid)
  • Site that emits clean markdown via APIJina Reader — direct markdown, no parsing
  • LinkedIn / niche socialApify (LinkedIn) — vendor-managed actors
  • IP-geofenced targetsOxylabs — residential pool + geo_location
  • Need an MCP server for AI agentsscrapefold-mcp — built-in

Quickstart

Install one extra per vendor, or scrapefold[all] for everything.

import asyncio
from scrapefold import scrape, crawl_site, ScrapeOptions

async def main():
    # Single URL, auto-engine — router picks the cheapest tier that works
    result = await scrape("https://example.com")
    print(result.markdown)        # always populated
    print(result.engine)          # which engine actually fetched it

    # Cloudflare-protected site — same call, router auto-escalates
    result = await scrape(
        "https://protected.example.com",
        opts=ScrapeOptions(render_js=True, stealth=True),
    )

    # Whole-site crawl with disk cache
    crawl = await crawl_site(
        "https://docs.example.com",
        opts=ScrapeOptions(max_pages=50, max_depth=3),
        output="site.md",
    )

asyncio.run(main())
# CLI
$ scrapefold scrape https://example.com
$ scrapefold crawl https://docs.example.com --max-pages 50 --output site.md
$ scrapefold list-engines