Why Scrapefold?
Every scraping vendor has trade-offs. Scrapefold lets you switch between them with one line — and escalates from free local engines to paid APIs only as far as a site forces it.
Try a new vendor
Before: rewrite your pipeline
After: change one string —
engines=("firecrawl",)Cascade on block pages
Before: hand-roll try/except chains
After: built-in
is_suspicious + ladder escalationWhole-site crawl
Before: build sitemap parser + BFS + dedup
After:
await crawl_site(root, opts)LLM-ready output
Before: strip HTML by hand
After:
result.markdown always populated15 engines, one interface
Local engines are free and fast; SaaS engines add premium proxies and stealth. The router picks the cheapest tier that works. Ratings: ★★★ excellent · ★★☆ good · ★☆☆ basic.
requests local
Static HTML · ultra-fast · ★★★
scrapling_fast local
TLS-impersonation HTTP · ★★★
scrapling_stealth local
JS render + stealth · ★★★
crawl4ai local
JS render · native markdown · ★★★
cloakbrowser local
Anti-fingerprint browser · ★★★
selenium local
Classic JS rendering · ★★☆
Jina Reader saas · free tier
Direct URL → markdown · ★★★
Firecrawl saas
LLM-ready markdown + stealth · ★★★
ScrapingBee saas
Premium proxy + JS · ★★★
Scrapingdog saas
Affordable proxy + browser · ★★★
Cloudflare BR saas
Browser rendering at the edge · ★★★
Oxylabs saas
Web Scraper API · residential geo · ★★★
Anysite saas
General-purpose · native markdown · ★★★
Apify (LinkedIn) saas · site
Vendor-managed actor runs · ★★☆
Outscraper saas · site
Niche aggregator scrapes · ★★☆
How to choose
Or skip the decision entirely — call scrape(url) and let the router pick.
- Static blog or documentation siterequests — zero deps, sub-second
- JS-rendered SPA, no anti-botscrapling_fast (free) or Jina Reader (free tier)
- Cloudflare / Datadome / PerimeterXscrapling_stealth (free) → Firecrawl / ScrapingBee (paid)
- Site that emits clean markdown via APIJina Reader — direct markdown, no parsing
- LinkedIn / niche socialApify (LinkedIn) — vendor-managed actors
- IP-geofenced targetsOxylabs — residential pool +
geo_location - Need an MCP server for AI agentsscrapefold-mcp — built-in
Quickstart
Install one extra per vendor, or scrapefold[all] for everything.
import asyncio from scrapefold import scrape, crawl_site, ScrapeOptions async def main(): # Single URL, auto-engine — router picks the cheapest tier that works result = await scrape("https://example.com") print(result.markdown) # always populated print(result.engine) # which engine actually fetched it # Cloudflare-protected site — same call, router auto-escalates result = await scrape( "https://protected.example.com", opts=ScrapeOptions(render_js=True, stealth=True), ) # Whole-site crawl with disk cache crawl = await crawl_site( "https://docs.example.com", opts=ScrapeOptions(max_pages=50, max_depth=3), output="site.md", ) asyncio.run(main())
# CLI $ scrapefold scrape https://example.com $ scrapefold crawl https://docs.example.com --max-pages 50 --output site.md $ scrapefold list-engines