- Stars
- 68,941
- License
- AGPL-3.0
- Last commit
- 3 days ago
Best Web Scraping & Crawling Tools
Explore leading tools in the Web Scraping & Crawling category, including open-source options and SaaS products. Compare features, use cases, and find the best fit for your workflow.
10+ open-source projects · 6 SaaS products
Top open-source Web Scraping & Crawling
These projects are active, self-hostable choices for knowledge management teams evaluating alternatives to SaaS tools.
- Stars
- 59,114
- License
- BSD-3-Clause
- Last commit
- 5 days ago
- Stars
- 24,863
- License
- Apache-2.0
- Last commit
- 11 days ago
- Stars
- 21,921
- License
- MIT
- Last commit
- 5 days ago
- Stars
- 20,704
- License
- Apache-2.0
- Last commit
- 3 days ago
- Stars
- 14,885
- License
- MIT
- Last commit
- 4 days ago
Crawlee provides a unified API for HTTP and headless-browser crawling, automatic proxy rotation, persistent queues, and flexible storage, enabling reliable, scalable scrapers in Node.js.
Expect a strong TypeScript presence among maintained projects.
Popular SaaS Platforms to Replace
Understand the commercial incumbents teams migrate from and how many open-source alternatives exist for each product.
Apify
Web automation & scraping platform powered by serverless Actors
Browserbase
Cloud platform for running and scaling headless web browsers, enabling reliable browser automation and scraping at scale
Browserless
Headless browser platform & APIs for Puppeteer/Playwright with autoscaling
Crawlbase
Web scraping & crawling platform with smart proxy and anti-bot bypass
ScrapingBee
Web scraping API that handles headless browsers and rotating proxies
Zyte
Data extraction platform with Zyte API, Smart Proxy Manager, and Scrapy Cloud
Apify lets you build and run ‘Actors’ to scrape websites, automate workflows, and integrate results with APIs and databases—scaling locally or in the cloud.
Frequently replaced when teams want private deployments and lower TCO.
Explore related categories
Browse neighbouring categories in Data Engineering to widen your evaluation.
- Data Catalogs & GovernanceMetadata catalogs with governance, discovery and lineage across data assets.
- ETL & Data IntegrationExtract-transform-load (ETL) and data integration platforms for moving and transforming data.
- Stream Processing EnginesFrameworks for real-time processing of streaming data and events.
- Workflow Orchestration ToolsWorkflow managers for scheduling and orchestrating data pipelines.





