Best Open-source Web Scraping & Crawling tools

Explore curated open-source tools in the Web Scraping & Crawling category. Compare technologies, see alternatives, and find the right solution for your workflow.

10+ projects · Page 1 of 1

Firecrawl logo

Firecrawl

Turn any website into clean, LLM‑ready data instantly

Stars
129,494
License
AGPL-3.0
Last commit
3 hours ago
TypeScriptActive
Crawlee logo

Crawlee

Build fast, human-like web scrapers with a single library

Stars
23,687
License
Apache-2.0
Last commit
7 hours ago
TypeScriptActive
Scrapy logo

Scrapy

Fast, high-level Python framework for web crawling and scraping

Stars
62,123
License
BSD-3-Clause
Last commit
1 day ago
PythonActive
ScrapeGraphAI logo

ScrapeGraphAI

LLM‑powered web scraping pipelines in just five lines of code

Stars
26,771
License
MIT
Last commit
1 day ago
PythonActive
Maxun logo

Maxun

Train a web‑scraping robot in minutes, no code required

Stars
15,758
License
AGPL-3.0
Last commit
1 day ago
TypeScriptActive
Scrapling logo

Scrapling

Adaptive web scraping that survives site changes effortlessly

Stars
61,565
License
BSD-3-Clause
Last commit
2 days ago
PythonActive
Katana logo

Katana

Fast, configurable web crawler with headless and JavaScript support

Stars
16,969
License
MIT
Last commit
2 days ago
GoActive
Crawl4AI logo

Crawl4AI

Turn the web into clean, LLM-ready Markdown instantly

Stars
67,916
License
Apache-2.0
Last commit
2 days ago
PythonActive
ChangeDetection.io logo

ChangeDetection.io

Real-time website change monitoring with instant multi-channel alerts

Stars
31,901
License
Apache-2.0
Last commit
3 days ago
PythonActive
LLM Scraper logo

LLM Scraper

Extract structured data from any webpage using LLMs

Stars
6,777
License
MIT
Last commit
4 days ago
TypeScriptActive
Apache Nutch logo

Apache Nutch

Scalable, extensible Java web crawler for large‑scale data collection

Stars
3,158
License
Apache-2.0
Last commit
6 days ago
JavaActive
Colly logo

Colly

Fast, elegant web scraping framework for Go developers

Stars
25,313
License
Apache-2.0
Last commit
12 days ago
GoActive
AnyCrawl logo

AnyCrawl

High-performance web, site, and SERP crawler with AI extraction

Stars
3,182
License
MIT
Last commit
21 days ago
MDXActive
WebMagic logo

WebMagic

Scalable Java crawler framework with flexible API and annotations

Stars
11,679
License
Apache-2.0
Last commit
5 months ago
JavaStable
AutoScraper logo

AutoScraper

Automatic, fast, lightweight web scraper that learns from examples

Stars
7,178
License
MIT
Last commit
1 year ago
PythonStable