
Apify
Web automation & scraping platform powered by serverless Actors
Discover top open-source software, updated regularly with real-world adoption signals.

Build fast, human-like web scrapers with a single library
Crawlee provides a unified API for HTTP and headless-browser crawling, automatic proxy rotation, persistent queues, and flexible storage, enabling reliable, scalable scrapers in Node.js.

Crawlee is a TypeScript‑first library that lets developers build reliable web scrapers and browser‑automation pipelines in Node.js. It targets data engineers, product teams, and researchers who need to collect structured data from the open web, APIs, or rendered pages.
The library provides a single interface for both raw HTTP requests and headless‑browser crawling (Playwright or Puppeteer) with human‑like fingerprinting, automatic TLS and header generation, and built‑in proxy rotation. Persistent queues support breadth‑first or depth‑first strategies, while pluggable storage adapters let you save tabular results or files locally or to cloud buckets. Hooks and configurable retries give fine‑grained control over request lifecycles, and the CLI can bootstrap projects with ready‑to‑run examples and Dockerfiles for container deployment.
Crawlee runs anywhere Node.js 16+ is available – locally, in CI pipelines, or on the Apify platform. Docker images are supplied for easy scaling, and the library integrates smoothly with existing TypeScript or JavaScript codebases.
When teams consider Crawlee, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
E‑commerce price monitoring
Continuously extract product listings, prices, and availability, storing results in a dataset for price‑trend analysis.
Content archiving for research
Download HTML, PDFs, and images from scholarly sites, preserving original files in cloud storage.
LLM training data collection
Scrape large corpora of web text, JSON APIs, and screenshots to feed retrieval‑augmented generation pipelines.
Automated UI testing
Use PlaywrightCrawler to render pages, capture screenshots, and verify element presence across browsers.
Crawlee requires Node.js 16 or higher.
It uses Playwright or Puppeteer, which must be installed separately.
Yes, Dockerfiles are provided for containerized deployment.
Crawlee integrates proxy rotation and session management, configurable via its API.
A separate Python implementation is available under the same project name.
Project at a glance
ActiveLast synced 4 days ago