Crawlee logo

Crawlee

Build fast, human-like web scrapers with a single library

Crawlee provides a unified API for HTTP and headless-browser crawling, automatic proxy rotation, persistent queues, and flexible storage, enabling reliable, scalable scrapers in Node.js.

Crawlee banner

Overview

Overview

Crawlee is a TypeScript‑first library that lets developers build reliable web scrapers and browser‑automation pipelines in Node.js. It targets data engineers, product teams, and researchers who need to collect structured data from the open web, APIs, or rendered pages.

Core capabilities

The library provides a single interface for both raw HTTP requests and headless‑browser crawling (Playwright or Puppeteer) with human‑like fingerprinting, automatic TLS and header generation, and built‑in proxy rotation. Persistent queues support breadth‑first or depth‑first strategies, while pluggable storage adapters let you save tabular results or files locally or to cloud buckets. Hooks and configurable retries give fine‑grained control over request lifecycles, and the CLI can bootstrap projects with ready‑to‑run examples and Dockerfiles for container deployment.

Deployment

Crawlee runs anywhere Node.js 16+ is available – locally, in CI pipelines, or on the Apify platform. Docker images are supplied for easy scaling, and the library integrates smoothly with existing TypeScript or JavaScript codebases.

Highlights

Single interface for HTTP and headless‑browser crawling
Persistent URL queue with breadth‑first and depth‑first options
Pluggable storage for datasets and file assets
Integrated proxy rotation and session management

Pros

  • Human‑like fingerprinting works out‑of‑the‑box
  • Supports Playwright and Puppeteer via a unified API
  • Scales automatically with available system resources
  • TypeScript‑first with strong typings

Considerations

  • Requires Node.js 16+; adding Playwright increases install size
  • Hook‑based lifecycle can add learning curve
  • Primarily JavaScript/TypeScript ecosystem (Python separate repo)
  • High‑volume proxy rotation may need external services

Managed products teams compare with

When teams consider Crawlee, these hosted platforms usually appear on the same shortlist.

Apify logo

Apify

Web automation & scraping platform powered by serverless Actors

Browserbase logo

Browserbase

Cloud platform for running and scaling headless web browsers, enabling reliable browser automation and scraping at scale

Browserless logo

Browserless

Headless browser platform & APIs for Puppeteer/Playwright with autoscaling

Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.

Fit guide

Great for

  • Developers building reliable web scrapers in Node.js
  • Teams collecting data for AI or LLM training pipelines
  • Projects needing both HTTP and rendered‑page crawling
  • Users who want CLI bootstrap and Docker deployment

Not ideal when

  • Environments limited to Python without Node.js
  • One‑off scripts where a heavyweight browser is unnecessary
  • Legacy systems that cannot upgrade to Node.js 16+
  • Use cases demanding ultra‑low latency without browser overhead

How teams use it

E‑commerce price monitoring

Continuously extract product listings, prices, and availability, storing results in a dataset for price‑trend analysis.

Content archiving for research

Download HTML, PDFs, and images from scholarly sites, preserving original files in cloud storage.

LLM training data collection

Scrape large corpora of web text, JSON APIs, and screenshots to feed retrieval‑augmented generation pipelines.

Automated UI testing

Use PlaywrightCrawler to render pages, capture screenshots, and verify element presence across browsers.

Tech snapshot

TypeScript61%
MDX29%
JavaScript7%
CSS1%
Dockerfile1%
Python1%

Tags

automationheadlesspuppeteerweb-crawlerplaywrightheadless-chromenodejsnpmweb-crawlingscraperapifycrawlingscrapingweb-scrapingtypescriptjavascriptcrawler

Frequently asked questions

What Node.js version is required?

Crawlee requires Node.js 16 or higher.

Does Crawlee include a browser engine?

It uses Playwright or Puppeteer, which must be installed separately.

Can I run Crawlee in Docker?

Yes, Dockerfiles are provided for containerized deployment.

How does proxy rotation work?

Crawlee integrates proxy rotation and session management, configurable via its API.

Is there a Python version?

A separate Python implementation is available under the same project name.

Project at a glance

Active
Stars
21,214
Watchers
21,214
Forks
1,158
LicenseApache-2.0
Repo age9 years old
Last commit3 hours ago
Primary languageTypeScript

Last synced 3 hours ago