
Apify
Web automation & scraping platform powered by serverless Actors
Discover top open-source software, updated regularly with real-world adoption signals.

LLM‑powered web scraping pipelines in just five lines of code
Prompt‑driven Python library that turns websites and documents into structured data using LLMs, with multi‑page, parallel, and low‑code integrations.

ScrapeGraphAI lets developers, data scientists, and low‑code users extract structured information from web pages or local documents with a single prompt. By combining large language models with graph‑based logic, it abstracts away HTML parsing and navigation, delivering clean JSON output in minutes.
The library ships with several ready‑made pipelines—SmartScraperGraph for single pages, SearchGraph for top‑N search results, SpeechGraph for audio summaries, and ScriptCreatorGraph for auto‑generated Python scripts. Multi‑graph variants run LLM calls in parallel, boosting throughput. It supports OpenAI, Groq, Azure, Gemini, and local Ollama models, and integrates with LangChain, LlamaIndex, and popular no‑code platforms like Zapier and n8n.
Install via pip install scrapegraphai and set up Playwright for rendering. Choose the Python or Node SDK, configure your LLM credentials, and start scraping in five lines of code. Telemetry is optional and can be disabled with an environment variable.
When teams consider ScrapeGraphAI, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Extract company profiles from competitor websites
Structured JSON containing description, founders, and social media links
Generate Python scripts that scrape product listings
Ready‑to‑run script saved to file for repeated execution
Create audio summaries of news articles
MP3 file with spoken summary generated from a single page
Automate multi‑page research across search results
Consolidated dataset compiled from the top N search results
No. The library works with any supported LLM, including OpenAI, Groq, Azure, Gemini, or local Ollama models.
Install Ollama, pull a model (e.g., llama3.2), and configure the `llm` section with the local model name.
It relies on Playwright; install it via `playwright install` after adding the package.
Yes. Set the environment variable `SCRAPEGRAPHAI_TELEMETRY_ENABLED=false` before running the library.
Yes, a Node SDK (scrapegraph-js) is available for integration in JavaScript projects.
Project at a glance
ActiveLast synced 4 days ago