
Apify
Web automation & scraping platform powered by serverless Actors
Discover top open-source software, updated regularly with real-world adoption signals.

Extract structured data from any webpage using LLMs
LLM Scraper is a TypeScript library that turns any webpage into structured data using LLM function calling, offering full type safety, Playwright integration, and support for major AI model providers.
LLM Scraper targets developers who need reliable, typed extraction of information from dynamic web pages. By leveraging LLM function calling, it converts arbitrary page content into JSON that matches a developer‑defined schema.
The library works with Playwright to load pages in several modes—HTML, raw HTML, Markdown, plain text, or even screenshots for multimodal models. Schemas can be expressed with Zod or JSON Schema, giving you compile‑time type safety. It supports a wide range of model families (OpenAI, Anthropic, Google, Groq, Ollama, etc.) and offers streaming responses and code‑generation to produce reusable Playwright scripts.
Install the package and your chosen AI SDK, launch a Playwright browser, create an LLM instance, define a Zod schema, and call scraper.run(page, schema). Optional stream and generate methods let you receive partial results or auto‑create scraper code. The MIT‑licensed library works in any Node.js/TypeScript project.
When teams consider LLM Scraper, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
News aggregation
Extract top stories, scores, authors, and comment links from news sites into a structured JSON feed.
E‑commerce price monitoring
Collect product name, price, availability, and SKU from retailer pages for price‑tracking dashboards.
Research data collection
Gather article titles, authors, abstracts, and publication dates from academic journal websites.
CMS content generation
Convert marketing page sections into structured components (headings, copy, images) for automated CMS population.
LLM Scraper works with OpenAI, Anthropic, Google, Groq, Ollama, and any provider compatible with the Vercel AI SDK.
Yes, it relies on Playwright to load and interact with pages, so a headless browser instance is required.
Schemas can be written using Zod objects or JSON Schema files, and the library parses results against them.
Yes, the `stream` method returns a partial object stream (available with the Vercel AI SDK).
It is released under the MIT license, which permits commercial use, modification, and distribution.
Project at a glance
ActiveLast synced 4 days ago