
Apify
Web automation & scraping platform powered by serverless Actors
Discover top open-source software, updated regularly with real-world adoption signals.

Automatic, fast, lightweight web scraper that learns from examples
AutoScraper lets you build a scraper by providing a URL and a few example values. It automatically learns extraction rules, enabling fast, repeatable data collection without writing XPath or CSS selectors.
AutoScraper is a Python library that creates a scraper by feeding it a URL (or raw HTML) and a short list of values you expect to find on the page. The library infers the underlying HTML patterns and builds a model that can later retrieve the same type of data from other pages with the same structure.
The API provides two retrieval modes: get_result_similar returns elements that match the pattern (useful for lists such as article titles), while get_result_exact returns values in the exact order you supplied (ideal for single‑field data like a stock price). Trained models can be saved to disk and re‑loaded, allowing you to reuse extraction logic across projects. Custom request arguments let you add proxies, headers, or other options without modifying the core code.
requestsAutoScraper targets developers, data scientists, and small teams that need quick, reliable scraping without the overhead of writing XPath or CSS selectors, and it works with any Python‑3 environment.
When teams consider AutoScraper, these hosted platforms usually appear on the same shortlist.
Looking for a hosted option? These are the services engineering teams benchmark against before choosing open source.
Gather related StackOverflow question titles
Generate a list of similar question titles from any StackOverflow page with a single function call.
Fetch live stock price and market cap
Retrieve current price, market capitalization, or other ticker data from Yahoo Finance by providing a sample value.
Extract GitHub repository metadata
Collect repository description, star count, and issues link for any GitHub repo without writing custom parsers.
Create a lightweight web API
Wrap AutoScraper in Flask to expose an endpoint that returns scraped data on demand, enabling rapid API development.
It parses the HTML of the supplied page, locates the elements containing the example values, and extracts surrounding tag patterns and attributes to build a reusable model.
AutoScraper works on the static HTML returned by the request. For JavaScript‑rendered content you need to fetch the rendered HTML yourself (e.g., with Selenium) before passing it to the library.
Use the `save(filepath)` method to write the model to disk and `load(filepath)` to restore it in a new session.
The library is compatible with Python 3.x (tested on 3.7 and newer).
Yes, you can pass a `request_args` dictionary to `build`, containing any arguments accepted by the `requests` library, such as `headers`, `proxies`, or `auth`.
Project at a glance
StableLast synced 4 days ago