Scrapinghub is a company focused on information retrieval and its later manipulation,
deeply involved on developing and contributing in Open Source projects regarding
web crawling and data processing technologies.
Scrapy is a very popular web crawling and
scraping framework for Python (10th in Github most trending Python projects)
used to write spiders for crawling and extracting data from websites.
Check Scrapy ideas
Portia is a tool that allows you
to visually scrape websites without any programming knowledge required.
Users can annotate web pages to identify the data they wish to extract,
and Portia will understand based on these annotations how to scrape data
from similar pages.
Check Portia ideas
Splash is a lightweight web browser with an HTTP API.
get detailed information and take screenshots of the crawled websites
as they are seen in a browser.
Check Splash ideas
Frontera is a web crawling framework
consisting of crawl frontier,
and distribution/scaling primitives, allowing to build a large scale
online web crawler.
Check Frontera ideas