“ Web scraping (also called Web harvesting or Web data extraction ) is a computer software technique of extracting information from websites. …Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to Web automation, which simulates human Web browsing using computer software. Uses of Web scraping include online price comparison, weather data monitoring, website change detection, Web research, Web content mashup and Web data integration.”
Pair with Versionista, which can create an RSS feed of changes to a Web site to keep tabs on what’s changing. This was done to great effect by ProPublica’s team in late 2009, esp. by Scott Klein and then-intern Brian Boyer, now at Chicago Tribune
Non-programming scrapers can’t do everything, but have the power to get you started. Some say “Program or be programmed,” but this is a compromise.
Legal permissions still apply, so don’t use scraped info you don’t have the right to.
Something to consider. How does this apply to what you do every day, and how scraping could contribute to your job?
“ The businesses that win will be those that understand how to build value from data from wherever it comes. Information isn’t power. The right information is.” – media consultant Neil Perkin wrote in Marketing Week