Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Crawling the web for fun and profit

10,751 views

Published on

Crawling technology are the basis for search engines but they also have many applications for business and for fun.

Published in: Technology
  • Be the first to comment

Crawling the web for fun and profit

  1. 1. Crawling the Web (for fun and profit) Federico Feroldi
  2. 2. “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner.” Wikipedia Picture greetings to photoholic1 --LennyB
  3. 3. Search engines only show you what their crawlers can catch Picture greetings to jimbrickett
  4. 4. The deep web contains a lot of valuable information e-commerce finance transportation yellow pages medicine government opinions real estate personal intranets social Picture greetings to tricky ™
  5. 5. Dig deeper with your own crawler Picture greetings to Super*Junk
  6. 6. Information = Competitive Advantage Picture greetings to mastrobiggo
  7. 7. B a cku p h i s t o r i c a l data: web sites, blogs
  8. 8. Social network analysis: find influencers and interests based on “social circles”
  9. 9. Find what people like
  10. 10. Sentiment analysis: find what people say about your brand or product
  11. 11. Trending topics and products
  12. 12. Competitor price tracking
  13. 13. Real estate
  14. 14. Personal data and online reputation
  15. 15. Do It Yourself Picture greetings to vic_206
  16. 16. Anybody can build a search engine
  17. 17. Scrapy Scheduler Internet architecture Re qu es Data ts Item Scrapy Downloader pipeline Requests Engine es Ite ns ms po R es Spider
  18. 18. Twitter social graph crawler with Scrapy in 150 LOC
  19. 19. The Web is much bigger than what you can search with Google
  20. 20. Thank you federico@cloudify.me twitter.com/cloudify

×