Crawling the web for fun and profit

10,082 views

Published on

Crawling technology are the basis for search engines but they also have many applications for business and for fun.

Published in: Technology
0 Comments
25 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,082
On SlideShare
0
From Embeds
0
Number of Embeds
521
Actions
Shares
0
Downloads
0
Comments
0
Likes
25
Embeds 0
No embeds

No notes for slide

Crawling the web for fun and profit

  1. 1. Crawling the Web (for fun and profit) Federico Feroldi
  2. 2. “A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner.” Wikipedia Picture greetings to photoholic1 --LennyB
  3. 3. Search engines only show you what their crawlers can catch Picture greetings to jimbrickett
  4. 4. The deep web contains a lot of valuable information e-commerce finance transportation yellow pages medicine government opinions real estate personal intranets social Picture greetings to tricky ™
  5. 5. Dig deeper with your own crawler Picture greetings to Super*Junk
  6. 6. Information = Competitive Advantage Picture greetings to mastrobiggo
  7. 7. B a cku p h i s t o r i c a l data: web sites, blogs
  8. 8. Social network analysis: find influencers and interests based on “social circles”
  9. 9. Find what people like
  10. 10. Sentiment analysis: find what people say about your brand or product
  11. 11. Trending topics and products
  12. 12. Competitor price tracking
  13. 13. Real estate
  14. 14. Personal data and online reputation
  15. 15. Do It Yourself Picture greetings to vic_206
  16. 16. Anybody can build a search engine
  17. 17. Scrapy Scheduler Internet architecture Re qu es Data ts Item Scrapy Downloader pipeline Requests Engine es Ite ns ms po R es Spider
  18. 18. Twitter social graph crawler with Scrapy in 150 LOC
  19. 19. The Web is much bigger than what you can search with Google
  20. 20. Thank you federico@cloudify.me twitter.com/cloudify

×