Crawler, Robots

1,420 views

Published on

This is overview about crawlers and targeted on Ruby Crawler as language can be different but concept is same

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,420
On SlideShare
0
From Embeds
0
Number of Embeds
44
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Crawler, Robots

  1. 1. Crawlers in Ruby Sapna Solutions Akshay Gupta
  2. 2. What is Crawler? When you are hungry, you prefer to go a restaurant that can serve you with delicious food, of your interest. Technically: Restaurant : Site/Base_url Food : Data/Information Interest : Relevant knowledge Synonym: ● Spider ● Robot ● Bot
  3. 3. How to make Robot? ● Website ● DOM (Document Object Model) ● Library (depending upon language)
  4. 4. How to make in Ruby? Libraries: Rubyfulsoup Hpricot WWW::Mechanize ScRUBYt Watir
  5. 5. Hpricot ● Best to use for simple text-extraction ● Clear API ● Fast and better than Rubyfulsoup ● Methods like parent and child, sibling as in JS, makes life easier
  6. 6. Is something missing? What you think? Is it really easy and makes scraping fast and efficient?
  7. 7. Firebug :-) Firebug integrates with Firefox to put a wealth of web development tools at your fingertips while you browse. You can edit, debug, and monitor CSS, HTML, and JavaScript live in any web page. ● Firebug (http://www.getfirebug.com/) ● This makes life easier. Do learn to use it
  8. 8. Enough...where is the code?? ● Build Doc = Hpricot(open(url-name)) ● To walk through DOM: (Doc/”#header”) ● More: (Doc/”.love_class”), (Doc/”a/ul/li[4]”) ● Doc.search(“[@href]”).first[:href]
  9. 9. References ●http://www.rubyrailways.com/data-extraction-for-web-20- ●http://www.google.com ●http://wiki.github.com/why/hpricot
  10. 10. Thanks :-) Questions???

×