Your SlideShare is downloading. ×
0
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Crawler, Robots
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Crawler, Robots

1,132

Published on

This is overview about crawlers and targeted on Ruby Crawler as language can be different but concept is same

This is overview about crawlers and targeted on Ruby Crawler as language can be different but concept is same

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,132
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Crawlers in Ruby Sapna Solutions Akshay Gupta
  • 2. What is Crawler? When you are hungry, you prefer to go a restaurant that can serve you with delicious food, of your interest. Technically: Restaurant : Site/Base_url Food : Data/Information Interest : Relevant knowledge Synonym: ● Spider ● Robot ● Bot
  • 3. How to make Robot? ● Website ● DOM (Document Object Model) ● Library (depending upon language)
  • 4. How to make in Ruby? Libraries: Rubyfulsoup Hpricot WWW::Mechanize ScRUBYt Watir
  • 5. Hpricot ● Best to use for simple text-extraction ● Clear API ● Fast and better than Rubyfulsoup ● Methods like parent and child, sibling as in JS, makes life easier
  • 6. Is something missing? What you think? Is it really easy and makes scraping fast and efficient?
  • 7. Firebug :-) Firebug integrates with Firefox to put a wealth of web development tools at your fingertips while you browse. You can edit, debug, and monitor CSS, HTML, and JavaScript live in any web page. ● Firebug (http://www.getfirebug.com/) ● This makes life easier. Do learn to use it
  • 8. Enough...where is the code?? ● Build Doc = Hpricot(open(url-name)) ● To walk through DOM: (Doc/”#header”) ● More: (Doc/”.love_class”), (Doc/”a/ul/li[4]”) ● Doc.search(“[@href]”).first[:href]
  • 9. References ●http://www.rubyrailways.com/data-extraction-for-web-20- ●http://www.google.com ●http://wiki.github.com/why/hpricot
  • 10. Thanks :-) Questions???

×