Your SlideShare is downloading. ×
0
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
HTML Parsing With Hpricot
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HTML Parsing With Hpricot

4,063

Published on

We can use Hpricot to virtually parse any website. Some cool techniques were shown in this slide to parse a site by Tags, Element IDs, XPath.

We can use Hpricot to virtually parse any website. Some cool techniques were shown in this slide to parse a site by Tags, Element IDs, XPath.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,063
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Linux Creative Group Hpricot – Dig The Impossible With Ruby By: Subhransu Behera arya.subhransu@gmail.com
  • 2. Ruby !!! What’s Special?
  • 3. So … Let’s See ! •  Dynamic
 •  Easy
to
Learn
 •  Easy
to
maintain
and
grow
 •  Convenient
Short‐Cuts
 Ex:
Str
=
“Linux
Crea=ve
Group”
 
 
 Str_join
=
Str.split(“
“).join(“+”)
 •  Transparent,
code
faster
 •  Few
Syntax
Errors,
Fewer
Bugs
 •  It’s
Fun

  • 4. Ruby Gems •  Package
Management
System
for
Ruby
Applica=ons
 and
Libraries

 •  Resolve
Dependencies.

 •  Provides
Central
Repository
of
SoUware.
 •  One
Command
Rules:

 
 
 ‐
gem
install
<gem_name>
 •  Can
Have
your
Own
Local
Gem
Server


 
 ‐
gem
install
<gem_name>
‐‐source
<gem_server_ip_and_port>

  • 5. Hpricot makes it easy to Parse
  • 6. Hpricot •  Pull
informa=on
from
virtually
any
website.
 •  Search
by
Element
ID,
Tags,
CSS
Selectors.
 •  Parse
HTML
including
broken
HTML
 •  Update
HTML
 •  Use
this
data
anywhere
and
anyway
you
want!
 •  Parse
by
XPath
for
directly
parsing
an
element.
 •  Let’s
see
….
How
it
works.


  • 7. Let’s Parse A Badly Designed Site !! •  h^p://www.worldweather.org
 •  It’s
a
site
that
provides
weather
informa=on
for
 different
loca=ons
across
the
globe.
 •  In
the
main
page
they
have
a
badly
nested
table
 structure
!!
 •  An
ideal
Web‐Developer
could
have
put
them
nicely
in
 divs
with
meaningful
IDs.
 •  But
let’s
face
the
truth
and
parse
the
Country
Names
 and
their
URLs.

  • 8. Easy Steps – 1. Open The Site
  • 9. Easy Steps – 2. Inspect With Firebug
  • 10. Easy Steps – 3. Copy X-Path of the Element
  • 11. Easy Steps – 4. Parse By X- Path Using Hpricot
  • 12. Use some Logic & You’ll Get
  • 13. Just Try it Out Questions?
  • 14. References

 •  Ruby
Programming
Language:
h^p:// www.ruby‐lang.org/en/
 •  Hpricot:
h^p://code.whytheluckys=ff.net/ hpricot/
 •  X‐Path:
h^p://en.wikipedia.org/wiki/XPath
 •  Firebug:
h^p://gecirebug.com/

  • 15. Thanks 

×