Slideshare.net (beta)

 
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons



All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 0 (more)

data == code | LRUG April 2008

From delineator, 3 months ago

Morph and Pottery rubygem utilities for screen scrapers.

1204 views  |  0 comments  |  0 favorites  |  4 downloads
 

Groups/Events

Not added to any group/event

 
 

Privacy InfoNew!

This slideshow is Public

 
Embed in your blog
Embed (wordpress.com)
custom

Slideshow Statistics
Total Views: 1204
on Slideshare: 1204
from embeds: 0* * Views from embeds since 21 Aug, 07

Slideshow transcript

Slide 1: code == data

Slide 2: data == code

Slide 3: Open Struct Photo: Salt Fired http://www.flickr.com/photos/saltfired/201994906/

Slide 4: require 'ostruct'

Slide 5: o = OpenStruct.new

Slide 6: o.name = 'el rug'

Slide 7: o.name

Slide 8: => "el rug"

Slide 9: o.inspect

Slide 10: => <OpenStruct name="el rug">

Slide 11: # not very classy

Slide 12: o.class

Slide 13: => OpenStruct

Slide 14: class Fund < OpenStruct def your_logic end end

Slide 16: public class Fund extends HashMap { }

Slide 17: public class Fund extends HashMap { /* bad code smell */ }

Slide 18: public class Fund extends HashMap<String, Object> { }

Slide 19: public class Fund extends HashMap<String, Object> { /* this stinks! */ }

Slide 22: Morph Photo: Salt Fired http://www.flickr.com/photos/saltfired/201998836/

Slide 23: gem install morph

Slide 24: require 'morph'

Slide 25: require 'hpricot' require 'open-uri'

Slide 29: class Hubbit include Morph

Slide 30: def initialize name doc = Hpricot open "http://github.com/#{name}"

Slide 31: (doc/'label').collect do |l|

Slide 32: label = l.inner_text

Slide 33: value = l.next_sibling. inner_text.strip

Slide 34: morph(label, value)

Slide 35: class Hubbit include Morph def initialize name begin doc = Hpricot open("http://github.com/#{name}") (doc/'label').collect do |node| label = node.inner_text value = node.next_sibling.inner_text.strip morph(label, value) end rescue raise "Couldn't find hubbit with name: #{name}" end end end

Slide 36: Hubbit.morph_methods

Slide 37: => []

Slide 38: why = Hubbit.new 'why'

Slide 39: => #<Hubbit @name="why the lucky stiff", @email="why@why...">

Slide 40: Hubbit.morph_methods

Slide 41: => ["email", "email=", "name", "name="]

Slide 42: why.name

Slide 43: => "why the lucky stiff"

Slide 44: why. 年龄
 = 21

Slide 45: why. 年龄

Slide 46: => 21

Slide 47: why.company

Slide 48: NoMethodError: undefined method 'company'

Slide 49: # maybe should have

Slide 50: why.company?

Slide 51: # but that's not there yet

Slide 53: dhh = Hubbit.new 'dhh'

Slide 54: Hubbit.morph_methods

Slide 55: => ["blog", "blog=", "company", "company=", "email", "email=", "location", "location=", "name", "name=", " 年龄 ", " 年龄 ="]

Slide 56: dhh.company

Slide 57: => "37signals"

Slide 58: why.company

Slide 59: => nil

Slide 60: implementation

Slide 61: def method_missing sym, *args is_writer = sym.to_s =~ /=$/ is_writer ? morph_method_missing(sym, *args) : super end

Slide 62: def morph_method_missing symbol, *args attribute = symbol.to_s.chomp '=' # ... if block_given? yield self.class, attribute else self.class.class_eval "attr_accessor :#{attribute}" send(symbol, *args) end # ... end

Slide 63: Soup Photo: Chrissy Wainwright http://www.flickr.com/photos/wainwright/380578681/

Slide 64: gem install soup

Slide 65: require 'soup'

Slide 66: Soup.prepare

Slide 67: s = Snip.new

Slide 68: s.name = 'el rug'

Slide 69: s.inspect

Slide 70: => "<Snip id:unset name:el rug>"

Slide 71: s.save

Slide 72: => "<Snip id:1 name:el rug>"

Slide 73: s = Snip['el rug']

Slide 74: => "<Snip id:1 name:el rug>"

Slide 75: # has no class

Slide 76: s.class

Slide 77: => nil

Slide 78: BlankSlate

Slide 79: class EmptyClass instance_methods.each { |m| undef_method(m) unless m =~ /^(__|instance_eval|respond_to?)/ } end class Snip < EmptyClass; end

Slide 80: Pottery Photo: zhaoshouren http://www.flickr.com/photos/ ajanhelendam/2326369128/

Slide 81: gem install pottery

Slide 85: def get_price_rows doc rows = rows_starting 'Bid(GBX)', doc @bid_offer = rows.size > 0

Slide 88: rows = rows_starting 'Nav(GBX)', doc unless @bid_offer rows end

Slide 89: def rows_starting label, doc (doc/"table/tr/td/[text()='#{label}']/../../../tr") end

Slide 90: def each_entry doc get_price_rows(doc).each do |row| cells = (row/'td').collect(&:inner_text). collect(&:strip).delete_if(&:blank?) cells.in_groups_of(2) do |entry| yield entry[0], entry[1] end end end

Slide 91: doc = open_doc url each_entry doc do |label, value| morph(label, value) end time = Time.now.utc.to_s self.time = time.match(/dd:dd:dd/)[0] self.name = doc.at('.FundNameHeader').inner_text self.url = url self.date = Date.today.to_s self.id_name = "#{url}##{date}"

Slide 92: require 'pottery'

Slide 93: class Fund include Pottery

Slide 94: def initialize fund=nil if fund url = "http://funds.ft.com/funds/#{fund}" doc = open_doc url each_entry doc do |label, value| morph(label, value) end time = Time.now.utc.to_s self.time = time.match(/dd:dd:dd/)[0] self.name = doc.at('.FundNameHeader').inner_text self.url = url self.date = Date.today.to_s self.id_name = "#{url}##{date}" end end def bid_price @bid_offer ? bid_gbx : nav_gbx end def offer_price @bid_offer ? offer_gbx : '' end private def each_entry doc get_price_rows(doc).each do |row| cells = (row/'td').collect(&:inner_text).collect(&:strip).delete_if(&:blank?) cells.in_groups_of(2) do |entry| yield entry[0], entry[1] end end end def get_price_rows doc rows = rows_starting 'Bid(GBX)', doc @bid_offer = rows.size > 0 rows = rows_starting 'Nav(GBX)', doc unless @bid_offer rows end def rows_starting label, doc (doc/"table/tr/td/[text()='#{label}']/../../../tr") end

Slide 95: end # of Fund

Slide 97: fund = Fund.new 'rufferllp/ ruffer/RZBST'

Slide 98: Fund.morph_methods

Slide 99: ["_52w_high", "_52w_high=", "_52w_low", "_52w_low=", "change", "change=", "date", "date=", "gross_yield", "gross_yield=", "id_name", "id_name=", "listed_yield", "listed_yield=", "name", "name=", "nav_gbx", "nav_gbx=", "net_yield", "net_yield=", "percentage_change", "percentage_change=", "time", "time=", "url", "url="]

Slide 100: fund.save

Slide 101: Fund.restore 'rufferllp/ ruffer/RZBST# 2008-04-14'

Slide 102: #<Fund:0x1857414 @percentage_change="+0.96", @gross_yield="-", @id_name="rufferllp/ruffer/RZBST#2008-04- 14", @net_yield="-", @bid_offer=false, @date="2008-04-14", @_52w_low="142.38", @listed_yield="-", @time="23:00:14", @name="Ruffer CF Baker Steel Gold O Acc NAV", @nav_gbx="183.90", @url="rufferllp/ruffer/RZBST", @change="+1.74", @_52w_high="209.88">

Slide 104: Future features?

Slide 105: identify data types e.g. integer, date, string

Slide 106: generate Rails generator line e.g. script/generate model x:string y:integer

Slide 107: generate doodle definition!

Slide 108: data == code http://code.whytheluckystiff.net/hpricot http://github.com/lazyatom/soup http://github.com/robmckinnon/morph http://github.com/robmckinnon/pottery