Open Hack 2008 Searchmonkey

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

Favorites, Groups & Events

Open Hack 2008 Searchmonkey - Presentation Transcript

  1.  
  2. Feeding the Monkey: the SearchMonkey data layer, presentation applications, and you
  3. SearchMonkey Presentation by: Evan Goer, SearchMonkey Community Manager (goer@yahoo-inc.com) Paul Tarjan, SearchMonkey Devtool Developer (ptarjan@yahoo-inc.com) http://www.slideshare.net/searchmonkey/open-hack-2008-searchmonkey-presentation
  4. What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After
  5. Enhanced Result Key/value Pairs or abstract Links Image
  6. Infobar Summary Blob
  7. How to get data to SearchMonkey?
    • Humans see:
    • name
    • picture of a person
    • current job
    • industry, …
    • Computers see:
    • an undifferentiated
    • blob of HTML
    • Can we make computers smarter?
  8. Artificial intelligence is hard. Plus…
  9. How does it Work? How does it work? Acme.com’s DB Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo! 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps 2 DataRSS feed Web Services Page Extraction Acme.com’s Site
  10. Data Sources: RDF and Microformats Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data
  11. Approach #1: Embedded RDF <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML+RDFa 1.0//EN” &quot;http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd&quot;> <html xmlns= http://www.w3.org/1999/xhtml xmlns:dc= http://purl.org/dc/elements/1.1/ xmlns:foaf= http://xmlns.com/foaf/0.1/ lang=&quot;en&quot; xml:lang=&quot;en&quot;> <head> <title>The Amazing Home Page of Joe Smith</title> </head> <body> <h1 property=&quot;dc:title&quot;>Joe's Home Page</h1> <div rel=&quot;foaf:maker&quot;> <h2 property=&quot;foaf:name&quot;>Joe Smith</h2> <div rel=&quot;foaf:depiction&quot; resource=&quot;http://joesmith.org/images/jsmith.png&quot;> <img src=&quot;/images/jsmith.png&quot; alt=&quot;Smiling headshot of Joe&quot; /> <p property=&quot;dc:rights&quot;>Creative Commons Attribution 3.0 Unported</p> </div> </div> …
    • Cached data
      • allows Enhanced Results
      • but not for dynamic data
    • Reuse existing markup
      • but requires site redesign
    • Open approach
      • everyone can use
    • Passive, crawled by Y!
      • less bureaucracy to set up
  12. Approach #2: Embedded Microformats <div id=&quot;hcard-Joe-Smith&quot; class=&quot;vcard&quot;> <span class=&quot;fn&quot;>Joe Smith</span> <div class=&quot;adr&quot;> <div class=&quot;street-address&quot;>123 Murphy Avenue</div> <span class=&quot;locality&quot;>Sunnyvale</span>, <span class=&quot;region&quot;>California</span> <span class=&quot;postal-code&quot;>94086</span> </div> <div class=&quot;tel&quot;>(408) 555-1234</div> </div> …
    • Cached data
      • allows Enhanced Results
      • but not for dynamic data
    • Reuse existing markup
      • but requires site redesign
    • Open approach
      • everyone can use
    • Passive, crawled by Y!
      • less bureaucracy to set up
  13. Data Sources: DataRSS Feed Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data
  14. Approach #3: DataRSS Feed <?profile http://search.yahoo.com/searchmonkey-profile ?> <feed xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://www.w3.org/2005/Atom ../xsd/datarss.xsd&quot; xmlns:dc=&quot;http://purl.org/dc/terms/” xmlns=&quot;http://www.w3.org/2005/Atom&quot; xmlns:commerce=&quot;http://search.yahoo.com/searchmonkey/commerce/&quot; xmlns:y=&quot;http://search.yahoo.com/datarss/&quot;> <id>http://local.yahoo.com/datarss/</id> <author><name>Peter Mika (pmika@yahoo-inc.com)</name></author> <title>Example data feed for Local</title> <updated>2008-07-16T04:05:06+07:00</updated> <entry> <title>Parcel 104</title> <id>http://local.yahoo.com/info-21583016-parcel-104-santa-clara</id> <updated>2008-07-16T04:05:06+07:00</updated> <content type=&quot;application/xml&quot;> <y:adjunct version=&quot;1.0&quot; name=&quot;com.yahoo.local”> <y:item rel=&quot;dc:subject&quot;> <y:type typeof=&quot;vcard:VCard commerce:Restaurant”> <y:meta property=&quot;commerce:hoursOfOperation&quot;> Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat.
    • Cached data
      • allows Enhanced Results
      • but not for dynamic data
    • Generate feed from DB
      • and maintain afterwards
    • Closed approach
      • only Yahoo! gets data
    • Actively provide a feed
      • coord w/Yahoo! to set up
  15. Building with Structured Data
    • Structured data -> easy app building
      • Relies on RDF, microformats, DataRSS
        • That was the hard part – whew, you’re done 
    • PHP in a typical app
      • Mostly simple assignments, Data::get()
      • Possibly strings, XML, math
      • Use if statements to check whether fields exist
      • Need to punt? Just return an empty array
  16. Data Sources: XSLT Extractors Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data
  17. Approach #4: Extract with XSLT <?xml version=&quot;1.0&quot;?> <xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; version=&quot;1.0&quot;> <xsl:template match=&quot;/&quot;> <adjunctcontainer> <adjunct id=&quot;smid:{$smid}&quot; version=&quot;1.0&quot;> <item rel=&quot;rel:Photo” resource=&quot;{//div[@class='hresume']//div[@class='image']/img/@src}&quot;/> <item rel=&quot;rel:Card&quot;> <meta property=&quot;vcard:fn&quot;> <xsl:value-of select=&quot;//div[@class='hresume']//span[contains(@class,'fn')]&quot;/> </meta> <meta property=&quot;vcard:title&quot;> <xsl:value-of select=&quot;//div[@class='hresume']//ul[@class='current']/li&quot;/> </meta> </item> </adjunct> </adjunctcontainer> </xsl:template> </xsl:stylesheet>
    • Generally not cached
      • too slow, infobar only
      • but good for dynamic data
    • Scrape page with XSLT
      • operates on cleaned up version of the DOM
      • watch out for template changes
    • Easy to prototype
  18. Prototyping with XSLT
    • What if I don’t have structured data?
      • I don’t own the site
      • I do own the site, but I want to prototype first
    • Build an XSLT custom data service first
      • Write some XSLT to extract the data and transform it into DataRSS
      • Mostly about finding the right Xpath (use Firebug or Xpather )
      • Quick to implement, but brittle
      • Can’t do a good Enhanced Result
  19. Creating an Infobar
    • Infobar advantages
      • Annotate someone else’s site
      • Use links and images from other domains
        • Mash up info from multiple sites
        • Affiliate / coupon links? Hmmm…
      • Can act on *, all websites
        • But these apps can be annoying if poorly designed
    • Key design principles
      • Put something useful in the summary
      • Be creative with the HTML
  20. Data Sources: Web Services Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data
  21. Approach #5: Call a Web Service <?xml version=&quot;1.0&quot;?> <xsl:stylesheet xmlns:xsi= http://www.w3.org/2001/XMLSchema-instance xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; version=&quot;1.0” xmlns:h= http://www.w3.org/1999/xhtml xmlns:y=&quot;urn:yahoo:srch” xsi:schemaLocation=&quot;urn:yahoo:srch http://api.search.yahoo.com/SiteExplorerService/V1/PageDataResponse.xsd &quot;> <xsl:template match=&quot;/&quot;> <adjunctcontainer xmlns:my=&quot;http://example.com/ns/1.0&quot;> <adjunct id=&quot;smid:{$smid}&quot; version=&quot;1.0&quot;> <meta property=&quot;my:link1&quot;> <xsl:value-of select=&quot;//y:Result[1]/y:Url&quot;/> </meta> <meta property=&quot;my:result1&quot;> <xsl:value-of select=&quot;//y:Result[1]/y:Title&quot;/> </meta> </adjunct> </adjunctcontainer> </xsl:template> </xsl:stylesheet>
    • Generally not cached
      • too slow, infobar only
      • but good for dynamic data
    • Call a Remote Web Service
      • allows SearchMonkey apps to glue together
      • can handle OpenSearch XML natively
  22. Resources
    • Main:
      • http://developer.yahoo.com/searchmonkey
    • Lists and Forums:
      • [email_address]
      • http://suggestions.yahoo.com/searchmonkey
    • RDF and Microformats:
      • http://microformats.org
      • http://www.w3.org/TR/xhtml-rdfa-primer
  23. Next Steps
    • Identify content to use in SearchMonkey
    • Weigh the strengths and drawbacks of each method for providing data:
      • RDF
      • Microformats
      • DataRSS feed
      • Custom data services
    • Go build your data layer and app!
    • Come talk to us for help. 
  24. Win Fabulous Prizes!
  25. FINIS questions?

+ Evan GoerEvan Goer, 2 years ago

custom

2438 views, 0 favs, 5 embeds more stats

SearchMonkey presentation for the Yahoo! Open Hack more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 2438
    • 2375 on SlideShare
    • 63 from embeds
  • Comments 1
  • Favorites 0
  • Downloads 28
Most viewed embeds
  • 49 views on http://motrech.blogspot.com
  • 11 views on http://static.slideshare.net
  • 1 views on http://www.journal-du-referencement.com
  • 1 views on http://feeds.feedburner.com
  • 1 views on http://www.motrech.com

more

All embeds
  • 49 views on http://motrech.blogspot.com
  • 11 views on http://static.slideshare.net
  • 1 views on http://www.journal-du-referencement.com
  • 1 views on http://feeds.feedburner.com
  • 1 views on http://www.motrech.com

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories