Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SearchMonkey

4,400 views

Published on

This is the basis for our "Intro to SearchMonkey" talks

Published in: Technology, Design
  • Be the first to comment

SearchMonkey

  1. 1. Monkey with Yahoo! Search
  2. 2. SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey (ptarjan@yahoo-inc.com) Online at: http://www.slideshare.net/ptarjan/searchmonkey-presentation 2 | http://developer.yahoo.com/searchmonkey
  3. 3. What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After 3 | http://developer.yahoo.com/searchmonkey
  4. 4. Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract 4 | http://developer.yahoo.com/searchmonkey
  5. 5. Infobar: Wikipedia Preview Summary Blob 5 | http://developer.yahoo.com/searchmonkey
  6. 6. Part of the puzzle 6 | http://developer.yahoo.com/searchmonkey
  7. 7. Vocabularies • Need to speak the same language • I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue – Dublin Core (http://purl.org/dc/elements/1.1/) – Friend of a Friend (http://xmlns.com/foaf/0.1/) – X-Friend Network (http://gmpg.org/xfn/11/) – … (many more) 7 | http://developer.yahoo.com/searchmonkey
  8. 8. Syntax • Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples • (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++ – Everything is a URL or String – Subject doesn’t have to be the document 8 | http://developer.yahoo.com/searchmonkey
  9. 9. Syntax 2 • Key / Value pair – Title = Awesome SearchMonkey Presentation – Homepage = http://search.yahoo.com/searchmonkey • Triples – (self, http://purl.org/dc#title, “Awesome SearchMonkey Presentation”) – (self, http://vcard#url, http://search.yahoo.com/searchmonkey) 9 | http://developer.yahoo.com/searchmonkey
  10. 10. Decompose to triples • I like to eat red candy – (self, http://example.com/likeEating, http://example.org/temp/redcandy) – (http://example.org/temp/redcandy, http://example.com/isColored, http://example.org/colors/red) – (http://example.org/temp/redcandy, http://example.com/isInstanceOf, http://example.org/food/candy) • Unnamed nodes are O.K. 10 | http://developer.yahoo.com/searchmonkey
  11. 11. How to get data to SearchMonkey? Humans see: • name • picture of a person • current job • industry, … Computers see: an undifferentiated blob of HTML Can we make computers smarter? 11 | http://developer.yahoo.com/searchmonkey
  12. 12. Artificial intelligence is hard. Plus … 12 | http://developer.yahoo.com/searchmonkey
  13. 13. How does it work? site owners/publishers share structured data with Yahoo!. 1 site owners & third-party developers build SearchMonkey apps. 2 consumers customize their search experience with Enhanced Results or Infobars 3 Page Extraction RDF/Microformat Markup Acme.com’s Web Pages Index DataRSS feed Web Services Acme.com’s database 13 | http://developer.yahoo.com/searchmonkey
  14. 14. Innards of SearchMonkey • You build a web-service inside our framework • When a search page renders – We check which SM apps are enabled – We call them • 50ms for in-page • Long time for AJAX – They return data in our template – We render them (and cache) 14 | http://developer.yahoo.com/searchmonkey
  15. 15. Inside SM Developer Developer Publisher 15 | http://developer.yahoo.com/searchmonkey
  16. 16. Data Sources: RDF and Microformats Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data 16 | http://developer.yahoo.com/searchmonkey
  17. 17. Approach #1: Embedded RDF <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <!DOCTYPE html PUBLIC quot;-//W3C//DTD XHTML+RDFa 1.0//EN” quot;http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtdquot;> <html xmlns=http://www.w3.org/1999/xhtml xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:foaf=http://xmlns.com/foaf/0.1/ • Cached data lang=quot;enquot; xml:lang=quot;enquot;> <head> • allows Enhanced Results <title>The Amazing Home Page of Joe Smith</title> </head> • but not for dynamic data <body> <h1 property=quot;dc:titlequot;>Joe's Home Page</h1> • Reuse existing markup <div rel=quot;foaf:makerquot;> • but requires site redesign <h2 property=quot;foaf:namequot;>Joe Smith</h2> <div rel=quot;foaf:depictionquot; • Open approach resource=quot;http://joesmith.org/images/jsmith.pngquot;> <img src=quot;/images/jsmith.pngquot; • everyone can use alt=quot;Smiling headshot of Joequot; /> <p property=quot;dc:rightsquot;>Creative Commons • Passive, crawled by Y! Attribution 3.0 Unported</p> </div> • less bureaucracy to set up </div> … 17 | http://developer.yahoo.com/searchmonkey
  18. 18. Approach #2: Embedded Microformats <div id=quot;hcard-Joe-Smithquot; class=quot;vcardquot;> <span class=quot;fnquot;>Joe Smith</span> <div class=quot;adrquot;> <div class=quot;street-addressquot;>123 Murphy Avenue</div> <span class=quot;localityquot;>Sunnyvale</span>, • Cached data <span class=quot;regionquot;>California</span> <span class=quot;postal-codequot;>94086</span> • allows Enhanced Results </div> <div class=quot;telquot;>(408) 555-1234</div> • but not for dynamic data </div>… • Reuse existing markup • but requires site redesign • Open approach • everyone can use • Passive, crawled by Y! • less bureaucracy to set up 18 | http://developer.yahoo.com/searchmonkey
  19. 19. Approach #3: DataRSS Feed <?profile http://search.yahoo.com/searchmonkey-profile ?> <feed xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot; xsi:schemaLocation=quot;http://www.w3.org/2005/Atom ../xsd/datarss.xsdquot; xmlns:dc=quot;http://purl.org/dc/terms/” xmlns=quot;http://www.w3.org/2005/Atomquot; xmlns:commerce=quot;http://search.yahoo.com/searchmonkey/commerce/quot; • Cached data xmlns:y=quot;http://search.yahoo.com/datarss/quot;> <id>http://local.yahoo.com/datarss/</id> • allows Enhanced Results <author><name>Peter Mika (pmika@yahoo-inc.com)</name></author> • but not for dynamic data <title>Example data feed for Local</title> <updated>2008-07-16T04:05:06+07:00</updated> Generate feed from DB • <entry> • and maintain afterwards <title>Parcel 104</title> <id>http://local.yahoo.com/info-21583016-parcel-104-santa-clara</id> • Closed approach <updated>2008-07-16T04:05:06+07:00</updated> <content type=quot;application/xmlquot;> • only Yahoo! gets data <y:adjunct version=quot;1.0quot; name=quot;com.yahoo.local”> • Actively provide a feed <y:item rel=quot;dc:subjectquot;> <y:type typeof=quot;vcard:VCard commerce:Restaurant”> • <y:meta property=quot;commerce:hoursOfOperationquot;> coord w/Yahoo! to set up Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat. 19 | http://developer.yahoo.com/searchmonkey
  20. 20. Approach #4: Extract with XSLT <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0quot;> <xsl:template match=quot;/quot;> <adjunctcontainer> <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> <item rel=quot;rel:Photo” • Generally not cached resource=quot;{//div[@class='hresume']//div[@class='image']/img/@src}quot;/> <item rel=quot;rel:Cardquot;> • too slow, infobar only <meta property=quot;vcard:fnquot;> • but good for dynamic <xsl:value-of select=quot;//div[@class='hresume']//span[contains(@class,'fn')]quot;/> data </meta> Scrape page with XSLT • <meta property=quot;vcard:titlequot;> <xsl:value-of select=quot;//div[@class='hresume']//ul[@class='current']/liquot;/> • operates on cleaned up </meta> version of the DOM </item> </adjunct> • watch out for template </adjunctcontainer> changes </xsl:template> </xsl:stylesheet> • Easy to prototype 20 | http://developer.yahoo.com/searchmonkey
  21. 21. Prototyping with XSLT • What if I don’t have structured data? – I don’t own the site – I do own the site, but I want to prototype first • Build an XSLT custom data service first – Write some XSLT to extract the data and transform it into DataRSS – Mostly about finding the right XPath (use Firebug or XPather ) – Quick to implement, but brittle – Can’t do a good Enhanced Result 21 | http://developer.yahoo.com/searchmonkey
  22. 22. Approach #5: Call a Web Service <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0” xmlns:h=http://www.w3.org/1999/xhtml xmlns:y=quot;urn:yahoo:srch” xsi:schemaLocation=quot;urn:yahoo:srch • Generally not cached http://api.search.yahoo.com/SiteExplorerService/V1/PageDataResponse.xsdquot;> <xsl:template match=quot;/quot;> • too <adjunctcontainer xmlns:my=quot;http://example.com/ns/1.0quot;> slow, infobar only <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> • but good for dynamic data <meta property=quot;my:link1quot;> • <xsl:value-of select=quot;//y:Result[1]/y:Urlquot;/> Call a Remote Web Service </meta> • allows SearchMonkey <meta property=quot;my:result1quot;> <xsl:value-of select=quot;//y:Result[1]/y:Titlequot;/> apps to glue together </meta> • can handle OpenSearch </adjunct> XML natively </adjunctcontainer> </xsl:template> </xsl:stylesheet> 22 | http://developer.yahoo.com/searchmonkey
  23. 23. Creating an Infobar • Infobar advantages – Annotate someone else’s site – Use links and images from other domains • Mash up info from multiple sites • Affiliate / coupon links? Hmmm… – Can act on *, all websites • But these apps can be annoying if poorly designed • Key design principles – Put something useful in the summary – Be creative with the HTML 23 | http://developer.yahoo.com/searchmonkey
  24. 24. Resources • Main: – http://developer.yahoo.com/searchmonkey • Lists and forums: – searchmonkey-developers@yahoogroups.com – http://suggestions.yahoo.com/searchmonkey • RDF and Microformats: – http://microformats.org – http://www.w3.org/TR/xhtml-rdfa-primer/ 24 | http://developer.yahoo.com/searchmonkey
  25. 25. Do it for real • Demo 25 | http://developer.yahoo.com/searchmonkey
  26. 26. Ninja Coding Techniques: Enter the Monkey 26 | http://developer.yahoo.com/searchmonkey
  27. 27. Typical SearchMonkey PHP code $ret['title'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/vcard:title’ ; // Image $ret['image']['src'] = Data::get('com.yahoo.uf.hcard/rel:Card/vcard:photo/@resource'); $ret['image']['alt'] = SMDEFAULT; $ret['image']['title'] = SMDEFAULT; $ret['image']['allowResize'] = true; // Key Value pairs - up to 4 $ret['dict'][0]['key'] = quot;Affiliationquot;; $ret['dict'][0]['value'] = Data::get('com.yahoo.uf.hresume/resume:affiliation/vcard:org/vcard:organization-name'); $ret['dict'][1]['key'] = quot;Contactquot;; $ret['dict'][1]['value'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/@resource'); 27 | http://developer.yahoo.com/searchmonkey
  28. 28. Your first mistake may be your last! 28 | http://developer.yahoo.com/searchmonkey
  29. 29. True ninjas leave no room for error // Get the list of businesses. If we // get at least one, extract the // address and telephone number $appNodeList = Data::xpath(quot;/*/adjunct/item[@rel='rel:Listing']quot;); $yd = $appNodeList->item(0); $adr = $tel = quot;”; $nodeList = Data::xpath(quot;item[@rel='rel:Business']quot;, $yd); if ($nodeList->length != 0) { $nd = $nodeList->item(0); $adr = Data::xpathString(quot;meta[@property='vcard:adr']quot;, $nd); $tel = Data::xpathString(quot;meta[@property='vcard:tel']quot;, $nd); } if ($r_rating != quot;quot;) { $ratingstr = Data::getStarsFromNum($r_rating); if ($r_summary != quot;quot;) { $ratingstr = $ratingstr . quot; quot; . $r_summary; 29 | http://developer.yahoo.com/searchmonkey
  30. 30. Useful conditional tricks • Check for empty data like this: – if (‘’==trim($var)) • Watch out for $a.’–’.$b.’-’.$c – What happens if these variables are empty? • You can create helper functions! – getOutput() must return an array, but there’s no reason not to create other functions – Call using self::function() instead of just function() 30 | http://developer.yahoo.com/searchmonkey
  31. 31. Development (test, debug, collaborate) • Your two best friends: input and output • Collaborative development – Create a shared Y!ID for your organization – Export and import apps from the dashboard • Bellwethers – Start with just one or two, for simplicity – Once app is working, hit “autofind” and look at all ten, see what breaks – Always set the #1 bellwether to something that’s high-ranking; that’s your Gallery preview 31 | http://developer.yahoo.com/searchmonkey
  32. 32. Image Helper Functions • Data::getStars(string $data_get_path) – i.e. Data::getStars(“smid:Jk8/review:rating”) • Data::getStarsFromNum(float $rating) – Must scale $rating to fall between 0-5 inclusive • Data::getImage(string $name) – Adds icons to your app • Data::getImage(“information”) • Data::getImage(“email”) • Data::getImage(“edit”) •… 32 | http://developer.yahoo.com/searchmonkey
  33. 33. XML functions • NodeList Data::xpath($string query [, DOMNode $contextnode) – More complicated than Data::get() – Can count, iterate, find children – Can fetch all vcard:fn, regardless where they are – Can find a node and grab 1st four children • string Data::xpathString($string query [, DOMNode $contextnode) – Convenience function if you don’t need to do further DOM manipulation 33 | http://developer.yahoo.com/searchmonkey
  34. 34. Infobar Design: Party like it’s 1999 • Sadly, can’t use CSS – and the default stylesheet strips off most style – thus lists won’t even display bullets or numbers, you have to fake this • Layout: use tables (remember tables?) • Fonts: can use <font color>, <font face>, <big>, <small> • Make good use of images and links • PRO TIP: Use PHP HEREDOC (<<<) 34 | http://developer.yahoo.com/searchmonkey
  35. 35. Let Infobars be Infobars • Make use of the real estate 35 | http://developer.yahoo.com/searchmonkey
  36. 36. Let Infobars be Infobars • Or be minimal • But don’t do an Infobar that’s really just an Enhanced Result in disguise – Use the blob and summary – Don’t use the thumbnail, key/value pairs, … 36 | http://developer.yahoo.com/searchmonkey
  37. 37. Triggering on * • This can be annoying for general audiences – but it’s hard to abort an infobar before 50ms – and you can’t do this in the PHP layer if you depend on an extractor or web service – Data has to be provided by a feed or by structured markup • For specialized audiences a “*” infobar might be ok 37 | http://developer.yahoo.com/searchmonkey
  38. 38. Triggering on * 38 | http://developer.yahoo.com/searchmonkey
  39. 39. Triggering on * • Trigger on structured markup – Ex: Creative Commons Infobar • Use feeds to annotate the URLs you want • Instead of *, do a comma-separated list of sites: – www.uiuc.edu/*, www.stanford.edu/*, www.berkeley.edu/*, www.cmu.edu/*, … 39 | http://developer.yahoo.com/searchmonkey
  40. 40. XSLT Extractors • Use the Firebug extension for Firefox – And Xpather, an extension for Firefox • Typical pattern: a skeleton of DataRSS, into which you plug some Xpath – For more complex XSL: • Use <xsl:template> • <xsl:for-each> is clumsier • Find a good ID to cling to – Compare arxiv.org (easy) to acm.org (harder) 40 | http://developer.yahoo.com/searchmonkey
  41. 41. Examples • Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon 41 | http://developer.yahoo.com/searchmonkey
  42. 42. questions? 42 | http://developer.yahoo.com/searchmonkey

×