SlideShare a Scribd company logo
Monkey with Yahoo! Search
SearchMonkey



                                                 Presentation by:



                          Paul Tarjan, Chief Technical Monkey
                                         (ptarjan@yahoo-inc.com)

                                                   Online at:



      http://www.slideshare.net/ptarjan/searchmonkey-presentation




2 | http://developer.yahoo.com/searchmonkey
What is SearchMonkey?

                                an open platform for using structured data to build more
                                useful and relevant search results



  Before                                                    After




3 | http://developer.yahoo.com/searchmonkey
Enhanced Result: Zagat




                           Image              Links   Key/Value Pairs
                                                      or Abstract




4 | http://developer.yahoo.com/searchmonkey
Infobar: Wikipedia Preview




                                          Summary   Blob




5 | http://developer.yahoo.com/searchmonkey
Part of the puzzle




6 | http://developer.yahoo.com/searchmonkey
Vocabularies

        • Need to speak the same language
        • I like to see girls of that... caliber.
        • English, French, Spanish, Esparanto?
        • URLs to the rescue
               – Dublin Core (http://purl.org/dc/elements/1.1/)
               – Friend of a Friend (http://xmlns.com/foaf/0.1/)
               – X-Friend Network (http://gmpg.org/xfn/11/)
               – … (many more)



7 | http://developer.yahoo.com/searchmonkey
Syntax

        • Nouns, Verbs, and Adjectives, oh my!
        • All phrases become lots of triples
        • (Subject, Verb / Adj. / Prep. / etc, Object)
        • Key / Value pairs ++
               – Everything is a URL or String
               – Subject doesn’t have to be the document




8 | http://developer.yahoo.com/searchmonkey
Syntax 2

        • Key / Value pair
               – Title = Awesome SearchMonkey Presentation
               – Homepage =
                 http://search.yahoo.com/searchmonkey
        • Triples
               – (self, http://purl.org/dc#title, “Awesome
                 SearchMonkey Presentation”)
               – (self, http://vcard#url,
                 http://search.yahoo.com/searchmonkey)



9 | http://developer.yahoo.com/searchmonkey
Decompose to triples

        • I like to eat red candy
               – (self, http://example.com/likeEating,
                 http://example.org/temp/redcandy)
               – (http://example.org/temp/redcandy,
                 http://example.com/isColored,
                 http://example.org/colors/red)
               – (http://example.org/temp/redcandy,
                 http://example.com/isInstanceOf,
                 http://example.org/food/candy)
        • Unnamed nodes are O.K.


10 | http://developer.yahoo.com/searchmonkey
How to get data to SearchMonkey?


                                               Humans see:
                                               • name
                                               • picture of a person
                                               • current job
                                               • industry, …

                                               Computers see:
                                               an undifferentiated
                                               blob of HTML

                                               Can we make
                                               computers smarter?

11 | http://developer.yahoo.com/searchmonkey
Artificial intelligence is hard. Plus …




12 | http://developer.yahoo.com/searchmonkey
How does it work?

          site owners/publishers share structured data with Yahoo!.
    1

          site owners & third-party developers build SearchMonkey apps.
    2

          consumers customize their search experience with Enhanced Results or Infobars
    3



                                  Page Extraction


                      RDF/Microformat Markup



        Acme.com’s
        Web Pages


                                       Index


                       DataRSS feed



                                   Web Services
        Acme.com’s
        database




13 | http://developer.yahoo.com/searchmonkey
Innards of SearchMonkey

        • You build a web-service inside our
          framework
        • When a search page renders
               – We check which SM apps are enabled
               – We call them
                       • 50ms for in-page
                       • Long time for AJAX
               – They return data in our template
               – We render them (and cache)


14 | http://developer.yahoo.com/searchmonkey
Inside SM


        Developer                              Developer




                                               Publisher




15 | http://developer.yahoo.com/searchmonkey
Data Sources: RDF and Microformats

     Name                         Cached       Open   Mode      Notes
     Yahoo! Index                 yes          yes    Passive   Old-School Y! Index data
     RDFa, eRDF                   yes          yes    Passive   Vocab + markup decoupled
     Microformats                 yes          yes    Passive   Vocab + markup coupled
     DataRSS feed                 yes          no     Active    Atom + metadata
     XSLT                         no           no     Active    Good for prototyping
     Web Service                  no           no     Active    Brings in remote data




16 | http://developer.yahoo.com/searchmonkey
Approach #1: Embedded RDF

 <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
 <!DOCTYPE html PUBLIC quot;-//W3C//DTD XHTML+RDFa 1.0//EN”
        quot;http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtdquot;>
 <html xmlns=http://www.w3.org/1999/xhtml
     xmlns:dc=http://purl.org/dc/elements/1.1/
     xmlns:foaf=http://xmlns.com/foaf/0.1/
                                                 • Cached data
     lang=quot;enquot; xml:lang=quot;enquot;>
 <head>
                                                      • allows Enhanced Results
  <title>The Amazing Home Page of Joe Smith</title>
 </head>
                                                      • but not for dynamic data
 <body>
  <h1 property=quot;dc:titlequot;>Joe's Home Page</h1>
                                                 • Reuse existing markup
  <div rel=quot;foaf:makerquot;>
                                                    • but requires site redesign
   <h2 property=quot;foaf:namequot;>Joe Smith</h2>
   <div rel=quot;foaf:depictionquot;
                                                 • Open approach
       resource=quot;http://joesmith.org/images/jsmith.pngquot;>
     <img src=quot;/images/jsmith.pngquot;
                                                      • everyone can use
            alt=quot;Smiling headshot of Joequot; />
     <p property=quot;dc:rightsquot;>Creative Commons
                                                 • Passive, crawled by Y!
        Attribution 3.0 Unported</p>
   </div>
                                                    • less bureaucracy to set up
  </div>
 …

17 | http://developer.yahoo.com/searchmonkey
Approach #2: Embedded Microformats

<div id=quot;hcard-Joe-Smithquot; class=quot;vcardquot;>
  <span class=quot;fnquot;>Joe Smith</span>
  <div class=quot;adrquot;>
     <div class=quot;street-addressquot;>123 Murphy Avenue</div>
     <span class=quot;localityquot;>Sunnyvale</span>,
                                               • Cached data
     <span class=quot;regionquot;>California</span>
     <span class=quot;postal-codequot;>94086</span>
                                                   • allows Enhanced Results
  </div>
  <div class=quot;telquot;>(408) 555-1234</div>
                                                   • but not for dynamic data
</div>…
                                               • Reuse existing markup
                                                  • but requires site redesign
                                               • Open approach
                                                  • everyone can use
                                               • Passive, crawled by Y!
                                                  • less bureaucracy to set up
18 | http://developer.yahoo.com/searchmonkey
Approach #3: DataRSS Feed

<?profile http://search.yahoo.com/searchmonkey-profile ?>
<feed xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot;
xsi:schemaLocation=quot;http://www.w3.org/2005/Atom ../xsd/datarss.xsdquot;
xmlns:dc=quot;http://purl.org/dc/terms/” xmlns=quot;http://www.w3.org/2005/Atomquot;
xmlns:commerce=quot;http://search.yahoo.com/searchmonkey/commerce/quot;
                                               • Cached data
xmlns:y=quot;http://search.yahoo.com/datarss/quot;>
<id>http://local.yahoo.com/datarss/</id>
                                                        • allows Enhanced Results
<author><name>Peter Mika (pmika@yahoo-inc.com)</name></author>
                                                        • but not for dynamic data
<title>Example data feed for Local</title>
<updated>2008-07-16T04:05:06+07:00</updated>
                                                Generate feed from DB
                                                   •
<entry>
                                                        • and maintain afterwards
 <title>Parcel 104</title>
 <id>http://local.yahoo.com/info-21583016-parcel-104-santa-clara</id>
                                               • Closed approach
 <updated>2008-07-16T04:05:06+07:00</updated>
 <content type=quot;application/xmlquot;>
                                                        • only Yahoo! gets data
 <y:adjunct version=quot;1.0quot; name=quot;com.yahoo.local”>
                                               • Actively provide a feed
   <y:item rel=quot;dc:subjectquot;>
     <y:type typeof=quot;vcard:VCard commerce:Restaurant”>
                                                  •
      <y:meta property=quot;commerce:hoursOfOperationquot;> coord w/Yahoo! to set up
         Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat.

19 | http://developer.yahoo.com/searchmonkey
Approach #4: Extract with XSLT

<?xml version=quot;1.0quot;?>
<xsl:stylesheet xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0quot;>
<xsl:template match=quot;/quot;>
  <adjunctcontainer>
    <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;>
      <item rel=quot;rel:Photo”
                                                • Generally not cached
          resource=quot;{//div[@class='hresume']//div[@class='image']/img/@src}quot;/>
      <item rel=quot;rel:Cardquot;>
                                                        • too slow, infobar only
         <meta property=quot;vcard:fnquot;>
                                                        • but good for dynamic
           <xsl:value-of select=quot;//div[@class='hresume']//span[contains(@class,'fn')]quot;/> data
        </meta>
                                                  Scrape page with XSLT
                                                   •
        <meta property=quot;vcard:titlequot;>
           <xsl:value-of select=quot;//div[@class='hresume']//ul[@class='current']/liquot;/>
                                                        • operates on cleaned up
        </meta>
                                                        version of the DOM
     </item>
    </adjunct>
                                                        • watch out for template
</adjunctcontainer>
                                                        changes
</xsl:template>
</xsl:stylesheet>
                                                • Easy to prototype
20 | http://developer.yahoo.com/searchmonkey
Prototyping with XSLT

        • What if I don’t have structured data?
               – I don’t own the site
               – I do own the site, but I want to prototype first
        • Build an XSLT custom data service first
               – Write some XSLT to extract the data and
                 transform it into DataRSS
               – Mostly about finding the right XPath (use
                 Firebug or XPather )
               – Quick to implement, but brittle
               – Can’t do a good Enhanced Result

21 | http://developer.yahoo.com/searchmonkey
Approach #5: Call a Web Service

<?xml version=quot;1.0quot;?>
<xsl:stylesheet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
           xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0”
           xmlns:h=http://www.w3.org/1999/xhtml
           xmlns:y=quot;urn:yahoo:srch”
           xsi:schemaLocation=quot;urn:yahoo:srch
                                               • Generally not cached
           http://api.search.yahoo.com/SiteExplorerService/V1/PageDataResponse.xsdquot;>
<xsl:template match=quot;/quot;>
                                                         • too
  <adjunctcontainer xmlns:my=quot;http://example.com/ns/1.0quot;> slow, infobar only
    <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;>            • but good for dynamic data
        <meta property=quot;my:link1quot;>
                                                      •
           <xsl:value-of select=quot;//y:Result[1]/y:Urlquot;/> Call a Remote Web Service
        </meta>
                                                         • allows SearchMonkey
        <meta property=quot;my:result1quot;>
          <xsl:value-of select=quot;//y:Result[1]/y:Titlequot;/> apps to glue together
        </meta>
                                                         • can handle OpenSearch
    </adjunct>
                                                         XML natively
  </adjunctcontainer>
</xsl:template>
</xsl:stylesheet>

22 | http://developer.yahoo.com/searchmonkey
Creating an Infobar

        • Infobar advantages
               – Annotate someone else’s site
               – Use links and images from other domains
                       • Mash up info from multiple sites
                       • Affiliate / coupon links? Hmmm…
               – Can act on *, all websites
                       • But these apps can be annoying if poorly designed

        • Key design principles
               – Put something useful in the summary
               – Be creative with the HTML

23 | http://developer.yahoo.com/searchmonkey
Resources

        • Main:
               – http://developer.yahoo.com/searchmonkey
        • Lists and forums:
               – searchmonkey-developers@yahoogroups.com
               – http://suggestions.yahoo.com/searchmonkey
        • RDF and Microformats:
               – http://microformats.org
               – http://www.w3.org/TR/xhtml-rdfa-primer/



24 | http://developer.yahoo.com/searchmonkey
Do it for real

        • Demo




25 | http://developer.yahoo.com/searchmonkey
Ninja Coding Techniques:
                         Enter the Monkey
26 | http://developer.yahoo.com/searchmonkey
Typical SearchMonkey PHP code

        $ret['title'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/vcard:title’ ;


        // Image
        $ret['image']['src'] = Data::get('com.yahoo.uf.hcard/rel:Card/vcard:photo/@resource');
        $ret['image']['alt'] = SMDEFAULT;
        $ret['image']['title'] = SMDEFAULT;
        $ret['image']['allowResize'] = true;


        // Key Value pairs - up to 4
        $ret['dict'][0]['key'] = quot;Affiliationquot;;
        $ret['dict'][0]['value'] =
            Data::get('com.yahoo.uf.hresume/resume:affiliation/vcard:org/vcard:organization-name');
        $ret['dict'][1]['key'] = quot;Contactquot;;
        $ret['dict'][1]['value'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/@resource');




27 | http://developer.yahoo.com/searchmonkey
Your first mistake may be your last!




28 | http://developer.yahoo.com/searchmonkey
True ninjas leave no room for error

// Get the list of businesses. If we
// get at least one, extract the
// address and telephone number
$appNodeList = Data::xpath(quot;/*/adjunct/item[@rel='rel:Listing']quot;);
   $yd = $appNodeList->item(0);
   $adr = $tel = quot;”;
   $nodeList = Data::xpath(quot;item[@rel='rel:Business']quot;, $yd);
   if ($nodeList->length != 0) {
       $nd = $nodeList->item(0);


       $adr = Data::xpathString(quot;meta[@property='vcard:adr']quot;, $nd);
       $tel = Data::xpathString(quot;meta[@property='vcard:tel']quot;, $nd);
   }
   if ($r_rating != quot;quot;) {
       $ratingstr = Data::getStarsFromNum($r_rating);
       if ($r_summary != quot;quot;) {
          $ratingstr = $ratingstr . quot; quot; . $r_summary;


29 | http://developer.yahoo.com/searchmonkey
Useful conditional tricks

        • Check for empty data like this:
               – if (‘’==trim($var))
        • Watch out for $a.’–’.$b.’-’.$c
               – What happens if these variables are empty?
        • You can create helper functions!
               – getOutput() must return an array, but there’s no
                 reason not to create other functions
               – Call using self::function() instead of just
                 function()


30 | http://developer.yahoo.com/searchmonkey
Development (test, debug, collaborate)

        • Your two best friends: input and output
        • Collaborative development
               – Create a shared Y!ID for your organization
               – Export and import apps from the dashboard
        • Bellwethers
               – Start with just one or two, for simplicity
               – Once app is working, hit “autofind” and look at
                 all ten, see what breaks
               – Always set the #1 bellwether to something that’s
                 high-ranking; that’s your Gallery preview

31 | http://developer.yahoo.com/searchmonkey
Image Helper Functions

        • Data::getStars(string $data_get_path)
               – i.e. Data::getStars(“smid:Jk8/review:rating”)
        • Data::getStarsFromNum(float $rating)
               – Must scale $rating to fall between 0-5 inclusive
        • Data::getImage(string $name)
               – Adds icons to your app
                       • Data::getImage(“information”)
                       • Data::getImage(“email”)
                       • Data::getImage(“edit”)
                       •…

32 | http://developer.yahoo.com/searchmonkey
XML functions

        • NodeList Data::xpath($string query [,
          DOMNode $contextnode)
               – More complicated than Data::get()
               – Can count, iterate, find children
               – Can fetch all vcard:fn, regardless where they are
               – Can find a node and grab 1st four children
        • string Data::xpathString($string query [,
          DOMNode $contextnode)
               – Convenience function if you don’t need to do
                 further DOM manipulation

33 | http://developer.yahoo.com/searchmonkey
Infobar Design: Party like it’s 1999

        • Sadly, can’t use CSS
               – and the default stylesheet strips off most style
               – thus lists won’t even display bullets or numbers,
                 you have to fake this
        • Layout: use tables (remember tables?)
        • Fonts: can use <font color>, <font face>,
          <big>, <small>
        • Make good use of images and links
        • PRO TIP: Use PHP HEREDOC (<<<)

34 | http://developer.yahoo.com/searchmonkey
Let Infobars be Infobars

        • Make use of the real estate




35 | http://developer.yahoo.com/searchmonkey
Let Infobars be Infobars

        • Or be minimal




        • But don’t do an Infobar that’s really just an
          Enhanced Result in disguise
               – Use the blob and summary
               – Don’t use the thumbnail, key/value pairs, …



36 | http://developer.yahoo.com/searchmonkey
Triggering on *

        • This can be annoying for general audiences
               – but it’s hard to abort an infobar before 50ms
               – and you can’t do this in the PHP layer if you
                 depend on an extractor or web service
               – Data has to be provided by a feed or by
                 structured markup
        • For specialized audiences a “*” infobar might
          be ok




37 | http://developer.yahoo.com/searchmonkey
Triggering on *




38 | http://developer.yahoo.com/searchmonkey
Triggering on *

        • Trigger on structured markup
               – Ex: Creative Commons Infobar
        • Use feeds to annotate the URLs you want
        • Instead of *, do a comma-separated list of
          sites:
               – www.uiuc.edu/*, www.stanford.edu/*,
                 www.berkeley.edu/*, www.cmu.edu/*, …




39 | http://developer.yahoo.com/searchmonkey
XSLT Extractors

        • Use the Firebug extension for Firefox
               – And Xpather, an extension for Firefox
        • Typical pattern: a skeleton of DataRSS, into
          which you plug some Xpath
               – For more complex XSL:
                       • Use <xsl:template>
                       • <xsl:for-each> is clumsier

        • Find a good ID to cling to
               – Compare arxiv.org (easy) to acm.org (harder)


40 | http://developer.yahoo.com/searchmonkey
Examples



       • Rubic’s cube
       • VTA Bus
       • API Monkey
       • BugMeNot
       • RetailMeNot
       • Amazon


41 | http://developer.yahoo.com/searchmonkey
questions?




42 | http://developer.yahoo.com/searchmonkey

More Related Content

What's hot

Best practices in museum search
 Best practices in museum search Best practices in museum search
Best practices in museum search
Nate Solas
 
How I learned to stop worrying and love the .htaccess file
How I learned to stop worrying and love the .htaccess fileHow I learned to stop worrying and love the .htaccess file
How I learned to stop worrying and love the .htaccess file
Roxana Stingu
 
YQL talk at OHD Jakarta
YQL talk at OHD JakartaYQL talk at OHD Jakarta
YQL talk at OHD Jakarta
Michael Smith Jr.
 
Moving from Web 1.0 to Web 2.0
Moving from Web 1.0 to Web 2.0Moving from Web 1.0 to Web 2.0
Moving from Web 1.0 to Web 2.0
Estelle Weyl
 
Accelerated Mobile - Beyond AMP
Accelerated Mobile - Beyond AMPAccelerated Mobile - Beyond AMP
Accelerated Mobile - Beyond AMP
Jono Alderson
 
Challenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceChallenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering service
Giacomo Zecchini
 
Introducing YUI
Introducing YUIIntroducing YUI
Introducing YUI
Christian Heilmann
 
Finding things on the web with BOSS
Finding things on the web with BOSSFinding things on the web with BOSS
Finding things on the web with BOSS
Christian Heilmann
 
Html 5 in a big nutshell
Html 5 in a big nutshellHtml 5 in a big nutshell
Html 5 in a big nutshell
Lennart Schoors
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
Seo Indonesia
 
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...
Lincoln III
 
Security panel-western-mass-drupal-camp
Security panel-western-mass-drupal-campSecurity panel-western-mass-drupal-camp
Security panel-western-mass-drupal-campcwworks
 
Extreme APIs for a better tomorrow
Extreme APIs for a better tomorrowExtreme APIs for a better tomorrow
Extreme APIs for a better tomorrow
Aaron Maturen
 
Fast by Default
Fast by DefaultFast by Default
Fast by Default
Abhay Kumar
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
Tommi Forsström
 
Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020
Tom Anthony
 
Httpsmaindroneyuk.blogspot.com
Httpsmaindroneyuk.blogspot.comHttpsmaindroneyuk.blogspot.com
Httpsmaindroneyuk.blogspot.com
jangunglahokey
 

What's hot (20)

Best practices in museum search
 Best practices in museum search Best practices in museum search
Best practices in museum search
 
How I learned to stop worrying and love the .htaccess file
How I learned to stop worrying and love the .htaccess fileHow I learned to stop worrying and love the .htaccess file
How I learned to stop worrying and love the .htaccess file
 
YQL talk at OHD Jakarta
YQL talk at OHD JakartaYQL talk at OHD Jakarta
YQL talk at OHD Jakarta
 
Html by tanbircox
Html by tanbircoxHtml by tanbircox
Html by tanbircox
 
Moving from Web 1.0 to Web 2.0
Moving from Web 1.0 to Web 2.0Moving from Web 1.0 to Web 2.0
Moving from Web 1.0 to Web 2.0
 
Accelerated Mobile - Beyond AMP
Accelerated Mobile - Beyond AMPAccelerated Mobile - Beyond AMP
Accelerated Mobile - Beyond AMP
 
Css by tanbircox
Css by tanbircoxCss by tanbircox
Css by tanbircox
 
Challenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceChallenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering service
 
Web programming & design using internet by tanbircox
Web programming & design using internet by tanbircoxWeb programming & design using internet by tanbircox
Web programming & design using internet by tanbircox
 
Introducing YUI
Introducing YUIIntroducing YUI
Introducing YUI
 
Finding things on the web with BOSS
Finding things on the web with BOSSFinding things on the web with BOSS
Finding things on the web with BOSS
 
Html 5 in a big nutshell
Html 5 in a big nutshellHtml 5 in a big nutshell
Html 5 in a big nutshell
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
 
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...
PrettyFaces: SEO, Dynamic, Parameters, Bookmarks, Navigation for JSF / JSF2 (...
 
Security panel-western-mass-drupal-camp
Security panel-western-mass-drupal-campSecurity panel-western-mass-drupal-camp
Security panel-western-mass-drupal-camp
 
Extreme APIs for a better tomorrow
Extreme APIs for a better tomorrowExtreme APIs for a better tomorrow
Extreme APIs for a better tomorrow
 
Fast by Default
Fast by DefaultFast by Default
Fast by Default
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020
 
Httpsmaindroneyuk.blogspot.com
Httpsmaindroneyuk.blogspot.comHttpsmaindroneyuk.blogspot.com
Httpsmaindroneyuk.blogspot.com
 

Viewers also liked

Soleus Audio Manager Help
Soleus Audio Manager HelpSoleus Audio Manager Help
Soleus Audio Manager HelpChris CHOU
 
Yahoo Developer Network overview
Yahoo Developer Network overviewYahoo Developer Network overview
Yahoo Developer Network overview
Christian Heilmann
 
Hadoop Jute Record Python
Hadoop Jute Record PythonHadoop Jute Record Python
Hadoop Jute Record Python
Paul Tarjan
 
Semantic Searchmonkey
Semantic SearchmonkeySemantic Searchmonkey
Semantic Searchmonkey
Paul Tarjan
 
Hands on Hadoop
Hands on HadoopHands on Hadoop
Hands on Hadoop
Paul Tarjan
 
Promoting Excellence Network - Graduate Attributes at CQUniversity Australia
Promoting Excellence Network - Graduate Attributes at CQUniversity AustraliaPromoting Excellence Network - Graduate Attributes at CQUniversity Australia
Promoting Excellence Network - Graduate Attributes at CQUniversity Australia
Damien Clark
 

Viewers also liked (6)

Soleus Audio Manager Help
Soleus Audio Manager HelpSoleus Audio Manager Help
Soleus Audio Manager Help
 
Yahoo Developer Network overview
Yahoo Developer Network overviewYahoo Developer Network overview
Yahoo Developer Network overview
 
Hadoop Jute Record Python
Hadoop Jute Record PythonHadoop Jute Record Python
Hadoop Jute Record Python
 
Semantic Searchmonkey
Semantic SearchmonkeySemantic Searchmonkey
Semantic Searchmonkey
 
Hands on Hadoop
Hands on HadoopHands on Hadoop
Hands on Hadoop
 
Promoting Excellence Network - Graduate Attributes at CQUniversity Australia
Promoting Excellence Network - Graduate Attributes at CQUniversity AustraliaPromoting Excellence Network - Graduate Attributes at CQUniversity Australia
Promoting Excellence Network - Graduate Attributes at CQUniversity Australia
 

Similar to SearchMonkey

Google Devfest Singapore - OpenSocial
Google Devfest Singapore - OpenSocialGoogle Devfest Singapore - OpenSocial
Google Devfest Singapore - OpenSocial
Patrick Chanezon
 
Intro To Django
Intro To DjangoIntro To Django
Intro To Django
Udi Bauman
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
jeresig
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
davejohnson
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDN
Dan Theurer
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentation
didip
 
How to learn to build your own PHP framework
How to learn to build your own PHP frameworkHow to learn to build your own PHP framework
How to learn to build your own PHP framework
Dinh Pham
 
Hanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.framework
Hanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.frameworkHanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.framework
Hanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.frameworkNguyen Duc Phu
 
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
Patrick Chanezon
 
Yahoo for the Masses
Yahoo for the MassesYahoo for the Masses
Yahoo for the Masses
Christian Heilmann
 
How I built the demo's
How I built the demo'sHow I built the demo's
How I built the demo's
Glenn Jones
 
Experiments in Data Portability 2
Experiments in Data Portability 2Experiments in Data Portability 2
Experiments in Data Portability 2
Glenn Jones
 
When To Use Ruby On Rails
When To Use Ruby On RailsWhen To Use Ruby On Rails
When To Use Ruby On Rails
dosire
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
Herman Peeren
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
Joomla!Days Netherlands
 
Building a Single Page Application using Ember.js ... for fun and profit
Building a Single Page Application using Ember.js ... for fun and profitBuilding a Single Page Application using Ember.js ... for fun and profit
Building a Single Page Application using Ember.js ... for fun and profit
Ben Limmer
 
Microformats HTML to API
Microformats HTML to APIMicroformats HTML to API
Microformats HTML to APIelliando dias
 
IPhone Web Development With Grails from CodeMash 2009
IPhone Web Development With Grails from CodeMash 2009IPhone Web Development With Grails from CodeMash 2009
IPhone Web Development With Grails from CodeMash 2009
Christopher Judd
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
Chris Toohey
 

Similar to SearchMonkey (20)

Google Devfest Singapore - OpenSocial
Google Devfest Singapore - OpenSocialGoogle Devfest Singapore - OpenSocial
Google Devfest Singapore - OpenSocial
 
Intro To Django
Intro To DjangoIntro To Django
Intro To Django
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDN
 
T5 Oli Aro
T5 Oli AroT5 Oli Aro
T5 Oli Aro
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentation
 
How to learn to build your own PHP framework
How to learn to build your own PHP frameworkHow to learn to build your own PHP framework
How to learn to build your own PHP framework
 
Hanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.framework
Hanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.frameworkHanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.framework
Hanoi php day 2008 - 01.pham cong dinh - how.to.build.your.own.framework
 
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
 
Yahoo for the Masses
Yahoo for the MassesYahoo for the Masses
Yahoo for the Masses
 
How I built the demo's
How I built the demo'sHow I built the demo's
How I built the demo's
 
Experiments in Data Portability 2
Experiments in Data Portability 2Experiments in Data Portability 2
Experiments in Data Portability 2
 
When To Use Ruby On Rails
When To Use Ruby On RailsWhen To Use Ruby On Rails
When To Use Ruby On Rails
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
 
Building a Single Page Application using Ember.js ... for fun and profit
Building a Single Page Application using Ember.js ... for fun and profitBuilding a Single Page Application using Ember.js ... for fun and profit
Building a Single Page Application using Ember.js ... for fun and profit
 
Microformats HTML to API
Microformats HTML to APIMicroformats HTML to API
Microformats HTML to API
 
IPhone Web Development With Grails from CodeMash 2009
IPhone Web Development With Grails from CodeMash 2009IPhone Web Development With Grails from CodeMash 2009
IPhone Web Development With Grails from CodeMash 2009
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
 

Recently uploaded

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

SearchMonkey

  • 2. SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey (ptarjan@yahoo-inc.com) Online at: http://www.slideshare.net/ptarjan/searchmonkey-presentation 2 | http://developer.yahoo.com/searchmonkey
  • 3. What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After 3 | http://developer.yahoo.com/searchmonkey
  • 4. Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract 4 | http://developer.yahoo.com/searchmonkey
  • 5. Infobar: Wikipedia Preview Summary Blob 5 | http://developer.yahoo.com/searchmonkey
  • 6. Part of the puzzle 6 | http://developer.yahoo.com/searchmonkey
  • 7. Vocabularies • Need to speak the same language • I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue – Dublin Core (http://purl.org/dc/elements/1.1/) – Friend of a Friend (http://xmlns.com/foaf/0.1/) – X-Friend Network (http://gmpg.org/xfn/11/) – … (many more) 7 | http://developer.yahoo.com/searchmonkey
  • 8. Syntax • Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples • (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++ – Everything is a URL or String – Subject doesn’t have to be the document 8 | http://developer.yahoo.com/searchmonkey
  • 9. Syntax 2 • Key / Value pair – Title = Awesome SearchMonkey Presentation – Homepage = http://search.yahoo.com/searchmonkey • Triples – (self, http://purl.org/dc#title, “Awesome SearchMonkey Presentation”) – (self, http://vcard#url, http://search.yahoo.com/searchmonkey) 9 | http://developer.yahoo.com/searchmonkey
  • 10. Decompose to triples • I like to eat red candy – (self, http://example.com/likeEating, http://example.org/temp/redcandy) – (http://example.org/temp/redcandy, http://example.com/isColored, http://example.org/colors/red) – (http://example.org/temp/redcandy, http://example.com/isInstanceOf, http://example.org/food/candy) • Unnamed nodes are O.K. 10 | http://developer.yahoo.com/searchmonkey
  • 11. How to get data to SearchMonkey? Humans see: • name • picture of a person • current job • industry, … Computers see: an undifferentiated blob of HTML Can we make computers smarter? 11 | http://developer.yahoo.com/searchmonkey
  • 12. Artificial intelligence is hard. Plus … 12 | http://developer.yahoo.com/searchmonkey
  • 13. How does it work? site owners/publishers share structured data with Yahoo!. 1 site owners & third-party developers build SearchMonkey apps. 2 consumers customize their search experience with Enhanced Results or Infobars 3 Page Extraction RDF/Microformat Markup Acme.com’s Web Pages Index DataRSS feed Web Services Acme.com’s database 13 | http://developer.yahoo.com/searchmonkey
  • 14. Innards of SearchMonkey • You build a web-service inside our framework • When a search page renders – We check which SM apps are enabled – We call them • 50ms for in-page • Long time for AJAX – They return data in our template – We render them (and cache) 14 | http://developer.yahoo.com/searchmonkey
  • 15. Inside SM Developer Developer Publisher 15 | http://developer.yahoo.com/searchmonkey
  • 16. Data Sources: RDF and Microformats Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data 16 | http://developer.yahoo.com/searchmonkey
  • 17. Approach #1: Embedded RDF <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <!DOCTYPE html PUBLIC quot;-//W3C//DTD XHTML+RDFa 1.0//EN” quot;http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtdquot;> <html xmlns=http://www.w3.org/1999/xhtml xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:foaf=http://xmlns.com/foaf/0.1/ • Cached data lang=quot;enquot; xml:lang=quot;enquot;> <head> • allows Enhanced Results <title>The Amazing Home Page of Joe Smith</title> </head> • but not for dynamic data <body> <h1 property=quot;dc:titlequot;>Joe's Home Page</h1> • Reuse existing markup <div rel=quot;foaf:makerquot;> • but requires site redesign <h2 property=quot;foaf:namequot;>Joe Smith</h2> <div rel=quot;foaf:depictionquot; • Open approach resource=quot;http://joesmith.org/images/jsmith.pngquot;> <img src=quot;/images/jsmith.pngquot; • everyone can use alt=quot;Smiling headshot of Joequot; /> <p property=quot;dc:rightsquot;>Creative Commons • Passive, crawled by Y! Attribution 3.0 Unported</p> </div> • less bureaucracy to set up </div> … 17 | http://developer.yahoo.com/searchmonkey
  • 18. Approach #2: Embedded Microformats <div id=quot;hcard-Joe-Smithquot; class=quot;vcardquot;> <span class=quot;fnquot;>Joe Smith</span> <div class=quot;adrquot;> <div class=quot;street-addressquot;>123 Murphy Avenue</div> <span class=quot;localityquot;>Sunnyvale</span>, • Cached data <span class=quot;regionquot;>California</span> <span class=quot;postal-codequot;>94086</span> • allows Enhanced Results </div> <div class=quot;telquot;>(408) 555-1234</div> • but not for dynamic data </div>… • Reuse existing markup • but requires site redesign • Open approach • everyone can use • Passive, crawled by Y! • less bureaucracy to set up 18 | http://developer.yahoo.com/searchmonkey
  • 19. Approach #3: DataRSS Feed <?profile http://search.yahoo.com/searchmonkey-profile ?> <feed xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot; xsi:schemaLocation=quot;http://www.w3.org/2005/Atom ../xsd/datarss.xsdquot; xmlns:dc=quot;http://purl.org/dc/terms/” xmlns=quot;http://www.w3.org/2005/Atomquot; xmlns:commerce=quot;http://search.yahoo.com/searchmonkey/commerce/quot; • Cached data xmlns:y=quot;http://search.yahoo.com/datarss/quot;> <id>http://local.yahoo.com/datarss/</id> • allows Enhanced Results <author><name>Peter Mika (pmika@yahoo-inc.com)</name></author> • but not for dynamic data <title>Example data feed for Local</title> <updated>2008-07-16T04:05:06+07:00</updated> Generate feed from DB • <entry> • and maintain afterwards <title>Parcel 104</title> <id>http://local.yahoo.com/info-21583016-parcel-104-santa-clara</id> • Closed approach <updated>2008-07-16T04:05:06+07:00</updated> <content type=quot;application/xmlquot;> • only Yahoo! gets data <y:adjunct version=quot;1.0quot; name=quot;com.yahoo.local”> • Actively provide a feed <y:item rel=quot;dc:subjectquot;> <y:type typeof=quot;vcard:VCard commerce:Restaurant”> • <y:meta property=quot;commerce:hoursOfOperationquot;> coord w/Yahoo! to set up Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat. 19 | http://developer.yahoo.com/searchmonkey
  • 20. Approach #4: Extract with XSLT <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0quot;> <xsl:template match=quot;/quot;> <adjunctcontainer> <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> <item rel=quot;rel:Photo” • Generally not cached resource=quot;{//div[@class='hresume']//div[@class='image']/img/@src}quot;/> <item rel=quot;rel:Cardquot;> • too slow, infobar only <meta property=quot;vcard:fnquot;> • but good for dynamic <xsl:value-of select=quot;//div[@class='hresume']//span[contains(@class,'fn')]quot;/> data </meta> Scrape page with XSLT • <meta property=quot;vcard:titlequot;> <xsl:value-of select=quot;//div[@class='hresume']//ul[@class='current']/liquot;/> • operates on cleaned up </meta> version of the DOM </item> </adjunct> • watch out for template </adjunctcontainer> changes </xsl:template> </xsl:stylesheet> • Easy to prototype 20 | http://developer.yahoo.com/searchmonkey
  • 21. Prototyping with XSLT • What if I don’t have structured data? – I don’t own the site – I do own the site, but I want to prototype first • Build an XSLT custom data service first – Write some XSLT to extract the data and transform it into DataRSS – Mostly about finding the right XPath (use Firebug or XPather ) – Quick to implement, but brittle – Can’t do a good Enhanced Result 21 | http://developer.yahoo.com/searchmonkey
  • 22. Approach #5: Call a Web Service <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0” xmlns:h=http://www.w3.org/1999/xhtml xmlns:y=quot;urn:yahoo:srch” xsi:schemaLocation=quot;urn:yahoo:srch • Generally not cached http://api.search.yahoo.com/SiteExplorerService/V1/PageDataResponse.xsdquot;> <xsl:template match=quot;/quot;> • too <adjunctcontainer xmlns:my=quot;http://example.com/ns/1.0quot;> slow, infobar only <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> • but good for dynamic data <meta property=quot;my:link1quot;> • <xsl:value-of select=quot;//y:Result[1]/y:Urlquot;/> Call a Remote Web Service </meta> • allows SearchMonkey <meta property=quot;my:result1quot;> <xsl:value-of select=quot;//y:Result[1]/y:Titlequot;/> apps to glue together </meta> • can handle OpenSearch </adjunct> XML natively </adjunctcontainer> </xsl:template> </xsl:stylesheet> 22 | http://developer.yahoo.com/searchmonkey
  • 23. Creating an Infobar • Infobar advantages – Annotate someone else’s site – Use links and images from other domains • Mash up info from multiple sites • Affiliate / coupon links? Hmmm… – Can act on *, all websites • But these apps can be annoying if poorly designed • Key design principles – Put something useful in the summary – Be creative with the HTML 23 | http://developer.yahoo.com/searchmonkey
  • 24. Resources • Main: – http://developer.yahoo.com/searchmonkey • Lists and forums: – searchmonkey-developers@yahoogroups.com – http://suggestions.yahoo.com/searchmonkey • RDF and Microformats: – http://microformats.org – http://www.w3.org/TR/xhtml-rdfa-primer/ 24 | http://developer.yahoo.com/searchmonkey
  • 25. Do it for real • Demo 25 | http://developer.yahoo.com/searchmonkey
  • 26. Ninja Coding Techniques: Enter the Monkey 26 | http://developer.yahoo.com/searchmonkey
  • 27. Typical SearchMonkey PHP code $ret['title'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/vcard:title’ ; // Image $ret['image']['src'] = Data::get('com.yahoo.uf.hcard/rel:Card/vcard:photo/@resource'); $ret['image']['alt'] = SMDEFAULT; $ret['image']['title'] = SMDEFAULT; $ret['image']['allowResize'] = true; // Key Value pairs - up to 4 $ret['dict'][0]['key'] = quot;Affiliationquot;; $ret['dict'][0]['value'] = Data::get('com.yahoo.uf.hresume/resume:affiliation/vcard:org/vcard:organization-name'); $ret['dict'][1]['key'] = quot;Contactquot;; $ret['dict'][1]['value'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/@resource'); 27 | http://developer.yahoo.com/searchmonkey
  • 28. Your first mistake may be your last! 28 | http://developer.yahoo.com/searchmonkey
  • 29. True ninjas leave no room for error // Get the list of businesses. If we // get at least one, extract the // address and telephone number $appNodeList = Data::xpath(quot;/*/adjunct/item[@rel='rel:Listing']quot;); $yd = $appNodeList->item(0); $adr = $tel = quot;”; $nodeList = Data::xpath(quot;item[@rel='rel:Business']quot;, $yd); if ($nodeList->length != 0) { $nd = $nodeList->item(0); $adr = Data::xpathString(quot;meta[@property='vcard:adr']quot;, $nd); $tel = Data::xpathString(quot;meta[@property='vcard:tel']quot;, $nd); } if ($r_rating != quot;quot;) { $ratingstr = Data::getStarsFromNum($r_rating); if ($r_summary != quot;quot;) { $ratingstr = $ratingstr . quot; quot; . $r_summary; 29 | http://developer.yahoo.com/searchmonkey
  • 30. Useful conditional tricks • Check for empty data like this: – if (‘’==trim($var)) • Watch out for $a.’–’.$b.’-’.$c – What happens if these variables are empty? • You can create helper functions! – getOutput() must return an array, but there’s no reason not to create other functions – Call using self::function() instead of just function() 30 | http://developer.yahoo.com/searchmonkey
  • 31. Development (test, debug, collaborate) • Your two best friends: input and output • Collaborative development – Create a shared Y!ID for your organization – Export and import apps from the dashboard • Bellwethers – Start with just one or two, for simplicity – Once app is working, hit “autofind” and look at all ten, see what breaks – Always set the #1 bellwether to something that’s high-ranking; that’s your Gallery preview 31 | http://developer.yahoo.com/searchmonkey
  • 32. Image Helper Functions • Data::getStars(string $data_get_path) – i.e. Data::getStars(“smid:Jk8/review:rating”) • Data::getStarsFromNum(float $rating) – Must scale $rating to fall between 0-5 inclusive • Data::getImage(string $name) – Adds icons to your app • Data::getImage(“information”) • Data::getImage(“email”) • Data::getImage(“edit”) •… 32 | http://developer.yahoo.com/searchmonkey
  • 33. XML functions • NodeList Data::xpath($string query [, DOMNode $contextnode) – More complicated than Data::get() – Can count, iterate, find children – Can fetch all vcard:fn, regardless where they are – Can find a node and grab 1st four children • string Data::xpathString($string query [, DOMNode $contextnode) – Convenience function if you don’t need to do further DOM manipulation 33 | http://developer.yahoo.com/searchmonkey
  • 34. Infobar Design: Party like it’s 1999 • Sadly, can’t use CSS – and the default stylesheet strips off most style – thus lists won’t even display bullets or numbers, you have to fake this • Layout: use tables (remember tables?) • Fonts: can use <font color>, <font face>, <big>, <small> • Make good use of images and links • PRO TIP: Use PHP HEREDOC (<<<) 34 | http://developer.yahoo.com/searchmonkey
  • 35. Let Infobars be Infobars • Make use of the real estate 35 | http://developer.yahoo.com/searchmonkey
  • 36. Let Infobars be Infobars • Or be minimal • But don’t do an Infobar that’s really just an Enhanced Result in disguise – Use the blob and summary – Don’t use the thumbnail, key/value pairs, … 36 | http://developer.yahoo.com/searchmonkey
  • 37. Triggering on * • This can be annoying for general audiences – but it’s hard to abort an infobar before 50ms – and you can’t do this in the PHP layer if you depend on an extractor or web service – Data has to be provided by a feed or by structured markup • For specialized audiences a “*” infobar might be ok 37 | http://developer.yahoo.com/searchmonkey
  • 38. Triggering on * 38 | http://developer.yahoo.com/searchmonkey
  • 39. Triggering on * • Trigger on structured markup – Ex: Creative Commons Infobar • Use feeds to annotate the URLs you want • Instead of *, do a comma-separated list of sites: – www.uiuc.edu/*, www.stanford.edu/*, www.berkeley.edu/*, www.cmu.edu/*, … 39 | http://developer.yahoo.com/searchmonkey
  • 40. XSLT Extractors • Use the Firebug extension for Firefox – And Xpather, an extension for Firefox • Typical pattern: a skeleton of DataRSS, into which you plug some Xpath – For more complex XSL: • Use <xsl:template> • <xsl:for-each> is clumsier • Find a good ID to cling to – Compare arxiv.org (easy) to acm.org (harder) 40 | http://developer.yahoo.com/searchmonkey
  • 41. Examples • Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon 41 | http://developer.yahoo.com/searchmonkey

Editor's Notes

  1. <number>
  2. A SearchMonkey Enhanced result contains a great deal of structured data. It could have a picture, key/value pairs, deep links…This kind of information goes far beyond what normal search results give you – a title and an autoextracted summary. Where does this information come from? <number>
  3. Likewise, an Infobar has a summary (what the user sees before the pane is expanded) and a “blob”, an area of free-form HTML. <number>
  4. Here’s a profile page for a colleague of mine on LinkedIn. When you and I glance at the page, we see all sorts of structured information. We see pictures, contact info, names, … all sorts of items that have actual meaning.But spiders just see a blob of markup. The spider can extract some basic info, like a title (probably correct), a summary (could be good or not), and some other metadata. But for pulling structured information out of web pages, human beings beat computers hands down. So how to harvest structured data?One approach would be to make computers SMARTER, by improving their ability to do pattern recognition and natural language processing. DRAWBACKS:these sorts of AI-type features have proven to be pretty expensive and difficult to develop. I’m not smart enough to do this, so I want you to do it for me. YOU know a lot more about YOUR site than we do. even with a “dumb” approach, indexing all these billions of webpages already takes many thousands of CPU cores, crunching away. Again, very expensive.finally, we all know what happens here. The computer begins scouring information from the entire world wide web, starts learning at a geometric rate, becomes self-aware, …Search<number>
  5. Computers become intelligent, begin to learn at a geometric rate, form SkyNet, and scour the Earth with nuclear fire. Shareholder value decreases.So we decided to go with the approach of -- keep our spider fairly dumb, and figure out different ways for people to provide us with structured data.
  6. In this scenario, we see all the different ways that you can feed SearchMonkey with data. A real SearchMonkey app probably wouldn’t use ALL these methods. From your database / CMS, you generate web pages with HTML markup. Those web pages can contain microformats or RDF, special markup that provides semantic meaning about the data on your pages. Our crawler can extract this information, just as it does the title, the page content, the mime-type, and so on. Alternatively, from your database you can also provide us with a DataRSS feed (more on that later) that we consume and place into our index.SearchMonkey also has two ways to actively retrieve information. You can create a Page Extractor, which scrapes information from a web page. You can also call a web service to retrieve more information about a page. We’ll talk more about all these methods in the subsequent slides.
  7. RDF is a W3C standard for providing generalized data about semantic relationships. The way to provide RDF data to SearchMonkey is to salt your pages with special markup, extra attributes that signify that the meaning of that content. For example, we can mark up an image as the DEPICTION of the PERSON who made the page. Something a human being can infer instantly, but that a computer has to be told.Data is CACHED, meaning that you can create Enhanced Result type apps (as well as infobars). This is very good. The only downside is that it depends on the page being crawled, which means it’s not good for rapidly changing data. You wouldn’t want to use this approach for sports scores in an ongoing game, for example.RDF is also an OPEN approach – just like HTML allows anyone who builds a browser to view your pages, RDF enables anyone who can build an RDF extractor to benefit from this additional semantic information.RDF is also a PASSIVE approach – unlike feeds, which we’ll talk about later, you just have to sit back and wait for Yahoo to crawl your site. No back and forth or bureaucracy required. The really nice thing about using RDF is that you get to reuse content already available on your site.
  8. Microformats are very similar to embedded RDF, just a slightly different approach. There are a wide variety of microformats, for events, for addresses, for social relationships, and so on. For each type of microformat, we have to implement support in SearchMonkey separately. SearchMonkey supports a number of microformats, all listed in the SearchMonkey documentation. By contrast, if you use RDF, you can use any vocabulary you like.
  9. DataRSS is the last way to provide cached data, suitable for Enhanced Results. The difference is that DataRSS is CLOSED, the data is only available to Yahoo!, via SearchMonkey. DataRSS requuires you to actively provide and maintain a feed. The feed format is Atom (a common, standard syndication format) with additional Y! metadata attached. Setting up a feed requires coordination with us, and maintenance of the feed going forward. Just like our previous microformat example, once a feed is up and running, it appears in the devtool just like any other cached data.
  10. For more rapidly changing data, you can create a Custom Data Service that extracts data from a web page using XSLT. This data generally isn’t cached, so it’s really only appropriate for infobars. However, it can be used with more rapidly updating data. It’s EXCELLENT for testing and prototyping, before your feed or data is ready[show demo]
  11. XSLT custom data services are excellent when there is no good structured data available, either because you don’t own the site in question, or because you just want to get a prototype out quickly without having to to change your site’s template markup. You can use these data services to mock up what is possible with SearchMonkey.As with the PHP, the XSLT is fairly simple. The “hard” part of writing the stylesheet is really just finding the right xpath expression for extracting the information you want. The other thing you need to do is pick a good vocabulary for describing the extracted data. For example, a description is a dc:description (Dublin Core description) and so on.If the page is not well-formed XHTML, have no fear, we tidy up the page ahead of time and run the XSLT on that. The tidying can fail, but only if the markup is really pathologically bad.As we mentioned before, XSLT custom data services are good for mocking up Enhanced Results, but they’re too slow in practice. For a production-quality app, you’ll need to use them in infobars.[Show demo]
  12. Enhanced Results are designed according to a rigid visual template, with image, links, and key/value pairs all carefully controlled. This is because we want to ensure that the search result still resembles a search result. Users scan the page, and will skip right over “wild” designs. Users literally will not consciously perceive weird results – they’ll think it’s an ad and screen it out. Infobars are the opposite. When a user opens an Infobar, they are “on task” and consciously engaged with the app. This means that for Infobars, you can and should be creative with the HTML and inline CSS. You’ve got a pretty decent canvas, so use it. The other main design principle for Infobars is that the summary must have useful text or a useful link in it. If the summary is generic, the user will not even see your infobar at all. Find one good link or one good key/value pair and put it in the summary to attract the user’s attention.
  13. Wiring up a SearchMonkey presentation app is easy. A few clicks and you have a working app.
  14. But there’s a world of difference between a working SearchMonkey app and real, production code.
  15. Everyone’s data will have holes in it. Use conditionals to check for whether fields are empty, and either swap in a different field or don’t show the field in the first place. If you’re missing critical data, you can abort by returning an empty array().
  16. The most important SearchMonkey buttons are the input and output buttons. If your app isn’t displaying properly in the preview pane, the input and output buttons will tell you why.A best practice is to create a shared Y! ID for development. This Y! user will appear in the Gallery, so you should set the name to something official looking, rather than just your name. You can also export SearchMonkey code to a file and share it with other users. Bellwethers serve two purposes. First, you need them to build your app – they determine what sort of data is on screen #3 and they serve as your live preview. Second, they’re good for QA. You only need one or two to start with, especially since it might take awhile to load ten URLs at once. After your app looks good on your first bellwethers, you should expand to 10.
  17. Make use of the image helper functions. You can use these icons in both Infobars and Enhanced Results.
  18. Most apps only require simple Data::get() calls, but if you need to do more complicated XML manipulation, use Data::xpath() or Data::xpathString().
  19. Either show a lot of data with an infobar (use that entire canvas)…
  20. Or find one good link or one good key/value pair and put it in the summary to attract the user’s attention. Either way, there’s little point in creating an Infobar that follows the strict template of the Enhanced Result.
  21. Infobars that trigger on * can be neat, but often they can be annoying. Unless the infobar really does have something useful to do on every single URL on the search results page, you should try to narrow your scope. Search<number>
  22. Stumbleupon acts on every URL – it might be useful for people who are very gung-ho about social networking / Web 2.0 sites, but it’s less appealing for the general public. Search<number>
  23. Screen #3 provides a clever way to abort your infobar, even if you’re triggering on *. If you can make your app depend on some structured markup (whether it’s embedded hcard or some piece of data provided by a feed), you can Failing that, you can go to Screen #2, and just apply your app to just a limited list of sites. Your app for college sites doesn’t have to trigger on * -- a finite list of sites might work.Search<number>