Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Monkey with the Semantic Web
SearchMonkey



                          Presentation by:



         Paul Tarjan, Chief Technical Monkey
               ...
The web was / is fragmented


                                   Funny pictures
       Super secret
        military site
...
So we added search to find stuff


               Google                      Yahoo




                Super
             ...
But there are many similar sites



   Facebook Events    Evite Events   Upcoming Events



      Youtube          Metacaf...
Wouldn’t it be cool if you could do:

  •  object:video creator:”Paul Tarjan”
     length<=60s
Wouldn’t it be cool if you could do:

  •  object:video creator:http://paulisageek.com/
     length<=60s
Wouldn’t it be cool if you could do:

  •  object:game name:”Desktop Tower Defense”
     version:1.5 publishdate:”May 2 20...
Wouldn’t it be cool if you could do:

  •  object:video author:”The Escapist”
     game:”Left 4 Dead”
It gets even
    cooler
Aggregation:

  •  object:review type:camera make:canon
     model:D40
Aggregation:

  •  object:event date:”May 16, 2008”
     type:party price<$5
Aggregation:

  •  object:photo person:“Paul Tarjan”
Aggregation:

  •  object:photo person:http://paulisageek.com
The Semantic What?

  •  Web pages are views of data for people to
     read
  •  Search Engines are a hack
  •  They trea...
Ok, I want to do it.
    Now what?
Recommendation: µF

  •  If there is a microformat for your data, use it
     –  hcard
     –  hreview
     –  hresume
   ...
µF in a nutshell

  •  Change your @class to something that is known
  •  <div>
     –  <span class=“name”>Paul Tarjan</sp...
Recommendation: RDFa

  •  If you have data that doesn’t really fit in a
     µF
  •  Examples:
    –  Markup APIs (YUI, ja...
RDFa in a nutshell

  •  Make a namespace
  •  Use @property, @rel and @resource
  •  For DATA: @property makes the node
 ...
Normal HTML

  •  <html>
   
…
   <div class=quot;private”>
   
private static String 
   
<strong>_createCookieHash </str...
RDFa: example

  •  <html xmlns:yui=quot;http://yuilibrary.com/rdf/
     1.0/yui.rdf#quot;>
   
…
   <div class=quot;priva...
That’s it!

   •  Automatically picked up by semantic
      parsers / crawlers
   •  Can build a SearchMonkey app on it
  ...
What is SearchMonkey?

         an open platform for using structured data to build more
         useful and relevant sear...
Enhanced Result: Zagat




        Image      Links   Key/Value Pairs
                           or Abstract
Infobar: Wikipedia Preview




              Summary         Blob
Part of the puzzle


           Semantic vocabularies


    Semantic markup on web pages


                SearchMonkey
Vocabularies

  •  Need to speak the same language
  •  I like to see girls of that... caliber.
  •  English, French, Span...
Syntax

  •  Nouns, Verbs, and Adjectives, oh my!
  •  All phrases become lots of triples
  •  (Subject, Verb / Adj. / Pre...
Syntax 2

  •  Key / Value pair
     –  Title = Awesome SearchMonkey Presentation
     –  Homepage =
        http://search...
Decompose to triples

  •  My friend “Bob” is an idiot.
     –  (self, http://xmlns.com/foaf/0.1/knows,
        genid:Ui__...
Writing URLs takes a lot of work!

  •  xmlns:foaf=http://xmlns.com/foaf/0.1/
  •  xmlns:vcard=http://www.w3.org/2001/vcar...
RDFa

  •  <html xmlns:foaf=“http://xmlns.com/foaf/0.1”
     xmlns:vcard=http://www.w3.org/2001/vcard-rdf/
     3.0# xmlns...
•  </SemanticWeb>


•  Questions?
Innards of SearchMonkey

  •  You build a web-service inside our
     framework
  •  When a search page renders
    –  We ...
Prototyping with XSLT

  •  What if I don’t have structured data?
     –  I don’t own the site
     –  I do own the site, ...
Do it for real

   •  Demo
Examples



  •  Rubic’s cube
  •  VTA Bus
  •  API Monkey
  •  BugMeNot
  •  RetailMeNot
  •  Amazon
questions?
Upcoming SlideShare
Loading in …5
×

Semantic Searchmonkey

11,281 views

Published on

Semantic Search + SeachMonkey talk given at Yahoo! Hacku event.

http://developer.yahoo.com/hacku
http://developer.yahoo.com/searchmonkey

Published in: Technology, Education

Semantic Searchmonkey

  1. Monkey with the Semantic Web
  2. SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey (ptarjan@yahoo-inc.com) Online at: http://www.slideshare.net/ptarjan/semantic-searchmonkey
  3. The web was / is fragmented Funny pictures Super secret military site Friend’s website University Cool event page bookmarks
  4. So we added search to find stuff Google Yahoo Super Funny secret pictures military site Friend’s University Cool website event page bookmarks
  5. But there are many similar sites Facebook Events Evite Events Upcoming Events Youtube Metacafe Vimeo Digg Reddit Technorati Let’s treat these as “views” onto “objects”
  6. Wouldn’t it be cool if you could do: •  object:video creator:”Paul Tarjan” length<=60s
  7. Wouldn’t it be cool if you could do: •  object:video creator:http://paulisageek.com/ length<=60s
  8. Wouldn’t it be cool if you could do: •  object:game name:”Desktop Tower Defense” version:1.5 publishdate:”May 2 2005”
  9. Wouldn’t it be cool if you could do: •  object:video author:”The Escapist” game:”Left 4 Dead”
  10. It gets even cooler
  11. Aggregation: •  object:review type:camera make:canon model:D40
  12. Aggregation: •  object:event date:”May 16, 2008” type:party price<$5
  13. Aggregation: •  object:photo person:“Paul Tarjan”
  14. Aggregation: •  object:photo person:http://paulisageek.com
  15. The Semantic What? •  Web pages are views of data for people to read •  Search Engines are a hack •  They treat pages as a bucket of words •  Lets turn the web into a database •  APIs are good, but there is no “web” of APIs •  If you figure out a good way of doing that, let me know 
  16. Ok, I want to do it. Now what?
  17. Recommendation: µF •  If there is a microformat for your data, use it –  hcard –  hreview –  hresume –  hcalendar –  rel-tag –  rel-licence –  xfn –  hatom –  geo
  18. µF in a nutshell •  Change your @class to something that is known •  <div> –  <span class=“name”>Paul Tarjan</span> –  <span class=‘email’>spam@paulisageek.com</span> •  </div> •  BECOMES •  <div class=“vcard”> –  <span class=“fn”>Paul Tarjan</span> –  <span class=“email”>spam@paulisageek.com</span> •  </div>
  19. Recommendation: RDFa •  If you have data that doesn’t really fit in a µF •  Examples: –  Markup APIs (YUI, javadoc, etc) –  Media (Audios, Videos, Games, Presentations) –  Job Postings
  20. RDFa in a nutshell •  Make a namespace •  Use @property, @rel and @resource •  For DATA: @property makes the node contents into the value •  For URLs: @rel makes the @resource into the value
  21. Normal HTML •  <html> … <div class=quot;private”> private static String <strong>_createCookieHash </strong> (hash) …
  22. RDFa: example •  <html xmlns:yui=quot;http://yuilibrary.com/rdf/ 1.0/yui.rdf#quot;> … <div class=quot;private” rel=quot;yui:methodquot; resource=quot;#method__createCookieHashquot;> private static String <strong property=quot;yui:namequot;> _createCookieHash </strong> (hash) …
  23. That’s it! •  Automatically picked up by semantic parsers / crawlers •  Can build a SearchMonkey app on it •  Can make a mashup way easier than screen scraping •  Can get the data from Yahoo! BOSS
  24. What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After
  25. Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract
  26. Infobar: Wikipedia Preview Summary Blob
  27. Part of the puzzle Semantic vocabularies Semantic markup on web pages SearchMonkey
  28. Vocabularies •  Need to speak the same language •  I like to see girls of that... caliber. •  English, French, Spanish, Esparanto? •  URLs to the rescue –  Dublin Core (http://purl.org/dc/elements/1.1/) –  Friend of a Friend (http://xmlns.com/foaf/0.1/) –  X-Friend Network (http://gmpg.org/xfn/11/) –  … (many more)
  29. Syntax •  Nouns, Verbs, and Adjectives, oh my! •  All phrases become lots of triples •  (Subject, Verb / Adj. / Prep. / etc, Object) •  Key / Value pairs ++ –  Everything is a URL or String –  Subject doesn’t have to be the document
  30. Syntax 2 •  Key / Value pair –  Title = Awesome SearchMonkey Presentation –  Homepage = http://search.yahoo.com/searchmonkey •  Triples –  (self, http://purl.org/dc#title, “Awesome SearchMonkey Presentation”) –  (self, http://vcard#url, http://search.yahoo.com/searchmonkey)
  31. Decompose to triples •  My friend “Bob” is an idiot. –  (self, http://xmlns.com/foaf/0.1/knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, http:// www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”) –  (genid:Ui__152310312_366, http:// example.org/ptarjan/isInstanceOf, http:// example.org/ptarjan/idiot) •  Unnamed nodes are O.K.
  32. Writing URLs takes a lot of work! •  xmlns:foaf=http://xmlns.com/foaf/0.1/ •  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/ 3.0# •  xmlns:junk=http://example.org/ptarjan/ •  My friend “Bob” is an idiot. –  (self, foaf:knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, vcard:fn, “Bob”) –  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot) •  Unnamed nodes are O.K.
  33. RDFa •  <html xmlns:foaf=“http://xmlns.com/foaf/0.1” xmlns:vcard=http://www.w3.org/2001/vcard-rdf/ 3.0# xmlns:junk=http://example.org/ptarjan/> <div rel=“foaf:knows”> <span property=“vcard:fn”>Bob</span> <span rel=“junk:isInstanceOf” resource=“junk:idiot” /> </div> </html>
  34. •  </SemanticWeb> •  Questions?
  35. Innards of SearchMonkey •  You build a web-service inside our framework •  When a search page renders –  We check which SM apps are enabled –  We call them • 50ms for in-page • Long time for AJAX –  They return data in our template –  We render them (and cache)
  36. Prototyping with XSLT •  What if I don’t have structured data? –  I don’t own the site –  I do own the site, but I want to prototype first •  Build an XSLT custom data service first –  Write some XSLT to extract the data and transform it into DataRSS –  Mostly about finding the right XPath (use Firebug or XPather ) –  Quick to implement, but brittle –  Can’t do a good Enhanced Result
  37. Do it for real •  Demo
  38. Examples •  Rubic’s cube •  VTA Bus •  API Monkey •  BugMeNot •  RetailMeNot •  Amazon
  39. questions?

×