Your SlideShare is downloading. ×
0
<ul><ul><li>Semantic Search on the Public Web with Creative Commons </li></ul></ul><ul><ul><li>2006.03.07 </li></ul></ul><...
Billion$ (0) <ul><li>Let's get the hype out of the way.... </li></ul>
Billion$ (1) <ul><li>Let's get the hype out of the way.... </li></ul>
Billion$ (2) <ul><li>Let's get the hype out of the way.... </li></ul>
Billion$ (3) <ul><li>This calls for a mashup... </li></ul>
Billion$ (4)
Billion$ (5) <ul><li>Fortunately CC's founders thought of that from the beginning... </li></ul>
Billion$ (6)
Billion$ (7)
<ul><li>About Creative Commons </li></ul>
 
Core Licensing Suite:  Creator/Licensor chooses license options NonCommercial No Derivatives ShareAlike Every Creative Com...
 
Simple License Generator
Internet Archive Free Hosting for CC works http://www.archive.org/
<ul><li>Creative Commons Metadata </li></ul>
Creative Commons Metadata Example <ul><li><rdf:RDF xmlns=&quot;http://web.resource.org/cc/&quot; </li></ul><ul><li>xmlns:d...
Rights Description Use Cases Discovery Expression Commerce Management(1)
Rights Description vs. Rights Management(2) Copy/Use promotion vs. Copy/Use protection Encourage fans vs. Discourage casua...
Why Semantic Web? Small organization, no central registration for every license  Decentralization: Let a thousand search e...
Why RDF-in-HTML comments? (yuck) <ul><li>Considered: </li></ul><ul><li>Robots.txt-like </li></ul><ul><li>HTML meta tags </...
CC Search History I <ul><li>Postgresql/tsearch2/python prototype (early 2004) </li></ul><ul><ul><li>Sloooowwwww, but did w...
CC Search History II <ul><li>CC-Nutch (late 2004) </li></ul><ul><ul><li>Nutch aims to be open source search engine compara...
 
CC Search History III <ul><li>Yahoo! Search for Creative Commons (early 2005) </li></ul><ul><ul><li>Search CC-licensed sub...
 
 
 
CC Search History IV <ul><li>Google CC search (November 2005) </li></ul><ul><ul><li>Search CC-licensed subset of Google’s ...
 
 
 
CC Search History V (the future) Better metadata formats Image and Video search Derivatives search Content commerce search...
Future CC metadata formats <ul><li>“ Semantic XHTML” AKA “lowercase semantic web” AKA “microformats” (now) </li></ul><ul><...
 
 
 
 
Image and Video search Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” w...
Searching for Derivative Works
Creative Commons (0)
Creative Commons (0)
Creative Commons (0)
Creative Commons (0)
Derivatives search RDF/XML snippet: <dc:source rdf:resource=”http://ccmixter.org/media/files/victor/3385”/> Query like Yah...
Content commerce search Transaction costs should be low even if rights are reserved Commercial terms and other commerce de...
“ Live” web search (feeds) Feeds are explicitly metadata-rich (unlike typical web page) Existing blog search ignores metad...
“ Management” (desktop, workgroup) Desktop search (OS-level) Content creation and media player integration XMP Semantic Wi...
Semantic mashups
Issues for Semantic Search on the Public Web Metadata quality Trust Scalability Usability Compatibility Critical mass Stat...
<ul><ul><li>Semantic Search on the Public Web with Creative Commons </li></ul></ul><ul><ul><li>2006.03.07 </li></ul></ul><...
Upcoming SlideShare
Loading in...5
×

Semantic Search on the Public Web with Creative Commons

1,272

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,272
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Semantic Search on the Public Web with Creative Commons"

  1. 1. <ul><ul><li>Semantic Search on the Public Web with Creative Commons </li></ul></ul><ul><ul><li>2006.03.07 </li></ul></ul><ul><ul><li>Mike Linksvayer </li></ul></ul>
  2. 2. Billion$ (0) <ul><li>Let's get the hype out of the way.... </li></ul>
  3. 3. Billion$ (1) <ul><li>Let's get the hype out of the way.... </li></ul>
  4. 4. Billion$ (2) <ul><li>Let's get the hype out of the way.... </li></ul>
  5. 5. Billion$ (3) <ul><li>This calls for a mashup... </li></ul>
  6. 6. Billion$ (4)
  7. 7. Billion$ (5) <ul><li>Fortunately CC's founders thought of that from the beginning... </li></ul>
  8. 8. Billion$ (6)
  9. 9. Billion$ (7)
  10. 10. <ul><li>About Creative Commons </li></ul>
  11. 12. Core Licensing Suite: Creator/Licensor chooses license options NonCommercial No Derivatives ShareAlike Every Creative Commons licenses allows the world to copy and distribute a work provided that the licensee credits the creator/licensor In addition creator/licensor may apply the following conditions:
  12. 14. Simple License Generator
  13. 15. Internet Archive Free Hosting for CC works http://www.archive.org/
  14. 16. <ul><li>Creative Commons Metadata </li></ul>
  15. 17. Creative Commons Metadata Example <ul><li><rdf:RDF xmlns=&quot;http://web.resource.org/cc/&quot; </li></ul><ul><li>xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; </li></ul><ul><li>xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;> </li></ul><ul><li><Work rdf:about=&quot;http://example.com/article.html&quot;> </li></ul><ul><li><dc:title>An Example Article</dc:title> </li></ul><ul><li><dc:date>2003-10-01</dc:date> </li></ul><ul><li><dc:type rdf:resource=&quot;http://purl.org/dc/dcmitype/Text&quot; /> </li></ul><ul><li><license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot; /> </li></ul><ul><li></Work> </li></ul><ul><li><License rdf:about=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot;> </li></ul><ul><li><permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> </li></ul><ul><li><permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> </li></ul><ul><li><requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; /> </li></ul><ul><li><requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; /> </li></ul><ul><li><prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> </li></ul><ul><li><permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; /> </li></ul><ul><li><requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; /> </li></ul><ul><li></License> </li></ul><ul><li></rdf:RDF> </li></ul>
  16. 18. Rights Description Use Cases Discovery Expression Commerce Management(1)
  17. 19. Rights Description vs. Rights Management(2) Copy/Use promotion vs. Copy/Use protection Encourage fans vs. Discourage casual pirates Resource management vs. Customer management Web content model vs. 20 th century content model Not mutually exclusive in theory.
  18. 20. Why Semantic Web? Small organization, no central registration for every license Decentralization: Let a thousand search engines bloom; web as API Existing RDF tools could take advantage of CC RDF
  19. 21. Why RDF-in-HTML comments? (yuck) <ul><li>Considered: </li></ul><ul><li>Robots.txt-like </li></ul><ul><li>HTML meta tags </li></ul><ul><li>LINK to external RDF file </li></ul><ul><li>RDF-in-HTML comments wins because </li></ul><ul><li>Metadata colocated with human visible HTML, only single copy & paste for licensors </li></ul><ul><li>Full power of RDF </li></ul>
  20. 22. CC Search History I <ul><li>Postgresql/tsearch2/python prototype (early 2004) </li></ul><ul><ul><li>Sloooowwwww, but did what a prototype should do </li></ul></ul>
  21. 23. CC Search History II <ul><li>CC-Nutch (late 2004) </li></ul><ul><ul><li>Nutch aims to be open source search engine comparable to commercial web scale search engines </li></ul></ul><ul><ul><li>Built on top of Lucene full text index </li></ul></ul><ul><ul><li>CC plugin only ~500 lines of code (not counting UI, CC-required additions to Nutch core) </li></ul></ul><ul><ul><li>http://search.creativecommons.org uses Nutch, >1m CC-licensed pages indexed </li></ul></ul>
  22. 25. CC Search History III <ul><li>Yahoo! Search for Creative Commons (early 2005) </li></ul><ul><ul><li>Search CC-licensed subset of Yahoo!’s index (~15m* pages) </li></ul></ul><ul><ul><li>*very rough guesstimate </li></ul></ul>
  23. 29. CC Search History IV <ul><li>Google CC search (November 2005) </li></ul><ul><ul><li>Search CC-licensed subset of Google’s index (~45m* pages) </li></ul></ul><ul><ul><li>*very rough guesstimate </li></ul></ul>
  24. 33. CC Search History V (the future) Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
  25. 34. Future CC metadata formats <ul><li>“ Semantic XHTML” AKA “lowercase semantic web” AKA “microformats” (now) </li></ul><ul><li><a rel=“license” href=“ http://creativecommons.org/licenses/by/2.5/ ”> </li></ul><ul><li>RDF/A AKA XHTML2 metadata (in working group) </li></ul><ul><li>GRDDL (gleaning resource descriptions from dialects of languages) </li></ul>
  26. 39. Image and Video search Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
  27. 40. Searching for Derivative Works
  28. 41. Creative Commons (0)
  29. 42. Creative Commons (0)
  30. 43. Creative Commons (0)
  31. 44. Creative Commons (0)
  32. 45. Derivatives search RDF/XML snippet: <dc:source rdf:resource=”http://ccmixter.org/media/files/victor/3385”/> Query like Yahoo! link: search or Technorati Cosmos search source:http://ccmixter.org/media/files/victor/3385 “ Who sampled this” as the new “who linked to this”
  33. 46. Content commerce search Transaction costs should be low even if rights are reserved Commercial terms and other commerce described by metadata associated with a work Find me work I can use at a price I can pay for usage rights warranty/paper trail (even if rights not reserved) Reintermediate consumer and creator
  34. 47. “ Live” web search (feeds) Feeds are explicitly metadata-rich (unlike typical web page) Existing blog search ignores metadata Web search will become more like blog search, vice versa?
  35. 48. “ Management” (desktop, workgroup) Desktop search (OS-level) Content creation and media player integration XMP Semantic Wikis
  36. 49. Semantic mashups
  37. 50. Issues for Semantic Search on the Public Web Metadata quality Trust Scalability Usability Compatibility Critical mass State of the art IR works very well – high expectations!
  38. 51. <ul><ul><li>Semantic Search on the Public Web with Creative Commons </li></ul></ul><ul><ul><li>2006.03.07 </li></ul></ul><ul><ul><li>Mike Linksvayer </li></ul></ul><ul><ul><li>Questions, feedback, flames: </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>http://developer.creativecommons.org </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×