Your SlideShare is downloading. ×
Semantic Search on the Public Web with Creative Commons
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Semantic Search on the Public Web with Creative Commons

1,254
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,254
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1.
      • Semantic Search on the Public Web with Creative Commons
      • 2006.03.07
      • Mike Linksvayer
  • 2. Billion$ (0)
    • Let's get the hype out of the way....
  • 3. Billion$ (1)
    • Let's get the hype out of the way....
  • 4. Billion$ (2)
    • Let's get the hype out of the way....
  • 5. Billion$ (3)
    • This calls for a mashup...
  • 6. Billion$ (4)
  • 7. Billion$ (5)
    • Fortunately CC's founders thought of that from the beginning...
  • 8. Billion$ (6)
  • 9. Billion$ (7)
  • 10.
    • About Creative Commons
  • 11.  
  • 12. Core Licensing Suite: Creator/Licensor chooses license options NonCommercial No Derivatives ShareAlike Every Creative Commons licenses allows the world to copy and distribute a work provided that the licensee credits the creator/licensor In addition creator/licensor may apply the following conditions:
  • 13.  
  • 14. Simple License Generator
  • 15. Internet Archive Free Hosting for CC works http://www.archive.org/
  • 16.
    • Creative Commons Metadata
  • 17. Creative Commons Metadata Example
    • <rdf:RDF xmlns=&quot;http://web.resource.org/cc/&quot;
    • xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot;
    • xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;>
    • <Work rdf:about=&quot;http://example.com/article.html&quot;>
    • <dc:title>An Example Article</dc:title>
    • <dc:date>2003-10-01</dc:date>
    • <dc:type rdf:resource=&quot;http://purl.org/dc/dcmitype/Text&quot; />
    • <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot; />
    • </Work>
    • <License rdf:about=&quot;http://creativecommons.org/licenses/by-nc-sa/2.5/&quot;>
    • <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; />
    • <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; />
    • <requires rdf:resource=&quot;http://web.resource.org/cc/Notice&quot; />
    • <requires rdf:resource=&quot;http://web.resource.org/cc/Attribution&quot; />
    • <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; />
    • <permits rdf:resource=&quot;http://web.resource.org/cc/DerivativeWorks&quot; />
    • <requires rdf:resource=&quot;http://web.resource.org/cc/ShareAlike&quot; />
    • </License>
    • </rdf:RDF>
  • 18. Rights Description Use Cases Discovery Expression Commerce Management(1)
  • 19. Rights Description vs. Rights Management(2) Copy/Use promotion vs. Copy/Use protection Encourage fans vs. Discourage casual pirates Resource management vs. Customer management Web content model vs. 20 th century content model Not mutually exclusive in theory.
  • 20. Why Semantic Web? Small organization, no central registration for every license Decentralization: Let a thousand search engines bloom; web as API Existing RDF tools could take advantage of CC RDF
  • 21. Why RDF-in-HTML comments? (yuck)
    • Considered:
    • Robots.txt-like
    • HTML meta tags
    • LINK to external RDF file
    • RDF-in-HTML comments wins because
    • Metadata colocated with human visible HTML, only single copy & paste for licensors
    • Full power of RDF
  • 22. CC Search History I
    • Postgresql/tsearch2/python prototype (early 2004)
      • Sloooowwwww, but did what a prototype should do
  • 23. CC Search History II
    • CC-Nutch (late 2004)
      • Nutch aims to be open source search engine comparable to commercial web scale search engines
      • Built on top of Lucene full text index
      • CC plugin only ~500 lines of code (not counting UI, CC-required additions to Nutch core)
      • http://search.creativecommons.org uses Nutch, >1m CC-licensed pages indexed
  • 24.  
  • 25. CC Search History III
    • Yahoo! Search for Creative Commons (early 2005)
      • Search CC-licensed subset of Yahoo!’s index (~15m* pages)
      • *very rough guesstimate
  • 26.  
  • 27.  
  • 28.  
  • 29. CC Search History IV
    • Google CC search (November 2005)
      • Search CC-licensed subset of Google’s index (~45m* pages)
      • *very rough guesstimate
  • 30.  
  • 31.  
  • 32.  
  • 33. CC Search History V (the future) Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
  • 34. Future CC metadata formats
    • “ Semantic XHTML” AKA “lowercase semantic web” AKA “microformats” (now)
    • <a rel=“license” href=“ http://creativecommons.org/licenses/by/2.5/ ”>
    • RDF/A AKA XHTML2 metadata (in working group)
    • GRDDL (gleaning resource descriptions from dialects of languages)
  • 35.  
  • 36.  
  • 37.  
  • 38.  
  • 39. Image and Video search Better metadata formats Image and Video search Derivatives search Content commerce search “ Live” web search “ Management” (desktop, workgroup) Semantic mashups
  • 40. Searching for Derivative Works
  • 41. Creative Commons (0)
  • 42. Creative Commons (0)
  • 43. Creative Commons (0)
  • 44. Creative Commons (0)
  • 45. Derivatives search RDF/XML snippet: <dc:source rdf:resource=”http://ccmixter.org/media/files/victor/3385”/> Query like Yahoo! link: search or Technorati Cosmos search source:http://ccmixter.org/media/files/victor/3385 “ Who sampled this” as the new “who linked to this”
  • 46. Content commerce search Transaction costs should be low even if rights are reserved Commercial terms and other commerce described by metadata associated with a work Find me work I can use at a price I can pay for usage rights warranty/paper trail (even if rights not reserved) Reintermediate consumer and creator
  • 47. “ Live” web search (feeds) Feeds are explicitly metadata-rich (unlike typical web page) Existing blog search ignores metadata Web search will become more like blog search, vice versa?
  • 48. “ Management” (desktop, workgroup) Desktop search (OS-level) Content creation and media player integration XMP Semantic Wikis
  • 49. Semantic mashups
  • 50. Issues for Semantic Search on the Public Web Metadata quality Trust Scalability Usability Compatibility Critical mass State of the art IR works very well – high expectations!
  • 51.
      • Semantic Search on the Public Web with Creative Commons
      • 2006.03.07
      • Mike Linksvayer
      • Questions, feedback, flames:
      • [email_address]
      • http://developer.creativecommons.org