Your SlideShare is downloading. ×
0
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Commodity Semantic Search: A Case Study of DiscoverEd
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Commodity Semantic Search: A Case Study of DiscoverEd

1,441

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,441
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Good afternoon. My name is Nathan Yergler, and I'm Chief Technology Officer at Creative Commons. This afternoon I'm going to talk about a semantic enhanced search engine for education we've been working on called DiscoverEd. It's built on commodity hardware and open source tools, and the software can be used for other domains. I'm going to talk about some approaches we tried and rejected, and give you some information on tools you can use for building your own semantic search without investing in your own server farm.
  • Transcript

    1. Commodity Semantic Search: A Case Study of DiscoverEd Nathan R. Yergler Creative Commons Semantic Technology Conference 24 June 2010
    2. share, reuse, and remix— legally
    3. Creative Commons provides legal and technical tools that make sharing easy, legal, and scalable.
    4.  
    5.  
    6. <a href=” http://creativecommons.org/licenses/by/3.0/ ” rel=”license”> Attribution 3.0 Unported </a>
    7. <rdf:RDF xmlns:cc='http://creativecommons.org/ns#' xmlns:foaf='http://xmlns.com/foaf/0.1/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:dc='http://purl.org/dc/elements/1.1/' xmlns:dcq='http://purl.org/dc/terms/' > <cc:License rdf:about=&quot;http://creativecommons.org/licenses/by/3.0/&quot;> <cc:permits rdf:resource=&quot;http://creativecommons.org/ns#DerivativeWorks&quot;/> <cc:permits rdf:resource=&quot;http://creativecommons.org/ns#Distribution&quot;/> <cc:permits rdf:resource=&quot;http://creativecommons.org/ns#Reproduction&quot;/> <cc:requires rdf:resource=&quot;http://creativecommons.org/ns#Notice&quot;/> <cc:requires rdf:resource=&quot;http://creativecommons.org/ns#Attribution&quot;/> <cc:legalcode rdf:resource=&quot;http://creativecommons.org/licenses/by/3.0/legalcode&quot;/> <dcq:hasVersion>3.0</dcq:hasVersion> <foaf:logo rdf:resource=&quot;http://i.creativecommons.org/l/by/3.0/80x15.png&quot;/> <foaf:logo rdf:resource=&quot;http://i.creativecommons.org/l/by/3.0/88x31.png&quot;/> <cc:licenseClass rdf:resource=&quot;http://creativecommons.org/license/&quot;/> <dc:creator rdf:resource=&quot;http://creativecommons.org&quot;/> </cc:License> </rdf:RDF>
    8. CC Rights Expression Language
    9. CC licenses are based on international copyright law
    10. There are hundreds of millions of pieces of CC-licensed content on the web
    11. OER <ul><li>“Open Educational Resources”
    12. Learning materials that are freely available to use, remix, and redistribute.
    13. Wide variety of format, content types, audience
    14. CC licenses make this content interoperable </li></ul>
    15. But how do you find OER you’re looking for?
    16.  
    17. OER Search == CC Search++ <ul><li>Similarities </li><ul><li>No central registry or repository
    18. It's up to publishers to label their works </li></ul><li>And additional interesting problems </li><ul><li>Differing views on what makes it “Educational”
    19. Additional facets – subject, language, etc </li></ul></ul>
    20. A Model for OER Search <ul><li>Curators identify educational resources </li><ul><li>Curators optionally add metadata
    21. A Curator may also be the Publisher
    22. Or a Curator may add metadata to someone else’s resources </li></ul></ul>
    23.  
    24.  
    25. A Model for OER Search (2) <ul><li>Ingest resource lists, metadata via RSS/Atom feeds, OAI-PMH </li></ul>
    26. Curators & Feeds
    27. Two Prototypes <ul><li>Google CSE
    28. Nutch </li></ul>
    29. Initial effort: Google CSE <ul><li>Google Custom Search Engine allows you to “create a search engine for a website or a collection of interesting websites.” </li><ul><li>Define resource patterns for inclusion
    30. Optionally include annotations – facets and labels </li></ul><li>Python scripts to consume resource lists
    31. Output XML suitable for Google CSE </li></ul>
    32. Scaling with CSE <ul><li>Lists of individual resources did not scale well
    33. Labels and Facets worked best with fixed, limited vocabulary
    34. License-filtered search unavailable </li></ul>
    35. Nutch-based Prototype <ul><li>Nutch + Jena triple store
    36. Simple scripts for generating seeds from the store
    37. IndexingFilter plugin for injecting metadata into Nutch index
    38. QueryFilter plugins for field-specific searches </li></ul>
    39. DiscoverEd (Nutch)
    40.  
    41. Prototype Results <ul><li>Curator model allows for very directed crawl </li><ul><li>Low cost, not very resource intensive </li></ul><li>Scale </li><ul><li>Flexibly filter on predicate values </li></ul><li>Limitations </li><ul><li>Provenance for curator metadata
    42. Predicate filters had to be “hand-crafted” </li></ul></ul>
    43. Current DiscoverEd Work
    44. “We're Open” <ul><li>Education is our test domain, but the tool can be generally useful
    45. Other organizations have expressed interest in using the DiscoverEd software
    46. Making code available on Gitorious, http://gitorious.org/discovered </li></ul>
    47. Provenance <ul><li>Initial work complete on storing provenance for curator metadata
    48. Working on integrating this with the query front end now
    49. When complete, will allow users to </li><ul><li>Limit their query to specific curators
    50. Exclude curators from their query </li></ul></ul>
    51. Field Queries <ul><li>Need to map predicate URIs to “human” names
    52. Currently map at query time </li><ul><li>Index using the URI
    53. Map specific terms to URIs
    54. For example, “tag” to “http://purl.org/dc/terms/subject”
    55. Requires a Filter for every predicate </li></ul><li>Landing work now to map at index time </li></ul>
    56. Information for Curators <ul><li>We want publishers/curators to publish more linked data
    57. Need a feedback loop to help drive this
    58. Working on “dashboard” to see what's indexed, how, etc.
    59. Second phase: documentation, tools to help improve their RDFa </li></ul>
    60. DiscoverEd Team & Supporters <ul><li>Ahrash Bissell
    61. Asheesh Laroia
    62. Raphael Krut-Landau
    63. Alex Kozak
    64. Christine Geith
    65. Karen Vignare </li></ul><ul><li>Hewlett Foundation
    66. Bill & Melinda Gates Foundation
    67. OSI
    68. Michigan State University
    69. AgShare </li></ul>
    70. Cloud Tools for Semantic Search <ul><li>Yahoo! BOSS </li><ul><li>Retrieve RDF extracted from pages </li></ul><li>Google CSE </li><ul><li>Filter using structured data (Page Maps, RDFa)
    71. Customize display using structured data </li></ul></ul>
    72. Conclusion <ul><li>Cloud tools can help build simple semantic search quickly
    73. Nutch provides a powerful, extensible platform for prototyping search tools
    74. DiscoverEd software demonstrates semantic search without large hardware investment </li></ul>
    75. http://wiki.creativecommons.org/DiscoverEd [email_address] @nyergler (identi.ca, twitter)

    ×