Publishing Technology Online Forum - Engineering the semantic web

1,564 views

Published on

Priya Parvatakir from Publishing Technology demonstrates how it is implementing semantic web technologies in new publisher GSE Research's online publishing website.

Published in: Education, Technology
  • You should check out a company that I have been working with called Writers Out Publishing, www.writersoutpublishing.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Publishing Technology Online Forum - Engineering the semantic web

  1. 1. Engineering technology to deliver the revolution Presentation to Online Publishers’ forum <ul><li>November 29, 2011 </li></ul><ul><li>Priya Parvatikar, Technical Architect </li></ul>
  2. 2. About this talk <ul><ul><li>Features of the GSE Research website </li></ul></ul><ul><ul><li>Overview of how the features have been achieved </li></ul></ul><ul><ul><li>‘ Under the hood’ look at the technology </li></ul></ul>Engineering technology to deliver the revolution
  3. 3. Improved search - Enhancing auto-suggest Engineering technology to deliver the revolution
  4. 4. Using taxonomy information for “did you mean” Engineering technology to deliver the revolution
  5. 5. Boosting relevant results Engineering technology to deliver the revolution
  6. 6. Guiding the user through facets Engineering technology to deliver the revolution
  7. 7. Guiding the user through suggestions Engineering technology to deliver the revolution
  8. 8. Concept homepages Engineering technology to deliver the revolution
  9. 9. Showing concepts on item homepages Engineering technology to deliver the revolution
  10. 10. Suggest related items Engineering technology to deliver the revolution
  11. 11. GSE Research – How? <ul><ul><li>Built using the pub2web platform </li></ul></ul><ul><ul><li>MetaStore used for metadata storage </li></ul></ul><ul><ul><li>Apache Solr used for search indexing </li></ul></ul><ul><ul><li>Semantic enrichment of content </li></ul></ul><ul><ul><li>Apache UIMA used for entity extraction </li></ul></ul>Engineering technology to deliver the revolution
  12. 12. MetaStore <ul><ul><li>RDF triplestore for storing metadata </li></ul></ul><ul><ul><li>Agnostic to the type of data being stored </li></ul></ul><ul><ul><li>Able to store rich and very granular data </li></ul></ul><ul><ul><li>Flexible to cater for future data enhancements </li></ul></ul><ul><ul><li>For the GSE Research site: </li></ul></ul><ul><ul><li>Content </li></ul></ul><ul><ul><li>Authors </li></ul></ul><ul><ul><li>Taxonomy concepts and relations </li></ul></ul><ul><ul><li>Federation of data from external datasets </li></ul></ul>Engineering technology to deliver the revolution
  13. 13. Search <ul><ul><li>Uses enterprise-grade Apache Solr </li></ul></ul><ul><ul><li>Inbuilt support for rich features </li></ul></ul><ul><ul><ul><li>Faceted searching </li></ul></ul></ul><ul><ul><ul><li>Synonyms </li></ul></ul></ul><ul><ul><ul><li>Stemming </li></ul></ul></ul><ul><ul><ul><li>Boosting </li></ul></ul></ul><ul><ul><ul><li>‘ More like this’ </li></ul></ul></ul><ul><ul><ul><li>‘ Did you mean’ </li></ul></ul></ul>Engineering technology to deliver the revolution
  14. 14. Content for GSE Research website <ul><li>Provided by GSE </li></ul><ul><ul><li>Content XML </li></ul></ul><ul><ul><li>Taxonomy prepared by GSE </li></ul></ul><ul><li>Taxonomy enhancement </li></ul><ul><ul><li>Concepts mapped to Library of Congress classifications </li></ul></ul><ul><ul><li>Taxonomy automatically enhanced with terms from this classification </li></ul></ul>Engineering technology to deliver the revolution
  15. 15. GSE Research taxonomy - example <ul><ul><li>For example, the GSE taxonomy contains </li></ul></ul><ul><ul><li>Climate change, pollution & environmental impacts </li></ul></ul><ul><ul><li>Water pollution </li></ul></ul><ul><ul><li>Air pollution </li></ul></ul><ul><ul><li>After enhancing with Library of Congress classification </li></ul></ul><ul><ul><li>Climate change, pollution & environmental impacts </li></ul></ul><ul><ul><li>Water pollution – variants: aquatic pollution, water contamination </li></ul></ul><ul><ul><li>Marine pollution – variants: ocean pollution, sea pollution </li></ul></ul><ul><ul><li>Oil pollution of water – variants: petroleum pollution of water </li></ul></ul><ul><ul><li>Estuarine pollution – variants: estuary pollution </li></ul></ul><ul><ul><li>Air pollution </li></ul></ul>Engineering technology to deliver the revolution
  16. 16. Content workflow in GSE Research Engineering technology to deliver the revolution MetaStore Search Index MetaStore Loader Text mining pipelines Content Images Tables Authors Additional concepts Concepts External datasets
  17. 17. Entity extraction for GSE Research content <ul><li>Apache UIMA </li></ul><ul><ul><li>Architectural framework to manage unstructured data </li></ul></ul><ul><ul><li>Apache license open-source project </li></ul></ul><ul><ul><li>OASIS standard </li></ul></ul><ul><li>Provides </li></ul><ul><ul><li>Framework </li></ul></ul><ul><ul><li>Annotators – multiple annotators can be applied in a pipeline </li></ul></ul><ul><ul><li>Ability to plug in external text-mining services as annotators </li></ul></ul>Engineering technology to deliver the revolution
  18. 18. Example of entity extraction Engineering technology to deliver the revolution
  19. 19. Editorial curation Engineering technology to deliver the revolution
  20. 20. Future possibilities for GSE Research <ul><ul><li>Extraction of geographical concepts </li></ul></ul><ul><ul><li>Federation of data from other external datasets eg. government datasets </li></ul></ul><ul><ul><li>Semantic analysis of search queries to deliver better results </li></ul></ul>Engineering technology to deliver the revolution
  21. 21. Summary <ul><ul><li>Tagging drives discovery </li></ul></ul><ul><ul><li>Provide multiple routes to content </li></ul></ul><ul><ul><li>Provide external context to content </li></ul></ul><ul><ul><li>Start simple and experiment </li></ul></ul><ul><ul><li>Flexibility of underlying systems is key </li></ul></ul>Engineering technology to deliver the revolution
  22. 22. Thank you!
  23. 23. Entity extraction for GSE Research content Engineering technology to deliver the revolution Climate change, pollution & env impacts Environmental Treaties Air pollution Environmental Treaties Kyoto protocol Montreal protocol Bonn convention narrower narrower sameAs

×