Engineering technology to deliver the revolution Presentation to Online Publishers’ forum November 29, 2011 Priya Parvatikar, Technical Architect
About this talk Features of the GSE Research website Overview of how the features have been achieved ‘ Under the hood’ look at the technology Engineering technology to deliver the revolution
Improved search - Enhancing auto-suggest Engineering technology to deliver the revolution
Using taxonomy information for “did you mean” Engineering technology to deliver the revolution
Boosting relevant results Engineering technology to deliver the revolution
Guiding the user through facets Engineering technology to deliver the revolution
Guiding the user through suggestions Engineering technology to deliver the revolution
Concept homepages Engineering technology to deliver the revolution
Showing concepts on item homepages Engineering technology to deliver the revolution
Suggest related items Engineering technology to deliver the revolution
GSE Research – How? Built using the pub2web platform  MetaStore used for metadata storage Apache Solr used for search indexing  Semantic enrichment of content  Apache UIMA used for entity extraction Engineering technology to deliver the revolution
MetaStore RDF triplestore for storing metadata Agnostic to the type of data being stored Able to store rich and very granular data Flexible to cater for future data enhancements For the GSE Research site:  Content Authors Taxonomy concepts and relations Federation of data from external datasets Engineering technology to deliver the revolution
Search Uses enterprise-grade Apache Solr  Inbuilt support for rich features Faceted searching Synonyms Stemming Boosting ‘ More like this’ ‘ Did you mean’ Engineering technology to deliver the revolution
Content for GSE Research website Provided by GSE Content XML Taxonomy prepared by GSE Taxonomy enhancement  Concepts mapped to Library of Congress classifications Taxonomy automatically enhanced with terms from this classification Engineering technology to deliver the revolution
GSE Research taxonomy - example For example, the GSE taxonomy contains Climate change, pollution & environmental impacts Water pollution Air pollution After enhancing with Library of Congress classification Climate change, pollution & environmental impacts Water pollution –  variants: aquatic pollution, water contamination  Marine pollution – variants: ocean pollution, sea pollution Oil pollution of water – variants: petroleum pollution of water Estuarine pollution – variants: estuary pollution Air pollution Engineering technology to deliver the revolution
Content workflow in GSE Research Engineering technology to deliver the revolution MetaStore Search Index MetaStore Loader Text mining pipelines Content  Images Tables Authors Additional concepts  Concepts External datasets
Entity extraction for GSE Research content Apache UIMA  Architectural framework to manage unstructured data Apache license open-source project OASIS standard Provides Framework Annotators – multiple annotators can be applied in a pipeline Ability to plug in external text-mining services as annotators Engineering technology to deliver the revolution
Example of entity extraction Engineering technology to deliver the revolution
Editorial curation Engineering technology to deliver the revolution
Future possibilities for GSE Research Extraction of geographical concepts Federation of data from other external datasets eg. government datasets Semantic analysis of search queries to deliver better results Engineering technology to deliver the revolution
Summary Tagging drives discovery Provide multiple routes to content Provide external context  to content Start simple and experiment Flexibility of underlying systems is key Engineering technology to deliver the revolution
Thank you!

Navigating the semantic web for publishers

  • 1.
    Engineering technology todeliver the revolution Presentation to Online Publishers’ forum November 29, 2011 Priya Parvatikar, Technical Architect
  • 2.
    About this talkFeatures of the GSE Research website Overview of how the features have been achieved ‘ Under the hood’ look at the technology Engineering technology to deliver the revolution
  • 3.
    Improved search -Enhancing auto-suggest Engineering technology to deliver the revolution
  • 4.
    Using taxonomy informationfor “did you mean” Engineering technology to deliver the revolution
  • 5.
    Boosting relevant resultsEngineering technology to deliver the revolution
  • 6.
    Guiding the userthrough facets Engineering technology to deliver the revolution
  • 7.
    Guiding the userthrough suggestions Engineering technology to deliver the revolution
  • 8.
    Concept homepages Engineeringtechnology to deliver the revolution
  • 9.
    Showing concepts onitem homepages Engineering technology to deliver the revolution
  • 10.
    Suggest related itemsEngineering technology to deliver the revolution
  • 11.
    GSE Research –How? Built using the pub2web platform MetaStore used for metadata storage Apache Solr used for search indexing Semantic enrichment of content Apache UIMA used for entity extraction Engineering technology to deliver the revolution
  • 12.
    MetaStore RDF triplestorefor storing metadata Agnostic to the type of data being stored Able to store rich and very granular data Flexible to cater for future data enhancements For the GSE Research site: Content Authors Taxonomy concepts and relations Federation of data from external datasets Engineering technology to deliver the revolution
  • 13.
    Search Uses enterprise-gradeApache Solr Inbuilt support for rich features Faceted searching Synonyms Stemming Boosting ‘ More like this’ ‘ Did you mean’ Engineering technology to deliver the revolution
  • 14.
    Content for GSEResearch website Provided by GSE Content XML Taxonomy prepared by GSE Taxonomy enhancement Concepts mapped to Library of Congress classifications Taxonomy automatically enhanced with terms from this classification Engineering technology to deliver the revolution
  • 15.
    GSE Research taxonomy- example For example, the GSE taxonomy contains Climate change, pollution & environmental impacts Water pollution Air pollution After enhancing with Library of Congress classification Climate change, pollution & environmental impacts Water pollution – variants: aquatic pollution, water contamination Marine pollution – variants: ocean pollution, sea pollution Oil pollution of water – variants: petroleum pollution of water Estuarine pollution – variants: estuary pollution Air pollution Engineering technology to deliver the revolution
  • 16.
    Content workflow inGSE Research Engineering technology to deliver the revolution MetaStore Search Index MetaStore Loader Text mining pipelines Content Images Tables Authors Additional concepts Concepts External datasets
  • 17.
    Entity extraction forGSE Research content Apache UIMA Architectural framework to manage unstructured data Apache license open-source project OASIS standard Provides Framework Annotators – multiple annotators can be applied in a pipeline Ability to plug in external text-mining services as annotators Engineering technology to deliver the revolution
  • 18.
    Example of entityextraction Engineering technology to deliver the revolution
  • 19.
    Editorial curation Engineeringtechnology to deliver the revolution
  • 20.
    Future possibilities forGSE Research Extraction of geographical concepts Federation of data from other external datasets eg. government datasets Semantic analysis of search queries to deliver better results Engineering technology to deliver the revolution
  • 21.
    Summary Tagging drivesdiscovery Provide multiple routes to content Provide external context to content Start simple and experiment Flexibility of underlying systems is key Engineering technology to deliver the revolution
  • 22.