Drupal and Apache Stanbol. What if you could reliably do autotagging?


Published on

My presentation on Drupal and Apache Stanbol integration at DrupalCamp Arad 2012 - Romania. Want to talk about this? Find me at http://webikon.com, Twitter: @gabidrg.

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Drupal and Apache Stanbol. What if you could reliably do autotagging?

  1. 1. Gabriel Dragomir Drupal and Apache Stanbol What if you could reliably do autotagging?Wednesday, January 23, 13
  2. 2. Semantic content is the key! Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data E.g. universities check theses for plagiarism SNSPA: we adapted WebFerret plagiarism checker for Romanian http://homepages.stca.herts.ac.uk/~pdgroup/Wednesday, January 23, 13
  3. 3. Semantic content is the key! Web Ferret - indentifies potential sources from the Internet and from an institutional repository CONS: Desktop based, no REST web services Cannot detect plagiarism by translationWednesday, January 23, 13
  4. 4. Semantic content is the key! Here comes Apache Stanbol A new approach: semantic analysis of documents extract citations in proximity search the web for documents with a similar citation structureWednesday, January 23, 13
  5. 5. From IKS to Apache Stanbol IKS - Interactive Knowledge Stack for small to medium CMS providers - EU funding An open source software stack written in Java Goal: extract and process semantic data from documents Project undergoing incubation at Apache Foundation http://stanbol.apache.orgWednesday, January 23, 13
  6. 6. Service oriented architecture Stanbol is designed to offer service oriented integration RESTful web service API returning RDF or JSON/ JSON-LD Each component exposes an endpoint independently Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling Remote component managementWednesday, January 23, 13
  7. 7. Implementation OSGi layer: Apache Felix and Apache Sling Build environment: Apache Maven RDF framework: Apache Clerezza Triples store, reasoning engine: Apache Jena Indexing and semantic search: Apache Solr Content analysis/metadata extraction: Apache Tika Natural language processing: Apache OpenNLPWednesday, January 23, 13
  8. 8. ArchitectureWednesday, January 23, 13
  9. 9. Components Semantic layer: Enhancer, EntityHub, ContentHub Enhancement engines: internal, 3rd party User interfaces Knowledge integration Storage integrationWednesday, January 23, 13
  10. 10. Content enhancement Examples: retrive additional metadata for a piece of content identify the language of a text extract entities (persons, places, organizations) create annotations to external sources use 3rd party services for named entities recognitionWednesday, January 23, 13
  11. 11. Drupal meets Stanbol Drupal supports RDFa allowing semantic annotations Taxonomy system allows for complex annotation Fieldable taxonomy terms allow for storage of complex semantic dataWednesday, January 23, 13
  12. 12. User scenarios Assisted semantic tagging: autotagging Content enrichment with semantically related information (documents, factual data, images etc.) Tag as you type: dynamic annotation of text in editors Autocomplete indexes - FAST with Apache SolrWednesday, January 23, 13
  13. 13. Autotagging with Stanbol Given a piece of content extract mentions of places, persons, organizations or other entities Named entity recognition (NER) OpenCalais and Zemanta provide similar functionality, limited free reqs, limited languages Stanbol does it for free Multilingual: may be trained for any languageWednesday, January 23, 13
  14. 14. How it works REST service: Apache Stanbol Enhancer Returns JSON-LD, RDF/XML, RDF/JSON etc curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" --data "The Stanbol enhancer can detect famous cities such as Paris and people such as Barack Obama." http://dev.iks-project.eu: 8081/enhancer JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport formatWednesday, January 23, 13
  15. 15. How it works JSON-LD: is included in Drupal 8 core Creates a description of the data as a “context” data structure Context: links object properties to concepts in an ontology Allows for values to be coerced to a certain set or languageWednesday, January 23, 13
  16. 16. How it works { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://www.barackobama.com", "@type": "person", "name": "Barack Obama", "homepage": "http://www.whitehouse.gov/" }Wednesday, January 23, 13
  17. 17. How it works { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://www.barackobama.com", "@type": "person", "name": "Barack Obama", "homepage": "http://www.whitehouse.gov/" } FOAF: “Friend of a friend” - RDF ontology describing people, their relations and activitiesWednesday, January 23, 13
  18. 18. { "@context": { (...) "foaf": "http://xmlns.com/foaf/0.1/", (...) "@subject": [ { "@subject": "http://dbpedia.org/resource/Barack_Obama", "@type": [ "dbp-ont:OfficeHolder", "dbp-ont:Person", "foaf:Person", "owl:Thing" ], (...) "foaf:depiction": [ "http://upload.wikimedia.org/wikipedia/en/e/e9/ Official_portrait_of_Barack_Obama.jpg", "http://upload.wikimedia.org/wikipedia/en/thumb/e/e9/ Official_portrait_of_Barack_Obama.jpg/200px-Official_portrait_of_Barack_Obama.jpg" ], "foaf:homepage": [ "http://www.whitehouse.gov/", "http://www.barackobama.com/" ],Wednesday, January 23, 13
  19. 19. How it works Source: blog.iks-project.euWednesday, January 23, 13
  20. 20. How it works On Drupal side we only have to parse the response Map JSON-LD properties to entity fields Use Drupal’s native RDFa capability to render semantic markup Use your imagination and build semantic contentWednesday, January 23, 13
  21. 21. Quick demo Semantic CMS - Evo42 communications, early adopter integration of Drupal with Stanbol Rene Kapusta - https://github.com/evo42/Semantic- CMS Drupal contributor, Aloha Editor core developerWednesday, January 23, 13