Gabriel Dragomir          Drupal and Apache Stanbol          What if you could reliably do autotagging?Wednesday, January ...
Semantic content is the key!              Most organizations need to organize/analyze/relate              huge amounts of ...
Semantic content is the key!              Web Ferret - indentifies potential sources from the              Internet and fro...
Semantic content is the key!              Here comes Apache Stanbol              A new approach:                    semant...
From IKS to Apache Stanbol              IKS - Interactive Knowledge Stack for small to medium              CMS providers -...
Service oriented architecture              Stanbol is designed to offer service oriented integration              RESTful ...
Implementation              OSGi layer: Apache Felix and Apache Sling              Build environment: Apache Maven        ...
ArchitectureWednesday, January 23, 13
Components              Semantic layer:                    Enhancer, EntityHub, ContentHub                    Enhancement ...
Content enhancement              Examples:                    retrive additional metadata for a piece of content          ...
Drupal meets Stanbol              Drupal supports RDFa allowing semantic annotations              Taxonomy system allows f...
User scenarios              Assisted semantic tagging: autotagging              Content enrichment with semantically relat...
Autotagging with Stanbol              Given a piece of content extract mentions of places,              persons, organizat...
How it works              REST service: Apache Stanbol Enhancer              Returns JSON-LD, RDF/XML, RDF/JSON etc       ...
How it works              JSON-LD: is included in Drupal 8 core              Creates a description of the data as a “conte...
How it works          {               "@context": {                  "name": "http://xmlns.com/foaf/0.1/name",            ...
How it works          {               "@context": {                  "name": "http://xmlns.com/foaf/0.1/name",            ...
{       "@context": {         (...)         "foaf": "http://xmlns.com/foaf/0.1/",         (...)       "@subject": [       ...
How it works                            Source: blog.iks-project.euWednesday, January 23, 13
How it works              On Drupal side we only have to parse the response              Map JSON-LD properties to entity ...
Quick demo              Semantic CMS - Evo42 communications, early adopter              integration of Drupal with Stanbol...
Upcoming SlideShare
Loading in...5
×

Drupal and Apache Stanbol. What if you could reliably do autotagging?

2,566

Published on

My presentation on Drupal and Apache Stanbol integration at DrupalCamp Arad 2012 - Romania. Want to talk about this? Find me at http://webikon.com, Twitter: @gabidrg.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,566
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Drupal and Apache Stanbol. What if you could reliably do autotagging?

  1. 1. Gabriel Dragomir Drupal and Apache Stanbol What if you could reliably do autotagging?Wednesday, January 23, 13
  2. 2. Semantic content is the key! Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data E.g. universities check theses for plagiarism SNSPA: we adapted WebFerret plagiarism checker for Romanian http://homepages.stca.herts.ac.uk/~pdgroup/Wednesday, January 23, 13
  3. 3. Semantic content is the key! Web Ferret - indentifies potential sources from the Internet and from an institutional repository CONS: Desktop based, no REST web services Cannot detect plagiarism by translationWednesday, January 23, 13
  4. 4. Semantic content is the key! Here comes Apache Stanbol A new approach: semantic analysis of documents extract citations in proximity search the web for documents with a similar citation structureWednesday, January 23, 13
  5. 5. From IKS to Apache Stanbol IKS - Interactive Knowledge Stack for small to medium CMS providers - EU funding An open source software stack written in Java Goal: extract and process semantic data from documents Project undergoing incubation at Apache Foundation http://stanbol.apache.orgWednesday, January 23, 13
  6. 6. Service oriented architecture Stanbol is designed to offer service oriented integration RESTful web service API returning RDF or JSON/ JSON-LD Each component exposes an endpoint independently Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling Remote component managementWednesday, January 23, 13
  7. 7. Implementation OSGi layer: Apache Felix and Apache Sling Build environment: Apache Maven RDF framework: Apache Clerezza Triples store, reasoning engine: Apache Jena Indexing and semantic search: Apache Solr Content analysis/metadata extraction: Apache Tika Natural language processing: Apache OpenNLPWednesday, January 23, 13
  8. 8. ArchitectureWednesday, January 23, 13
  9. 9. Components Semantic layer: Enhancer, EntityHub, ContentHub Enhancement engines: internal, 3rd party User interfaces Knowledge integration Storage integrationWednesday, January 23, 13
  10. 10. Content enhancement Examples: retrive additional metadata for a piece of content identify the language of a text extract entities (persons, places, organizations) create annotations to external sources use 3rd party services for named entities recognitionWednesday, January 23, 13
  11. 11. Drupal meets Stanbol Drupal supports RDFa allowing semantic annotations Taxonomy system allows for complex annotation Fieldable taxonomy terms allow for storage of complex semantic dataWednesday, January 23, 13
  12. 12. User scenarios Assisted semantic tagging: autotagging Content enrichment with semantically related information (documents, factual data, images etc.) Tag as you type: dynamic annotation of text in editors Autocomplete indexes - FAST with Apache SolrWednesday, January 23, 13
  13. 13. Autotagging with Stanbol Given a piece of content extract mentions of places, persons, organizations or other entities Named entity recognition (NER) OpenCalais and Zemanta provide similar functionality, limited free reqs, limited languages Stanbol does it for free Multilingual: may be trained for any languageWednesday, January 23, 13
  14. 14. How it works REST service: Apache Stanbol Enhancer Returns JSON-LD, RDF/XML, RDF/JSON etc curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" --data "The Stanbol enhancer can detect famous cities such as Paris and people such as Barack Obama." http://dev.iks-project.eu: 8081/enhancer JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport formatWednesday, January 23, 13
  15. 15. How it works JSON-LD: is included in Drupal 8 core Creates a description of the data as a “context” data structure Context: links object properties to concepts in an ontology Allows for values to be coerced to a certain set or languageWednesday, January 23, 13
  16. 16. How it works { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://www.barackobama.com", "@type": "person", "name": "Barack Obama", "homepage": "http://www.whitehouse.gov/" }Wednesday, January 23, 13
  17. 17. How it works { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://www.barackobama.com", "@type": "person", "name": "Barack Obama", "homepage": "http://www.whitehouse.gov/" } FOAF: “Friend of a friend” - RDF ontology describing people, their relations and activitiesWednesday, January 23, 13
  18. 18. { "@context": { (...) "foaf": "http://xmlns.com/foaf/0.1/", (...) "@subject": [ { "@subject": "http://dbpedia.org/resource/Barack_Obama", "@type": [ "dbp-ont:OfficeHolder", "dbp-ont:Person", "foaf:Person", "owl:Thing" ], (...) "foaf:depiction": [ "http://upload.wikimedia.org/wikipedia/en/e/e9/ Official_portrait_of_Barack_Obama.jpg", "http://upload.wikimedia.org/wikipedia/en/thumb/e/e9/ Official_portrait_of_Barack_Obama.jpg/200px-Official_portrait_of_Barack_Obama.jpg" ], "foaf:homepage": [ "http://www.whitehouse.gov/", "http://www.barackobama.com/" ],Wednesday, January 23, 13
  19. 19. How it works Source: blog.iks-project.euWednesday, January 23, 13
  20. 20. How it works On Drupal side we only have to parse the response Map JSON-LD properties to entity fields Use Drupal’s native RDFa capability to render semantic markup Use your imagination and build semantic contentWednesday, January 23, 13
  21. 21. Quick demo Semantic CMS - Evo42 communications, early adopter integration of Drupal with Stanbol Rene Kapusta - https://github.com/evo42/Semantic- CMS Drupal contributor, Aloha Editor core developerWednesday, January 23, 13

×