Drupal and Apache Stanbol


Published on


Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Drupal and Apache Stanbol

  1. 1. Gabriel Dragomir Drupal and Apache Stanbol SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES
  2. 2. About me • Drupal developer, trainer and consultant • Founding member of Drupal Romania Association
  3. 3. The Semantic Web • Tim Berners Lee: ‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’
  4. 4. What’s the hype? • Most organizations need to organize/analyze/ relate huge amounts of textual, unstructured, dissipated data • Examples: • keyword extraction from content: annotate abstracts • text categorization: organize big volumes of text based on a thesaurus • media monitoring of tags: occurences of a specific keyword on social media channels
  5. 5. Linked data http://lod-cloud.net/
  6. 6. Linked data • Project started in 2007 • Aimed at building the Web of Data by: • identifying open access data sets • converting them into RDF vocabularies • publish them as open access data sets
  7. 7. Linked data ecosystem • Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/ • Provides a conceptual map of the vocabularies • Various providers: libraries, governmental actors, NGOs
  8. 8. Linked data ecosystem • Where to find other data sets? • http://www.w3.org/2001/sw/wiki/ SKOS/Datasets • Swoogle: http://swoogle.umbc.edu/ • PoolParty: http:// vocabulary.semantic-web.at
  9. 9. Linked data at work!
  10. 10. Semantic annotation • Creates specific metadata that enable new ways to retrieve and aggregate information • Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core) • For more on ontologies see: http:// www.w3.org/wiki/Good_Ontologies • The annotations build semantic
  11. 11. Semantic annotation • Most common uses: • Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais) • Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.
  12. 12. Apache Stanbol on the fly • Here comes Apache Stanbol • A new approach: • modular semantic analysis of documents • processing components can be built for virtually any language • flexible workflows via semantic annotation chains • any vocabulary (Linked Data, custom) can be used
  13. 13. Service oriented architecture • Stanbol is designed to offer service oriented integration • RESTful web services API returning RDF or JSON/JSON-LD • Each component exposes an endpoint independently • Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling • Remote component management
  14. 14. Implementation • OSGi layer: Apache Felix and Apache Sling • Build environment: Apache Maven • RDF framework: Apache Clerezza • Triples store, reasoning engine: Apache Jena • Indexing and semantic search: Apache Solr • Content analysis/metadata extraction: Apache Tika • Natural language processing: Apache OpenNLP
  15. 15. Architecture
  16. 16. Components • Semantic layer: • Enhancer, EntityHub, ContentHub • Enhancement engines: internal, 3rd party • User interfaces • Knowledge integration (rule sets, reasoners) • Storage integration
  17. 17. Content enhancement • Examples: • retrieve additional metadata for a piece of content • identify the language of a text • extract entities (persons, places, organizations) • create annotations to external sources • use 3rd party services for named entities recognition
  18. 18. Drupal meets Stanbol • Several modules implement RDF support allowing data transport to Stanbol semantic annotations • Taxonomy system allows for complex annotation • Fieldable taxonomy terms allow for storage of complex semantic data
  19. 19. User scenarios • Semantic indexing via Stanbol (SOLR yard) • Content enrichment with semantically related information (documents, factual data, images etc.) • Tag as you type: dynamic annotation of text in editors
  20. 20. How it works • POST request sends content via REST API • content is processed by an enhancement chain • Returns JSON-LD, RDF/XML, RDF/JSON etc JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format • for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation • http://stanbol-yle.jelastic.planeetta.net/demo/ enhancer
  21. 21. Drupal integration Source: blog.iks-project.eu
  22. 22. Drupal distribution: IKS CE • IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor) • Components: • Search API Stanbol • VIE.js - semantic annotation UI • https://drupal.org/project/iksce • http://drupal.org/project/vie • http://drupal.org/project/search_api_stanbol • https://github.com/fago/stanbol-for-drupal
  23. 23. Search API Stanbol • enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub. • data sent as RDF • data can be mashed up with data from other sources (Managed Sites, Remote Sites)
  24. 24. VIE.js • “Vienna IKS Editables” • JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.
  25. 25. Monolitic vs Decoupled Content Management Systems • Monolitic vs Decoupled Content Management Systems source: Henri Bergius - http://bergie.iki.fi
  26. 26. Demo setup • we store Drupal entities in a SOLR index • annotations are to be made based on: • DBPedia - bundled with Apache Stanbol • a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus • SemWeb is imported as a SOLR index into Apache Stanbol
  27. 27. Custom vocabularies • PoolParty Semantic Web • 224 concepts related to semantic web • Author: Andreas Blumauer • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb.html • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb/Drupal.html
  28. 28. Demo • index Drupal entities in Apache Stanbol • retrieve annotated entites via REST API • annotate entities using dbpedia and semweb indexes • edit Drupal entities and annotate on the fly • retrieve linked data tag recommendations
  29. 29. Questions?
  30. 30. Contact me • gabriel.dragomir@webikon.com • twitter: gabidrg
  31. 31. Thank you!