Gabriel Dragomir

Drupal and Apache
Stanbol
SEMANTIC ANNOTATION WITH CUSTOM
VOCABULARIES
About me

• Drupal developer, trainer and consultant
• Founding member of Drupal Romania
Association
The Semantic Web
• Tim Berners Lee:
‘‘The first step is putting data on the
Web in a form that machines can
naturally under...
What’s the hype?
• Most organizations need to organize/analyze/

relate huge amounts of textual, unstructured,
dissipated ...
Linked data

http://lod-cloud.net/
Linked data
• Project started in 2007
• Aimed at building the Web of Data by:
• identifying open access data sets
• conver...
Linked data ecosystem
• Linked Open Vocabularies (LOV):
http://lov.okfn.org/dataset/lov/

• Provides a conceptual map of t...
Linked data ecosystem
• Where to find other data sets?
• http://www.w3.org/2001/sw/wiki/
SKOS/Datasets

• Swoogle: http://s...
Linked data at work!
Semantic annotation
• Creates specific metadata that enable
new ways to retrieve and aggregate
information

• Annotations a...
Semantic annotation
• Most common uses:
• Named Entity Linking: limited

recognizing entities of type person,
organization...
Apache Stanbol on the fly
• Here comes Apache Stanbol
• A new approach:
• modular semantic analysis of documents
• processi...
Service oriented
architecture
• Stanbol is designed to offer service oriented
integration

• RESTful web services API retu...
Implementation
• OSGi layer: Apache Felix and Apache Sling
• Build environment: Apache Maven
• RDF framework: Apache Clere...
Architecture
Components
• Semantic layer:
• Enhancer, EntityHub, ContentHub
• Enhancement engines: internal, 3rd party
• User interface...
Content enhancement
• Examples:
• retrieve additional metadata for a piece of
content

• identify the language of a text
•...
Drupal meets Stanbol
• Several modules implement RDF

support allowing data transport to
Stanbol semantic annotations

• T...
User scenarios
• Semantic indexing via Stanbol (SOLR
yard)

• Content enrichment with semantically
related information (do...
How it works
• POST request sends content via REST API
• content is processed by an enhancement chain
• Returns JSON-LD, R...
Drupal integration

Source: blog.iks-project.eu
Drupal distribution: IKS
CE
• IKS CE distribution - Wolfgang Ziegler (fago),
Stéphane Corlosquet (scor)

• Components:
• S...
Search API Stanbol
• enables the indexing of Drupal

entities such as nodes, users,
taxonomy terms, files, etc. in Stanbol
...
VIE.js
• “Vienna IKS Editables”
• JavaScript library for

implementing decoupled Content
Management Systems and semantic
i...
Monolitic vs Decoupled
Content Management Systems
• Monolitic vs Decoupled Content
Management Systems

source: Henri Bergi...
Demo setup
• we store Drupal entities in a SOLR index
• annotations are to be made based on:
• DBPedia - bundled with Apac...
Custom vocabularies
• PoolParty Semantic Web
• 224 concepts related to semantic web
• Author: Andreas Blumauer
• http://vo...
Demo
• index Drupal entities in Apache Stanbol
• retrieve annotated entites via REST API
• annotate entities using dbpedia...
Questions?
Contact me

• gabriel.dragomir@webikon.com
• twitter: gabidrg
Thank you!
Upcoming SlideShare
Loading in …5
×

Drupal and Apache Stanbol

994 views

Published on

l SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
994
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
33
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Drupal and Apache Stanbol

  1. 1. Gabriel Dragomir Drupal and Apache Stanbol SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES
  2. 2. About me • Drupal developer, trainer and consultant • Founding member of Drupal Romania Association
  3. 3. The Semantic Web • Tim Berners Lee: ‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’
  4. 4. What’s the hype? • Most organizations need to organize/analyze/ relate huge amounts of textual, unstructured, dissipated data • Examples: • keyword extraction from content: annotate abstracts • text categorization: organize big volumes of text based on a thesaurus • media monitoring of tags: occurences of a specific keyword on social media channels
  5. 5. Linked data http://lod-cloud.net/
  6. 6. Linked data • Project started in 2007 • Aimed at building the Web of Data by: • identifying open access data sets • converting them into RDF vocabularies • publish them as open access data sets
  7. 7. Linked data ecosystem • Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/ • Provides a conceptual map of the vocabularies • Various providers: libraries, governmental actors, NGOs
  8. 8. Linked data ecosystem • Where to find other data sets? • http://www.w3.org/2001/sw/wiki/ SKOS/Datasets • Swoogle: http://swoogle.umbc.edu/ • PoolParty: http:// vocabulary.semantic-web.at
  9. 9. Linked data at work!
  10. 10. Semantic annotation • Creates specific metadata that enable new ways to retrieve and aggregate information • Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core) • For more on ontologies see: http:// www.w3.org/wiki/Good_Ontologies • The annotations build semantic
  11. 11. Semantic annotation • Most common uses: • Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais) • Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.
  12. 12. Apache Stanbol on the fly • Here comes Apache Stanbol • A new approach: • modular semantic analysis of documents • processing components can be built for virtually any language • flexible workflows via semantic annotation chains • any vocabulary (Linked Data, custom) can be used
  13. 13. Service oriented architecture • Stanbol is designed to offer service oriented integration • RESTful web services API returning RDF or JSON/JSON-LD • Each component exposes an endpoint independently • Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling • Remote component management
  14. 14. Implementation • OSGi layer: Apache Felix and Apache Sling • Build environment: Apache Maven • RDF framework: Apache Clerezza • Triples store, reasoning engine: Apache Jena • Indexing and semantic search: Apache Solr • Content analysis/metadata extraction: Apache Tika • Natural language processing: Apache OpenNLP
  15. 15. Architecture
  16. 16. Components • Semantic layer: • Enhancer, EntityHub, ContentHub • Enhancement engines: internal, 3rd party • User interfaces • Knowledge integration (rule sets, reasoners) • Storage integration
  17. 17. Content enhancement • Examples: • retrieve additional metadata for a piece of content • identify the language of a text • extract entities (persons, places, organizations) • create annotations to external sources • use 3rd party services for named entities recognition
  18. 18. Drupal meets Stanbol • Several modules implement RDF support allowing data transport to Stanbol semantic annotations • Taxonomy system allows for complex annotation • Fieldable taxonomy terms allow for storage of complex semantic data
  19. 19. User scenarios • Semantic indexing via Stanbol (SOLR yard) • Content enrichment with semantically related information (documents, factual data, images etc.) • Tag as you type: dynamic annotation of text in editors
  20. 20. How it works • POST request sends content via REST API • content is processed by an enhancement chain • Returns JSON-LD, RDF/XML, RDF/JSON etc JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format • for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation • http://stanbol-yle.jelastic.planeetta.net/demo/ enhancer
  21. 21. Drupal integration Source: blog.iks-project.eu
  22. 22. Drupal distribution: IKS CE • IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor) • Components: • Search API Stanbol • VIE.js - semantic annotation UI • https://drupal.org/project/iksce • http://drupal.org/project/vie • http://drupal.org/project/search_api_stanbol • https://github.com/fago/stanbol-for-drupal
  23. 23. Search API Stanbol • enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub. • data sent as RDF • data can be mashed up with data from other sources (Managed Sites, Remote Sites)
  24. 24. VIE.js • “Vienna IKS Editables” • JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.
  25. 25. Monolitic vs Decoupled Content Management Systems • Monolitic vs Decoupled Content Management Systems source: Henri Bergius - http://bergie.iki.fi
  26. 26. Demo setup • we store Drupal entities in a SOLR index • annotations are to be made based on: • DBPedia - bundled with Apache Stanbol • a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus • SemWeb is imported as a SOLR index into Apache Stanbol
  27. 27. Custom vocabularies • PoolParty Semantic Web • 224 concepts related to semantic web • Author: Andreas Blumauer • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb.html • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb/Drupal.html
  28. 28. Demo • index Drupal entities in Apache Stanbol • retrieve annotated entites via REST API • annotate entities using dbpedia and semweb indexes • edit Drupal entities and annotate on the fly • retrieve linked data tag recommendations
  29. 29. Questions?
  30. 30. Contact me • gabriel.dragomir@webikon.com • twitter: gabidrg
  31. 31. Thank you!

×