Drupal and Apache Stanbol
LINKED DATA BASED SEMANTIC ANNOTATION
Gabriel Dragomir
Sunday, August 18, 13
The Semantic Web
Tim Berners Lee:
‘‘The first step is putting data on the Web in a form
that machines can naturally understand, or
converting it to that form. This creates what I call a
Semantic Web – a Web of data that can be
processed directly or indirectly by machines.’’
Sunday, August 18, 13
What’s the hype?
Most organizations need to organize/analyze/relate
huge amounts of textual, unstructured, dissipated data
Examples:
keyword extraction from content: annotate abstracts
text categorization: organize big volumes of text based
on a thesaurus
media monitoring of tags: occurences of a specific
keyword on social media channels
Sunday, August 18, 13
Linked data
http://lod-cloud.net/
Sunday, August 18, 13
Linked data
Project started in 2007
Aimed at building the Web of Data by:
identifying open access data sets
converting them into RDF vocabularies
publish them as open access data sets
Sunday, August 18, 13
Linked data ecosystem
Linked Open Vocabularies (LOV): http://lov.okfn.org/
dataset/lov/
Provides a conceptual map of the vocabularies
Various providers: libraries, governmental actors, NGOs
Sunday, August 18, 13
Linked data ecosystem
Where to find other data sets?
http://www.w3.org/2001/sw/wiki/SKOS/Datasets
Swoogle: http://swoogle.umbc.edu/
PoolParty: http://vocabulary.semantic-web.at
Sunday, August 18, 13
Linked data at work!
Sunday, August 18, 13
Semantic annotation
Creates specific metadata that enable new ways to
retrieve and aggregate information
Annotations are done based on a conceptual scheme,
an ontology (ex. FOAF, DC Core)
For more on ontologies see: http://www.w3.org/wiki/
Good_Ontologies
The annotations build semantic relationships: e.g.
rdf:type, owl:sameAs
Sunday, August 18, 13
Semantic annotation
Most common uses:
Named Entity Linking: limited recognizing entities of
type person, organization, place (e.g. OpenCalais)
Entityhub Linking: annotation based on vocabularies
with no limitations of entity types. Requires more
natural language processing prior to annotation.
Sunday, August 18, 13
Apache Stanbol on the fly
Here comes Apache Stanbol
A new approach:
modular semantic analysis of documents
processing components can be built for virtually any
language
flexible workflows via semantic annotation chains
any vocabulary (Linked Data, custom) can be used
Sunday, August 18, 13
From IKS to Apache Stanbol
IKS - Interactive Knowledge Stack for small to medium
CMS providers - EU funded consortium
An open source software stack written in Java
Goal: extract and process semantic data from
documents
Project undergoing incubation at Apache Foundation
http://stanbol.apache.org
Sunday, August 18, 13
Service oriented architecture
Stanbol is designed to offer service oriented integration
RESTful web services API returning RDF or JSON/
JSON-LD
Each component exposes an endpoint independently
Open Services Gateway initiative compliant (OSGi) via
Apache Felix and Apache Sling
Remote component management
Sunday, August 18, 13
Implementation
OSGi layer: Apache Felix and Apache Sling
Build environment: Apache Maven
RDF framework: Apache Clerezza
Triples store, reasoning engine: Apache Jena
Indexing and semantic search: Apache Solr
Content analysis/metadata extraction: Apache Tika
Natural language processing: Apache OpenNLP
Sunday, August 18, 13
Architecture
Sunday, August 18, 13
Components
Semantic layer:
Enhancer, EntityHub, ContentHub
Enhancement engines: internal, 3rd party
User interfaces
Knowledge integration (rule sets, reasoners)
Storage integration
Sunday, August 18, 13
Content enhancement
Examples:
retrieve additional metadata for a piece of content
identify the language of a text
extract entities (persons, places, organizations)
create annotations to external sources
use 3rd party services for named entities recognition
Sunday, August 18, 13
Drupal meets Stanbol
Several modules implement RDF support allowing data
transport to Stanbol semantic annotations
Taxonomy system allows for complex annotation
Fieldable taxonomy terms allow for storage of complex
semantic data
Sunday, August 18, 13
User scenarios
Semantic indexing via Stanbol (SOLR yard)
Content enrichment with semantically related
information (documents, factual data, images etc.)
Tag as you type: dynamic annotation of text in editors
Sunday, August 18, 13
How it works
POST request sends content via REST API
content is processed by an enhancement chain
Returns JSON-LD, RDF/XML, RDF/JSON etc
JSON-LD - JavaScript Object Notation for Linked Data
a human readable and simple linked data transport
format
for best results an enancement chain should do
language detection, tokenization, POS Tagging prior to
performing semantic annotation
http://drupalaton.jelastic.dogado.eu/stanbol/enhancer
Sunday, August 18, 13
Drupal integration
Source: blog.iks-project.eu
Sunday, August 18, 13
Drupal distribution: IKS CE
IKS CE distribution - Wolfgang Ziegler (fago),
Stéphane Corlosquet (scor)
Components:
Search API Stanbol
VIE.js - semantic annotation UI
https://drupal.org/project/iksce
http://drupal.org/project/vie
http://drupal.org/project/search_api_stanbol
Sunday, August 18, 13
Search API Stanbol
enables the indexing of Drupal entities such as nodes,
users, taxonomy terms, files, etc. in Stanbol EntityHub.
data sent as RDF
data can be mashed up with data from other sources
(Managed Sites, Remote Sites)
Sunday, August 18, 13
VIE.js
“Vienna IKS Editables”
JavaScript library for implementing decoupled Content
Management Systems and semantic interaction in web
applications.
Sunday, August 18, 13
Monolitic vs Decoupled
Content Management
Monolitic vs Decoupled Content Management Systems
source: Henri Bergius - http://bergie.iki.fi
Sunday, August 18, 13
Demo setup
we store Drupal entities in a SOLR index
annotations are to be made based on:
DBPedia - bundled with Apache Stanbol
a custom vocabulary of terms related to semantic
web - Social Semantic Web Thesaurus
SemWeb is imported as a SOLR index into Apache
Stanbol
Sunday, August 18, 13
Custom vocabularies
Social Semantic Web Thesaurus
1959 concepts related to semantic web
Author: Andreas Blumauer
http://vocabulary.semantic-web.at/semweb.html
http://vocabulary.semantic-web.at/semweb/8.visual
Sunday, August 18, 13
Demo
index Drupal entities in Apache Stanbol
retrieve annotated entites via REST API
annotate entities using dbpedia and semweb indexes
edit Drupal entities and annotate on the fly
retrieve linked data tag recommendations
Sunday, August 18, 13
Questions?
Sunday, August 18, 13
Contact me
gabriel.dragomir@webikon.com
twitter: gabidrg
Sunday, August 18, 13
Thank you!
Sunday, August 18, 13
http://mures2013.drupalcamp.ro
Sunday, August 18, 13

Linked data based semantic annotation using Drupal and Apache Stanbol

  • 1.
    Drupal and ApacheStanbol LINKED DATA BASED SEMANTIC ANNOTATION Gabriel Dragomir Sunday, August 18, 13
  • 2.
    The Semantic Web TimBerners Lee: ‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’ Sunday, August 18, 13
  • 3.
    What’s the hype? Mostorganizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data Examples: keyword extraction from content: annotate abstracts text categorization: organize big volumes of text based on a thesaurus media monitoring of tags: occurences of a specific keyword on social media channels Sunday, August 18, 13
  • 4.
  • 5.
    Linked data Project startedin 2007 Aimed at building the Web of Data by: identifying open access data sets converting them into RDF vocabularies publish them as open access data sets Sunday, August 18, 13
  • 6.
    Linked data ecosystem LinkedOpen Vocabularies (LOV): http://lov.okfn.org/ dataset/lov/ Provides a conceptual map of the vocabularies Various providers: libraries, governmental actors, NGOs Sunday, August 18, 13
  • 7.
    Linked data ecosystem Whereto find other data sets? http://www.w3.org/2001/sw/wiki/SKOS/Datasets Swoogle: http://swoogle.umbc.edu/ PoolParty: http://vocabulary.semantic-web.at Sunday, August 18, 13
  • 8.
    Linked data atwork! Sunday, August 18, 13
  • 9.
    Semantic annotation Creates specificmetadata that enable new ways to retrieve and aggregate information Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core) For more on ontologies see: http://www.w3.org/wiki/ Good_Ontologies The annotations build semantic relationships: e.g. rdf:type, owl:sameAs Sunday, August 18, 13
  • 10.
    Semantic annotation Most commonuses: Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais) Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation. Sunday, August 18, 13
  • 11.
    Apache Stanbol onthe fly Here comes Apache Stanbol A new approach: modular semantic analysis of documents processing components can be built for virtually any language flexible workflows via semantic annotation chains any vocabulary (Linked Data, custom) can be used Sunday, August 18, 13
  • 12.
    From IKS toApache Stanbol IKS - Interactive Knowledge Stack for small to medium CMS providers - EU funded consortium An open source software stack written in Java Goal: extract and process semantic data from documents Project undergoing incubation at Apache Foundation http://stanbol.apache.org Sunday, August 18, 13
  • 13.
    Service oriented architecture Stanbolis designed to offer service oriented integration RESTful web services API returning RDF or JSON/ JSON-LD Each component exposes an endpoint independently Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling Remote component management Sunday, August 18, 13
  • 14.
    Implementation OSGi layer: ApacheFelix and Apache Sling Build environment: Apache Maven RDF framework: Apache Clerezza Triples store, reasoning engine: Apache Jena Indexing and semantic search: Apache Solr Content analysis/metadata extraction: Apache Tika Natural language processing: Apache OpenNLP Sunday, August 18, 13
  • 15.
  • 16.
    Components Semantic layer: Enhancer, EntityHub,ContentHub Enhancement engines: internal, 3rd party User interfaces Knowledge integration (rule sets, reasoners) Storage integration Sunday, August 18, 13
  • 17.
    Content enhancement Examples: retrieve additionalmetadata for a piece of content identify the language of a text extract entities (persons, places, organizations) create annotations to external sources use 3rd party services for named entities recognition Sunday, August 18, 13
  • 18.
    Drupal meets Stanbol Severalmodules implement RDF support allowing data transport to Stanbol semantic annotations Taxonomy system allows for complex annotation Fieldable taxonomy terms allow for storage of complex semantic data Sunday, August 18, 13
  • 19.
    User scenarios Semantic indexingvia Stanbol (SOLR yard) Content enrichment with semantically related information (documents, factual data, images etc.) Tag as you type: dynamic annotation of text in editors Sunday, August 18, 13
  • 20.
    How it works POSTrequest sends content via REST API content is processed by an enhancement chain Returns JSON-LD, RDF/XML, RDF/JSON etc JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation http://drupalaton.jelastic.dogado.eu/stanbol/enhancer Sunday, August 18, 13
  • 21.
  • 22.
    Drupal distribution: IKSCE IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor) Components: Search API Stanbol VIE.js - semantic annotation UI https://drupal.org/project/iksce http://drupal.org/project/vie http://drupal.org/project/search_api_stanbol Sunday, August 18, 13
  • 23.
    Search API Stanbol enablesthe indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub. data sent as RDF data can be mashed up with data from other sources (Managed Sites, Remote Sites) Sunday, August 18, 13
  • 24.
    VIE.js “Vienna IKS Editables” JavaScriptlibrary for implementing decoupled Content Management Systems and semantic interaction in web applications. Sunday, August 18, 13
  • 25.
    Monolitic vs Decoupled ContentManagement Monolitic vs Decoupled Content Management Systems source: Henri Bergius - http://bergie.iki.fi Sunday, August 18, 13
  • 26.
    Demo setup we storeDrupal entities in a SOLR index annotations are to be made based on: DBPedia - bundled with Apache Stanbol a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus SemWeb is imported as a SOLR index into Apache Stanbol Sunday, August 18, 13
  • 27.
    Custom vocabularies Social SemanticWeb Thesaurus 1959 concepts related to semantic web Author: Andreas Blumauer http://vocabulary.semantic-web.at/semweb.html http://vocabulary.semantic-web.at/semweb/8.visual Sunday, August 18, 13
  • 28.
    Demo index Drupal entitiesin Apache Stanbol retrieve annotated entites via REST API annotate entities using dbpedia and semweb indexes edit Drupal entities and annotate on the fly retrieve linked data tag recommendations Sunday, August 18, 13
  • 29.
  • 30.
  • 31.
  • 32.