This slide deck has been prepared for a workshop on Linked Data Publishing and Semantic Processing using the Redlink platform (http://redlink.co). The workshop delivered at the Department of Information Engineering, Computer Science and Mathematics at Università degli Studi dell'Aquila aimed at providing a general understanding of Semantic Web Technologies and how these can be used in real world use cases such as Salzburgerland Tourismus.
A brief introduction has been also included on MICO (Media in Context) a European Union part-funded research project to provide cross-media analysis solutions for online multimedia producers.
A framework for knowledge extraction, linked data and semantic search.
What do we want computers to do for us?
We have data.
• From 2005 to 2020, the digital universe will grow in
size by a factor of 300, from 30 exabytes to 40 trillion
gigabyte (40 ZB).
• From now until 2020, the digital universe will about
double every two years.
• Volumes of data are projected to reach 5.247 GB per
person with emerging economies playing an
increasingly important role (producing two thirds of
the world data by the end of this decade).
• Only 0.5% of this data is used today for analysis.
• The amount of information individuals create
themselves - writing documents, taking pictures,
recording audio - is far less than the information
being created about them in the digital universe.
[IDC I V I E W, 2012]
What do we want computers to do for us?
[Andrew NG, 2011]We want computers to process data.
We use it everyday.
[J U RAFSKY & MARTIN, 2008]
a theoretically motivated range of
computational techniques for
analysing naturally occurring
text/speech for the purpose of
achieving human-like language
Features extraction in text/speech.
Levels of knowledge encoding in language data.
What do we want computers to do with a text?
We want computers to make sense of unstructured data.
A practical example.
Combining Semantic Web technologies with NLP technologies.
About 20 minutes
car drive from L’Aquila.
How we started.
Building an open platform for
knowledge extraction, linked data
and semantic search.
Delivering the world’s most
advanced open source
content analysis and making
linked data publishing and
information discovery accessible
• Incorporating requirements from industry partners:
• CMS companies
• System integrators
• Tool providers
• Inheriting 6 years of IP with R&D on:
• Semantic Information Management and
Publishing (RDF and Semantic Web Technology)
• Semantic Processing
• Conceptual Search
LINKED DATA PUBLISHING
Linked Data Cloud
• Semantic Search
• Semantic enhancement process chaining
• Multiple NLP features extraction facilities
• Multiple language support
• Content classiﬁcation and sentiment analysis
• Graduated as Top Level Project of the Apache
Foundation in September 2012
A Toolbox for Semantic Processing.
The Highly Scalable Search Server.
• Based on Apache Lucene
• Various language speciﬁc processing procedures
• Highly scalable (Solr cloud) and highly conﬁgurable
• Ultra fast indexing/searching, indexes can be merged/
• Semantic Search available with an easy-to-install
Adding Semantic Search to Apache Solr.
• Boost your existing Apache Solr installation with
semantic enhancements via Redlink Content Analysis
• Watch the screencast
• Learn more• Customising the semantic enhancements
with user-created vocabularies and Redlink NLP extraction
• Build your ﬁrst app
• Learn more
• Redlink allows users to create their own Linked Data server for
managing vocabularies or publishing datasets for Linked (Open)
• Datasets managed with Redlink can
be made available for content
analysis and linking
• Datasets can be either private (Linked
Enterprise Data) or public (Linked
• Public Datasets such as DBpedia, Freebase and
GeoNames are available for de-referencing and interlinking
• Read-Write Linked Data
• Triple store with transactions, versioning
and rule-based reasoning
• SPARQL and LDPath query languages
• Transparent Linked Data Caching
• Graduated as Top Level Project of the Apache
Foundation in November 2013
The Open Platform for Linked Data.
An Open Linked Data Project
for Tourism in Salzburg
• Cross platform publishing as more travellers massively begin
using mobile devices
• Multiple Web CMSs (both proprietary and open source) to be
• Costly manual curation and interlinking
• Increasing demand for content syndication (from big players like
foursquare as well as from local application developers)
• Need for better SEO especially for events and sites (too regional to
be understood by commercial search engines)
Remixing existing content and creating new value.
running on WordPress
freshly updated content
on locations and events
a database containing:
events, facilities, accommodations, …
Everything we know already
the World’s largest
Using Linked Data to make sense of the information
Linked Data Publishing
• Data from the online booking system (Feratel) is enriched and transformed
in triples using identiﬁed vocabularies and ontologies
• Triples are stored in the Redlink triple store in a dedicated context
• RDF data and SPARQL end-points are published to the data website
(data.salzburgerland.com) running CKAN as Linked Open Data
• CKAN makes the data accessibile to third parties in various formats by
Transforming Feratel Data
in Semantic Knowledge
from SOAP to Linked Data
Ontologies provide a mean
to hold everything together
Data Modelling with LODE
Using LODE: An ontology for
Linking Open Descriptions of
Adding the relationships
with RDF different data sources are integrated to provide
robot-friendly information that describe real world things
Semantic Lifting and
Linked Data Principles
• A “word” or “phrase” becomes an
identiﬁer used to denote
“things” (named entities) existing in
the real world
1.Real-world thing are
unambiguously represented with
web addresses (URI)
2.By accessing these web addresses
(HTTP-URI) usable data is sent in
return using standard formats (RDF,
3.This data includes links to other
data so that people can discover
LANGUAGE EVENT THING LOCATION
ENGLISH FLORIANIFEIER MAY UNTERNBERG
[Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe]
“This May don't miss the
Florianifeier, we'll have fun
as usual in Unternberg”
Dynamic Semantic Publishing with ordLiftW
• Data from the Redlink triple store is made available for content enrichment
and can be edited using WordLift, a semantic plugin for WordPress.
• Using Linked Data the Web
becomes my new CMS
• information is automatically
imported in WordPress
• posts are connected with
• properties for each entity can
be edited using WordPress
• any change is automatically
reﬂected in the triple-store and
re-published as Open Data
Using Linked Data and WordLift the Web becomes your new CMS.
editing a blog post
editing an entity
Touristic applications attempting to discover events in Salzburgerland.
“Which events occur in May in Lungau?”
Linked Open Data
Unternberg is a village in the area of Lungauon google.at!!
Better SEO using
• Using schema.org the data
from the triple-store is added
to the pages as semantic
• Search engines can ﬁnally
“recognise” entities that were
previously unknown (i.e.
•Media in cross-media context, allowing to
analyse media resources as well as
connected content, including video, images,
audio, text, link structure and metadata;
•Investigate cross-media analysis along the
complete, distributed analysis chain, namely
extraction, metadata publishing, querying
•Contribute its main software development
results as Open Source components to two
established Apache projects, Apache
Marmotta and Apache Stanbol, simplifying
the use of the technology in industrial
What do we want computers to do with Media?
“Show me the tempo-regional fragments where
Lewis Jones is right beside Connor Macfarlane?”
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mm: <http://linkedmultimedia.org/sparql-
PREFIX ma: <http://www.w3.org/ns/ma-ont#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT (mm:boundingBox(?l1,?l2) AS ?left_right)
?f1 ma:locator ?l1; dct:subject ?p1.
?p1 foaf:name "Lewis Jones".
?f2 ma:locator ?l2; dct:subject ?p2.
?p2 foaf:name "Connor Macfarlane".
We want computers to process media.
soon in the
ANDREW NG, 2011
J U RAFSKY & MARTIN, 2008
Webscale IA using Linked Open Data on slideshare by reduxd
LODE linking open descriptions of events aswc 2009 on
slideshare by Raphael Troncy
Semantic SEO in the post-Hummingbird era on slideshare by Kim
Renberg and Andrea Volpini
Querying of metadata, media content and context in MICO a
demo by Thomas Kurz
this presentation is the result of many inspiring ideas and amazing work from
other people and here is the list:
any idea, graphics or meme belonging to us is available
for sharing, copying and re-mixing under
creative commons license 3.0