From Linked Data to Semantic Applications

 The Semantic Web vision & Linked Data
 Multi-disciplinary perspective
 Linked Data, IR, NLP
 Case study: Treo
 Talking to the Linked Data Web
 Semantic application patterns
 Take-away message

2001:

 Software which is able to
understand meaning
(intelligent, flexible)

 Leveraging the Web for
information scale

 What was the plan to
achieve it?

 Build a Semantic Web
Stack

 Which covers both
representation and
reasoning

 Adoption:
 No significant data
growth
 Ontologies are not
straightforward to
build:
 People are not
familiriazed with the
tools and principles
 Difficult to keep
consistency at Web scale
 Scalability

 Problems:
 Consistecy
 Scalability

Logic World Web World

2006:

 The Web as a Huge Database
 Fundamental step for data
creation

 Where is the intelligence and
flexibility?
 We will be back to this point
in a minute

 Data Model Features:
 Graph-based data model
 Extensible schema
 Entity-centric data integration

 Specific Features:
 Designed over open Web standards
 Based on the Web infrastructure (HTTP, URIs)

 Positives:
 Solidadoption in the Open Data context
(eGovernment, eScience, etc,...)
 Existing data is relevant (you can build real
applications)
 Negatives:
 Data consumption is a problem
 Datageneration beyond databases
mapping/triplification is also a problem
 Still far from the Semantic Web vision

 How to address the previous challenges?

 Linked Data:
 Web-scale structured data representation
 Information Retrieval:
 Search, approximation, ranking strategies
 Scalability
 Natural Language Processing (NLP):
 Analysing natural language
 Semantic approximation (distributional semantics)

 With Linked Data we are still in the DB world

From which university did the wife of
Barack Obama graduate?

 With Linked Data we are still in the DB world
 (but slightly worse)

From which university did the wife of Barack Obama graduate?

): Direction, path

Demonstration

 Transform natural language queries into triple patterns
 Steps:
 Entity Recognition
“From which university did the wife of Barack Obama graduate?”
 Dependency parsing
 Query Pattern detection prep(graduate-10, From-1) From/IN
det(university-3, which-2) which/WDT
 Query Planning pobj(From-1, university-3) university/NN
aux(graduate-10, did-4) did/VBD
det(wife-6, the-5) the/DT
nsubj(graduate-10, wife-6) wife/NN
prep(wife-6, of-7) of/IN
nn(Obama-9, Barack-8) Barack/NNP
pobj(of-7, Obama-9) Obama/NNP
root(ROOT-0, graduate-10) graduate/VB
Using NLP ?/.

 Entity Search:
 Build an entity index (instances)
 Extract terms from URIs and index the terms using your
favourite IR framework
 Search instances by keywords

Using IR

Query

Linked Data
Web

Using IR

 Use distributional semantics to semantically match
query terms to predicates and classes
 Distributional principle: Words that co-occur together
tend to have related meaning
 Allows the creation of a comprehensive semantic model from
unstructured text
 Based on statistical patterns over large amounts of text
 No human annotations
 Distributional semantics can be used to compute a
semantic relatedness measure between two words
Using NLP
and IR

 Computation of a measure of “semantic proximity”
between two terms
 Allows a semantic approximate matching between
and
 It supports a reasoning-like behavior based on the
knowledge embedded in the corpus

Using NLP
and IR

Query

Which properties are
semantically related to ‘wife’?

Linked Data
Web

Using NLP
and IR

Query

Linked Data
Web

Using NLP
and IR

 Semantic approximation in databases (as in any IR
system): semantic best-effort
 Need some level of user disambiguation,
refinement and feedback
 As we move in the direction of semantic systems
we should expect the need for principled dialog
mechanisms (like in human communication)
 Pull the the user interaction back into the system

Using NLP
and IR

 Derived from the experience developing Treo

 Not restricted to queries over Linked Data

 The following list is not intended to be complete

 Pattern #1: Maximize the amount of knowledge in
your semantic application

 Meaning interpretation depends on knowledge

 Using LOD: DBpedia, Freebase, YAGO can give you
a very comprehensive set of instances and their
types

 Wikipedia can provide you a comprehensive
distributional semantic model

 Pattern #2: Allow your database to grow

 Dynamic schema

 Entity-centric data integration

 Pattern #3: Once the database grows in complexity
use semantic search instead of structured queries

 Instances can be used as pivot entities to reduce
the search space
 They are easier to search
 Higher specificity and lower vocabulary variation

 Pattern #4: Use distributional semantics and
semantic relatedness for a robust semantic
matching

 Distributional semantics allows your application to
digest (and make use of) large amounts of
unstructured information

 Multilingual solution

 Can be complemented with WordNet

 Pattern #5: POS-Tags, Syntactic Parsing + Rules will
go a long way to interpret natural language queries
and sentences
 Use them to explore the regularities in natural
language

 Define a scope for natural language processing in
your application (restrict by domain, syntactic
complexity)

 These tools are easy to use and quite robust (at
least for English)

 Pattern #6: Provide a user dialog mechanism in the
application

 Improve the semantic model with user feedback

 Part of the Semantic Web vision can be addressed
today with a multi-disciplinary perspective
 Linked Data, IR and NLP
 You can build your own IBM Watson-like application
 Both data and tools are available and ready to use:
the barrier is the mindset
 Large opportunity for new solutions

 NLP  Datasets
 WordNet  DBpedia
 VerbNet  Freebase
 Stanford parser  YAGO
 C&C parser/Boxer
 NLTK
 Tools that will be
 DBpedia Spotlight available soon:
 Gate  Treo
 UIMA  Treo-ESA
 IR  Graphia
 Lucene/Solr
 Terrier

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain,
. IEEE Internet
Computing, Special Issue on Internet-Scale Data, 2012.

André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain,
International Journal of Semantic Computing (IJSC),
2012.

André Freitas, Sean O'Riain, Edward Curry,
. 27th ACM Applied Computing Symposium, Semantic Web and Its
Applications Track, 2012.

André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da
Silva, In
Proceedings of the 16th International Conference on Applications of Natural Language to
Information Systems (NLDB) 2011.

André Freitas, Danilo S. Carvalho, João Carlos Pereira da Silva, Sean O'Riain, Edward Curry, A
Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia. In
Proceedings of the 1st Workshop on the Web of Linked Entities (WoLE 2012) at the 11th
International Semantic Web Conference (ISWC), 2012

andrefreitas.org

andre (dot) freitas – at – deri (dot) org

From Linked Data to Semantic Applications

More Related Content

What's hot

Viewers also liked

Similar to From Linked Data to Semantic Applications

More from Andre Freitas

Recently uploaded

From Linked Data to Semantic Applications