“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Wissenstechnologie Vi 08 09
1. Wissenstechnologie WS 08/09
Michael Granitzer
IWM TU Graz & Know-Center
Know Center
Lecture 6: T iple Sto es Sparql,
Lect e 6 Triple Stores, Spa ql
Semantic Retrieval
http://kmi.tugraz.at
http://kmi tugraz at http://www.know-center.at
http://www know center at
This work is licensed under the Creative Commons Attribution 2.0 Austria License.
To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/at/.
2. Today
Ontology Modelling & SW Frameworks
Triple Stores
• Basic RDBMS scheme
• Property tables & vertical Partitioning
• Performance Comparisons
SPARQL
• Definition
• Simplex & Complex Queries
• Some examples on Endpoints
vs.
Information Retrieval vs Semantic Retrieval
• Basics of IR
• „Semantic“ Retrieval
• Practical Examples (Freebase, Cugil etc.)
2
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
3. Ontology Modelling & SW
Frameworks
Ontology Modelling
Triple Stores
SPARQL
vs.
vs OOP Information Retrieval vs.
Semantic Retrieval
Similar to design in Object Oriented Programming
Classes, objects and members
Capture th operational properties
C t the ti l ti
public interface Course {
public void enroll()
bli id ll()
}
Ontology Modelling: Capture the structural properties
owl:participates owl:Course
owl:Student
3
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
4. Ontology Modelling & SW
Frameworks
Ontology Modelling
Triple Stores
SPARQL
vs.
vs RDBMS Information Retrieval vs.
Semantic Retrieval
Similar in designing a database system
Higher expressiveness in OWL Aggrement on the domain not
only referential integrity
Not focused on special indexing structures or on querying only
Ontologies should be application independent
Consistency checks
Semantic Integration via Ontologies
4
Product File System Employee Text ...
Database Database Database http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
5. Ontology Modelling & SW
Frameworks
Ontology Modelling
Triple Stores
SPARQL
Goals Information Retrieval vs.
Semantic Retrieval
Goals
Share common understanding among people or software
Enable reuse of knowledge
Make domain assumptions explicit
Separate domain knowledge from operational knowledge
Analyze domain knowledge
Main Application Areas
Semantic harmonization of heterogeneous data sources
Structuring the content of a portal
Enhance search and retrieval
5
…
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
6. Ontology Modelling & SW
Frameworks
Ontology Modelling
Triple Stores
SPARQL
Aspects to model Information Retrieval vs.
Semantic Retrieval
Defining classes in the ontology
Arranging classes in a taxonomy
g g y
Defining slots/properties for classes and their values
Define logical constraints on classes/properties
Assign instances
6
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
7. Ontology Modelling & SW
Frameworks
Ontology Modelling
Triple Stores
SPARQL
Three simple rules Information Retrieval vs.
Semantic Retrieval
1. There
1 “There is no one correct way to model a domain there
domain—
are always viable alternatives. The best solution almost
always depends on the application that you have in
mind and the extensions that you anticipate ”
anticipate.
2. “Concepts in the ontology should be close to objects
(physical or logical) and relationships in your domain of
interest. These are most likely to be nouns (objects) or
verbs (relationships) in sentences that describe your
domain.
domain ”
3. “Ontology development is necessarily an iterative
process
process”
7
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
8. Ontology Modelling & SW
Frameworks
Ontology Modelling
Triple Stores
SPARQL
Noy s
Noy‘s and McGunnise 7 Steps Information Retrieval vs.
Semantic Retrieval
1.
1 Determine the domain and scope of the ontology
2. Consider reusing existing ontologies
3 E
3. Enumerate i
t important terms in the ontology
t tt i th t l
4. Define the classes and the class hierarchy
5. Define the properties (slots) of classes
6. Define the facets of the slots
7. Create/Import instances
8
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
9. Ontology Modelling & SW
Frameworks
Semantic Web Frameworks
Triple Stores
SPARQL
Motivation Information Retrieval vs.
Semantic Retrieval
Protege as modelling GUI
For „Semantic Web Applications“ we want also to
A t
Automatically i
ti ll import/map i t
t/ instances
Manage large number of triples
Combine different schemas
Query for specific triples
Harmonize different metadata schemas
Database requirements for graphs
Reasoning
9
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
10. Ontology Modelling & SW
Frameworks
Semantic Web Frameworks
Triple Stores
SPARQL
Overview Information Retrieval vs.
Semantic Retrieval
Three „major Java based Open Source frameworks
„major“
Jena
Sesame
Protege Java API
Functionality
Java API for managing OWL, RDF and RDFS (optional DAML+OIL)
Import/Export of different formats
Persistence via own data store, different database and file system
backend
Querying, Graph manipulation and restricted reasoning capabilities
Web API
10
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
11. Ontology Modelling & SW
Frameworks
Semantic Web Frameworks
Triple Stores
SPARQL
Jena Architecture Information Retrieval vs.
Semantic Retrieval
SPARQL
RDF/XML
Jena: Implementing the
Semantic Web Recommendations – 2003
http://www.hpl.hp.com/techreports/2003/H
11
PL-2003-146.html
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
12. Ontology Modelling & SW
Frameworks
Semantic Web Frameworks
Triple Stores
SPARQL
Main Differences Information Retrieval vs.
Semantic Retrieval
Jena
Reference implementation
Not directly focused towards web access and scalability
Protege
Modelling GUI
Sesame
Focused towards remote access and scaleability
Flexible Layer architecture for different storage backends
Others: Virtuoso 3Store Kowari OpenAnzo
Virtuoso, 3Store, Kowari,
12
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
13. Ontology Modelling & SW
Frameworks
Triple Stores
Triple Stores
SPARQL
Overview
Information Retrieval vs.
Semantic Retrieval
Basic data model is RDF (i e OWL, RDFS)
(i.e. OWL
RDF forms an directed graph
How do we manage large graphs
In Memory Adjacency Matrix
On secondary storage
– Special Indices
– Use relational database management systems
13
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
14. Ontology Modelling & SW
Frameworks
Triple Stores Triple Stores
SPARQL
„Normalized Table Model of RDF
Normalized“
Information Retrieval vs.
Semantic Retrieval
Subject Predicate Object
http://book.at/isbn123 author http://fussball.de/G. Müller
http://book.at/isbn123 price €15
http://book.at/isbn123 Title Ein Leben für die Tore
http://fussball.de/G. Müller Name Gerd Müller
author
http://book.at/isbn123 http://fussball.de/G. Müller
name
price title
14
Ein Leben für die Tore Gerd Müller
€15 http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
15. Triple Stores
Query in an unoptimized RDBMS
Select r3.o as Title from rdf
Query: Titles of books from the personwith name where
r1, rdf r2, rdf r3
Gerd Müller? r1.s = r2.o AND
R2.s = r3.s AND
r1.o = ‘Gerd Müller’ AND
Subject (s) Predicate Object (o)
r1.p = ‘Name’ AND
(p)
r2.p = ‘author’ AND
http://book.at/isbn123 author http://fussball.de/G. Müller R3.p = ‘Title’
p
http://book.at/isbn123 price €15
http://book.at/isbn123 Title Ein Leben für die Tore
http://fussball.de/G.
http://fussball de/G Name Gerd Müller
Müller
15
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
16. Ontology Modelling & SW
Frameworks
Triple Stores
Triple Stores
SPARQL
The Sesame Mapping as example
Information Retrieval vs.
Semantic Retrieval
y
See Hak Soo Kim, Hyun Seok Cha, Jungsun Kim, Jin Hyun Son,, Development of the Efficient OWL Document Management
g y p g
16
System for the Embedded Applications, Springer 2005, http://www.springerlink.com/content/8mfxeh0glq5xj00m/
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
17. Ontology Modelling & SW
Frameworks
Triple Stores Triple Stores
SPARQL
Indexing Techniques
Information Retrieval vs.
Semantic Retrieval
Use specialised indices for graphs
Bitmap indices in Virtuoso
http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing
Index different combinations of the S,P,O Table
P,S,O
O,P,S
O,S,P
S,O,P
17
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
18. Ontology Modelling & SW
Frameworks
Triple Stores
Triple Stores
SPARQL
A first Analysis
Information Retrieval vs.
Semantic Retrieval
Normalised view on a graph: one large table
Generic and flexible, but
Large self j i t f rather simple queries. RDBMS are
L lf joints for th i l i
usually not optimized for this
Large memory overhead in query processing due to
self joints
Requires lot of index lookups and/or full table scans
Large storage overhead
In
I general: fl ibilit vs. performance
l flexibility f
How to improve? 18
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
19. Ontology Modelling & SW
Frameworks
Triple Stores Triple Stores
SPARQL
Further improvements
Information Retrieval vs.
Semantic Retrieval
p y p y g
Property tables: flattened representation by finding
sets of properties which are used together
Subject-Property Matrix Materialized Join Views
(SPMJVs) from Oracle
Ch
Chong, E. I., Das, S., E d
E I D S Eadon, G and S i i
G., J. 2005 A ffi i t SQL b d
d Srinivasan, J 2005. An efficient SQL-based
RDF querying scheme. In Proceedings of the 31st international Conference on
Very Large Data Bases, ACM
19
Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd
international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422. http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
20. Ontology Modelling & SW
Frameworks
Triple Stores Triple Stores
SPARQL
Property Tables
Information Retrieval vs.
Semantic Retrieval
++: Faster querying within a property tables due to reducing
subject-subject self joins
--: Requires intelligent selection of the properties in the table
More property colums lead to more null values in the table
and therefore to larger space overhead
Lesser property colums lead to more property tables
more joins over lesser property tables
--: Multi valued properties are hard to manage (e.g. a book has
: (e g
several authors)
Subject Title Author Year
ID1 “Intro to RDF” Granitzer 2006
ID1 “Intro to RDF” Tochtermann 2006 20
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
21. Ontology Modelling & SW
Frameworks
Triple Stores Triple Stores
SPARQL
Vertical Partitioning
Information Retrieval vs.
Semantic Retrieval
Partition database according to properties – one table per
property
Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the
33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422.
Tables are sorted by subject allows fast merge sort joins
21
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
22. Ontology Modelling & SW
Frameworks
Triple Stores Triple Stores
SPARQL
Vertical Partitioning
Information Retrieval vs.
Semantic Retrieval
++: Use of simple fast merge joints
simple,
++: Multi valued attributes are supported
++: No a-priori clustering decision is necessary
a priori
++: Smaller tables. Only those properties accessed have to be
read from disk
--: Insert may be slower due to access to multiple tables
--: Queries over multiple properties span over multiple tables
22
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
23. Ontology Modelling & SW
Frameworks
Triple Store
Triple Stores
SPARQL
Performance of Open Source Solutions
Information Retrieval vs.
Semantic Retrieval
Portwin & Parvatikar (2006) Scaling Jena in a Commercial Environment:
The Ingenta MetaStore Project
LEGHIGH Dataset with domain universities
~200 million triples, 11 Millionen OWL Statements, 4.3 millionen
documents
Kowari: 1 billion triple, load 20k Triple/s for Wikipedia data set
Unoptimized
Simple query take milliseconds
With inference queries take several seconds to minutes
depending on the complexity
Optimization for Inference: for RDFS entailment is to
expand the graph by making implicit edges explicit
more storage but faster access 23
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
24. Ontology Modelling & SW
Frameworks
Triple Store Triple Stores
SPARQL
Performance of Oracle
Information Retrieval vs.
Semantic Retrieval
BioMed literature database (UniProt data set)
80 million triples
5 d t ( 25 T i l 17 M i
~5 GB RDF/XML data (~2,5 GB Triple; 1,7 GB Mapping;
4,8 GB Indices)
Queries take milliseconds to secondes
Subject-property matrix materialized views provide
optimization potential of roughly ~30%
30%
Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. 2005. An efficient SQL-based RDF querying scheme. In
Proceedings of the 31st international Conference on Very Large Data Bases, ACM
24
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
25. Ontology Modelling & SW
Frameworks
Triple Store
Triple Stores
SPARQL
Performance Summary
Information Retrieval vs.
Semantic Retrieval
http://esw.w3.org/topic/LargeTripleStores
Problem: Comparison among performance numbers available
Trade-off Generic vs Performance
Trade off vs.
Optimization potential is available
Currently not as fast as specialised RDBMS but more flexible
RDBMS,
25
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
26. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
SPARQL Protocol and RDF Query Language Information Retrieval vs.
Semantic Retrieval
Different languages similar to SQL in RDBMS
SerQL, RDF, SPARQL
SPARQL currently proposed recommendation of the W3C
But what does querying a graph mean?
Basically
Specify a sub-graph with variable nodes
Find all patterns in the graph matching the sub-graph
? author
Gerd Müller Select ?x, ?y
where ?x <author> “Gerd Müller”.
title ?x <title> ?y.
?y
? 26
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
27. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Example Information Retrieval vs.
Semantic Retrieval
Daten:
http://example.org/book/book1
http://purl.org/dc/elements/1.1/title
quot;SPARQL Tutorialquot; .
Abfrage:
SELECT ?title
WHERE {
<http://example.org/book/book1>
<http://purl.org/dc/elements/1.1/title>
?title .
}
Ergebnis:
title
quot;SPARQL Tutorialquot;
27
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
28. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Example Information Retrieval vs.
Semantic Retrieval
Data:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:a foaf:name quot;Johnny Lee Outlawquot; .
_:a foaf:mbox <mailto:jlow@example.com> .
_:b foaf:name quot;Peter Goodguyquot; .
:b
_:b foaf:mbox <mailto:peter@example org> .
<mailto:peter@example.org>
Query:
PREFIX foaf: http://xmlns.com/foaf/0.1/
SELECT ?name ?mbox
WHERE {
?x foaf:name ?name .
?x foaf:mbox ?mbox
}
Result:
Res lt
name mbox
quot;Johnny Lee Outlawquot; mailto:jlow@example.com
quot;Peter Goodguyquot; <mailto:peter@example.org>
28
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
29. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Simple Query Elements Information Retrieval vs.
Semantic Retrieval
Determine the Namespace: PREFIX
Determine the return format
SELECT: Table output format similar to SQL Results
CONSTRUCT: Allows to construct a graph as return value
ASK: Returns only true/false depending of the result exists or not
DESCRIBE: return possible properties/ressources for a particular
query. Used for browsing.
Specify the selection criteria with the WHERE Clause
Specify a non-recursive sub-pattern with triples and placeholders (?
Or $)
Perform Grouping and Filter Operations
Modifiers: ORDER BY, LIMIT, OFFSET, DISTINCT
29
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
30. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Blank Nodes Information Retrieval vs.
Semantic Retrieval
ID of Blank Nodes is unique within one query and indicate only the
existence of a blank node not it‘s absolute value
Blank nodes are identified by an automatically generated URI
Consider the results of a query
≡ ≠
Subject Value Subject Value Subject Value
_:a
a “zum”
“ m” _:x “zum”
“ ” _:z “zum”
_:b “Beispiel” _:y “Beispiel” _:z “Beispiel”
Blank nodes may be renamed and are structural elements only
30
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
31. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Complex Queries Information Retrieval vs.
Semantic Retrieval
Combination of groups of simple graph expressions in the WHERE
clause
OPTIONAL clause: Subgraph pattern may not exist
Example for querying book titles from Springer
if an author exists, it will be listed if not the title is returned
without a author
SELECT ?title ?author
WHERE
{ ?buch ex:pulishedFrom http://springer.com/Verlag .
p p // p g / g
? Buch ex:Title ?title .
OPTIONAL {?buch ex:Autor ?author }.
}
31
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
32. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Complex Queries Information Retrieval vs.
Semantic Retrieval
Specifying alternative sub graph patterns: UNION
Logical OR or union of two separat queries
SELECT ?title ?author
WHERE
{ ?buch ex:pulishedFrom http://springer.com/Verlag .
? Buch ex:Title ?title .
{?buch
{?b h ex:Autor ? th .} UNION
A t ?author }
{?buch ex:Creator ?author .}
}
„Select all books with a title published by Springer which have an
author or an creator assigned“
Note: ?author in the different groups are independent of each other
32
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
33. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Complex Queries Information Retrieval vs.
Semantic Retrieval
Considering special datatypes: FILTER and XML Datatypes
Specify the data type of a literal
SELECT ?title ?author
WHERE
{ ?buch ex:pulishedFrom http://springer.com/Verlag .
? Buch ex:Title ?title .
„1998 xsd:integer
?buch ex:publishedIn „1998“^^xsd:integer
}
FILTER specifies boolean expressions for filtering results
E.g. Specify the data type range using FILTER
(see Chapter 7 in Semantic Web Grundlagen)
SELECT ?title ?author
WHERE
{ ?buch ex:pulishedFrom http://springer.com/Verlag .
?buch ex:publishedIn ?year .
FILTER(?year >2000) 33
}
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
35. Ontology Modelling & SW
Frameworks
SPARQL
Triple Stores
SPARQL
Summary Information Retrieval vs.
Semantic Retrieval
Similar to SQL
Allows easier expression of joins without knowing the
underlying database schema
y g
Allows to return not only tables, but also more complexe
output formats like graphs etc.
Datatypes of a variable not always clear
http://www.w3.org/TR/rdf-sparql-query/
http://thefigtrees.net/lee/sw/sparql-faq
htt //th fi t t/l / / l f
Hitzler, Krötsch, Rudolph, Sure, Semantic Web – 35
Grundlagen,
Grundlagen Chapter 7
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
36. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Overview Information Retrieval vs.
Semantic Retrieval
Central Question: What is semantic retrieval?
Define information retrieval
Wh
Where i semantic missing?
is ti i i ?
How can we use Semantic Web technologie to increase
semantic?
36
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
37. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Definition of IR Information Retrieval vs.
Semantic Retrieval
Salton (1968): „
( ) „Information retrieval is a field concerned with the
structure, analysis, organization, storage, searching, and retrieval of
information.“
“Information retrieval ( ) i fi di
“ f i i l (IR) is finding material (
i l (usually d
ll documents) of an
) f
unstructured nature (usually text) that satisfies an information need
from within large collections (usually stored on computers). “
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval,
Cambridge University Press 2008
Press.
Main focus of IR is how to deal with uncertainty and incomplete
information
Representation of documents is ambiguous
Query formulation is ambiguous and usually incomplete
“Unstructured” information
Usually the perfect answer, so far a perfect answer exists, can not be
delivered
37
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
38. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
IR vs Data Retrieval from Rijsbergen 1979
vs. Information Retrieval vs.
Semantic Retrieval
Data retrieval Information retrieval
Matching Exact match Partial (best) match
Inference Deduction
Ded ction Induction
Ind ction
Model Deterministic Probabilistic
Classification Monothetic Polythetic
Query language Artificial Natural
Query specification Complete Incomplete
Items wanted Matching
g Relevant
Error response Sensitive Insensitive
38
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
39. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
IR vs Data Retrieval from Rijsbergen 1979
vs. Information Retrieval vs.
Semantic Retrieval
“What is the Gross domestic product of Austria?
What Austria?”
39
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
40. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
IR vs Data Retrieval from Rijsbergen 1979
vs. Information Retrieval vs.
Semantic Retrieval
“What is the Gross domestic product of Austria?
What Austria?”
Select GDP from GDP_table where country_name=“Austria”
€ 270.8 bn
However,
not all information is available in databases
Queries are hard to formulate for the average users as well
as for non domain experts
The l th d i
Th more complex the domain and the information need,
d th i f ti d
the harder to formulate a correct query
40
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
41. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Basic Retrieval Workflow Information Retrieval vs.
Semantic Retrieval
Retrieval
Documents Document Representation Model M
D Dr
Ranking
Function R
Information Need Query
IN Q
41
See also Baeza Yates & Ribeiro Neto, (1999),“Modern Information Retrieval”
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
42. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
The Vector Space Model Information Retrieval vs.
Semantic Retrieval
Document Representation Dr: Documents are represented as
bag-of-words (i.e. a set of words)
Query Q: Query is a set of keywords
Retrieval Model M:
Set of words are converted to vectors d and q
Use different heurisitc to calculate the importance of a word
Ranking Function R:
Cosine Si il it C l l t th angle b t
C i Similarity: Calculate the l between d and q
d
d1:= “Boy p y chess”
y plays
d2:= “Boy plays bridge”
42
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
43. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
An analysis of the vector space model Information Retrieval vs.
Semantic Retrieval
Query and documents are represented in terms of their words
Importance of words depend on their occurrence
Syntactic matching between documents and queries
y g q
No synonyms are considered (e.g. Money == Cash)
No homonyms are considered (e.g. Apache Web Server)
No mereonyms are considered (e.g. tire is part of a car)
No relationships between terms are considered
43
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
44. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
p q y
Increase the semantic of the document representation Dr and the query
Q
Add metadata (e.g. tags, dublin core etc.)
Use more sophisticated preprocessing (e g language models word
(e.g. models,
sense disambiguation)
Allow users to express information needs in more detail or estimate
the context of a user (e.g. specify metadata, profiling)
Formal representation of DR and Q using semantic web languages
like OWL
see Tran, Bloehdorn, Cimiano, Haase (2007), „Expressive Ressource Description for Ontology-Based
Information Retrieval“
Retrieval
However, if we have a perfect formal representation we still need to
transform natural language queries to this model for the average user
Requires a special user interface – not possible for the generic case
Natural Language Understanding – currently unsolved
44
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
45. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
p
Document representation
C:myDocument.doc
Ex:containsConcept
Ex:Concept
http://en.wikipedia.org/wiki/ApacheWebServer
http://en wikipedia org/wiki/ApacheWebServer rdf:type
ex:term
„ Apache“ „ Apache Server“ „ Apache Web Server“
Query
Select * WHERE {?x ex:contains
http://en.wikipedia.org/wiki/ApacheWebServer} 45
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
46. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
Iterative refinement of the information need:
Keyword Query: „Apache“
http://en.wikipedia.org/wiki/ApacheWebServer
http://en.wikipedia.org/wiki/ApacheHelicopter
ttp //e ped a o g/ / pac e e copte
http://en.wikipedia.org/wiki/AmericanNatives
Select * WHERE {?x ex:containsConcept http://en.wikipedia.org/wiki/ApacheWebServer}
C:myDocument.doc C:myDocument2.doc
Ex:containsConcept Ex:containsConcept
p p g p
http://en.wikipedia.org/wiki/ApacheWebServer http://en.wikipedia.org/wiki/ApacheTribes
ex:term
46
„ Apache“ „ Apache Server“ „ Apache Web Server“ http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
47. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
Increase the cabability of the retrieval model M and ranking
function R
Latent Semantic Indexing/Concept Indexing
Automatically determine the concepts contain in a
document set
Include a-priori knowledge (e.g. Thesaurus, Word Net)
Learn ranking functions based on a users feedback (e.g. via
machine learning)
Use formal knowledge in form of ontologies and reasoning
capabilities
47
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
48. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
Example
Vector Space Model
– D={Apache=0.8, http=0.5, server=0.3}
D {Apache 0.8, http 0.5, server 0.3}
– Q={Jetty=0.8, java=0.7, web=0.4}
– Ranking Value=0.0
Introduce a „better“ retrieval model by using a domain
ontology:
„Apache“ and „Jetty“
Ex:WebServer
can be related to each
ex:isA other using the domain
Ex:ApacheServer Ex:JettyServer ontology
ex:term ex:term Ranking Value > 0 48
„ Apache Web Server“ „ Apache“ „ Jetty Server“ „ Jetty“ http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
49. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
Improve the presentation of results
Clustering of search results
Display different facets of the result set
p y
Different representation of results
Display facts instead of documents
Supports refining the user to define their information need
Search as iterative approach
49
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
50. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
So where can we include semantic? Information Retrieval vs.
Semantic Retrieval
50
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
51. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Dimensions of Semantic Information Retrieval vs.
Semantic Retrieval
Semantically structured vs. unstructured document and query
representation
Semantic expressiveness increases with increased structure
Queries are hard to formalize. Support for the average user
is required
Labour intensive creation of the document representation
Extension of the retrieval model
Runtime complexity of reasoning in case of semanticlly
p y g y
structured Dr and Q
Scaleability (also an issue for more complex statistical
methods)
51
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
52. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Dimensions of Semantic Information Retrieval vs.
Semantic Retrieval
p
Semantic enhancement of result presentation
„Low hanging fruit“
Does not require a formalized knowledge base
Formalized knowledge vs. statistical approaches
Bottom Up vs. Top Down
Statistical approaches can provide sophisticated retrieval model
model,
which do not require formalized, modelled knowledge
Statistical approaches depend on the fact that all required
information is within the data set and can be extracted
Statistical models are usually not shareable between systems
Knowledge base may not model all facts contained in the data set
(
(e.g. W dN t d
WordNet does k thi
know nothing about the Apache Web Server)
b t th A h W bS )
52
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
53. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Example Freebase www freebase com
Freebase, www.freebase.com Information Retrieval vs.
Semantic Retrieval
Open database of the worlds information
Contribution by the community
Linked with other free resources like Wikipedia
p
Web API
Own Query Language: Metaweb Query Language
Regarding semantic search
Structural document representation
Keyword queries in combination with intelligent interfaces
for infromation need refinement
Fact based representation
53
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
54. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Example Freebase Information Retrieval vs.
Semantic Retrieval
54
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
55. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Example Cuil www cuil com
Cuil, www.cuil.com Information Retrieval vs.
Semantic Retrieval
Internet search engine with 120 billion pages
Not based on popularity of sites, just on content and topics of
p p y ,j p
content
Document representation and query is unstructured
Extended Retrievalmodel i th b k
E t d dR t i l d l in the background b
d based on
d
categorical knowledge and statistical methods (Clustering)
http://www.news.com.au/technology/story/0,25642,24089734-5014239,00.html
Enhanced interface
55
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
56. Ontology Modelling & SW
Frameworks
Semantic vs. Information Retrieval
Triple Stores
SPARQL
Example Yahoo! Search Monkey Information Retrieval vs.
Semantic Retrieval
Use structured data to improve presentation of search results
http://developer.yahoo.com/searchmonkey/#
56
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
57. Summary
Triple Store
Generic /Flexibility vs. Performance/Space
Usually RDBMS with a large triple table
Alternative: special graph indexing structures
SPARQL
Simple query for RDF by providing a graph pattern
Recommended by W3c
Semantic Search
Document & query representation
Retrieval model & Interfaces
Semantic vs. Statistic
57
It is the next step, but not such a big one
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
58. Next Week
Guest Lectures (Anwesenheitspflicht) from experts
Werner Klieber: Semantic Web Services (30‘)
A tif L tif Open Linked Data (30‘)
Aatif Latif: O Li k d D t
Fleur Jeanquartier: Bringing the Semantic Web closer
to the User (30‘)
(30 )
58
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
59. That‘s it for today…
Thanks for your attention
Questions/comments?
mgranitzer@tugraz.at
59
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at
60. License
This work is licensed under the Creative Commons
Attribution 2.0 Austria License.
To view a copy of this license, visit
http://creativecommons org/licenses/by/2 0/at/
http://creativecommons.org/licenses/by/2.0/at/.
Contributors:
Mathias Lux
Peter Scheir
Klaus Tochtermann
Michael Granitzer
60
http://kmi.tugraz.at
WS 08/09 Wissenstechnologie @ kmi.tugraz.at