• Like
  • Save
Wissenstechnologie Vi 08 09
Upcoming SlideShare
Loading in...5
×
 

Wissenstechnologie Vi 08 09

on

  • 2,965 views

 

Statistics

Views

Total Views
2,965
Views on SlideShare
2,962
Embed Views
3

Actions

Likes
0
Downloads
12
Comments
0

2 Embeds 3

http://www.linkedin.com 2
http://kmi.tugraz.at 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Wissenstechnologie Vi 08 09 Wissenstechnologie Vi 08 09 Presentation Transcript

    • Wissenstechnologie WS 08/09 Michael Granitzer IWM TU Graz & Know-Center Know Center Lecture 6: T iple Sto es Sparql, Lect e 6 Triple Stores, Spa ql Semantic Retrieval http://kmi.tugraz.at http://kmi tugraz at http://www.know-center.at http://www know center at This work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/at/.
    • Today Ontology Modelling & SW Frameworks Triple Stores • Basic RDBMS scheme • Property tables & vertical Partitioning • Performance Comparisons SPARQL • Definition • Simplex & Complex Queries • Some examples on Endpoints vs. Information Retrieval vs Semantic Retrieval • Basics of IR • „Semantic“ Retrieval • Practical Examples (Freebase, Cugil etc.) 2 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Ontology Modelling Triple Stores SPARQL vs. vs OOP Information Retrieval vs. Semantic Retrieval Similar to design in Object Oriented Programming Classes, objects and members Capture th operational properties C t the ti l ti public interface Course { public void enroll() bli id ll() } Ontology Modelling: Capture the structural properties owl:participates owl:Course owl:Student 3 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Ontology Modelling Triple Stores SPARQL vs. vs RDBMS Information Retrieval vs. Semantic Retrieval Similar in designing a database system Higher expressiveness in OWL Aggrement on the domain not only referential integrity Not focused on special indexing structures or on querying only Ontologies should be application independent Consistency checks Semantic Integration via Ontologies 4 Product File System Employee Text ... Database Database Database http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Ontology Modelling Triple Stores SPARQL Goals Information Retrieval vs. Semantic Retrieval Goals Share common understanding among people or software Enable reuse of knowledge Make domain assumptions explicit Separate domain knowledge from operational knowledge Analyze domain knowledge Main Application Areas Semantic harmonization of heterogeneous data sources Structuring the content of a portal Enhance search and retrieval 5 … http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Ontology Modelling Triple Stores SPARQL Aspects to model Information Retrieval vs. Semantic Retrieval Defining classes in the ontology Arranging classes in a taxonomy g g y Defining slots/properties for classes and their values Define logical constraints on classes/properties Assign instances 6 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Ontology Modelling Triple Stores SPARQL Three simple rules Information Retrieval vs. Semantic Retrieval 1. There 1 “There is no one correct way to model a domain there domain— are always viable alternatives. The best solution almost always depends on the application that you have in mind and the extensions that you anticipate ” anticipate. 2. “Concepts in the ontology should be close to objects (physical or logical) and relationships in your domain of interest. These are most likely to be nouns (objects) or verbs (relationships) in sentences that describe your domain. domain ” 3. “Ontology development is necessarily an iterative process process” 7 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Ontology Modelling Triple Stores SPARQL Noy s Noy‘s and McGunnise 7 Steps Information Retrieval vs. Semantic Retrieval 1. 1 Determine the domain and scope of the ontology 2. Consider reusing existing ontologies 3 E 3. Enumerate i t important terms in the ontology t tt i th t l 4. Define the classes and the class hierarchy 5. Define the properties (slots) of classes 6. Define the facets of the slots 7. Create/Import instances 8 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic Web Frameworks Triple Stores SPARQL Motivation Information Retrieval vs. Semantic Retrieval Protege as modelling GUI For „Semantic Web Applications“ we want also to A t Automatically i ti ll import/map i t t/ instances Manage large number of triples Combine different schemas Query for specific triples Harmonize different metadata schemas Database requirements for graphs Reasoning 9 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic Web Frameworks Triple Stores SPARQL Overview Information Retrieval vs. Semantic Retrieval Three „major Java based Open Source frameworks „major“ Jena Sesame Protege Java API Functionality Java API for managing OWL, RDF and RDFS (optional DAML+OIL) Import/Export of different formats Persistence via own data store, different database and file system backend Querying, Graph manipulation and restricted reasoning capabilities Web API 10 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic Web Frameworks Triple Stores SPARQL Jena Architecture Information Retrieval vs. Semantic Retrieval SPARQL RDF/XML Jena: Implementing the Semantic Web Recommendations – 2003 http://www.hpl.hp.com/techreports/2003/H 11 PL-2003-146.html http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic Web Frameworks Triple Stores SPARQL Main Differences Information Retrieval vs. Semantic Retrieval Jena Reference implementation Not directly focused towards web access and scalability Protege Modelling GUI Sesame Focused towards remote access and scaleability Flexible Layer architecture for different storage backends Others: Virtuoso 3Store Kowari OpenAnzo Virtuoso, 3Store, Kowari, 12 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL Overview Information Retrieval vs. Semantic Retrieval Basic data model is RDF (i e OWL, RDFS) (i.e. OWL RDF forms an directed graph How do we manage large graphs In Memory Adjacency Matrix On secondary storage – Special Indices – Use relational database management systems 13 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL „Normalized Table Model of RDF Normalized“ Information Retrieval vs. Semantic Retrieval Subject Predicate Object http://book.at/isbn123 author http://fussball.de/G. Müller http://book.at/isbn123 price €15 http://book.at/isbn123 Title Ein Leben für die Tore http://fussball.de/G. Müller Name Gerd Müller author http://book.at/isbn123 http://fussball.de/G. Müller name price title 14 Ein Leben für die Tore Gerd Müller €15 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Triple Stores Query in an unoptimized RDBMS Select r3.o as Title from rdf Query: Titles of books from the personwith name where r1, rdf r2, rdf r3 Gerd Müller? r1.s = r2.o AND R2.s = r3.s AND r1.o = ‘Gerd Müller’ AND Subject (s) Predicate Object (o) r1.p = ‘Name’ AND (p) r2.p = ‘author’ AND http://book.at/isbn123 author http://fussball.de/G. Müller R3.p = ‘Title’ p http://book.at/isbn123 price €15 http://book.at/isbn123 Title Ein Leben für die Tore http://fussball.de/G. http://fussball de/G Name Gerd Müller Müller 15 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL The Sesame Mapping as example Information Retrieval vs. Semantic Retrieval y See Hak Soo Kim, Hyun Seok Cha, Jungsun Kim, Jin Hyun Son,, Development of the Efficient OWL Document Management g y p g 16 System for the Embedded Applications, Springer 2005, http://www.springerlink.com/content/8mfxeh0glq5xj00m/ http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL Indexing Techniques Information Retrieval vs. Semantic Retrieval Use specialised indices for graphs Bitmap indices in Virtuoso http://virtuoso.openlinksw.com/wiki/main/Main/VOSBitmapIndexing Index different combinations of the S,P,O Table P,S,O O,P,S O,S,P S,O,P 17 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL A first Analysis Information Retrieval vs. Semantic Retrieval Normalised view on a graph: one large table Generic and flexible, but Large self j i t f rather simple queries. RDBMS are L lf joints for th i l i usually not optimized for this Large memory overhead in query processing due to self joints Requires lot of index lookups and/or full table scans Large storage overhead In I general: fl ibilit vs. performance l flexibility f How to improve? 18 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL Further improvements Information Retrieval vs. Semantic Retrieval p y p y g Property tables: flattened representation by finding sets of properties which are used together Subject-Property Matrix Materialized Join Views (SPMJVs) from Oracle Ch Chong, E. I., Das, S., E d E I D S Eadon, G and S i i G., J. 2005 A ffi i t SQL b d d Srinivasan, J 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st international Conference on Very Large Data Bases, ACM 19 Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422. http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL Property Tables Information Retrieval vs. Semantic Retrieval ++: Faster querying within a property tables due to reducing subject-subject self joins --: Requires intelligent selection of the properties in the table More property colums lead to more null values in the table and therefore to larger space overhead Lesser property colums lead to more property tables more joins over lesser property tables --: Multi valued properties are hard to manage (e.g. a book has : (e g several authors) Subject Title Author Year ID1 “Intro to RDF” Granitzer 2006 ID1 “Intro to RDF” Tochtermann 2006 20 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL Vertical Partitioning Information Retrieval vs. Semantic Retrieval Partition database according to properties – one table per property Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422. Tables are sorted by subject allows fast merge sort joins 21 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Stores Triple Stores SPARQL Vertical Partitioning Information Retrieval vs. Semantic Retrieval ++: Use of simple fast merge joints simple, ++: Multi valued attributes are supported ++: No a-priori clustering decision is necessary a priori ++: Smaller tables. Only those properties accessed have to be read from disk --: Insert may be slower due to access to multiple tables --: Queries over multiple properties span over multiple tables 22 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Store Triple Stores SPARQL Performance of Open Source Solutions Information Retrieval vs. Semantic Retrieval Portwin & Parvatikar (2006) Scaling Jena in a Commercial Environment: The Ingenta MetaStore Project LEGHIGH Dataset with domain universities ~200 million triples, 11 Millionen OWL Statements, 4.3 millionen documents Kowari: 1 billion triple, load 20k Triple/s for Wikipedia data set Unoptimized Simple query take milliseconds With inference queries take several seconds to minutes depending on the complexity Optimization for Inference: for RDFS entailment is to expand the graph by making implicit edges explicit more storage but faster access 23 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Store Triple Stores SPARQL Performance of Oracle Information Retrieval vs. Semantic Retrieval BioMed literature database (UniProt data set) 80 million triples 5 d t ( 25 T i l 17 M i ~5 GB RDF/XML data (~2,5 GB Triple; 1,7 GB Mapping; 4,8 GB Indices) Queries take milliseconds to secondes Subject-property matrix materialized views provide optimization potential of roughly ~30% 30% Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. 2005. An efficient SQL-based RDF querying scheme. In Proceedings of the 31st international Conference on Very Large Data Bases, ACM 24 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Triple Store Triple Stores SPARQL Performance Summary Information Retrieval vs. Semantic Retrieval http://esw.w3.org/topic/LargeTripleStores Problem: Comparison among performance numbers available Trade-off Generic vs Performance Trade off vs. Optimization potential is available Currently not as fast as specialised RDBMS but more flexible RDBMS, 25 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL SPARQL Protocol and RDF Query Language Information Retrieval vs. Semantic Retrieval Different languages similar to SQL in RDBMS SerQL, RDF, SPARQL SPARQL currently proposed recommendation of the W3C But what does querying a graph mean? Basically Specify a sub-graph with variable nodes Find all patterns in the graph matching the sub-graph ? author Gerd Müller Select ?x, ?y where ?x <author> “Gerd Müller”. title ?x <title> ?y. ?y ? 26 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Example Information Retrieval vs. Semantic Retrieval Daten: http://example.org/book/book1 http://purl.org/dc/elements/1.1/title quot;SPARQL Tutorialquot; . Abfrage: SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . } Ergebnis: title quot;SPARQL Tutorialquot; 27 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Example Information Retrieval vs. Semantic Retrieval Data: @prefix foaf: <http://xmlns.com/foaf/0.1/> . _:a foaf:name quot;Johnny Lee Outlawquot; . _:a foaf:mbox <mailto:jlow@example.com> . _:b foaf:name quot;Peter Goodguyquot; . :b _:b foaf:mbox <mailto:peter@example org> . <mailto:peter@example.org> Query: PREFIX foaf: http://xmlns.com/foaf/0.1/ SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox } Result: Res lt name mbox quot;Johnny Lee Outlawquot; mailto:jlow@example.com quot;Peter Goodguyquot; <mailto:peter@example.org> 28 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Simple Query Elements Information Retrieval vs. Semantic Retrieval Determine the Namespace: PREFIX Determine the return format SELECT: Table output format similar to SQL Results CONSTRUCT: Allows to construct a graph as return value ASK: Returns only true/false depending of the result exists or not DESCRIBE: return possible properties/ressources for a particular query. Used for browsing. Specify the selection criteria with the WHERE Clause Specify a non-recursive sub-pattern with triples and placeholders (? Or $) Perform Grouping and Filter Operations Modifiers: ORDER BY, LIMIT, OFFSET, DISTINCT 29 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Blank Nodes Information Retrieval vs. Semantic Retrieval ID of Blank Nodes is unique within one query and indicate only the existence of a blank node not it‘s absolute value Blank nodes are identified by an automatically generated URI Consider the results of a query ≡ ≠ Subject Value Subject Value Subject Value _:a a “zum” “ m” _:x “zum” “ ” _:z “zum” _:b “Beispiel” _:y “Beispiel” _:z “Beispiel” Blank nodes may be renamed and are structural elements only 30 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Complex Queries Information Retrieval vs. Semantic Retrieval Combination of groups of simple graph expressions in the WHERE clause OPTIONAL clause: Subgraph pattern may not exist Example for querying book titles from Springer if an author exists, it will be listed if not the title is returned without a author SELECT ?title ?author WHERE { ?buch ex:pulishedFrom http://springer.com/Verlag . p p // p g / g ? Buch ex:Title ?title . OPTIONAL {?buch ex:Autor ?author }. } 31 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Complex Queries Information Retrieval vs. Semantic Retrieval Specifying alternative sub graph patterns: UNION Logical OR or union of two separat queries SELECT ?title ?author WHERE { ?buch ex:pulishedFrom http://springer.com/Verlag . ? Buch ex:Title ?title . {?buch {?b h ex:Autor ? th .} UNION A t ?author } {?buch ex:Creator ?author .} } „Select all books with a title published by Springer which have an author or an creator assigned“ Note: ?author in the different groups are independent of each other 32 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Complex Queries Information Retrieval vs. Semantic Retrieval Considering special datatypes: FILTER and XML Datatypes Specify the data type of a literal SELECT ?title ?author WHERE { ?buch ex:pulishedFrom http://springer.com/Verlag . ? Buch ex:Title ?title . „1998 xsd:integer ?buch ex:publishedIn „1998“^^xsd:integer } FILTER specifies boolean expressions for filtering results E.g. Specify the data type range using FILTER (see Chapter 7 in Semantic Web Grundlagen) SELECT ?title ?author WHERE { ?buch ex:pulishedFrom http://springer.com/Verlag . ?buch ex:publishedIn ?year . FILTER(?year >2000) 33 } http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Real World Examples in DBPedia Information Retrieval vs. Semantic Retrieval PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22 rdf syntax ns#> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT * WHERE { ?album p:artist ?band. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>. OPTIONAL {?album p:cover ?cover}. OPTIONAL {?album p:name ?name}. { p } PREFIX p: <http://dbpedia.org/property/> OPTIONAL {?album p:released ?dateofrelease}. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> CONSTRUCT { } ORDER BY DESC(?name) LIMIT 20 OFFSET 19 ?album p:itIsDone ?dateofrelease . ?band p:isBand quot;truequot; . } WHERE { ?album p:artist ?band. ?band ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>. OPTIONAL {?album p:cover ?cover}. OPTIONAL {?album p:name ?name}. OPTIONAL {?album p:released ?dateofrelease}. } ORDER BY ?name LIMIT 20 OFFSET 19 PREFIX p: <http://dbpedia.org/property/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> DESCRIBE ?album WHERE { ?album p:artist <http://dbpedia.org/resource/The_Allman_Brothers_Band>. ?album rdf:type <http://dbpedia.org/class/yago/Album106591815>. 34 } http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks SPARQL Triple Stores SPARQL Summary Information Retrieval vs. Semantic Retrieval Similar to SQL Allows easier expression of joins without knowing the underlying database schema y g Allows to return not only tables, but also more complexe output formats like graphs etc. Datatypes of a variable not always clear http://www.w3.org/TR/rdf-sparql-query/ http://thefigtrees.net/lee/sw/sparql-faq htt //th fi t t/l / / l f Hitzler, Krötsch, Rudolph, Sure, Semantic Web – 35 Grundlagen, Grundlagen Chapter 7 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Overview Information Retrieval vs. Semantic Retrieval Central Question: What is semantic retrieval? Define information retrieval Wh Where i semantic missing? is ti i i ? How can we use Semantic Web technologie to increase semantic? 36 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Definition of IR Information Retrieval vs. Semantic Retrieval Salton (1968): „ ( ) „Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.“ “Information retrieval ( ) i fi di “ f i i l (IR) is finding material ( i l (usually d ll documents) of an ) f unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). “ Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008 Press. Main focus of IR is how to deal with uncertainty and incomplete information Representation of documents is ambiguous Query formulation is ambiguous and usually incomplete “Unstructured” information Usually the perfect answer, so far a perfect answer exists, can not be delivered 37 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL IR vs Data Retrieval from Rijsbergen 1979 vs. Information Retrieval vs. Semantic Retrieval Data retrieval Information retrieval Matching Exact match Partial (best) match Inference Deduction Ded ction Induction Ind ction Model Deterministic Probabilistic Classification Monothetic Polythetic Query language Artificial Natural Query specification Complete Incomplete Items wanted Matching g Relevant Error response Sensitive Insensitive 38 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL IR vs Data Retrieval from Rijsbergen 1979 vs. Information Retrieval vs. Semantic Retrieval “What is the Gross domestic product of Austria? What Austria?” 39 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL IR vs Data Retrieval from Rijsbergen 1979 vs. Information Retrieval vs. Semantic Retrieval “What is the Gross domestic product of Austria? What Austria?” Select GDP from GDP_table where country_name=“Austria” € 270.8 bn However, not all information is available in databases Queries are hard to formulate for the average users as well as for non domain experts The l th d i Th more complex the domain and the information need, d th i f ti d the harder to formulate a correct query 40 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Basic Retrieval Workflow Information Retrieval vs. Semantic Retrieval Retrieval Documents Document Representation Model M D Dr Ranking Function R Information Need Query IN Q 41 See also Baeza Yates & Ribeiro Neto, (1999),“Modern Information Retrieval” http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL The Vector Space Model Information Retrieval vs. Semantic Retrieval Document Representation Dr: Documents are represented as bag-of-words (i.e. a set of words) Query Q: Query is a set of keywords Retrieval Model M: Set of words are converted to vectors d and q Use different heurisitc to calculate the importance of a word Ranking Function R: Cosine Si il it C l l t th angle b t C i Similarity: Calculate the l between d and q d d1:= “Boy p y chess” y plays d2:= “Boy plays bridge” 42 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL An analysis of the vector space model Information Retrieval vs. Semantic Retrieval Query and documents are represented in terms of their words Importance of words depend on their occurrence Syntactic matching between documents and queries y g q No synonyms are considered (e.g. Money == Cash) No homonyms are considered (e.g. Apache Web Server) No mereonyms are considered (e.g. tire is part of a car) No relationships between terms are considered 43 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval p q y Increase the semantic of the document representation Dr and the query Q Add metadata (e.g. tags, dublin core etc.) Use more sophisticated preprocessing (e g language models word (e.g. models, sense disambiguation) Allow users to express information needs in more detail or estimate the context of a user (e.g. specify metadata, profiling) Formal representation of DR and Q using semantic web languages like OWL see Tran, Bloehdorn, Cimiano, Haase (2007), „Expressive Ressource Description for Ontology-Based Information Retrieval“ Retrieval However, if we have a perfect formal representation we still need to transform natural language queries to this model for the average user Requires a special user interface – not possible for the generic case Natural Language Understanding – currently unsolved 44 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval p Document representation C:myDocument.doc Ex:containsConcept Ex:Concept http://en.wikipedia.org/wiki/ApacheWebServer http://en wikipedia org/wiki/ApacheWebServer rdf:type ex:term „ Apache“ „ Apache Server“ „ Apache Web Server“ Query Select * WHERE {?x ex:contains http://en.wikipedia.org/wiki/ApacheWebServer} 45 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval Iterative refinement of the information need: Keyword Query: „Apache“ http://en.wikipedia.org/wiki/ApacheWebServer http://en.wikipedia.org/wiki/ApacheHelicopter ttp //e ped a o g/ / pac e e copte http://en.wikipedia.org/wiki/AmericanNatives Select * WHERE {?x ex:containsConcept http://en.wikipedia.org/wiki/ApacheWebServer} C:myDocument.doc C:myDocument2.doc Ex:containsConcept Ex:containsConcept p p g p http://en.wikipedia.org/wiki/ApacheWebServer http://en.wikipedia.org/wiki/ApacheTribes ex:term 46 „ Apache“ „ Apache Server“ „ Apache Web Server“ http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval Increase the cabability of the retrieval model M and ranking function R Latent Semantic Indexing/Concept Indexing Automatically determine the concepts contain in a document set Include a-priori knowledge (e.g. Thesaurus, Word Net) Learn ranking functions based on a users feedback (e.g. via machine learning) Use formal knowledge in form of ontologies and reasoning capabilities 47 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval Example Vector Space Model – D={Apache=0.8, http=0.5, server=0.3} D {Apache 0.8, http 0.5, server 0.3} – Q={Jetty=0.8, java=0.7, web=0.4} – Ranking Value=0.0 Introduce a „better“ retrieval model by using a domain ontology: „Apache“ and „Jetty“ Ex:WebServer can be related to each ex:isA other using the domain Ex:ApacheServer Ex:JettyServer ontology ex:term ex:term Ranking Value > 0 48 „ Apache Web Server“ „ Apache“ „ Jetty Server“ „ Jetty“ http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval Improve the presentation of results Clustering of search results Display different facets of the result set p y Different representation of results Display facts instead of documents Supports refining the user to define their information need Search as iterative approach 49 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL So where can we include semantic? Information Retrieval vs. Semantic Retrieval 50 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Dimensions of Semantic Information Retrieval vs. Semantic Retrieval Semantically structured vs. unstructured document and query representation Semantic expressiveness increases with increased structure Queries are hard to formalize. Support for the average user is required Labour intensive creation of the document representation Extension of the retrieval model Runtime complexity of reasoning in case of semanticlly p y g y structured Dr and Q Scaleability (also an issue for more complex statistical methods) 51 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Dimensions of Semantic Information Retrieval vs. Semantic Retrieval p Semantic enhancement of result presentation „Low hanging fruit“ Does not require a formalized knowledge base Formalized knowledge vs. statistical approaches Bottom Up vs. Top Down Statistical approaches can provide sophisticated retrieval model model, which do not require formalized, modelled knowledge Statistical approaches depend on the fact that all required information is within the data set and can be extracted Statistical models are usually not shareable between systems Knowledge base may not model all facts contained in the data set ( (e.g. W dN t d WordNet does k thi know nothing about the Apache Web Server) b t th A h W bS ) 52 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Example Freebase www freebase com Freebase, www.freebase.com Information Retrieval vs. Semantic Retrieval Open database of the worlds information Contribution by the community Linked with other free resources like Wikipedia p Web API Own Query Language: Metaweb Query Language Regarding semantic search Structural document representation Keyword queries in combination with intelligent interfaces for infromation need refinement Fact based representation 53 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Example Freebase Information Retrieval vs. Semantic Retrieval 54 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Example Cuil www cuil com Cuil, www.cuil.com Information Retrieval vs. Semantic Retrieval Internet search engine with 120 billion pages Not based on popularity of sites, just on content and topics of p p y ,j p content Document representation and query is unstructured Extended Retrievalmodel i th b k E t d dR t i l d l in the background b d based on d categorical knowledge and statistical methods (Clustering) http://www.news.com.au/technology/story/0,25642,24089734-5014239,00.html Enhanced interface 55 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Ontology Modelling & SW Frameworks Semantic vs. Information Retrieval Triple Stores SPARQL Example Yahoo! Search Monkey Information Retrieval vs. Semantic Retrieval Use structured data to improve presentation of search results http://developer.yahoo.com/searchmonkey/# 56 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Summary Triple Store Generic /Flexibility vs. Performance/Space Usually RDBMS with a large triple table Alternative: special graph indexing structures SPARQL Simple query for RDF by providing a graph pattern Recommended by W3c Semantic Search Document & query representation Retrieval model & Interfaces Semantic vs. Statistic 57 It is the next step, but not such a big one http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • Next Week Guest Lectures (Anwesenheitspflicht) from experts Werner Klieber: Semantic Web Services (30‘) A tif L tif Open Linked Data (30‘) Aatif Latif: O Li k d D t Fleur Jeanquartier: Bringing the Semantic Web closer to the User (30‘) (30 ) 58 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • That‘s it for today… Thanks for your attention Questions/comments? mgranitzer@tugraz.at 59 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at
    • License This work is licensed under the Creative Commons Attribution 2.0 Austria License. To view a copy of this license, visit http://creativecommons org/licenses/by/2 0/at/ http://creativecommons.org/licenses/by/2.0/at/. Contributors: Mathias Lux Peter Scheir Klaus Tochtermann Michael Granitzer 60 http://kmi.tugraz.at WS 08/09 Wissenstechnologie @ kmi.tugraz.at