Web3.0 seminar wipro-session2-logicalontological


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web3.0 seminar wipro-session2-logicalontological

  1. 1. 24-06-2010 Web 3.0, Semantics & Session II – Enterprise Computing Logical Ontological Satish, Sukumar, Feroz Sheikh & Venkatesh S www.canopusconsulting.com June 2010 Objective  Information interchange and modelling  An introduction to semantic web technologies  RDF, RDFS, OWL2  Semantic modelling  Querying  The web 3.0 technology stack© Canopus Consulting 1
  2. 2. 24-06-2010 The following Sessions will address:  How do we build an application?  How do we build the ontology?  What are the key architecture components?  What are the tools & technologies to use? How do I choose which technology to use?3 © Canopus Consulting Semantic Web Application Lifecycle Ontology Editors: Protégé, TopBraid Composer4 Build Information Model Semantic Query Server Refine/Evolve Information Model Create Assimilation Models & Aggregate knowledge RDF Stores: Mulgara, Sesame Technologies: GRDDL, RDFizers, Programming: Jena OWLs, Automatic Annotation Retrieve and Use Semantic Data© Canopus Consulting 2
  3. 3. 24-06-2010 Semantic Web Application Lifecycle  Information Modelling  Build Ontology (model level representation)  Information Assimilation  Populate Knowledgebase from various sources5  Including current applications  Automatic Semantic Annotation of existing data  Any type of document, multiple sources of documents  Information Retrieval  Applications: search, integrate/portal, summarize/ explain, analyse, decisions support  Reasoning techniques: graph analysis, inferencing© Canopus Consulting Architecture Stack of Semantic Technologies Application HTTP SOAP Programming API Semantic Middleware6 e.g. Semantic SOA SPARQL Processor Inference Engine RDF-SQL Adaptor Relational RDF Store Store Semantic Technology Stack© Canopus Consulting 3
  4. 4. 24-06-2010 Semantic Web Technologies7 Source: W3C© Canopus Consulting The Perceptron.Net Use case  A rich Cultural Informatics environment designed to  Create, Collect, Categorize any type of cultural artifact – Music, Literature, Travel, Leisure,8 Entertainment..  Communities can be formed around content  Make use of existing information on the network and existing community infrastructure  An example:  Indian Music cannot be categorized along the same lines as Western Music  Genre, Album, Artist – is just not sufficient…© Canopus Consulting 4
  5. 5. 24-06-2010 The Perceptron.Net use case…  Typical Queries we want to support:  Thematic Album Creation Ability:  Give me all songs that are directed by X, and music composed by “y” and hero was “z”9  Give me all songs in Raga Kalyani – (must include film, folk and classical songs)  Give me all songs in Lord Rama in Sanskrit, which are “stotras”…  Give me all the recordings of live performacnes at Sri Krishna Gana Sabha, Chennai© Canopus Consulting The Perceptron.Net use case…  Provide an exploratory interface:  Specify a generic criteria and successively filter until you find what you need. E.g: specify a “mood” or a song you like and ask for “similar” songs or songs that match such a mood.  Allow community to add content, meta-data and find new connections in the content.10  Content can be anywhere on the Internet  Raaga.com, HamaraCd.com, MusicToday.Com, Orkut groups, blogs, websites  Not only music, but include content “about” music – articles, essays, ratings, discussions – which should be used in connecting the content, in searching the content, in enriching the content  Provide feeds such that facebook type plug-in can be developed easily – so that content and queries can be shared/updated from anywhere.© Canopus Consulting 5
  6. 6. 24-06-201011© Canopus Consulting Practical Problem12 LET US CREATE A COMPILATION OF AMITABH BACHAN’S FILM SONGS© Canopus Consulting 6
  7. 7. 24-06-2010 We start with a music site Or HamaraCD or India Times Shopping or … And we go through a fairly exhaustive categorization …13 By actor By album name Release year© Canopus Consulting Or go to a site with a filmography And drill down to some of our favorite ones14© Canopus Consulting 7
  8. 8. 24-06-2010 But what if we wanted to  Get all songs for Amitabh Bachchan that  Are from movies directed by Yash Chopra?15© Canopus Consulting Or what if we wanted to Get all songs of Amitabh Bachan in which the playback singer won an award.16© Canopus Consulting 8
  9. 9. 24-06-2010 We could do this  By hand  Make a list of movies from the filmography site  Go to Raaga and get songs for those movies  But what if there are multiple sources?17© Canopus Consulting Across Multiple Sources18© Canopus Consulting 9
  10. 10. 24-06-2010 The problem  Implicit to Explicit  Logic & domain knowledge is “Implicit” in an application  In Code, DB Schema, Documentation etc.  Humans can interpret it, but automated agents can’t List<SongList>: getSongsInAlbum();19 SongID Sequence Duration Yaara Seeli 1 4:50 Seeli Dekha Ek 2 2:55 Khwab toh … … … For agents, this logic needs to be “Explicitly” stated© Canopus Consulting To enable agents,  Extract the Logic as Metadata  The logic & domain knowledge is embedded in the application  In object models, class hierarchies, instance names  This needs to be extracted out, for others to link, access and use  Deliver both data and metadata in a uniform manner20© Canopus Consulting 10
  11. 11. 24-06-2010 We need  Common Language  Agents need a common language to interchange information  Derive Conclusions  Logic, that is now explicitly stated, can be used to draw conclusions21  Allow Independent Evolution  Every application should be able to evolve independently  Both at data level (new data) or schema level (new knowledge)  Support Incremental Assimilation  Different source can contain/provide different aspects of knowledge  Community participation to evolve the knowledge  Manifests itself in the principle of AAA (Anyone can say anything about any topic)© Canopus Consulting Steps In Information Interchange 1. Map the various data onto an abstract data representation  Make the data independent of its internal representation…22  Expose it in a standard format 2. Merge the resulting representations to create a single model 3. Make queries on the whole  Queries that could not have been done on the individual data sets© Canopus Consulting 11
  12. 12. 24-06-2010 Steps In Information Interchange23© Canopus Consulting Enabling Technologies  Semantic technologies (specifically RDF) treat metadata as data and exchange both in exactly the same way.  They provide a way for anyone to make a basic24 statement about anything and a means of layering these statements into a model.© Canopus Consulting 12
  13. 13. 24-06-2010 Interchange Without APIs  If the information provider can now produce an RDF stream of data and metadata, the assimilation agent can combine streams from all sources and treat them as if they are from25 the same source  The only thing the agent needs to understand the model of RDF, and not the individual application APIs and models  This is the fundamental premise of information interchange on the web 3.0.© Canopus Consulting First Expose the Data  As a set of relations  Using RDF Jai Ho p:title http://.../#JaiHo p:movie Slumdog… p:composer26 http://.../#ARR p:name AR Rehman Song Data© Canopus Consulting 13
  14. 14. 24-06-2010 Second Merge Data from different Sources p:title Same URI = Same URI = http://.../#JaiHo Jai Ho Same Same p:movie Resource Resource Slumdog… p:composer27 http://.../#JaiHo http://.../#ARR p:sang p:name http://.../#ARR AR Rehman http://.../#JaiHo -ARR p:performed_at p:music_director Song Data http://.../#ARR Performances Data© Canopus Consulting Merging Identical Resources p:title http://.../#JaiHo Jai Ho p:movie Slumdog… p:composer28 http://.../#JaiHo http://.../#ARR p:sang p:name http://.../#ARR AR Rehman http://.../#JaiHo -ARR p:performed_at p:music_director Song Data http://.../#ARR Performances Data© Canopus Consulting 14
  15. 15. 24-06-2010 Third Query the data  As if they are from the same source  Answer questions that were not possible from either source alone  For example  What is the title of the song sung by AR Rehman at Fireflies Music29 Festival p:title http://.../#JaiHo Jai Ho p:movie p:sang Slumdog… http://.../#FFM p:composer http://.../#JaiHo -ARR p:performed_at http://.../#ARR p:music_director p:name AR Rehman http://.../#ARR© Canopus Consulting Fourth Adding Additional Knowledge  We can assert that the composer & music_director are same  p:composer sameAs p:music_director  Very handy for model transformations, language translations, de- duplication etc. Now we can ask queries not possible with either of the sources “Composers” of songs played at Fireflies Music Festival30 p:title http://.../#JaiHo Jai Ho p:movie p:sang Slumdog… http://.../#ARR p:composer http://.../#JaiHo -ARR p:performed_at http://.../#ARR p:music_director p:name AR Rehman http://.../#ARR© Canopus Consulting 15
  16. 16. 24-06-2010 Semantic Web Technologies Common Language Reasoning & Inferences Assimilation & Retrieval RDF RDFS RDF DBs OWL SPARQL31 RDF RDFS OWL SPARQL • Resource • RDF Schema - • Web Ontology • Protocol and Description Provides basic Language - RDF query Framework - elements for adds language defines the semantics to structure to description of the schema triples. RDF vocabularies© Canopus Consulting Expressing Knowledge  Explicit Knowledge – Semantic Models - Ontology  Formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts  It is used to reason about the properties of that domain  And may be used to describe the domain32© Canopus Consulting 16
  17. 17. 24-06-2010 Mapping to Information Interchange33 Source: W3C© Canopus Consulting Resource Descriptor Framework34 WHAT IS RDF?© Canopus Consulting 17
  18. 18. 24-06-2010 Paradigms of Information: Spec for RDF  AAA - Anyone can say Anything about Anything  Consistency is not a necessary condition  For example, source 1 – Yaara Sili Sili is a sad song  Source 2 – Yaara Sili Sili is a famous song  Community may provide music theory attributes to the song35  To each their own  No common schema, yet the ability to make globally (valid) statements  For example, information about raagas can be combined with information on film music  There is always one more  Open world assumption - facts can and are always incrementally added  For example, one source may classify Lalbagh as a tourist place  Another source may classify Lalbagh as a garden  Application understands that gardens are suitable for “morning walk”© Canopus Consulting Sample Data Song name Raaga Sung by Duration Nanu palimpa Mohanam Dr 42 mins Balamuralikrishna Varuga varugave Mohanam MS 8 mins Subbulakshmi Kallalo Kannulalo Kalyani Leela 6 mins36 Pranati Pranati Kalyani S P Balu 16 mins Piluvukara Hindolam Ghantasala 6.3 mins Alugukara© Canopus Consulting 18
  19. 19. 24-06-2010 Integrating Data from different sources Source 1 Pranati Pranati Kalyani S P Balu 16 mins Source 237 Kallalo Kannulalo Kalyani Leela 6 mins Source 3 Varuga varugave Mohanam MS 8 mins Subbulakshmi Data distribution row by row – all participants must agree to the common schema© Canopus Consulting Interchange by Column Source 2 Source 1 id Sung by id Song name Raaga 1 Dr Balamuralikrishna 1 Nanu palimpa Mohanam 2 M S Subbulakshmi 2 Varuga varugave Mohanam 3 Leela 3 Kallalo Kannulalo Kalyani 4 S P Balu38 4 Pranati Pranati Kalyani 5 Ghantasala 5 Piluvukara Hindolam Alugukara  Distributing by column  Each participant must agree to the unique identifier© Canopus Consulting 19
  20. 20. 24-06-2010 Interchange by Cell Source 1 Sung by Row 1 Dr Balamuralikrishna Source 239 Song name Source 3 Row 1 Nanu palimpa Raaga Row 1 Mohanam  Distributing by cell  Each participant must agree to row ID & column name© Canopus Consulting The Cell  Cell by cell division allows us to do just that  Row ID and column needs to be identified40 This is what RDF is -> TRIPLES of Data Subject –> Predicate -> Object Subject and Predicate have unique identifiers, object can be a literal or identifier© Canopus Consulting 20
  21. 21. 24-06-2010 RDF Expressions – Triples  Triples - Subject, Predicate, Object  Subject  Must be a Resource  Predicate41  Must be a Resource  Object  Can be a Resource or a Literal Resource  <subject> has a property <predicate>, whose value is <object>  A labelled connection between two resources  E.g. Song: Jai Ho has a property composer whose value is A R Rahman  E.g. MelakartaRaaga has a property NumberOfSwaras whose value is 7 Resource Resource Literal© Canopus Consulting Rules of RDF  Global Uniqueness  The RDF URI and names must be unique globally  Sentence Form  The order of knowledge representation in the sentence42 should not change  So the autonomous agents can consume that metadata  Reuse  If a document refers to an existing resource, then it is talking about that same global resource© Canopus Consulting 21
  22. 22. 24-06-2010 Result Anybody can say Every statement is an atomic RDF Anything about Anything sentence To each his own Every subject, object and predicate in RDF is qualified43 There is always one more RDF is a graph, no begin and no end, An additional statement can always be added to the graph© Canopus Consulting More rdf  RDF blank nodes and their usage  Identify an abstract concept – “there exists some”  My ideal friend  Id is always local to the document, need not be the44 same  Reification  RDF collections  Bag – unordered collection  Seq – ordered collection  Alt – unordered set of equivalent alternatives© Canopus Consulting 22
  23. 23. 24-06-2010 rdf:type  Rdf:type is a property that provides an elementary typing system  song:yaara-seeli-seeli rdf:type HindiSongs:FilmSong45  Does not make any assumption that HindiSongs:FilmSong is a class – it is a resource  Rdf does not have a definition for class© Canopus Consulting Bringing in the meaning46 WHAT IS RDFS© Canopus Consulting 23
  24. 24. 24-06-2010 Need for RDF schemas  First step towards the “extra knowledge”:  define the terms we can use  what restrictions apply  what extra relationships are there?47  Officially: “RDF Vocabulary Description Language”  the term “Schema” is retained for historical reasons… Source: W3C© Canopus Consulting Classes, Resources  RDFS defines resources and classes:  everything in RDF is a “resource”  “classes” are also resources, but they are also a collection of possible resources (i.e., “individuals”)48  “composer”, “mood” are classes  A R Rahman is an individual  Love is an individual© Canopus Consulting 24
  25. 25. 24-06-2010 Classes, Resources (contd.)  Relationships are defined among classes and resources:  “typing”: an individual belongs to a specific class  “«A R Rahman» is a composer”49  to be more precise: “«http://.../#A R Rahman» is a composer”  “subclassing”: all instances of one are also the instances of the other (“every novel is a fiction”)  RDFS formalizes these notions in RDF© Canopus Consulting Classes, resources in RDF(S) #artist rdfs:subClassOf http://perceptron.net/indianmusic#A R rdf:type #composer Rahman50  RDFS defines the meaning of these terms  A resource may belong to several classes  rdf:type is just a property…  “«A R aRahman» is a composer and «composer» is an «artist»”  The type information may be very important for applications  e.g., it may be used for a categorization of possible nodes© Canopus Consulting 25
  26. 26. 24-06-2010 Inferred Properties #artist rdfs:subClassOf http://perceptron.net/indianmusic#A R rdf:type #composer Rahman51  is not in the original RDF data  Can be inferred from the RDFS rules  RDFS environments return that triple, too© Canopus Consulting Classes and Sub-Classes  Challenge:  We have instance data about SuddaSwaras and <owl:Thing rdf:about="#ga"> <rdf:type rdf:resource ="#SuddaSwara"/> VikritSwaras. </owl:Thing>  We however often want <owl:Thing rdf:about="#ra">52 <rdf:type rdf:resource ="#SuddaSwara"/> to use Swaras to mean </owl:Thing> both – for example, to <owl:Thing rdf:about="#ni"> say that some Raaga has <rdf:type rdf:resource ="#VikritSwara"/> </owl:Thing> 7 Swaras  How do we state this?© Canopus Consulting 26
  27. 27. 24-06-2010 Classes and Sub classes <rdfs:Class rdf:ID=“SuddaSwara"> <rdfs:subClassOf rdf:resource="#Swara"/> </rdfs:Class>  Class1 rdfs:subClassOf Class253  Class1 is a specialization of Class2, membership in Class1 implies membership in Class2, properties of Class2 are inherited by Class1  A class can be subClass of multiple classes Class 1 subClassOf Class 2 Class 1 subClassOf Class 3  Instances are specified using rdf:type© Canopus Consulting Inference – formal rules  The RDF Semantics document has a list of (33) entailment rules:  “if such and such triples are in the graph, add this and this”54  do that recursively until the graph does not change© Canopus Consulting 27
  28. 28. 24-06-2010 Properties and Sub Properties  Properties and classes are defined separately from each other  Property is not owned by any class  Range and domain of properties can be specified55  What type of resources serve as object and subject  Sub Property  Rdfs:subPropertyOf is used to define one property as a sub-property of another  The sub-property inherits the domain and range definitions of the property© Canopus Consulting What does it look like56 Data Metadata Inferred Assertions© Canopus Consulting 28
  29. 29. 24-06-2010 Properties , domains and ranges  It is still rdf:Property not RDFS:Property, however RDFS adds the notion of a domain and range to an rdf:Property <rdf:Property rdf:ID=“directed_by"> <rdfs:domain rdf:resource=“#MusicDirector"/> <rdfs:range rdf:resource="#FilmSongs"/>57 </rdf:Property>  Domain – what resources does the property apply to  Range – what are the possible values  If there are 2 domain or 2 range statements, it means both must be true  Range can indicate:  Rdf:resource => either a resource or a literal  Rdfs:datatype© Canopus Consulting Inference based on Domains and Ranges  Challenge – data typing based on use  We have statements about songs and who composed them such as  Endaro Mahanubhavulu isComposedBy Thyagaraja  But we do not have a direct statement that says Thyagaraja, or any one else is an instance of a class called Composers.58  Suppose we have to list all the composers in our model, what do we do?  This is a very common pattern in transformation rules© Canopus Consulting 29
  30. 30. 24-06-2010 Inference based on Domains and Ranges  Answer:  Define the range for isComposedBy to be of class Composers  The RDFS inference will automatically deduce that anything that is specified as the object of isComposedBy is an instance of class Composer59 Metadata Inferred Assertion Data <owl:ObjectProperty rdf:about="#isComposedBy"> <rdfs:range rdf:resource="#Composer"/> </owl:ObjectProperty>© Canopus Consulting Reinforcing the notion of “Schema”  Domains and ranges are not used for validation - but instead are used to determine new information based on old information  Does this surprise you?60© Canopus Consulting 30
  31. 31. 24-06-201061 WHAT IS OWL© Canopus Consulting Ontologies  RDFS is useful, but does not solve all possible requirements  Complex applications may want more possibilities:62  characterization of properties  identification of objects with different URIs  Disjoint-ness or equivalence of classes  construct classes, not only name them  can a program reason about some terms? E.g.:  “if «Person» resources «A» and «B» have the same «email» property, then «A» and «B» are identical”© Canopus Consulting 31
  32. 32. 24-06-2010 Ontologies (Cont.)  The term ontologies is used in this respect:  “defines the concepts and relationships used to describe and represent an area of knowledge”63  RDFS can be considered as a simple ontology language  Languages should be a compromise between  rich semantics for meaningful applications  feasibility, implementability© Canopus Consulting OWL - Web Ontology Language  OWL is an extra layer, a bit like RDF Schemas  own namespace, own terms  it relies on RDF Schemas  It is a separate recommendation64© Canopus Consulting 32
  33. 33. 24-06-2010 OWL Overview  OWL is a large set of additional terms  For classes:  owl:equivalentClass: two classes have the same individuals  EXAMPLE – A:MISIC_DIRECTOR and B:COMPOSER  owl:disjointWith: no individuals in common For properties:65   owl:equivalentProperty  EXAMPLE – A:VOCALS_BY and B:SINGER  owl:propertyDisjointWith  For individuals:  owl:sameAs: two URIs refer to the same concept (“individual”)  owl:differentFrom: negation of owl:sameAs© Canopus Consulting Classes in OWL  In RDFS, you can subclass existing classes… that’s all  In OWL, you can construct classes from existing ones:66  enumerate its content  through intersection, union, complement© Canopus Consulting 33
  34. 34. 24-06-2010 OWL: Class  OWL:Class is a subset of RDFS:Class  More expressiveness – restrictions, set operations …  Owl:Thing  Every resource that is an instance of a class is automatically a member of OWL:Thing67  OWL:Nothing -> the empty class, most specialized  Separation of classes and instances  Though the language does not mandate it  Classes contents can be enumerated  The classes consists of exactly of those individuals  Union of classes can be defined  Other possibilities: complementOf, intersectionOf© Canopus Consulting OWL class definition <owl:Class rdf:about="#Composer"> <rdfs:subClassOf rdf:resource="#Person"/> </owl:Class>68 <owl:Class rdf:about="#Composition"> <rdfs:subClassOf rdf:resource="&owl;Thing"/> </owl:Class>© Canopus Consulting 34
  35. 35. 24-06-2010 Equivalence in OWL  Equivalent Class  If Class A equivalentClassOf Class B  => they share the same members  If x belongs to Class A then x also belongs to Class B and vice versa69  Is equivalent to saying: A (Model) Design Pattern:  A rdfs:subClassOf B How to implement equivalence, when limited to  B rdfs:subClassOf A RDFS vocabulary  Equivalent properties  If propertyA equivalentPropertyOf propertyB  If propertyA applies between resources X and Y, then propertyB also applies© Canopus Consulting OWL - exhaustive  The combination of class constructions with various restrictions is extremely powerful  What we have so far follows same logic as before  extend the basic RDF and RDFS possibilities with new70 features  define their semantics, ie, what they “mean” in terms of relationships  expect to infer new relationships based on those  However, a full inference procedure is hard  not implementable with simple rule engines, for example© Canopus Consulting 35
  36. 36. 24-06-2010 Properties  owl:ObjectProperty is used to connect a resource to another resource  owl:DatatypePropery is used to connect a resource to an rdfs:Literal (untyped) or an XML schema built-in data type (typed) value71  Both Can have sub-properties <owl:ObjectProperty rdf:about="#isComposedBy"> <rdfs:range rdf:resource="#Composer"/> </owl:ObjectProperty> <owl:DatatypeProperty rdf:about="#hasName"> <rdfs:range rdf:resource="&xsd;string"/> </owl:DatatypeProperty>© Canopus Consulting More on properties  Property can be the inverse of another  Has child – has parent  Usually seen in “containment” type Inverse Property associations  Property can be symmetric ApB => BpA72  Knows, hasSpouse  Usually abstract forms of more specialized properties,  Often you will find that most symmetric properties are super-properties to others  Property can be asymmetric  hasMother, greaterThan, lesserThan  Property can be transitive. Transitive & Symmetric  isPartOf, contains  Generally seen between entities of similar type© Canopus Consulting 36
  37. 37. 24-06-2010 More on properties  Challenge  We have the following statements:  Raaga A isJanyaRaagaOf Raaga B  Raaga C isJanyaRaagaOf Raaga D73  Raaga E isMelakarthaDerivative of Raaga A  We get additional information that there is something called JanakaRaaga such that if A is JanyaRaaga of B then B is JanakaRaaga of A  I want to:  Get all JanakaRaaga of A  Get all Raagas related to A© Canopus Consulting More on properties  Answer  isJanyaRaaga isInverseOf isJanakaRaaga  isJanyaRaaga, isJanakaRaga, isMelakartaDerivative of are sub-properties of raagaRelations74 Entailed by OWL itself : inverseOf is inverse of itself In other words, inverseOf is symmetric It is not uncommon to find this pattern in model transformations© Canopus Consulting 37
  38. 38. 24-06-2010 Qualifying property membership  Film songs are those songs that have appeared in films <owl:Class rdf:ID=“FilmSongs">75 <rdfs:subClassOf rdf:resource="#Songs"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#appearedIn"/> <owl:someValuesFrom rdf:resource="#Films"/> </owl:Restriction> </rdfs:subClassOf> </owl:Class>© Canopus Consulting Qualifying property membership  owl:allValuesFrom  If the property is used, then all values for the property must belong to the class  owl:someValuesFrom76  If the property is used, at least one of the values must belong to the class  owl:hasValue  Of all the values a class has for a particular property, at least one must be this specific value© Canopus Consulting 38
  39. 39. 24-06-2010 Owl:cardinality Likewise cardinality, and max cardinality <owl:Class rdf:ID=“FilmSongs"> <rdfs:subClassOf rdf:resource="#Songs"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#appearedIn"/>77 <owl:minCardinality rdf:datatype=“XMLSchema#nonNegativeInteger"> 1 </owl:minCardinality> </rdfs:subClassOf> </owl:Class>© Canopus Consulting SPARQL Protocol And RDF Query Language78 QUERYING RDF© Canopus Consulting 39
  40. 40. 24-06-2010 Querying RDF  A collection of rdf statements is a graph  No root - no start no finish  What is returned to a query is a graph  a relational query forms a new table by combining79 existing tables, an rdf query returns a new graph by combining information from graphs from multiple sources© Canopus Consulting Queries and their responses  Querying an RDF store such as the one we have just built will return exactly what is asserted  Example:80  Raaga:Sreeragam rdf:type Raaga:Ghana  Raaga:Ghana rdfs:SubClassOf Raaga:CarnaticRaaga A query such as ?x rdf:type Raaga:CarnaticRaaga Will return nothing !!© Canopus Consulting 40
  41. 41. 24-06-2010 Retrieval models from RDF Semantics Semantic Query Meaning of data & relationships (e.g SPARQL) Internally represented as81 Structure Graph Traversal (e.g. Squish) A graph of triples connected to each other Serialized using Syntax Syntactic Query (e.g XPath/Xquery) Formats such as XML/RDF or Turtle© Canopus Consulting Syntactic Level Query <rdf:RDF <Raaga rdf:about="#satmel"> <rdf:type rdf:resource="&owl;Thing"/> <hasSwara rdf:resource="#ni"/> <hasSwara rdf:resource="#nu"/> <hasSwara rdf:resource="#pa"/> <hasSwara rdf:resource="#ra"/> <hasSwara rdf:resource="#sa"/> </Raaga> <owl:Thing rdf:about="#shuddhatodi">82 <rdf:type rdf:resource="#Raaga"/> <isMelakartaDerivative rdf:resource="#hanumatodi"/> </owl:Thing> for $r in document(music.owl) where $r/Raaga/@rdf:about = ‘#satmel’  Select a raga – (Xpath/XQuery) return $r/Raaga/hasSwara  /RDF/Raga/@rdf:about=“#satmel”  /RDF/Thing/@rdf:about=“#shuddhatodi”  Limitations of this approach  Can go out of hand very quickly  Does not understand the semantics of RDFS & OWL  Tied to the structure of RDF (which can be expressed in many ways)© Canopus Consulting 41
  42. 42. 24-06-2010 Structural Query Subject Predicate Object perceptron:satmel rdf:type perceptron:Raaga perceptron:shuddhatoodi rdf:type perceptron:MelakarthaRaaga perceptron:satmel perceptron:hasSwara perceptron:sa perceptron:MelakarthaRaaaga owl:subClassOf perceptron:Raaga …83  Possible queries  Select * from Triples where Object = “owl:Raaga” and Predicate = “rdf:type”  Select Predicate from Triples where Subject = “perceptron:satmel”  Limitations of this approach  Interprets any RDF model as just a set of Triples  Does not understand the semantics of RDFS & OWL  E.g. Looking for all raagas will fail here since shuddhatoodi is asserted to be a MelakarthaRaaga, while it is a subClassOf Raaga is in the semantics of the next triple© Canopus Consulting Querying at the Semantic Level  Need a new language that can understand the semantics of RDFS & OWL  Sample Queries  Select ?x from <perceptron.net> where ?x <rdf:type> Raaga  All MelakarthaRaagas will also be returned even though they are not explicitly asserted to be a Raaga84  Can draw inferences from the rules of RDFS and OWL  Either computing and storing the closure of a given model  Or Infer new statements as needed by the query on the fly  Not tied to how the data is stored or serialized  Special purpose query languages designed to facilitate this  RQL, SPARQL, TQL  SPARQL has emerged as the industry standard© Canopus Consulting 42
  43. 43. 24-06-2010 Introducing SPARQL  Designed to query collections of triples…  …and to easily traverse relationships  SQL-like syntax (SELECT, WHERE)  Matches graph patterns85© Canopus Consulting SPARQL – Key Characteristics  Graph pattern matching ability  Capability to restrict matches on a queried graph by providing a graph pattern  Which consists of one or more RDF triple patterns, to be satisfied in a query  Variable binding results  Returns zero or more bindings of variables.  Each set of bindings is one way that the query can be satisfied by the queried graph86  Sub-graph results  It must be possible for query results to be returned as a sub-graph of the original graph  Result limits  Possible to specify an upper bound on the number of query results returned  Streaming results  Possible for the client to request that results be streamed  WSDL support  The protocol – including its interfaces, their operations, results, and types are described using WSDL© Canopus Consulting 43
  44. 44. 24-06-2010 Sample Model Fragment87 PREFIX perceptron: http://www.perceptronnetwork.com/ontologies/2010/... SELECT ?raaga WHERE {?raaga rdf:type perceptron:Raaga }© Canopus Consulting Let us take a closer look PREFIX perceptron: http://www.perceptronnetwork.com/ontologies/2010/... SELECT ?raaga WHERE {?raaga rdf:type perceptron:Raaga }  PREFIX  Defines an alias for the namespace88  SELECT  Select query  FROM  Optional clause, specifies the URI of the model  Variables  Marked by either ? or $  WHERE  Usually the most significant part of a SPARQL query  Each triple is one condition (filter on the graph) (expressed using Turtle Syntax)© Canopus Consulting 44
  45. 45. 24-06-2010 SPARQL Select  Challenge  Number of Raagas is huge, paginate the results PREFIX perceptron: <someURI> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT DISTINCT ?s FROM <rmi://localhost/server1#perceptron>89 WHERE {?s rdf:type perceptron:Raaga} LIMIT 2 OFFSET 2 ORDER BY DESC(?s)  DISTINCT – remove duplicate results  LIMIT n – Limit the returned values to n rows  OFFSET n – Offset the first n nodes  Limit and Offset together can be used to paginate the results  ORDER BY – To sort the results, can be ASC, DESC© Canopus Consulting Example of a Complex Query  Challenge  Search for the top 10 songs (by popularity) in my favorite Raaga SELECT ?rendition FROM <rmi://localhost/server1#perceptron> FROM NAMED <my-preferences.rdf> FROM NAMED <popularity.rdf>90 WHERE { GRAPH <my-preferences.rdf> { ?me dc:name “My Name" . ?me prefs:favouriteRaga ?fav_raga . } . ?composition perceptron:hasRaaga ?fav_raga . ?rendition perceptron:hasComposition ?composition . GRAPH <my-preferences.rdf> { ?rendition pop:popularity ?popularity } } ORDER BY DESC[?popularity] LIMIT 10© Canopus Consulting 45
  46. 46. 24-06-2010 How this Query Would Work My-favourites.rdf perceptron.owl “My Name” dc:name Mohanam ?fav_raga Person:XXX prefs:favouriteRaaga perceptron:hasRaaga91 ?me Chandracharita ?composition pop:popularity 1 perceptron:isRenditionOf 1 1 ?popularity rend- chandracharita- popularity.rdf bmk ?rendition© Canopus Consulting Additional SPARQL constructs  INSERT  INSERT DATA { triples block }  DELETE  DELETE DATA { triples block }  OPTIONAL  Used to define an optional condition92  UNION  For alternate queries  FILTER  Value based filters  ASK  Returns true or false depending upon the condition  DESCRIBE  Returns the node graph for the given condition© Canopus Consulting 46
  47. 47. 24-06-2010 Technologies & Tools93 SEMANTIC WEB INFRASTRUCTURE© Canopus Consulting Objective  How do we build an application?  How do we build the ontology?  What are the key architecture components?  What are the tools & technologies to use? How do I choose which technology to use?94 © Canopus Consulting 47
  48. 48. 24-06-2010 Semantic Web Application Lifecycle  Information Modelling  Build Ontology (model level representation)  Information Assimilation  Populate Knowledgebase from various sources95  Including current applications  Automatic Semantic Annotation of existing data  Any type of document, multiple sources of documents  Information Retrieval  Applications: search, integrate/portal, summarize/ explain, analyse, decisions support  Reasoning techniques: graph analysis, inferencing© Canopus Consulting Semantic Web Application Lifecycle Ontology Editors: Protégé, TopBraid Composer96 Build Information Model Semantic Query Server Refine/Evolve Information Model Create Assimilation Models & Aggregate knowledge RDF Stores: Mulgara, Sesame Technologies: GRDDL, RDFizers, Programming: Jena OWLs, Automatic Annotation Retrieve and Use Semantic Data© Canopus Consulting 48
  49. 49. 24-06-2010 Information Modeling  Information Model Consists of  Description Component – Schema  Designed by domain experts, community  Description Base – Assertions, Extensions  Automated agents who assimilate the information97  The model is an evolutionary process  Start with a concept map or a taxonomy  Leading up to a formal ontology  It evolves over lifetime of the application  Tools & Technologies  Popular modeling tools – Protégé, TopBraid Composer, GrOWL  Concept Mapping – CMAP Tools, CMAP Tools COE  http://www.xml.com/2002/11/06/Ontology_Editor_Survey.html© Canopus Consulting RDF Stores, Triple Stores98 PERSISTING RDF DATA© Canopus Consulting 49
  50. 50. 24-06-2010 Storage Models for RDF Hand-crafted SQL,  SQL Storage ORM  Direct SQL access to RDF data  Queries become complex as discussed earlier  Can’t completely deal with the semantics of data99  SQL Bridge Tools like – SDB, Squirrel RDF  Access via SPARQL interface  Adaptor transforms SPARQL query into SQL  Data stored in relational schema  Native RDF Native non-relational DB – such as  Access via SPARQL interface Mulgara  Data stored in native RDF databases© Canopus Consulting Storage Models for RDF SQL Storage SQL Bridge Native RDF Application Application Application10 0 SPARQL SPARQL SQL Queries Queries Queries Adaptor Relational Relational Relational Native RDF Schema Triples Triples Graph© Canopus Consulting 50
  51. 51. 24-06-2010 Storage Models for RDF SQL Storage SQL Bridge Native RDF Application Application Application Traditional RDF approach Databases10 1 SPARQL SPARQL SQL Queries Queries Queries Adaptor Relational Relational Relational Native RDF Schema Triples Triples Graph© Canopus Consulting Storing RDF Data – Relational Model Relational Schema  Data model consisting of well defined tables, columns and their meaning  To provide flexibility, we would have to use techniques that dynamically update the schema  Still makes each row alike – whereas the fundamental premise of RDF is that each row is potentially unique10 2© Canopus Consulting 51
  52. 52. 24-06-2010 Storing RDF Data – Triple forms  Storing RDF predicates in a relational database  Use the attributes as “extended properties”  Simple triple forms (subject – predicate – object)  Predicate tables (one table per predicate)  Gives the requisite flexibility but makes indexing & retrieval difficult10 3© Canopus Consulting Storing RDF Data –RDF Stores  RDF Stores  Provide a mechanism to store RDF data  Provide a mechanism to query it using languages such as SPARQL  Internally may or may not be based on relational databases  Also known as Graph Databases or Schema-less Databases10  Although not all Graph databases support RDF 4 Jena SDB, TDB Oracle 11g RDF Database Mulgara Semantic Store OpenLink Virtuoso AllegroGraph OpenRDF Sesame© Canopus Consulting 52
  53. 53. 24-06-2010 Programming APIs, Technology Stack10 5 PROGRAMMING FOR THE SEMANTIC WEB© Canopus Consulting Let us Revisit SPARQL  SPARQL  SPARQL Protocol and RDF Query Language  It is both a Protocol and a Query Language  Protocol Definition10 6  Defines a mechanism of invoking a query over the web  Equivalent to a web service definition  Defines one operation  Query - with the input, output and fault definitions  Defines the protocol bindings  HTTP (get/post)  SOAP (web service)  An implementation may provide ANY of the above two© Canopus Consulting 53
  54. 54. 24-06-2010 Architecture Stack of Semantic Technologies Application HTTP SOAP Programming API Semantic Middleware10 e.g. Semantic SOA SPARQL Processor Inference Engine 7 RDF-SQL Adaptor Relational RDF Store Store Semantic Technology Stack© Canopus Consulting RDF Programming Stack - Jena  The most popular stack for Java is Jena (http://jena.sourceforge.net)  Jena also has an in-memory graph manipulation API  Jena APIs for RDF, RDFS, OWL is one of the most popular APIs  It uses a Graph based model and pluggable architecture  It allows plugging various storages, reasoners etc. to the API10  Many RDF stores either have or are building support for Jena API 8 Jena uses a Graph model as the core All reasoners also assert inferences into a Graph Thus the output of a reasoner can be fed into another layer It is a common pattern with Jena to do such layering© Canopus Consulting 54
  55. 55. 24-06-2010 Components of Jena Stack Joseki HTTP SOAP Jena API Programming API ARQ10 SPARQL Processor Inference Engine 9 RDF-SQL Adaptor TDB SDB (Abstraction) Relational RDF Store Store Semantic Technology Stack© Canopus Consulting Connecting to RDF Database – RDF Programming APIs  RDF programming APIs are available in almost all major languages  C, C++, C# and .Net, Haskell, Java, Javascript, Lisp,  Obj-C, PHP, Perl, Prolog, Python, Ruby, Tcl/Tk  There is no standard client API (yet!) - equivalent of JDBC  Each RDF package has its own set of APIs11  Most often they follow similar paradigms of Connections, Factories, Sessions 0  The query syntax however is standardized  Thus queries are portable across RDF stores  At the same time, many packages have their own “dialects” of RDF query languages  Mulgara is one of the most popular open source RDF databases  Has its own API to connect to the RDF Database  In addition to SPARQL, has its own dialect (called iTQL)© Canopus Consulting 55
  56. 56. 24-06-2010 Connecting to Mulgara Database URI SERVER_URI = URI.create("rmi://servername/instancename"); URI GRAPH_URI = URI.create("rmi://servername/instancename#model"); String query = "SELECT ?x WHERE { ?x <p:isDerivedFrom> <p:Kalyaani> }"; try { // Creating a new connection from the factory using the server URI ConnectionFactory factory = new ConnectionFactory(); Connection connection = factory.newConnection(SERVER_URI);11 // Initialize the SPARQL interpreter with the graph URI 1 SparqlInterpreter interpreter = new SparqlInterpreter(); interpreter.setDefaultGraphUri(GRAPH_URI); // Parse and execute the query on the connection Query query = interpreter.parseQuery(queryStr); Answer a = connection.execute(query); // Use the results RdfXmlEmitter.writeRdfXml((GraphAnswer) a, System.out); Sample Java } code to connect catch (Exception e) { to Mulgara e.printStackTrace(); Database using } Mulgara Client finally { API connection.dispose(); }© Canopus Consulting Adopting Existing SQL Databases11 2 Source: Tim Berners-Lee© Canopus Consulting 56
  57. 57. 24-06-2010 Semantic Web Technologies11 3 Source: W3C© Canopus Consulting11 4 APPENDIX© Canopus Consulting 57
  58. 58. 24-06-2010 RDF Triples  Resources can use any URI, e.g.:  http://www.example.org/file.xml#element(home)  http://www.example.org/file.html#home  http://www.example.org/file2.xml#xpath1(//q[@a=b])11 5  URI-s can also denote non Web entities:  http://www.ivan-herman.net/me is me  not my home page, not my publication list, but me  RDF triples form a directed, labelled graph© Canopus Consulting RDF Primitives  Resources  Anything that can be uniquely identified  Using a URI  E.g. http://www.perceptron.net/indianmusic#A R Rahman11 Namespace Fragment ID 6  Properties  Are resources in themselves  Literals RDF is a  Strings general model for  With optional data types triples© Canopus Consulting 58
  59. 59. 24-06-2010 OWL Full  No constraints on any of the constructs  owl:Class is just syntactic sugar for rdfs:Class  owl:Thing is equivalent to rdfs:Resource  this means that:11 7  Class can also be an individual, a URI can denote a property as well as a Class  e.g., it is possible to talk about class of classes, apply properties on them  Extension of RDFS in all respects  But: no system may exist that infers everything one might expect© Canopus Consulting OWL Full usage  Nevertheless OWL Full is essential  it gives a generic framework to express many things  some application just need to express and interchange terms11 8  Applications may control what terms are used and how  in fact, they may define their own sub-language via, eg, a vocabulary  thereby ensuring a manageable inference procedure© Canopus Consulting 59