More Related Content
Similar to Web3.0 seminar wipro-session2-logicalontological (20)
Web3.0 seminar wipro-session2-logicalontological
- 1. 24-06-2010
Web 3.0, Semantics & Session II –
Enterprise Computing Logical Ontological
Satish, Sukumar,
Feroz Sheikh & Venkatesh S
www.canopusconsulting.com June 2010
Objective
Information interchange and modelling
An introduction to semantic web technologies
RDF, RDFS, OWL
2
Semantic modelling
Querying
The web 3.0 technology stack
© Canopus Consulting
1
- 2. 24-06-2010
The following Sessions will address:
How do we build an application?
How do we build the ontology?
What are the key architecture components?
What are the tools & technologies to use?
How do I choose which technology to use?
3
© Canopus Consulting
Semantic Web Application Lifecycle
Ontology Editors:
Protégé, TopBraid
Composer
4 Build Information Model
Semantic Query
Server
Refine/Evolve
Information Model
Create Assimilation Models
& Aggregate knowledge
RDF Stores:
Mulgara, Sesame Technologies:
GRDDL, RDFizers,
Programming: Jena OWLs, Automatic
Annotation
Retrieve and Use Semantic Data
© Canopus Consulting
2
- 3. 24-06-2010
Semantic Web Application Lifecycle
Information Modelling
Build Ontology (model level representation)
Information Assimilation
Populate Knowledgebase from various sources
5
Including current applications
Automatic Semantic Annotation of existing data
Any type of document, multiple sources of documents
Information Retrieval
Applications: search, integrate/portal, summarize/
explain, analyse, decisions support
Reasoning techniques: graph analysis, inferencing
© Canopus Consulting
Architecture Stack of Semantic Technologies
Application
HTTP SOAP
Programming API
Semantic Middleware
6 e.g. Semantic SOA SPARQL Processor Inference Engine
RDF-SQL
Adaptor
Relational
RDF Store
Store
Semantic Technology Stack
© Canopus Consulting
3
- 4. 24-06-2010
Semantic Web Technologies
7
Source: W3C
© Canopus Consulting
The Perceptron.Net Use case
A rich Cultural Informatics environment
designed to
Create, Collect, Categorize any type of cultural
artifact – Music, Literature, Travel, Leisure,
8
Entertainment..
Communities can be formed around content
Make use of existing information on the network
and existing community infrastructure
An example:
Indian Music cannot be categorized along the
same lines as Western Music
Genre, Album, Artist – is just not sufficient…
© Canopus Consulting
4
- 5. 24-06-2010
The Perceptron.Net use case…
Typical Queries we want to support:
Thematic Album Creation Ability:
Give me all songs that are directed by X, and music
composed by “y” and hero was “z”
9
Give me all songs in Raga Kalyani – (must include
film, folk and classical songs)
Give me all songs in Lord Rama in Sanskrit, which
are “stotras”…
Give me all the recordings of live performacnes at
Sri Krishna Gana Sabha, Chennai
© Canopus Consulting
The Perceptron.Net use case…
Provide an exploratory interface:
Specify a generic criteria and successively filter until you find what you
need. E.g: specify a “mood” or a song you like and ask for “similar” songs
or songs that match such a mood.
Allow community to add content, meta-data and find new
connections in the content.
10
Content can be anywhere on the Internet
Raaga.com, HamaraCd.com, MusicToday.Com, Orkut groups, blogs,
websites
Not only music, but include content “about” music – articles, essays,
ratings, discussions – which should be used in connecting the
content, in searching the content, in enriching the content
Provide feeds such that facebook type plug-in can be developed
easily – so that content and queries can be shared/updated from
anywhere.
© Canopus Consulting
5
- 7. 24-06-2010
We start with a music site
Or HamaraCD or India
Times Shopping or …
And we go through a fairly
exhaustive categorization
…
13
By actor
By album name
Release year
© Canopus Consulting
Or go to a site with a filmography
And drill down to some
of our favorite ones
14
© Canopus Consulting
7
- 8. 24-06-2010
But what if we wanted to
Get all songs for Amitabh Bachchan that
Are from movies directed by Yash Chopra?
15
© Canopus Consulting
Or what if we wanted to
Get all songs of Amitabh
Bachan in which the
playback singer won an
award.
16
© Canopus Consulting
8
- 9. 24-06-2010
We could do this
By hand
Make a list of movies from the filmography site
Go to Raaga and get songs for those movies
But what if there are multiple sources?
17
© Canopus Consulting
Across Multiple Sources
18
© Canopus Consulting
9
- 10. 24-06-2010
The problem
Implicit to Explicit
Logic & domain knowledge is “Implicit” in an application
In Code, DB Schema, Documentation etc.
Humans can interpret it, but automated agents can’t
List<SongList>: getSongsInAlbum();
19
SongID Sequence Duration
Yaara Seeli 1 4:50
Seeli
Dekha Ek 2 2:55
Khwab toh
… … …
For agents, this logic needs to be “Explicitly” stated
© Canopus Consulting
To enable agents,
Extract the Logic as Metadata
The logic & domain knowledge is embedded in the application
In object models, class hierarchies, instance names
This needs to be extracted out, for others to link, access and use
Deliver both data and metadata in a uniform manner
20
© Canopus Consulting
10
- 11. 24-06-2010
We need
Common Language
Agents need a common language to interchange information
Derive Conclusions
Logic, that is now explicitly stated, can be used to draw conclusions
21
Allow Independent Evolution
Every application should be able to evolve independently
Both at data level (new data) or schema level (new knowledge)
Support Incremental Assimilation
Different source can contain/provide different aspects of knowledge
Community participation to evolve the knowledge
Manifests itself in the principle of AAA (Anyone can say anything
about any topic)
© Canopus Consulting
Steps In Information Interchange
1. Map the various data onto an abstract data
representation
Make the data independent of its internal
representation…
22 Expose it in a standard format
2. Merge the resulting representations to create
a single model
3. Make queries on the whole
Queries that could not have been done on the
individual data sets
© Canopus Consulting
11
- 12. 24-06-2010
Steps In Information Interchange
23
© Canopus Consulting
Enabling Technologies
Semantic technologies (specifically RDF) treat
metadata as data and exchange both in exactly
the same way.
They provide a way for anyone to make a basic
24 statement about anything and a means of
layering these statements into a model.
© Canopus Consulting
12
- 13. 24-06-2010
Interchange Without APIs
If the information provider can now produce
an RDF stream of data and metadata, the
assimilation agent can combine streams from
all sources and treat them as if they are from
25 the same source
The only thing the agent needs to understand
the model of RDF, and not the individual
application APIs and models
This is the fundamental premise of
information interchange on the web 3.0.
© Canopus Consulting
First Expose the Data
As a set of relations
Using RDF Jai Ho
p:title http://.../#JaiHo
p:movie
Slumdog…
p:composer
26
http://.../#ARR
p:name
AR Rehman
Song Data
© Canopus Consulting
13
- 14. 24-06-2010
Second Merge Data from different Sources
p:title Same URI =
Same URI =
http://.../#JaiHo
Jai Ho Same
Same
p:movie Resource
Resource
Slumdog…
p:composer
27 http://.../#JaiHo
http://.../#ARR
p:sang
p:name http://.../#ARR
AR Rehman
http://.../#JaiHo
-ARR p:performed_at
p:music_director
Song Data http://.../#ARR
Performances Data
© Canopus Consulting
Merging Identical Resources
p:title http://.../#JaiHo
Jai Ho
p:movie
Slumdog…
p:composer
28 http://.../#JaiHo
http://.../#ARR
p:sang
p:name http://.../#ARR
AR Rehman
http://.../#JaiHo
-ARR p:performed_at
p:music_director
Song Data http://.../#ARR
Performances Data
© Canopus Consulting
14
- 15. 24-06-2010
Third Query the data
As if they are from the same source
Answer questions that were not possible from either source
alone
For example
What is the title of the song sung by AR Rehman at Fireflies Music
29 Festival
p:title http://.../#JaiHo
Jai Ho
p:movie p:sang
Slumdog… http://.../#FFM
p:composer
http://.../#JaiHo
-ARR p:performed_at
http://.../#ARR
p:music_director
p:name
AR Rehman http://.../#ARR
© Canopus Consulting
Fourth Adding Additional Knowledge
We can assert that the composer & music_director are same
p:composer sameAs p:music_director
Very handy for model transformations, language translations, de-
duplication etc.
Now we can ask queries not possible with either of the sources
“Composers” of songs played at Fireflies Music Festival
30
p:title http://.../#JaiHo
Jai Ho
p:movie p:sang
Slumdog… http://.../#ARR
p:composer
http://.../#JaiHo
-ARR p:performed_at
http://.../#ARR
p:music_director
p:name
AR Rehman http://.../#ARR
© Canopus Consulting
15
- 16. 24-06-2010
Semantic Web Technologies
Common Language
Reasoning & Inferences
Assimilation & Retrieval
RDF RDFS RDF DBs
OWL SPARQL
31
RDF RDFS OWL SPARQL
• Resource • RDF Schema - • Web Ontology • Protocol and
Description Provides basic Language - RDF query
Framework - elements for adds language
defines the semantics to
structure to description of the schema
triples. RDF
vocabularies
© Canopus Consulting
Expressing Knowledge
Explicit Knowledge – Semantic Models - Ontology
Formal representation of the knowledge by a set of concepts
within a domain and the relationships between those concepts
It is used to reason about the properties of that domain
And may be used to describe the domain
32
© Canopus Consulting
16
- 17. 24-06-2010
Mapping to Information Interchange
33
Source: W3C
© Canopus Consulting
Resource Descriptor Framework
34 WHAT IS RDF?
© Canopus Consulting
17
- 18. 24-06-2010
Paradigms of Information: Spec for RDF
AAA - Anyone can say Anything about Anything
Consistency is not a necessary condition
For example, source 1 – Yaara Sili Sili is a sad song
Source 2 – Yaara Sili Sili is a famous song
Community may provide music theory attributes to the song
35 To each their own
No common schema, yet the ability to make globally (valid) statements
For example, information about raagas can be combined with information
on film music
There is always one more
Open world assumption - facts can and are always incrementally added
For example, one source may classify Lalbagh as a tourist place
Another source may classify Lalbagh as a garden
Application understands that gardens are suitable for “morning walk”
© Canopus Consulting
Sample Data
Song name Raaga Sung by Duration
Nanu palimpa Mohanam Dr 42 mins
Balamuralikrishna
Varuga varugave Mohanam MS 8 mins
Subbulakshmi
Kallalo Kannulalo Kalyani Leela 6 mins
36
Pranati Pranati Kalyani S P Balu 16 mins
Piluvukara Hindolam Ghantasala 6.3 mins
Alugukara
© Canopus Consulting
18
- 19. 24-06-2010
Integrating Data from different sources
Source 1
Pranati Pranati Kalyani S P Balu 16 mins
Source 2
37 Kallalo Kannulalo Kalyani Leela 6 mins
Source 3 Varuga varugave Mohanam MS 8 mins
Subbulakshmi
Data distribution row by row – all participants must agree to the common schema
© Canopus Consulting
Interchange by Column
Source 2
Source 1 id Sung by
id Song name Raaga 1 Dr Balamuralikrishna
1 Nanu palimpa Mohanam 2 M S Subbulakshmi
2 Varuga varugave Mohanam 3 Leela
3 Kallalo Kannulalo Kalyani 4 S P Balu
38
4 Pranati Pranati Kalyani 5 Ghantasala
5 Piluvukara Hindolam
Alugukara
Distributing by column
Each participant must agree to the unique identifier
© Canopus Consulting
19
- 20. 24-06-2010
Interchange by Cell
Source 1
Sung by
Row 1 Dr Balamuralikrishna
Source 2
39 Song name Source 3
Row 1 Nanu palimpa Raaga
Row 1 Mohanam
Distributing by cell
Each participant must agree to row ID & column name
© Canopus Consulting
The Cell
Cell by cell division allows us to do just that
Row ID and column needs to be identified
40
This is what RDF is -> TRIPLES of Data
Subject –> Predicate -> Object
Subject and Predicate have unique identifiers, object can be a literal or identifier
© Canopus Consulting
20
- 21. 24-06-2010
RDF Expressions – Triples
Triples - Subject, Predicate, Object
Subject
Must be a Resource
Predicate
41 Must be a Resource
Object
Can be a Resource or a Literal
Resource
<subject> has a property <predicate>, whose value is <object>
A labelled connection between two resources
E.g. Song: Jai Ho has a property composer whose value is A R Rahman
E.g. MelakartaRaaga has a property NumberOfSwaras whose value is 7
Resource Resource Literal
© Canopus Consulting
Rules of RDF
Global Uniqueness
The RDF URI and names must be unique globally
Sentence Form
The order of knowledge representation in the sentence
42
should not change
So the autonomous agents can consume that metadata
Reuse
If a document refers to an existing resource, then it is
talking about that same global resource
© Canopus Consulting
21
- 22. 24-06-2010
Result
Anybody can say Every statement is an atomic RDF
Anything about Anything sentence
To each his own Every subject, object and predicate in RDF
is qualified
43
There is always one more RDF is a graph, no begin and no end,
An additional statement can always be
added to the graph
© Canopus Consulting
More rdf
RDF blank nodes and their usage
Identify an abstract concept – “there exists some”
My ideal friend
Id is always local to the document, need not be the
44 same
Reification
RDF collections
Bag – unordered collection
Seq – ordered collection
Alt – unordered set of equivalent alternatives
© Canopus Consulting
22
- 23. 24-06-2010
rdf:type
Rdf:type is a property that provides an
elementary typing system
song:yaara-seeli-seeli rdf:type
HindiSongs:FilmSong
45
Does not make any assumption that
HindiSongs:FilmSong is a class – it is a resource
Rdf does not have a definition for class
© Canopus Consulting
Bringing in the meaning
46 WHAT IS RDFS
© Canopus Consulting
23
- 24. 24-06-2010
Need for RDF schemas
First step towards the “extra knowledge”:
define the terms we can use
what restrictions apply
what extra relationships are there?
47
Officially: “RDF Vocabulary Description Language”
the term “Schema” is retained for historical reasons…
Source: W3C
© Canopus Consulting
Classes, Resources
RDFS defines resources and classes:
everything in RDF is a “resource”
“classes” are also resources, but they are also a
collection of possible resources (i.e., “individuals”)
48
“composer”, “mood” are classes
A R Rahman is an individual
Love is an individual
© Canopus Consulting
24
- 25. 24-06-2010
Classes, Resources (contd.)
Relationships are defined among classes and
resources:
“typing”: an individual belongs to a specific class
“«A R Rahman» is a composer”
49
to be more precise: “«http://.../#A R Rahman» is a
composer”
“subclassing”: all instances of one are also the
instances of the other (“every novel is a fiction”)
RDFS formalizes these notions in RDF
© Canopus Consulting
Classes, resources in RDF(S)
#artist
rdfs:subClassOf
http://perceptron.net/indianmusic#A R rdf:type
#composer
Rahman
50
RDFS defines the meaning of these terms
A resource may belong to several classes
rdf:type is just a property…
“«A R aRahman» is a composer and «composer» is an
«artist»”
The type information may be very important for applications
e.g., it may be used for a categorization of possible nodes
© Canopus Consulting
25
- 26. 24-06-2010
Inferred Properties
#artist
rdfs:subClassOf
http://perceptron.net/indianmusic#A R rdf:type
#composer
Rahman
51
is not in the original RDF data
Can be inferred from the RDFS rules
RDFS environments return that triple, too
© Canopus Consulting
Classes and Sub-Classes
Challenge:
We have instance data
about SuddaSwaras and <owl:Thing rdf:about="#ga">
<rdf:type rdf:resource ="#SuddaSwara"/>
VikritSwaras. </owl:Thing>
We however often want <owl:Thing rdf:about="#ra">
52
<rdf:type rdf:resource ="#SuddaSwara"/>
to use Swaras to mean </owl:Thing>
both – for example, to
<owl:Thing rdf:about="#ni">
say that some Raaga has <rdf:type rdf:resource ="#VikritSwara"/>
</owl:Thing>
7 Swaras
How do we state this?
© Canopus Consulting
26
- 27. 24-06-2010
Classes and Sub classes
<rdfs:Class rdf:ID=“SuddaSwara">
<rdfs:subClassOf rdf:resource="#Swara"/>
</rdfs:Class>
Class1 rdfs:subClassOf Class2
53
Class1 is a specialization of Class2, membership in
Class1 implies membership in Class2, properties of
Class2 are inherited by Class1
A class can be subClass of multiple classes
Class 1 subClassOf Class 2
Class 1 subClassOf Class 3
Instances are specified using rdf:type
© Canopus Consulting
Inference – formal rules
The RDF Semantics document has a list of (33)
entailment rules:
“if such and such triples are in the graph, add this
and this”
54 do that recursively until the graph does not change
© Canopus Consulting
27
- 28. 24-06-2010
Properties and Sub Properties
Properties and classes are defined separately
from each other
Property is not owned by any class
Range and domain of properties can be specified
55 What type of resources serve as object and subject
Sub Property
Rdfs:subPropertyOf is used to define one property
as a sub-property of another
The sub-property inherits the domain and range
definitions of the property
© Canopus Consulting
What does it look like
56
Data
Metadata
Inferred Assertions
© Canopus Consulting
28
- 29. 24-06-2010
Properties , domains and ranges
It is still rdf:Property not RDFS:Property, however RDFS adds the notion
of a domain and range to an rdf:Property
<rdf:Property rdf:ID=“directed_by">
<rdfs:domain rdf:resource=“#MusicDirector"/>
<rdfs:range rdf:resource="#FilmSongs"/>
57 </rdf:Property>
Domain – what resources does the property apply to
Range – what are the possible values
If there are 2 domain or 2 range statements, it means both must be true
Range can indicate:
Rdf:resource => either a resource or a literal
Rdfs:datatype
© Canopus Consulting
Inference based on Domains and Ranges
Challenge – data typing based on use
We have statements about songs and who composed them such as
Endaro Mahanubhavulu isComposedBy Thyagaraja
But we do not have a direct statement that says Thyagaraja, or any
one else is an instance of a class called Composers.
58
Suppose we have to list all the composers in our model, what do we
do?
This is a very common pattern in transformation rules
© Canopus Consulting
29
- 30. 24-06-2010
Inference based on Domains and Ranges
Answer:
Define the range for isComposedBy to be of class Composers
The RDFS inference will automatically deduce that anything
that is specified as the object of isComposedBy is an instance
of class Composer
59
Metadata
Inferred Assertion
Data
<owl:ObjectProperty rdf:about="#isComposedBy">
<rdfs:range rdf:resource="#Composer"/>
</owl:ObjectProperty>
© Canopus Consulting
Reinforcing the notion of “Schema”
Domains and ranges are not used for
validation - but instead are used to determine
new information based on old information
Does this surprise you?
60
© Canopus Consulting
30
- 31. 24-06-2010
61 WHAT IS OWL
© Canopus Consulting
Ontologies
RDFS is useful, but does not solve all possible
requirements
Complex applications may want more
possibilities:
62
characterization of properties
identification of objects with different URIs
Disjoint-ness or equivalence of classes
construct classes, not only name them
can a program reason about some terms? E.g.:
“if «Person» resources «A» and «B» have the same
«email» property, then «A» and «B» are identical”
© Canopus Consulting
31
- 32. 24-06-2010
Ontologies (Cont.)
The term ontologies is used in this respect:
“defines the concepts and relationships used
to describe and represent an area of
knowledge”
63 RDFS can be considered as a simple ontology
language
Languages should be a compromise between
rich semantics for meaningful applications
feasibility, implementability
© Canopus Consulting
OWL - Web Ontology Language
OWL is an extra layer, a bit like RDF Schemas
own namespace, own terms
it relies on RDF Schemas
It is a separate recommendation
64
© Canopus Consulting
32
- 33. 24-06-2010
OWL Overview
OWL is a large set of additional terms
For classes:
owl:equivalentClass: two classes have the same individuals
EXAMPLE – A:MISIC_DIRECTOR and B:COMPOSER
owl:disjointWith: no individuals in common
For properties:
65
owl:equivalentProperty
EXAMPLE – A:VOCALS_BY and B:SINGER
owl:propertyDisjointWith
For individuals:
owl:sameAs: two URIs refer to the same concept
(“individual”)
owl:differentFrom: negation of owl:sameAs
© Canopus Consulting
Classes in OWL
In RDFS, you can subclass existing classes…
that’s all
In OWL, you can construct classes from
existing ones:
66
enumerate its content
through intersection, union, complement
© Canopus Consulting
33
- 34. 24-06-2010
OWL: Class
OWL:Class is a subset of RDFS:Class
More expressiveness – restrictions, set operations …
Owl:Thing
Every resource that is an instance of a class is
automatically a member of OWL:Thing
67
OWL:Nothing -> the empty class, most specialized
Separation of classes and instances
Though the language does not mandate it
Classes contents can be enumerated
The classes consists of exactly of those individuals
Union of classes can be defined
Other possibilities: complementOf, intersectionOf
© Canopus Consulting
OWL class definition
<owl:Class rdf:about="#Composer">
<rdfs:subClassOf rdf:resource="#Person"/>
</owl:Class>
68
<owl:Class rdf:about="#Composition">
<rdfs:subClassOf rdf:resource="&owl;Thing"/>
</owl:Class>
© Canopus Consulting
34
- 35. 24-06-2010
Equivalence in OWL
Equivalent Class
If Class A equivalentClassOf Class B
=> they share the same members
If x belongs to Class A then x also belongs to Class B and vice
versa
69
Is equivalent to saying: A (Model) Design Pattern:
A rdfs:subClassOf B How to implement
equivalence, when limited to
B rdfs:subClassOf A RDFS vocabulary
Equivalent properties
If propertyA equivalentPropertyOf propertyB
If propertyA applies between resources X and Y, then propertyB
also applies
© Canopus Consulting
OWL - exhaustive
The combination of class constructions with
various restrictions is extremely powerful
What we have so far follows same logic as before
extend the basic RDF and RDFS possibilities with new
70
features
define their semantics, ie, what they “mean” in terms of
relationships
expect to infer new relationships based on those
However, a full inference procedure is hard
not implementable with simple rule engines, for example
© Canopus Consulting
35
- 36. 24-06-2010
Properties
owl:ObjectProperty is used to connect a resource to
another resource
owl:DatatypePropery is used to connect a resource to an
rdfs:Literal (untyped) or an XML schema built-in data
type (typed) value
71
Both Can have sub-properties
<owl:ObjectProperty rdf:about="#isComposedBy">
<rdfs:range rdf:resource="#Composer"/>
</owl:ObjectProperty>
<owl:DatatypeProperty rdf:about="#hasName">
<rdfs:range rdf:resource="&xsd;string"/>
</owl:DatatypeProperty>
© Canopus Consulting
More on properties
Property can be the inverse of another
Has child – has parent
Usually seen in “containment” type
Inverse Property
associations
Property can be symmetric ApB => BpA
72
Knows, hasSpouse
Usually abstract forms of more specialized
properties,
Often you will find that most symmetric
properties are super-properties to others
Property can be asymmetric
hasMother, greaterThan, lesserThan
Property can be transitive. Transitive & Symmetric
isPartOf, contains
Generally seen between entities of similar type
© Canopus Consulting
36
- 37. 24-06-2010
More on properties
Challenge
We have the following statements:
Raaga A isJanyaRaagaOf Raaga B
Raaga C isJanyaRaagaOf Raaga D
73
Raaga E isMelakarthaDerivative of Raaga A
We get additional information that there is
something called JanakaRaaga such that if A is
JanyaRaaga of B then B is JanakaRaaga of A
I want to:
Get all JanakaRaaga of A
Get all Raagas related to A
© Canopus Consulting
More on properties
Answer
isJanyaRaaga isInverseOf isJanakaRaaga
isJanyaRaaga, isJanakaRaga, isMelakartaDerivative of
are sub-properties of raagaRelations
74
Entailed by OWL itself :
inverseOf is inverse of itself
In other words,
inverseOf is symmetric
It is not uncommon to find this pattern in model transformations
© Canopus Consulting
37
- 38. 24-06-2010
Qualifying property membership
Film songs are those songs that have appeared
in films
<owl:Class rdf:ID=“FilmSongs">
75 <rdfs:subClassOf rdf:resource="#Songs"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#appearedIn"/>
<owl:someValuesFrom rdf:resource="#Films"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
© Canopus Consulting
Qualifying property membership
owl:allValuesFrom
If the property is used, then all values for the
property must belong to the class
owl:someValuesFrom
76
If the property is used, at least one of the values
must belong to the class
owl:hasValue
Of all the values a class has for a particular
property, at least one must be this specific value
© Canopus Consulting
38
- 39. 24-06-2010
Owl:cardinality
Likewise cardinality, and max cardinality
<owl:Class rdf:ID=“FilmSongs">
<rdfs:subClassOf rdf:resource="#Songs"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#appearedIn"/>
77
<owl:minCardinality rdf:datatype=“XMLSchema#nonNegativeInteger">
1
</owl:minCardinality>
</rdfs:subClassOf>
</owl:Class>
© Canopus Consulting
SPARQL Protocol And RDF Query Language
78 QUERYING RDF
© Canopus Consulting
39
- 40. 24-06-2010
Querying RDF
A collection of rdf statements is a graph
No root - no start no finish
What is returned to a query is a graph
a relational query forms a new table by combining
79
existing tables, an rdf query returns a new graph by
combining information from graphs from multiple
sources
© Canopus Consulting
Queries and their responses
Querying an RDF store such as the one we
have just built will return exactly what is
asserted
Example:
80
Raaga:Sreeragam rdf:type Raaga:Ghana
Raaga:Ghana rdfs:SubClassOf Raaga:CarnaticRaaga
A query such as ?x rdf:type Raaga:CarnaticRaaga
Will return nothing !!
© Canopus Consulting
40
- 41. 24-06-2010
Retrieval models from RDF
Semantics
Semantic Query
Meaning of data & relationships (e.g SPARQL)
Internally represented as
81
Structure Graph Traversal
(e.g. Squish)
A graph of triples connected to each other
Serialized using
Syntax Syntactic Query
(e.g XPath/Xquery)
Formats such as XML/RDF or Turtle
© Canopus Consulting
Syntactic Level Query
<rdf:RDF
<Raaga rdf:about="#satmel">
<rdf:type rdf:resource="&owl;Thing"/>
<hasSwara rdf:resource="#ni"/>
<hasSwara rdf:resource="#nu"/>
<hasSwara rdf:resource="#pa"/>
<hasSwara rdf:resource="#ra"/>
<hasSwara rdf:resource="#sa"/>
</Raaga>
<owl:Thing rdf:about="#shuddhatodi">
82
<rdf:type rdf:resource="#Raaga"/>
<isMelakartaDerivative rdf:resource="#hanumatodi"/>
</owl:Thing>
for $r in document(music.owl)
where $r/Raaga/@rdf:about = ‘#satmel’
Select a raga – (Xpath/XQuery) return $r/Raaga/hasSwara
/RDF/Raga/@rdf:about=“#satmel”
/RDF/Thing/@rdf:about=“#shuddhatodi”
Limitations of this approach
Can go out of hand very quickly
Does not understand the semantics of RDFS & OWL
Tied to the structure of RDF (which can be expressed in many ways)
© Canopus Consulting
41
- 42. 24-06-2010
Structural Query
Subject Predicate Object
perceptron:satmel rdf:type perceptron:Raaga
perceptron:shuddhatoodi rdf:type perceptron:MelakarthaRaaga
perceptron:satmel perceptron:hasSwara perceptron:sa
perceptron:MelakarthaRaaaga owl:subClassOf perceptron:Raaga
…
83
Possible queries
Select * from Triples where Object = “owl:Raaga” and Predicate = “rdf:type”
Select Predicate from Triples where Subject = “perceptron:satmel”
Limitations of this approach
Interprets any RDF model as just a set of Triples
Does not understand the semantics of RDFS & OWL
E.g. Looking for all raagas will fail here since shuddhatoodi is asserted to
be a MelakarthaRaaga, while it is a subClassOf Raaga is in the semantics of
the next triple
© Canopus Consulting
Querying at the Semantic Level
Need a new language that can understand the semantics of RDFS & OWL
Sample Queries
Select ?x from <perceptron.net> where ?x <rdf:type> Raaga
All MelakarthaRaagas will also be returned even though they are not explicitly
asserted to be a Raaga
84 Can draw inferences from the rules of RDFS and OWL
Either computing and storing the closure of a given model
Or Infer new statements as needed by the query on the fly
Not tied to how the data is stored or serialized
Special purpose query languages designed to facilitate this
RQL, SPARQL, TQL
SPARQL has emerged as the industry standard
© Canopus Consulting
42
- 43. 24-06-2010
Introducing SPARQL
Designed to query collections of triples…
…and to easily traverse relationships
SQL-like syntax (SELECT, WHERE)
Matches graph patterns
85
© Canopus Consulting
SPARQL – Key Characteristics
Graph pattern matching ability
Capability to restrict matches on a queried graph by providing a graph pattern
Which consists of one or more RDF triple patterns, to be satisfied in a query
Variable binding results
Returns zero or more bindings of variables.
Each set of bindings is one way that the query can be satisfied by the queried graph
86 Sub-graph results
It must be possible for query results to be returned as a sub-graph of the original
graph
Result limits
Possible to specify an upper bound on the number of query results returned
Streaming results
Possible for the client to request that results be streamed
WSDL support
The protocol – including its interfaces, their operations, results, and types are
described using WSDL
© Canopus Consulting
43
- 44. 24-06-2010
Sample Model Fragment
87
PREFIX perceptron: http://www.perceptronnetwork.com/ontologies/2010/...
SELECT ?raaga
WHERE {?raaga rdf:type perceptron:Raaga }
© Canopus Consulting
Let us take a closer look
PREFIX perceptron: http://www.perceptronnetwork.com/ontologies/2010/...
SELECT ?raaga
WHERE {?raaga rdf:type perceptron:Raaga }
PREFIX
Defines an alias for the namespace
88
SELECT
Select query
FROM
Optional clause, specifies the URI of the model
Variables
Marked by either ? or $
WHERE
Usually the most significant part of a SPARQL query
Each triple is one condition (filter on the graph) (expressed using Turtle Syntax)
© Canopus Consulting
44
- 45. 24-06-2010
SPARQL Select
Challenge
Number of Raagas is huge, paginate the results
PREFIX perceptron: <someURI>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s FROM <rmi://localhost/server1#perceptron>
89 WHERE {?s rdf:type perceptron:Raaga}
LIMIT 2 OFFSET 2
ORDER BY DESC(?s)
DISTINCT – remove duplicate results
LIMIT n – Limit the returned values to n rows
OFFSET n – Offset the first n nodes
Limit and Offset together can be used to paginate the results
ORDER BY – To sort the results, can be ASC, DESC
© Canopus Consulting
Example of a Complex Query
Challenge
Search for the top 10 songs (by popularity) in my favorite Raaga
SELECT ?rendition
FROM <rmi://localhost/server1#perceptron>
FROM NAMED <my-preferences.rdf>
FROM NAMED <popularity.rdf>
90
WHERE {
GRAPH <my-preferences.rdf> {
?me dc:name “My Name" .
?me prefs:favouriteRaga ?fav_raga .
} .
?composition perceptron:hasRaaga ?fav_raga .
?rendition perceptron:hasComposition ?composition .
GRAPH <my-preferences.rdf> {
?rendition pop:popularity ?popularity
}
}
ORDER BY DESC[?popularity] LIMIT 10
© Canopus Consulting
45
- 46. 24-06-2010
How this Query Would Work
My-favourites.rdf
perceptron.owl
“My Name”
dc:name Mohanam ?fav_raga
Person:XXX
prefs:favouriteRaaga perceptron:hasRaaga
91
?me
Chandracharita
?composition
pop:popularity
1 perceptron:isRenditionOf
1
1
?popularity rend-
chandracharita-
popularity.rdf bmk ?rendition
© Canopus Consulting
Additional SPARQL constructs
INSERT
INSERT DATA { triples block }
DELETE
DELETE DATA { triples block }
OPTIONAL
Used to define an optional condition
92
UNION
For alternate queries
FILTER
Value based filters
ASK
Returns true or false depending upon the condition
DESCRIBE
Returns the node graph for the given condition
© Canopus Consulting
46
- 47. 24-06-2010
Technologies & Tools
93 SEMANTIC WEB INFRASTRUCTURE
© Canopus Consulting
Objective
How do we build an application?
How do we build the ontology?
What are the key architecture components?
What are the tools & technologies to use?
How do I choose which technology to use?
94
© Canopus Consulting
47
- 48. 24-06-2010
Semantic Web Application Lifecycle
Information Modelling
Build Ontology (model level representation)
Information Assimilation
Populate Knowledgebase from various sources
95
Including current applications
Automatic Semantic Annotation of existing data
Any type of document, multiple sources of documents
Information Retrieval
Applications: search, integrate/portal, summarize/
explain, analyse, decisions support
Reasoning techniques: graph analysis, inferencing
© Canopus Consulting
Semantic Web Application Lifecycle
Ontology Editors:
Protégé, TopBraid
Composer
96 Build Information Model
Semantic Query
Server
Refine/Evolve
Information Model
Create Assimilation Models
& Aggregate knowledge
RDF Stores:
Mulgara, Sesame Technologies:
GRDDL, RDFizers,
Programming: Jena OWLs, Automatic
Annotation
Retrieve and Use Semantic Data
© Canopus Consulting
48
- 49. 24-06-2010
Information Modeling
Information Model Consists of
Description Component – Schema
Designed by domain experts, community
Description Base – Assertions, Extensions
Automated agents who assimilate the information
97
The model is an evolutionary process
Start with a concept map or a taxonomy
Leading up to a formal ontology
It evolves over lifetime of the application
Tools & Technologies
Popular modeling tools – Protégé, TopBraid Composer, GrOWL
Concept Mapping – CMAP Tools, CMAP Tools COE
http://www.xml.com/2002/11/06/Ontology_Editor_Survey.html
© Canopus Consulting
RDF Stores, Triple Stores
98 PERSISTING RDF DATA
© Canopus Consulting
49
- 50. 24-06-2010
Storage Models for RDF
Hand-crafted SQL,
SQL Storage ORM
Direct SQL access to RDF data
Queries become complex as discussed earlier
Can’t completely deal with the semantics of data
99 SQL Bridge Tools like – SDB,
Squirrel RDF
Access via SPARQL interface
Adaptor transforms SPARQL query into SQL
Data stored in relational schema
Native RDF Native non-relational
DB – such as
Access via SPARQL interface Mulgara
Data stored in native RDF databases
© Canopus Consulting
Storage Models for RDF
SQL Storage SQL Bridge Native RDF
Application Application Application
10
0
SPARQL SPARQL
SQL Queries
Queries Queries
Adaptor
Relational Relational Relational Native RDF
Schema Triples Triples Graph
© Canopus Consulting
50
- 51. 24-06-2010
Storage Models for RDF
SQL Storage SQL Bridge Native RDF
Application Application Application
Traditional RDF
approach Databases
10
1
SPARQL SPARQL
SQL Queries
Queries Queries
Adaptor
Relational Relational Relational Native RDF
Schema Triples Triples Graph
© Canopus Consulting
Storing RDF Data – Relational Model
Relational Schema
Data model consisting of well defined tables, columns and their meaning
To provide flexibility, we would have to use techniques that dynamically
update the schema
Still makes each row alike – whereas the fundamental premise of RDF is
that each row is potentially unique
10
2
© Canopus Consulting
51
- 52. 24-06-2010
Storing RDF Data – Triple forms
Storing RDF predicates in a relational database
Use the attributes as “extended properties”
Simple triple forms (subject – predicate – object)
Predicate tables (one table per predicate)
Gives the requisite flexibility but makes indexing & retrieval difficult
10
3
© Canopus Consulting
Storing RDF Data –RDF Stores
RDF Stores
Provide a mechanism to store RDF data
Provide a mechanism to query it using languages such as SPARQL
Internally may or may not be based on relational databases
Also known as Graph Databases or Schema-less Databases
10
Although not all Graph databases support RDF
4
Jena SDB, TDB Oracle 11g RDF Database
Mulgara Semantic Store
OpenLink Virtuoso
AllegroGraph
OpenRDF Sesame
© Canopus Consulting
52
- 53. 24-06-2010
Programming APIs, Technology Stack
10
5
PROGRAMMING FOR THE SEMANTIC WEB
© Canopus Consulting
Let us Revisit SPARQL
SPARQL
SPARQL Protocol and RDF Query Language
It is both a Protocol and a Query Language
Protocol Definition
10
6
Defines a mechanism of invoking a query over the web
Equivalent to a web service definition
Defines one operation
Query - with the input, output and fault definitions
Defines the protocol bindings
HTTP (get/post)
SOAP (web service)
An implementation may provide ANY of the above two
© Canopus Consulting
53
- 54. 24-06-2010
Architecture Stack of Semantic Technologies
Application
HTTP SOAP
Programming API
Semantic Middleware
10 e.g. Semantic SOA SPARQL Processor Inference Engine
7
RDF-SQL
Adaptor
Relational
RDF Store
Store
Semantic Technology Stack
© Canopus Consulting
RDF Programming Stack - Jena
The most popular stack for Java is Jena (http://jena.sourceforge.net)
Jena also has an in-memory graph manipulation API
Jena APIs for RDF, RDFS, OWL is one of the most popular APIs
It uses a Graph based model and pluggable architecture
It allows plugging various storages, reasoners etc. to the API
10 Many RDF stores either have or are building support for Jena API
8
Jena uses a Graph model as the core
All reasoners also assert inferences into a Graph
Thus the output of a reasoner can be fed into another layer
It is a common pattern with Jena to do such layering
© Canopus Consulting
54
- 55. 24-06-2010
Components of Jena Stack
Joseki HTTP SOAP
Jena API
Programming API
ARQ
10 SPARQL Processor Inference Engine
9
RDF-SQL
Adaptor
TDB
SDB
(Abstraction)
Relational
RDF Store
Store
Semantic Technology Stack
© Canopus Consulting
Connecting to RDF Database – RDF Programming APIs
RDF programming APIs are available in almost all major languages
C, C++, C# and .Net, Haskell, Java, Javascript, Lisp,
Obj-C, PHP, Perl, Prolog, Python, Ruby, Tcl/Tk
There is no standard client API (yet!) - equivalent of JDBC
Each RDF package has its own set of APIs
11 Most often they follow similar paradigms of Connections, Factories, Sessions
0
The query syntax however is standardized
Thus queries are portable across RDF stores
At the same time, many packages have their own “dialects” of RDF query
languages
Mulgara is one of the most popular open source RDF databases
Has its own API to connect to the RDF Database
In addition to SPARQL, has its own dialect (called iTQL)
© Canopus Consulting
55
- 56. 24-06-2010
Connecting to Mulgara Database
URI SERVER_URI = URI.create("rmi://servername/instancename");
URI GRAPH_URI = URI.create("rmi://servername/instancename#model");
String query = "SELECT ?x WHERE { ?x <p:isDerivedFrom> <p:Kalyaani> }";
try {
// Creating a new connection from the factory using the server URI
ConnectionFactory factory = new ConnectionFactory();
Connection connection = factory.newConnection(SERVER_URI);
11 // Initialize the SPARQL interpreter with the graph URI
1 SparqlInterpreter interpreter = new SparqlInterpreter();
interpreter.setDefaultGraphUri(GRAPH_URI);
// Parse and execute the query on the connection
Query query = interpreter.parseQuery(queryStr);
Answer a = connection.execute(query);
// Use the results
RdfXmlEmitter.writeRdfXml((GraphAnswer) a, System.out); Sample Java
} code to connect
catch (Exception e) { to Mulgara
e.printStackTrace(); Database using
} Mulgara Client
finally { API
connection.dispose();
}
© Canopus Consulting
Adopting Existing SQL Databases
11
2
Source: Tim Berners-Lee
© Canopus Consulting
56
- 57. 24-06-2010
Semantic Web Technologies
11
3
Source: W3C
© Canopus Consulting
11
4
APPENDIX
© Canopus Consulting
57
- 58. 24-06-2010
RDF Triples
Resources can use any URI, e.g.:
http://www.example.org/file.xml#element(home)
http://www.example.org/file.html#home
http://www.example.org/file2.xml#xpath1(//q[@a=b])
11
5
URI-s can also denote non Web entities:
http://www.ivan-herman.net/me is me
not my home page, not my publication list, but me
RDF triples form a directed, labelled graph
© Canopus Consulting
RDF Primitives
Resources
Anything that can be uniquely identified
Using a URI
E.g. http://www.perceptron.net/indianmusic#A R Rahman
11
Namespace Fragment ID
6
Properties
Are resources in themselves
Literals
RDF is a
Strings general
model for
With optional data types triples
© Canopus Consulting
58
- 59. 24-06-2010
OWL Full
No constraints on any of the constructs
owl:Class is just syntactic sugar for rdfs:Class
owl:Thing is equivalent to rdfs:Resource
this means that:
11
7 Class can also be an individual, a URI can denote a
property as well as a Class
e.g., it is possible to talk about class of classes, apply properties
on them
Extension of RDFS in all respects
But: no system may exist that infers
everything one might expect
© Canopus Consulting
OWL Full usage
Nevertheless OWL Full is essential
it gives a generic framework to express many things
some application just need to express and interchange
terms
11
8 Applications may control what terms are used
and how
in fact, they may define their own sub-language via,
eg, a vocabulary
thereby ensuring a manageable inference procedure
© Canopus Consulting
59