Finding Commonalities: from Description Logics to the Web of Data

Finding Commonalities in Linked Open Data
Silvia Giannini
PhD Student
(Supervisor: Prof. Eugenio Di Sciascio)
Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI),
Politecnico di Bari, Bari, Italy
in collaboration with
Prof. Francesco M. Donini, Ph.D. Simona Colucci
Web&Media Group Meeting | 31 March, 2014

Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Outline
1 Finding Commonalities: A DLs use case
The I.M.P.A.K.T. system
The Core Competence module
2 Finding Commonalities: the Web of Data
3 Conclusion
Silvia Giannini Finding commonalities in Linked Open Data

What is I.M.P.A.K.T.
Information Management and Processing with the Aid of
Knowledge-based Technologies
An integrated system managing three enterprise business services based on
knowledge management:
1 Skill Matching 1
2 Team Composition 2
3 Core Competence Extraction 3
1
E. Tinelli, S. Colucci, S. Giannini, E. Di Sciascio, and F.M. Donini, Large scale skill matching
through knowledge compilation In: Proc. of ISMIS 2012, Springer-Verlag (2012) 192201.
2
E. Tinelli, S. Colucci, E. Di Sciascio, and F.M. Donini, Knowledge compilation for automated team
composition exploiting standard SQL In: Proc. of SAC 2012, ACM (2012) 16801685.
3
S. Colucci, E. Tinelli, S. Giannini, E. Di Sciascio, and F.M. Donini, Knowledge Compilation for Core
Competence Extraction in Organizations In: Proc. of Business Information Systems 2013, Springer
(2013) 163174.

What is I.M.P.A.K.T.
Skill Matching GUI

Behind I.M.P.A.K.T.
An ontology for the HR domain (nearly 5000 concepts)
T -Box
Employee Profile
(M0
)
Industry
(M1
)
Complementary
Skill
(M2
)
Level
(M3
)
Language
(M5
)
Job
Title
(M6
)
Knowledge
(M4
)
Main module M0: it models the properties (entry points) needed to
imports all the sections describing an employee CV.

Behind I.M.P.A.K.T.
An ontology for the HR domain (nearly 5000 concepts)
T -Box
Employee Profile
(M0
)
Industry
(M1
)
Complementary
Skill
(M2
)
Level
(M3
)
Language
(M5
)
Job
Title
(M6
)
Knowledge
(M4
)
Possible employee skills and technical tools usage ability.
Specied through:
type - experience role (e.g., developer, administrator)
year - experience level
lastdate - last temporal update of work experience

Behind I.M.P.A.K.T.
A Curriculum Vitae representation
A-Box
A prole P = (∃R0
j .C) is a concept in ALE(D), where R0
j , 1 ≤ j ≤ 6, is
an entry point, and C is a concept in FL0(D) modeled in Mj.

What is a Core Competence
Core Competence: a Knowledge Management process
Core competencies are a company collective knowledge about
how to coordinate diverse production skills and integrate multiple
streams of technologies. Identifying core comptencies helps in support
competitive advantage, articulate a strategic intent, and allocate
resources to build cross-unit technological and production links.
(G. Hamel, and C.K.A. Prahalad, The core competence of the corporation. Harvard Business, in Harvard
Business Review May-June (1990) 7990)
Examples:
Apple - design
Netix - content delivery
Google - expertise in algorithms
...

The reasoning service
Objective: Automatically extract Core Competence, by identifying a common
know-how in a signicant portion of personnel (k employees, with k set as a
threshold value by the people in charge for the strategic analysis).
Tool:
Logic-based approach
Non-standard inference services (LCS, k-CS, BICS)
Method:
Knowledge-compilation process
It solves subsumption only via SQL queries against a proper R-DB schema,
without any exponential-time inference engine

A logic-based approach
Least Common Subsumer (LCS)
Let C1, . . . , Cn be a collection of n
concepts in a DL L. The Least
Common Subsumer (LCS) of
C1, . . . , Cn is a concept D in L such
that D is the most specic concept
subsuming all the elements of the
collection.
k-Common Subsumer (k-CS)
Let C1, . . . , Cn be a collection of n
concepts in a DL L and let k n. A
k-Common Subsumer (k-CS) of
C1, . . . , Cn is a concept D in L such
that D is an LCS of k concepts among
C1, . . . , Cn.
Informative k-Common Subsumer
(IkCS)
Given k n, an Informative
k-Common Subsumer (IkCS) of the
concepts C1, . . . , Cn in a DL L is a
concept D such that D is a k-CS
stricltly subsumed by the
LCS(C1, . . . , Cn) and adding
informative content to it.
Best Informative Common Subsumer
(BICS)
Given k n, a Best Informative
Common Subsumer (BICS) of the
concepts C1, . . . , Cn in a DL L is a
concept B such that B is an IkCS for
C1, . . . , Cn, and for every k j ≤ n
every j-CS is not informative.

The Knowledge Compilation process
Issues:
Computational diculties of deduction in knowledge bases expressed
through a logical formalism;
Combining the representation power of a logical language, with the
scalability and eciency of information processing in a DBMS.
Knowledge Compilation:
1 OFF-LINE REASONING
pre-processing of a company intellectual capital, described in a Description
Logics (DLs) Knowledge Base (KB), in an appropriate relational database
schema.
2 ON-LINE REASONING
querying of the data structure coming out from the rst phase through
standard SQL-queries for ecient Core Competence Extraction.

CV translation

OFF-LINE REASONING: Relational schema design rules
T -Box informative content
Table CONCEPT: it stores CCNF of all the FL0(D) concepts (part (a))

T -Box informative content
A table is created for each entry point R0
j , j 0 (part (b))

A-Box informative content
Each atom of CCNF(C) of a conjunct ∃R0
j .C is stored in a dierent tuple
of table Rj with the same groupID (part (b))

A-Box informative content
Table PROFILE includes proleID and extra-ontological structured
information (e.g., personal data, work-related information) (part (b))

ON-LINE REASONING: The Core Competence Extraction Algorithm
1 Proles Subsumers Matrix computation
Idea: Extract the common know-how, expressed in form of atomic
information, shared by the same group of employees, with cardinality
greater or equal to k.
Example
Mario Rossi: Cplusplus (5 years), Java (5 years), Visual Basic (5 years)
Daniela Bianchi: Cplusplus (2 years), Java (6 years), Visual Basic (1 years)
Elena Pomarico: CplusPlus, Java, Visual Basic
Carmelo Piccolo: VBScript, Process Performance Monitoring
Lucio Battista: DBMS (2 years)
Mariangela Porro: DBMS (2 years), Internet Technologies (2 years)
Nicola Marco: DBMS (5 years), Internet Technologies (5 years)
Domenico De Palo: OOprogramming (6 years), Articial intelligence (4 years), Internet technologies (4
years)

The Core Competence Extraction Algorithm
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix

D1 ∃hasKnowledge.ComputerScienceSkill
D2 ∃hasKnowledge.(ComputerScienceSkill =2 years)
D3 ∃hasKnowledge.ProgrammingLanguage
D4 ∃hasKnowledge.OOP
D5 ∃hasKnowledge.(ComputerScienceSkill =5 years)
D6 ∃hasKnowledge.(DBMS =2 years)
D7 ∃hasKnowledge.(OOP =5 years)
D8 ∃hasKnowledge.(InternetTechnologies =2 years)
D9 ∃hasKnowledge.C++
D10 ∃hasKnowledge.VisualBasic
D11 ∃hasKnowledge.Java
...
Table: Description of D1, . . . , D11 reported in the previous Table

2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
LCS = ∃hasKnowledge.ComputerScienceSkill

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
BICS = ∃hasKnowledge.ComputerScienceSkill =5 years

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
ICS3 = ∃hasKnowledge.(DBMS =2 years)

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
ICS3 = ∃hasKnowledge.(OOP =5 years)

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
ICS3 = ∃hasKnowledge.(InternetTechnologies =2 years)

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
ICS3 = ∃hasKnowledge.(C++ VisualBasic Java)

Core Competence module GUI

Lessons learned
Proposal: Knowledge Compilation approach for Core Competence Extraction.
+ It improves performances in terms of execution times, w.r.t. classical
logic-based approach.
+ It adopts standard SQL-queries to compute the same informative content
as advanced inference services.
+ It makes the computational costs of the process aordable also for large
organizations, while retaining the full expressiveness of the logic-based
approaches.
Notes on Performance:
The number of proles is highly relevant in the common subsumers
enumeration process.
The most computationally expensive process is the prole subsumers
matrix creation, under a threshold of proles concept components.

Outline
Common Subsumer in RDF
RDF Clustering
3 Conclusion

Motivation
Learning from the Web of Data:
huge amount of interconnected and machine-understandable data
data modeled as RDF resources
dataset addressed as Linked (Open) Data (LOD).
Facts to learn
identication of subsets of resources related to a common informative
content
- Cluster search (approximate matching)
- Disambiguation
- Personalization

Problem Denition
In analogy to the LCS service, proposed in DLs to learn from examples.
Adaptation to the Web of Data:
giving up to the subsumption minimality requirement: even rough
Common Subsumers are useful for learning in the Web of Data
denition of Common Subsumer of pairs of RDF resources
Denition (Rooted Graph (r-graph))
Let TWr be the set of all triples with subject r in the Web. A Rooted Graph
(r-graph) is a pair r, Tr , where
1 r is either the URI of an RDF resource, or a blank node
2 Tr = {t | t = r p c} is a subset of relevant triples in TWr

Example: A Possible Representation for resources a and b

Example: A(nother) Possible Representation for resources a and b

Common Subsumer
Denition (Common Subsumer)
Let a, Ta , b, Tb be two r-graphs and x, w, y be blank nodes.
If a, Ta = b, Tb , then a, Ta is a Common Subsumer of a, Ta , b, Tb .
if Ta = ∅ or Tb = ∅, the pair x, ∅ is a Common Subsumer of a, Ta ,
b, Tb
Otherwise, a pair x, T is a Common Subsumer of a, Ta , b, Tb i:
∃t = x w y such that (T entails t)
⇒ (1)
∃t1 = a p c, t2 = b q d such that
(T entails t1) ∧ (T entails t2)
where Ta ⊆ T, Tb ⊆ T and w, T is a Common Subsumer of p, Tp and
q, Tq , and y, T is a Common Subsumer of c, Tc and d, Td .
Note: We consider only simple entailment

Example: a Common Subsumer of a and b

Example: a Common Subsumer of a and b
Note: Triples with a blank node in predicate and object positions are discarded

Example: a(nother) Common Subsumer of a and b

Example: a(nother) Common Subsumer of a and b
Note: Triples with a blank node in predicate and object positions are discarded

Solving Algorithm
Main Features:
anytime: if interrupted, it always returns a Common Subsumer of the
input pair of RDF resources
modular: it takes as input a function computing the sets of triples relevant
for the input RDF resources
Our current criterion for triples selection:
triples within a given graph distance from the input resource
triples having properties within to a selected set of signicant properties
for the dataset/application of interest
Output: A Common Subsumer of two r-graphs a, Ta and b, Tb :
a pair made up by a resource (anonymous or not) and a set of triples
stating facts about such a resource which are true for both a and b.
Alternative cases:
_ : cs, T : a blank node _ : cs together with a set of triples related to
_ : cs.
a, Ta , i and a, Ta = b, Tb
_ : cs, ∅ if either Ta = ∅ or Tb = ∅

RDF Clustering
Target Semantic Web Task
Clustering of Web resources with a CS
retrieving resources conveying the same information
in their dierent RDF descriptions
CS description → SPARQL queries:
WHERE { Tcs [blank nodes → variables] }

RDF Clustering
Clustering with a CS: A use case
The Italian Chamber of Deputies LOD
Public SPARQL endpoint (http://dati.camera.it/sparql)
Running example: Find the commonalities between deputies Nilde Iotti
and Tina Anselmi in the 10th Legislature

RDF Clustering
The Italian Chamber of Deputies LOD
Public SPARQL endpoint (http://dati.camera.it/sparql)
SELECT DISTINCT ?x0
WHERE{
?x0 a http://dati.camera.it/ocd/deputato .
?x0 http:xmlns.comfoaf0.1gender female .
?x0 http://dati.camera.it/ocd/rif_mandatoCamera ?x1 .
. . .
}

RDF Clustering
1st Legislature clusters

Outline
3 Conclusion

Conclusion
Motivation: learning shared informative content in collections of RDF
resources
Problem Denition: search for Common Subsumers not subsumption
minimal in order to ensure computability in the Web of Data, too large to
be explored
Results:
An anytime algorithm computing Common Subsumers of pairs of RDF
resources:
allowing for using partial learned informative content for further processing,
whenever the search for Common Subsumers is interrupted
possibly supporting the clustering of collections of RDF resources, by
exploiting associativity of Common Subsumers.
Future works:
Extension of CS denition to other entailment regimes
Investigation on methods for selection of relevant triples
Automated link traversal techniques for more dataset exploration
Application to data quality problems (e.g.,missing values)

Finding Commonalities: from Description Logics to the Web of Data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Finding Commonalities: from Description Logics to the Web of Data

Similar to Finding Commonalities: from Description Logics to the Web of Data (20)

Recently uploaded

Recently uploaded (20)

Finding Commonalities: from Description Logics to the Web of Data