Semantic indexing modelling of resources in
personal and collective memories based on a P2P
approach
Cristian LAI

Thesis Director: Claude MOULIN

PhD Defense – July 6, 2011

1 / 51
Outline

G

Motivation

G

Semantic Indexing

G

Indexing Patterns

G

Community of Users

G

Conclusion

G

Future Work

PhD Defense – July 6, 2011

2 / 51
Motivation
Context

G

Loose community of users
H
H

G
G

Private and shared resources.
Experts and general users.

Resources of different types.
Any nature of community
H

Possible focus on community of
teachers and students.

PhD Defense – July 6, 2011

3 / 51
Motivation
Issues

G

How to manage publication and retrieval contexts?
H

G

How to transform a user understandable description to machine
understandable one?
H

G

How to match the description made during the retrieval context with a
description made during the publication context?

How to create a formal description from the user input?

How to make possible the life of a decentralized community?
H
H

How to manage a certain level of communication among members?
How to manage elements allowing the indexing of resources?

PhD Defense – July 6, 2011

4 / 51
Motivation
Contribution

G

How to manage publication and retrieval contexts?
H
H

G

How to transform a user understandable description to machine
understandable one?
H

G

Description extension.
The description is enlarged during publication to foresee different possible
retrieval situations.

Model of Indexing Patterns.

How to make possible the life of a decentralized community?
H

H

A distributed Semantic Wiki of the community and a distributed system of
Notes.
A set of Core resources is managed for allowing the indexing.

PhD Defense – July 6, 2011

5 / 51
Outline

G

Motivation

G

Semantic Indexing

G

Indexing Patterns

G

Community of Users

G

Conclusion

G

Future Work

PhD Defense – July 6, 2011

6 / 51
Semantic Indexing
A P2P system

PhD Defense – July 6, 2011

7 / 51
Semantic Indexing
Indexing

G

P2P networks require the Boolean indexing

G

Indexing is the process of creating or updating an index

H

H

G

Given a list of resources it is necessary to create their proper descriptions
different from the only title.

The only title of a resource does not give a meaning universally known
H

G

We choose to adopt the same Boolean indexing also for the personal memory.

In traditional filesharing systems it is usually used the title or a set of keywords
to identify a resource.

We consider semantic descriptions built manually by users.

PhD Defense – July 6, 2011

8 / 51
Semantic Indexing
Description and query

G
G

A description is supplied during publication
In a Boolean Index, for retrieving a resource it is required the same
description
H

G

Exact matching between descriptions.

A query is supplied during retrieval
H

It is equivalent to a description of a potential set of resources.

PhD Defense – July 6, 2011

9 / 51
Semantic Indexing
Ontologies and Knowledge bases

G

Expert community members are able to
H
H

G
G

Find the proper ontologies.
Build a population of an ontology grouping the most prominent individuals of
the domain: knowledge base.

Ontologies and knowledge bases are available within the community.
Open description
H
H

Add keywords to the description.
Guidelines for preventing typing ambiguities.

PhD Defense – July 6, 2011

10 / 51
Semantic Indexing
Semantic description

PhD Defense – July 6, 2011

11 / 51
Semantic Indexing
Types of Descriptions

G

Resource type
H

H
G

Address resources giving elements of description concerning the resource
itself and not its content.
Document written by Chomsky.

Content type
H
H

Address resources giving elements of description that concern their content.
Document about Chomsky.

PhD Defense – July 6, 2011

12 / 51
Semantic Indexing
Resource Type

G
G

Document written by Chomsky.
An ontology of domain is necessary. It should contain:
H
H
H

G

G

A concept that can represent the resource.
A concept that can represent an author.
A relation that binds the document to the author.

The resource is considered as an instance of the concept that represents
the resource itself.
The system must show the concepts of the ontology that can represent the
resource: Entry Point
H

The ontology provider has to declare what are the Entry Points.

PhD Defense – July 6, 2011

13 / 51
Semantic Indexing
Content Type

G
G
G

Document about Chomsky
An ontology that represents the resource and its content is necessary.
We have defined the System Ontology that contains
H
H

G

The concept system:Document that represents the resource.
The property system:hasInterest that paraphrases about.

It is necessary to have an ontology of domain for extracting the concept
describing the content
H

It is necessary to represent the author Chomsky.

PhD Defense – July 6, 2011

14 / 51
Semantic Indexing
Description Tree: Content Type

G

The whole figure represents the Formal Description: it is an RDF Graph.

G

The bordered part is used for the Final Description.
PhD Defense – July 6, 2011

15 / 51
Semantic Indexing
Description Tree: Resource Type

G

The whole figure represents the Formal Description: it is an RDF Graph.

G

The bordered part is used for the Final Description.
PhD Defense – July 6, 2011

16 / 51
Semantic Indexing
Simple description

Definition
A Simple Description is a description where the root of the Description Tree has
only one child.

_:id_document

_:id_1

p1

_:id_n-1

...

ont:C

pn

G

The general form of the part used for the Final Description.

G

The blank nodes are virtual instances of concepts.

G

The last node is a real individual of a concept defined in the ontology.

PhD Defense – July 6, 2011

17 / 51
Semantic Indexing
Complex description: Description Tree
A Complex Description contains several paths. Each path starts from the root
and relates a Simple Description SDes of the same document.

_:id_document

r1

r2
r3

PhD Defense – July 6, 2011

18 / 51
Semantic Indexing
Complex description: definition

Definition
A Complex Description is a Description Tree where the root has more than one
child. The tree is the merging of n simple descriptions. A Complex Description
CDes is defined by the union of simple descriptions:
CDes = SDes1 ∨ SDes2 ∨ ... ∨ SDesn

PhD Defense – July 6, 2011

19 / 51
Semantic Indexing
Complex description: publication and retrieval

G
G

G

A resource R with n SDes is published n times, once with each SDes.
A query with one of the n descriptions must answer positively with the
resource R .
A query requesting for resources having a complex description is
considered as a set of elementary queries (corresponding at a simple
description). The result of the query is the intersection of the elementary
query results.
Result (QCDes ) =
Result (QSDesi )
i =1,p

PhD Defense – July 6, 2011

20 / 51
Semantic Indexing
Creation of keys

G

G

A key used in the index is a representation of the semantic description of a
resource and is written in a language based on RDF.
The semantic description is an RDF graph (the Description Tree)
H

That contains blank nodes useless for indexing because they do not contain
semantic information necessary for describing a resource.

PhD Defense – July 6, 2011

21 / 51
Semantic Indexing
An example of Description Tree

Very difficult documents.
lom:Difficulty

lom:LomEducationalCategory

_:lo

pe
ty
f:
rd

pe

ty

rdf:

f:

rd

type

lom:LearningObject

lom:very_difficult

_:lec

lom:has_difficulty

rd f

lom:has_lomEducational

s:l
l

abe
"Very Difficult"

PhD Defense – July 6, 2011

22 / 51
Semantic Indexing
A small knowledge base

The description contains the following triples:
_:lo rdf:type lom:LearningObject .
_:lo lom:has_lomEducational :_lec .
courier
_:lec rdf:type lom:LomEducationalCategory .
_:lec lom:has_difficulty lom:very_difficult .
The N3 notation synthesizes the description as follows:
[ a lom:LearningObject ] lom:has_lomEducational
courier [a lom:LomEducationalCategory ;
lom:has_difficulty lom:very_difficult .]

PhD Defense – July 6, 2011

23 / 51
Semantic Indexing
Format of the key

Key:
{rdf:type,lom:LearningObject}
courier
{lom:has_lomEducational}
{lom:has_difficulty,lom:very_difficult}
PhD Defense – July 6, 2011

24 / 51
Semantic Indexing
Keys extension

G

Users should be able to find a resource with other characteristics than those exactly
used for publishing
H

G

The System must also publish a resource with descriptions corresponding to
these expected characteristics.

The extension of keys produces a Complex Description
H

H

The Simple Description supplied by the resource provider is combined with
others generated by the system.
The resource is published with each of them.

PhD Defense – July 6, 2011

25 / 51
Semantic Indexing
Keys extension: subsumption

G
G

G

Documents about Stack.
The ontology Theory of Languages contains the concept lt:Stack and the
super-concept: lt:Data_Structure.
Any request of resources concerning Data Structure should also return
resources concerning Stack.
courier

Key_initial:
{rdf:type,system:Document}
{system:hasInterest, lt:Stack}

courier

PhD Defense – July 6, 2011

Key_extended:
{rdf:type,system:Document}
{system:hasInterest, lt:Data_Structure}

26 / 51
Semantic Indexing
Keys extension: facet

G
G

G

G

Very difficult documents.
A resource may be published with a specific difficulty level
(lom:very_difficult).
We consider also interesting to look for resources where the difficulty level
has been defined.
A request of resources where the difficulty level has been defined, should
also return resources published with a specific difficulty level (instances of
the concept lom:Difficulty).
courier

Key_initial:
{rdf:type,lom:LearningObject}
{lom:has_lomEducational}
{lom:has_difficulty,lom:very_difficult}

courier

PhD Defense – July 6, 2011

Key_extended:
{rdf:type,lom:LearningObject}
{lom:has_lomEducational}
{lom:has_difficulty}

27 / 51
Semantic Indexing
Keys extension: category

G
G

Documents about Chomsky.
We consider that if the content of a resource is about a particular author, it
is also about the concept of Author.
courier

Key_initial:
{rdf:type,system:Document}
{system:hasInterest, lt:chomsky}

courier

PhD Defense – July 6, 2011

Key_extended:
{rdf:type,system:Document}
{system:hasInterest, lt:Author}

28 / 51
Semantic Indexing
Keys extension: keyword (I)

G
G

Documents about "Jeffrey D. Ullman".
The ontology Theory of Languages does not contain any individual of the
concept lt:Author referring to the author "Jeffrey D. Ullman"
H

H

courier

We considered the possibility for the System to create a virtual individual of
the concept lt:Author
And let the user enter the string "Jeffrey D. Ullman" as value of its property
lt:hasName

Key_initial:
{rdf:type,system:Document}
{system:hasInterest,lt:Author}
{lt:hasName,"Jeffrey D. Ullman"ˆˆxsd:string}

PhD Defense – July 6, 2011

29 / 51
Semantic Indexing
Keys extension: keyword (II)

G

A resource whose content is about a particular author, is also about the
concept of Author (Category extension).
courier

G

Key_extended:
{rdf:type,system:Document}
{system:hasInterest, lt:Author}

A resource whose content is associated to a string, is also about a keyword
(Keyword extension).
courier

Key_extended:
{rdf:type,system:Document}
{system:hasKeyword,
"Jeffrey D. Ullman"ˆˆxsd:string}

PhD Defense – July 6, 2011

30 / 51
Outline

G

Motivation

G

Semantic Indexing

G

Indexing Patterns

G

Community of Users

G

Conclusion

G

Future Work

PhD Defense – July 6, 2011

31 / 51
Indexing Patterns
Cases of indexing

V.

C.T.

R.T.
K.T.
Legenda:
C.T.: Content Type
R.T.: Resource Type
K.T.: Keyword Type

1 Step
concept
property
individual
individual

N.V.
V.

≥1 Step
individual
individual

N.V.

2 Steps

property
>1 Step
property

1 Step
string

Example
treating of Family.
about the semantic density of a LO.
treating of Chomsky.
treating of Ullman.

Extension
Subsumption
Subsumption
Category
Keyword

Example
having a known contributor.
having an unknown contributor.

Extension
Facet
Facet + Keyword

Example
about Medieval Italy.

Extension

V.: Virtual
N.V.: Not Virtual

PhD Defense – July 6, 2011

32 / 51
Indexing Patterns
Objective

G
G

An Indexing Pattern is a model of a case of indexing.
An Indexing Pattern allows to follow a path within an ontology, defining a
sequence of steps
H

G

At each step, the user interacts only with the necessary part of the ontology.
The unnecessary parts are hidden.

An Indexing Pattern is used
H

For presenting the ontologies to users in a friendly and easy-to-use way
G

H

Developers can provide a User Interface able to guide the user through the
ontologies.

For creating the keys of indexing.

PhD Defense – July 6, 2011

33 / 51
Indexing Patterns
Definition

We call Indexing Pattern a triple
I P = (D, P , A ) where:
G

G

G

D is a Description Template, the generalized description of a resource. It
contains some variables that are fixed during the steps followed by users
for creating the description.
P is a User Process, the sequence of steps necessary for determining the
values of the variables. It is composed of a sequence of assignments
involving either SPARQL queries, or other types of user inputs.
A is an Algorithm, the sequence of computations used for creating the
keys of indexing.

PhD Defense – July 6, 2011

34 / 51
Indexing Patterns
Description Template: Iterative Pattern
〈T0 〉 ← lom:LearningObject

D

Individual: _:d

pe
:ty

rdf

_:d

Types: <T0 >
loop (k=1,n)

〈T1 〉 ←lom:LomEducationalCategory
〈P1 〉 ←lom:has_lomEducational

Facts: <Pk > _:ik
Individual: _:ik

e

p
:ty
rdf

_:i1

Types: <Tk >
end loop

〈T2 〉 ←lom:Difficulty

〈P2 〉 ←lom:has_difficulty

<i_v> ← _:in
<V> ← <Tn >

pe

ty

rd

〈 i_v 〉 ←lom:very_difficult

rdf
s

"Very Difficult"

:la

PhD Defense – July 6, 2011

f:

bel

35 / 51
Indexing Patterns
User process: Iterative Pattern
P
O ← userOntologyChoice()

T0 ← user(entry_point(O ))
k←0
i_v ← null
repeat
k ← k++
Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r)
with γ<Tk −1 >= {
?p rdf:type owl:ObjectProperty .
?p rdfs:domain <Tk −1 > .
?p rdfs:range ?r . }
<pk , Tk >← user(res(Sk <Tk −1 >))
SF <Tk >←(γ<Tk >, {O }, select ?i)
with γ<Tk >= { ?i rdf:type <Tk > . }
if (res(SF <Tk >))= ∅
<i_v> = user(res(SF <Tk >))
until <i_v> =null

PhD Defense – July 6, 2011

36 / 51
Indexing Patterns
User process: Iterative Pattern
〈T0 〉 ← lom:LearningObject

P
O ← userOntologyChoice()

T0 ← user(entry_point(O ))
k←0
i_v ← null

_:d

e
typ

:
rdf

repeat
k ← k++
Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r)
with γ<Tk −1 >= {
?p rdf:type owl:ObjectProperty .
?p rdfs:domain <Tk −1 > .
?p rdfs:range ?r . }
<pk , Tk >← user(res(Sk <Tk −1 >))
SF <Tk >←(γ<Tk >, {O }, select ?i)
with γ<Tk >= { ?i rdf:type <Tk > . }
if (res(SF <Tk >))= ∅
<i_v> = user(res(SF <Tk >))
until <i_v> =null

PhD Defense – July 6, 2011

36 / 51
Indexing Patterns
User process: Iterative Pattern
〈T0 〉 ← lom:LearningObject

P
O ← userOntologyChoice()

T0 ← user(entry_point(O ))
k←0
i_v ← null

_:d

repeat
k ← k++
Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r)

e
typ

:
rdf

〈T1 〉 ←lom:LomEducationalCategory
〈P1 〉 ←lom:has_lomEducational

with γ<Tk −1 >= {
?p rdf:type owl:ObjectProperty .
?p rdfs:domain <Tk −1 > .

pe

_:i1

:ty
rdf

?p rdfs:range ?r . }
<pk , Tk >← user(res(Sk <Tk −1 >))
SF <Tk >←(γ<Tk >, {O }, select ?i)
with γ<Tk >= { ?i rdf:type <Tk > . }
if (res(SF <Tk >))= ∅
<i_v> = user(res(SF <Tk >))
until <i_v> =null

PhD Defense – July 6, 2011

36 / 51
Indexing Patterns
User process: Iterative Pattern
〈T0 〉 ← lom:LearningObject

P
O ← userOntologyChoice()

T0 ← user(entry_point(O ))
k←0
i_v ← null

e
typ

:
rdf

_:d

repeat
k ← k++
Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r)

〈T1 〉 ←lom:LomEducationalCategory
〈P1 〉 ←lom:has_lomEducational

with γ<Tk −1 >= {
?p rdf:type owl:ObjectProperty .
?p rdfs:domain <Tk −1 > .

pe

:ty
rdf

_:i1

〈T2 〉 ←lom:Difficulty

?p rdfs:range ?r . }
<pk , Tk >← user(res(Sk <Tk −1 >))

〈P2 〉 ←lom:has_difficulty

SF <Tk >←(γ<Tk >, {O }, select ?i)

pe

with γ<Tk >= { ?i rdf:type <Tk > . }
if (res(SF <Tk >))= ∅

ty

〈 i_v 〉 ←lom:very_difficult

rd

f:

<i_v> = user(res(SF <Tk >))
until <i_v> =null

rdf
s:l
abe
l

PhD Defense – July 6, 2011

"Very Difficult"

36 / 51
Outline

G

Motivation

G

Semantic Indexing

G

Indexing Patterns

G

Community of Users

G

Conclusion

G

Future Work

PhD Defense – July 6, 2011

37 / 51
Community of Users
Community and users

A Community is composed of users interested in collaborative activities
G

G

G

Expert users: experts in the domain of interest of the community. They are
in charge of the activities of providing the ontologies, their description and
their publication.
Provider users: they don’t have an high level role. They usually publish and
retrieve resources.
Consumer users: they have a passive participation because they don’t
provide any contribution to the community. They just retrieve resources.

PhD Defense – July 6, 2011

38 / 51
Community of Users
Community and resources

G

Community resources
H

Documents

H

Core resources

G

G
G
G
G

The resources shared by users through the Shared Memory.
Ontologies: used for creating the keys of indexing.
Notes: free text provided by a user to include additional information in the System.
Wiki: a unique space of the System shared by all users.
Are published in the network with specific keys using the System Ontology.

PhD Defense – July 6, 2011

39 / 51
Community of Users
Core resources: Ontologies

G

Are published in the network by expert users with a small additional
description:
H
H

G

G

a textual description concerning domain of the ontology;
the set of Entry Points.

The publication is made thanks a key of indexing assigned automatically by
the System.
Are retrieved from the network when the user starts the system.

PhD Defense – July 6, 2011

40 / 51
Community of Users
Core resources: Notes

G

The use of Notes is considered of general purpose
H

H
G

The content of the Note may be any topic of interest for the user: messages
for other users, memos, comments on certain resources, etc.
Notes are published using a keyword.

Notes are published with a key assigned by the System.

PhD Defense – July 6, 2011

41 / 51
Community of Users
Core resources: Wiki

G

The Wiki of the Community is composed of only one physical document
containing several parts that may link to other resources, distributed in the
network
H

G

Links are Semantic, refer to keys of indexing, are embedded in the HTML link
tag.

When a new community is created the System publishes the Wiki in the
network from a template containing the skeleton with only the essential
structure.

PhD Defense – July 6, 2011

42 / 51
Community of Users
Community and tools

A Community is supported by a Web platform equipped with a set of tools
G

G

G

G

Indexing Tool: is used for choosing the ontologies retrieved from the
network and for creating the keys of indexing.
Indexing Pool: is a temporary container of (key, resource) pairs. It allows
users to select the resource they want to index and to associate the key of
indexing built with the Indexing Tool.
Notes Editor : is a tool that enables users to create personal notes that are
associated to keys of indexing and published.
Retrieval Tool: allows users to submit queries to the system. It retrieves
results and displays them.

PhD Defense – July 6, 2011

43 / 51
Community of Users
Web Platform

PhD Defense – July 6, 2011

44 / 51
Community of Users
Architecture of a Peer

PhD Defense – July 6, 2011

45 / 51
Outline

G

Motivation

G

Semantic Indexing

G

Indexing Patterns

G

Community of Users

G

Conclusion

G

Future Work

PhD Defense – July 6, 2011

46 / 51
Conclusion

G

A model of Indexing Patterns
H

G

Description extension mechanism
H
H

G

For transforming a user understandable description to a machine
understandable one.
Form managing publication and retrieval contexts.
The description is enlarged during publication to foresee different possible
retrieval situations.

A Web platform
H
H
H

That makes feasible the life of a decentralized community.
Contains a set of Core resources for allowing the indexing.
Contains a distributed Semantic Wiki and a distributed system of Notes.

PhD Defense – July 6, 2011

47 / 51
Outline

G

Motivation

G

Semantic Indexing

G

Indexing Patterns

G

Community of Users

G

Conclusion

G

Future Work

PhD Defense – July 6, 2011

48 / 51
Future Work

G

Advanced navigation system for ontologies
H

G

Exchange with an external system.
H

G

It may query our system by creatin a semantic description of potential
resources based on RDF

Multilingual issues.
H

G

A richer navigation system for ontologies for better organize the visual
composition of represented data.

It concerns resources indexed on keywords or indexed on virtual individuals
because the user has to add at least one string in order to describe this
individual.

Evaluation.
H

Experiments should prove that the system can really support a real
community.

PhD Defense – July 6, 2011

49 / 51
Publications (I)

G

In proceedings
H

H
H

H

Moulin, C., Lai, C. Ontologies based approach for semantic indexing in distributed environments. KEOD
2009, International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge
Management, Madeira, 6 - 8 October 2009, Portugal. pp. 420-423.
Moulin, C., Lai, C. Semantic indexing within a Semantic Desktop. WWW/Internet 2009, Rome, 19 - 22
November 2009, Italy, pp. 149-152.
Moulin, C., Lai, C. Indexing Patterns within a Distributed System. In S. D. Fatos Xhafa, Santi Caballe,
Ajith Abraham, ed., Second International Conference on Intelligent Networking and Collaborative
Systems (INCOS 2010), Thessaloniki, 24 - 26 November 2010, Greece, pp. 206-213.
Moulin, C., Lai, C. Reasoning in a distributed semantic indexing system. 4th International Workshop on
Distributed Agent-based Retrieval Tools, DART 2010. 18 - 19 June, Geneva, pp. 88-98.

PhD Defense – July 6, 2011

50 / 51
Publications (II)

G

Journal
H

Moulin, C., Lai, C. Harmonization between personal and shared memories. International Journal of
Software Engineering and Knowledge Engineering, 20(4), pp.521-531, 2010.

G

Book chapters
H

H

Moulin, C., Lai, C. Semantic desktop: a common gate on local and distributed indexed resources. In A.
Soro, E. Vargiu, G. Armano and G. Paddeu, eds., Information Retrieval and Mining in Distributed
Environments, Springer, 2010. pp.61-76.
Moulin, C., Lai, C. Query Building in a Distributed Semantic Indexing System. Advances in Distributed
Agent-based Retrieval Tools, post-proceedings of DART 2010. (in press).

PhD Defense – July 6, 2011

51 / 51

Phd_cristian_lai presentation

  • 1.
    Semantic indexing modellingof resources in personal and collective memories based on a P2P approach Cristian LAI Thesis Director: Claude MOULIN PhD Defense – July 6, 2011 1 / 51
  • 2.
    Outline G Motivation G Semantic Indexing G Indexing Patterns G Communityof Users G Conclusion G Future Work PhD Defense – July 6, 2011 2 / 51
  • 3.
    Motivation Context G Loose community ofusers H H G G Private and shared resources. Experts and general users. Resources of different types. Any nature of community H Possible focus on community of teachers and students. PhD Defense – July 6, 2011 3 / 51
  • 4.
    Motivation Issues G How to managepublication and retrieval contexts? H G How to transform a user understandable description to machine understandable one? H G How to match the description made during the retrieval context with a description made during the publication context? How to create a formal description from the user input? How to make possible the life of a decentralized community? H H How to manage a certain level of communication among members? How to manage elements allowing the indexing of resources? PhD Defense – July 6, 2011 4 / 51
  • 5.
    Motivation Contribution G How to managepublication and retrieval contexts? H H G How to transform a user understandable description to machine understandable one? H G Description extension. The description is enlarged during publication to foresee different possible retrieval situations. Model of Indexing Patterns. How to make possible the life of a decentralized community? H H A distributed Semantic Wiki of the community and a distributed system of Notes. A set of Core resources is managed for allowing the indexing. PhD Defense – July 6, 2011 5 / 51
  • 6.
    Outline G Motivation G Semantic Indexing G Indexing Patterns G Communityof Users G Conclusion G Future Work PhD Defense – July 6, 2011 6 / 51
  • 7.
    Semantic Indexing A P2Psystem PhD Defense – July 6, 2011 7 / 51
  • 8.
    Semantic Indexing Indexing G P2P networksrequire the Boolean indexing G Indexing is the process of creating or updating an index H H G Given a list of resources it is necessary to create their proper descriptions different from the only title. The only title of a resource does not give a meaning universally known H G We choose to adopt the same Boolean indexing also for the personal memory. In traditional filesharing systems it is usually used the title or a set of keywords to identify a resource. We consider semantic descriptions built manually by users. PhD Defense – July 6, 2011 8 / 51
  • 9.
    Semantic Indexing Description andquery G G A description is supplied during publication In a Boolean Index, for retrieving a resource it is required the same description H G Exact matching between descriptions. A query is supplied during retrieval H It is equivalent to a description of a potential set of resources. PhD Defense – July 6, 2011 9 / 51
  • 10.
    Semantic Indexing Ontologies andKnowledge bases G Expert community members are able to H H G G Find the proper ontologies. Build a population of an ontology grouping the most prominent individuals of the domain: knowledge base. Ontologies and knowledge bases are available within the community. Open description H H Add keywords to the description. Guidelines for preventing typing ambiguities. PhD Defense – July 6, 2011 10 / 51
  • 11.
    Semantic Indexing Semantic description PhDDefense – July 6, 2011 11 / 51
  • 12.
    Semantic Indexing Types ofDescriptions G Resource type H H G Address resources giving elements of description concerning the resource itself and not its content. Document written by Chomsky. Content type H H Address resources giving elements of description that concern their content. Document about Chomsky. PhD Defense – July 6, 2011 12 / 51
  • 13.
    Semantic Indexing Resource Type G G Documentwritten by Chomsky. An ontology of domain is necessary. It should contain: H H H G G A concept that can represent the resource. A concept that can represent an author. A relation that binds the document to the author. The resource is considered as an instance of the concept that represents the resource itself. The system must show the concepts of the ontology that can represent the resource: Entry Point H The ontology provider has to declare what are the Entry Points. PhD Defense – July 6, 2011 13 / 51
  • 14.
    Semantic Indexing Content Type G G G Documentabout Chomsky An ontology that represents the resource and its content is necessary. We have defined the System Ontology that contains H H G The concept system:Document that represents the resource. The property system:hasInterest that paraphrases about. It is necessary to have an ontology of domain for extracting the concept describing the content H It is necessary to represent the author Chomsky. PhD Defense – July 6, 2011 14 / 51
  • 15.
    Semantic Indexing Description Tree:Content Type G The whole figure represents the Formal Description: it is an RDF Graph. G The bordered part is used for the Final Description. PhD Defense – July 6, 2011 15 / 51
  • 16.
    Semantic Indexing Description Tree:Resource Type G The whole figure represents the Formal Description: it is an RDF Graph. G The bordered part is used for the Final Description. PhD Defense – July 6, 2011 16 / 51
  • 17.
    Semantic Indexing Simple description Definition ASimple Description is a description where the root of the Description Tree has only one child. _:id_document _:id_1 p1 _:id_n-1 ... ont:C pn G The general form of the part used for the Final Description. G The blank nodes are virtual instances of concepts. G The last node is a real individual of a concept defined in the ontology. PhD Defense – July 6, 2011 17 / 51
  • 18.
    Semantic Indexing Complex description:Description Tree A Complex Description contains several paths. Each path starts from the root and relates a Simple Description SDes of the same document. _:id_document r1 r2 r3 PhD Defense – July 6, 2011 18 / 51
  • 19.
    Semantic Indexing Complex description:definition Definition A Complex Description is a Description Tree where the root has more than one child. The tree is the merging of n simple descriptions. A Complex Description CDes is defined by the union of simple descriptions: CDes = SDes1 ∨ SDes2 ∨ ... ∨ SDesn PhD Defense – July 6, 2011 19 / 51
  • 20.
    Semantic Indexing Complex description:publication and retrieval G G G A resource R with n SDes is published n times, once with each SDes. A query with one of the n descriptions must answer positively with the resource R . A query requesting for resources having a complex description is considered as a set of elementary queries (corresponding at a simple description). The result of the query is the intersection of the elementary query results. Result (QCDes ) = Result (QSDesi ) i =1,p PhD Defense – July 6, 2011 20 / 51
  • 21.
    Semantic Indexing Creation ofkeys G G A key used in the index is a representation of the semantic description of a resource and is written in a language based on RDF. The semantic description is an RDF graph (the Description Tree) H That contains blank nodes useless for indexing because they do not contain semantic information necessary for describing a resource. PhD Defense – July 6, 2011 21 / 51
  • 22.
    Semantic Indexing An exampleof Description Tree Very difficult documents. lom:Difficulty lom:LomEducationalCategory _:lo pe ty f: rd pe ty rdf: f: rd type lom:LearningObject lom:very_difficult _:lec lom:has_difficulty rd f lom:has_lomEducational s:l l abe "Very Difficult" PhD Defense – July 6, 2011 22 / 51
  • 23.
    Semantic Indexing A smallknowledge base The description contains the following triples: _:lo rdf:type lom:LearningObject . _:lo lom:has_lomEducational :_lec . courier _:lec rdf:type lom:LomEducationalCategory . _:lec lom:has_difficulty lom:very_difficult . The N3 notation synthesizes the description as follows: [ a lom:LearningObject ] lom:has_lomEducational courier [a lom:LomEducationalCategory ; lom:has_difficulty lom:very_difficult .] PhD Defense – July 6, 2011 23 / 51
  • 24.
    Semantic Indexing Format ofthe key Key: {rdf:type,lom:LearningObject} courier {lom:has_lomEducational} {lom:has_difficulty,lom:very_difficult} PhD Defense – July 6, 2011 24 / 51
  • 25.
    Semantic Indexing Keys extension G Usersshould be able to find a resource with other characteristics than those exactly used for publishing H G The System must also publish a resource with descriptions corresponding to these expected characteristics. The extension of keys produces a Complex Description H H The Simple Description supplied by the resource provider is combined with others generated by the system. The resource is published with each of them. PhD Defense – July 6, 2011 25 / 51
  • 26.
    Semantic Indexing Keys extension:subsumption G G G Documents about Stack. The ontology Theory of Languages contains the concept lt:Stack and the super-concept: lt:Data_Structure. Any request of resources concerning Data Structure should also return resources concerning Stack. courier Key_initial: {rdf:type,system:Document} {system:hasInterest, lt:Stack} courier PhD Defense – July 6, 2011 Key_extended: {rdf:type,system:Document} {system:hasInterest, lt:Data_Structure} 26 / 51
  • 27.
    Semantic Indexing Keys extension:facet G G G G Very difficult documents. A resource may be published with a specific difficulty level (lom:very_difficult). We consider also interesting to look for resources where the difficulty level has been defined. A request of resources where the difficulty level has been defined, should also return resources published with a specific difficulty level (instances of the concept lom:Difficulty). courier Key_initial: {rdf:type,lom:LearningObject} {lom:has_lomEducational} {lom:has_difficulty,lom:very_difficult} courier PhD Defense – July 6, 2011 Key_extended: {rdf:type,lom:LearningObject} {lom:has_lomEducational} {lom:has_difficulty} 27 / 51
  • 28.
    Semantic Indexing Keys extension:category G G Documents about Chomsky. We consider that if the content of a resource is about a particular author, it is also about the concept of Author. courier Key_initial: {rdf:type,system:Document} {system:hasInterest, lt:chomsky} courier PhD Defense – July 6, 2011 Key_extended: {rdf:type,system:Document} {system:hasInterest, lt:Author} 28 / 51
  • 29.
    Semantic Indexing Keys extension:keyword (I) G G Documents about "Jeffrey D. Ullman". The ontology Theory of Languages does not contain any individual of the concept lt:Author referring to the author "Jeffrey D. Ullman" H H courier We considered the possibility for the System to create a virtual individual of the concept lt:Author And let the user enter the string "Jeffrey D. Ullman" as value of its property lt:hasName Key_initial: {rdf:type,system:Document} {system:hasInterest,lt:Author} {lt:hasName,"Jeffrey D. Ullman"ˆˆxsd:string} PhD Defense – July 6, 2011 29 / 51
  • 30.
    Semantic Indexing Keys extension:keyword (II) G A resource whose content is about a particular author, is also about the concept of Author (Category extension). courier G Key_extended: {rdf:type,system:Document} {system:hasInterest, lt:Author} A resource whose content is associated to a string, is also about a keyword (Keyword extension). courier Key_extended: {rdf:type,system:Document} {system:hasKeyword, "Jeffrey D. Ullman"ˆˆxsd:string} PhD Defense – July 6, 2011 30 / 51
  • 31.
    Outline G Motivation G Semantic Indexing G Indexing Patterns G Communityof Users G Conclusion G Future Work PhD Defense – July 6, 2011 31 / 51
  • 32.
    Indexing Patterns Cases ofindexing V. C.T. R.T. K.T. Legenda: C.T.: Content Type R.T.: Resource Type K.T.: Keyword Type 1 Step concept property individual individual N.V. V. ≥1 Step individual individual N.V. 2 Steps property >1 Step property 1 Step string Example treating of Family. about the semantic density of a LO. treating of Chomsky. treating of Ullman. Extension Subsumption Subsumption Category Keyword Example having a known contributor. having an unknown contributor. Extension Facet Facet + Keyword Example about Medieval Italy. Extension V.: Virtual N.V.: Not Virtual PhD Defense – July 6, 2011 32 / 51
  • 33.
    Indexing Patterns Objective G G An IndexingPattern is a model of a case of indexing. An Indexing Pattern allows to follow a path within an ontology, defining a sequence of steps H G At each step, the user interacts only with the necessary part of the ontology. The unnecessary parts are hidden. An Indexing Pattern is used H For presenting the ontologies to users in a friendly and easy-to-use way G H Developers can provide a User Interface able to guide the user through the ontologies. For creating the keys of indexing. PhD Defense – July 6, 2011 33 / 51
  • 34.
    Indexing Patterns Definition We callIndexing Pattern a triple I P = (D, P , A ) where: G G G D is a Description Template, the generalized description of a resource. It contains some variables that are fixed during the steps followed by users for creating the description. P is a User Process, the sequence of steps necessary for determining the values of the variables. It is composed of a sequence of assignments involving either SPARQL queries, or other types of user inputs. A is an Algorithm, the sequence of computations used for creating the keys of indexing. PhD Defense – July 6, 2011 34 / 51
  • 35.
    Indexing Patterns Description Template:Iterative Pattern 〈T0 〉 ← lom:LearningObject D Individual: _:d pe :ty rdf _:d Types: <T0 > loop (k=1,n) 〈T1 〉 ←lom:LomEducationalCategory 〈P1 〉 ←lom:has_lomEducational Facts: <Pk > _:ik Individual: _:ik e p :ty rdf _:i1 Types: <Tk > end loop 〈T2 〉 ←lom:Difficulty 〈P2 〉 ←lom:has_difficulty <i_v> ← _:in <V> ← <Tn > pe ty rd 〈 i_v 〉 ←lom:very_difficult rdf s "Very Difficult" :la PhD Defense – July 6, 2011 f: bel 35 / 51
  • 36.
    Indexing Patterns User process:Iterative Pattern P O ← userOntologyChoice() T0 ← user(entry_point(O )) k←0 i_v ← null repeat k ← k++ Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r) with γ<Tk −1 >= { ?p rdf:type owl:ObjectProperty . ?p rdfs:domain <Tk −1 > . ?p rdfs:range ?r . } <pk , Tk >← user(res(Sk <Tk −1 >)) SF <Tk >←(γ<Tk >, {O }, select ?i) with γ<Tk >= { ?i rdf:type <Tk > . } if (res(SF <Tk >))= ∅ <i_v> = user(res(SF <Tk >)) until <i_v> =null PhD Defense – July 6, 2011 36 / 51
  • 37.
    Indexing Patterns User process:Iterative Pattern 〈T0 〉 ← lom:LearningObject P O ← userOntologyChoice() T0 ← user(entry_point(O )) k←0 i_v ← null _:d e typ : rdf repeat k ← k++ Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r) with γ<Tk −1 >= { ?p rdf:type owl:ObjectProperty . ?p rdfs:domain <Tk −1 > . ?p rdfs:range ?r . } <pk , Tk >← user(res(Sk <Tk −1 >)) SF <Tk >←(γ<Tk >, {O }, select ?i) with γ<Tk >= { ?i rdf:type <Tk > . } if (res(SF <Tk >))= ∅ <i_v> = user(res(SF <Tk >)) until <i_v> =null PhD Defense – July 6, 2011 36 / 51
  • 38.
    Indexing Patterns User process:Iterative Pattern 〈T0 〉 ← lom:LearningObject P O ← userOntologyChoice() T0 ← user(entry_point(O )) k←0 i_v ← null _:d repeat k ← k++ Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r) e typ : rdf 〈T1 〉 ←lom:LomEducationalCategory 〈P1 〉 ←lom:has_lomEducational with γ<Tk −1 >= { ?p rdf:type owl:ObjectProperty . ?p rdfs:domain <Tk −1 > . pe _:i1 :ty rdf ?p rdfs:range ?r . } <pk , Tk >← user(res(Sk <Tk −1 >)) SF <Tk >←(γ<Tk >, {O }, select ?i) with γ<Tk >= { ?i rdf:type <Tk > . } if (res(SF <Tk >))= ∅ <i_v> = user(res(SF <Tk >)) until <i_v> =null PhD Defense – July 6, 2011 36 / 51
  • 39.
    Indexing Patterns User process:Iterative Pattern 〈T0 〉 ← lom:LearningObject P O ← userOntologyChoice() T0 ← user(entry_point(O )) k←0 i_v ← null e typ : rdf _:d repeat k ← k++ Sk <Tk −1 >←(γ<Tk −1 >,{O }, select ?p ?r) 〈T1 〉 ←lom:LomEducationalCategory 〈P1 〉 ←lom:has_lomEducational with γ<Tk −1 >= { ?p rdf:type owl:ObjectProperty . ?p rdfs:domain <Tk −1 > . pe :ty rdf _:i1 〈T2 〉 ←lom:Difficulty ?p rdfs:range ?r . } <pk , Tk >← user(res(Sk <Tk −1 >)) 〈P2 〉 ←lom:has_difficulty SF <Tk >←(γ<Tk >, {O }, select ?i) pe with γ<Tk >= { ?i rdf:type <Tk > . } if (res(SF <Tk >))= ∅ ty 〈 i_v 〉 ←lom:very_difficult rd f: <i_v> = user(res(SF <Tk >)) until <i_v> =null rdf s:l abe l PhD Defense – July 6, 2011 "Very Difficult" 36 / 51
  • 40.
    Outline G Motivation G Semantic Indexing G Indexing Patterns G Communityof Users G Conclusion G Future Work PhD Defense – July 6, 2011 37 / 51
  • 41.
    Community of Users Communityand users A Community is composed of users interested in collaborative activities G G G Expert users: experts in the domain of interest of the community. They are in charge of the activities of providing the ontologies, their description and their publication. Provider users: they don’t have an high level role. They usually publish and retrieve resources. Consumer users: they have a passive participation because they don’t provide any contribution to the community. They just retrieve resources. PhD Defense – July 6, 2011 38 / 51
  • 42.
    Community of Users Communityand resources G Community resources H Documents H Core resources G G G G G The resources shared by users through the Shared Memory. Ontologies: used for creating the keys of indexing. Notes: free text provided by a user to include additional information in the System. Wiki: a unique space of the System shared by all users. Are published in the network with specific keys using the System Ontology. PhD Defense – July 6, 2011 39 / 51
  • 43.
    Community of Users Coreresources: Ontologies G Are published in the network by expert users with a small additional description: H H G G a textual description concerning domain of the ontology; the set of Entry Points. The publication is made thanks a key of indexing assigned automatically by the System. Are retrieved from the network when the user starts the system. PhD Defense – July 6, 2011 40 / 51
  • 44.
    Community of Users Coreresources: Notes G The use of Notes is considered of general purpose H H G The content of the Note may be any topic of interest for the user: messages for other users, memos, comments on certain resources, etc. Notes are published using a keyword. Notes are published with a key assigned by the System. PhD Defense – July 6, 2011 41 / 51
  • 45.
    Community of Users Coreresources: Wiki G The Wiki of the Community is composed of only one physical document containing several parts that may link to other resources, distributed in the network H G Links are Semantic, refer to keys of indexing, are embedded in the HTML link tag. When a new community is created the System publishes the Wiki in the network from a template containing the skeleton with only the essential structure. PhD Defense – July 6, 2011 42 / 51
  • 46.
    Community of Users Communityand tools A Community is supported by a Web platform equipped with a set of tools G G G G Indexing Tool: is used for choosing the ontologies retrieved from the network and for creating the keys of indexing. Indexing Pool: is a temporary container of (key, resource) pairs. It allows users to select the resource they want to index and to associate the key of indexing built with the Indexing Tool. Notes Editor : is a tool that enables users to create personal notes that are associated to keys of indexing and published. Retrieval Tool: allows users to submit queries to the system. It retrieves results and displays them. PhD Defense – July 6, 2011 43 / 51
  • 47.
    Community of Users WebPlatform PhD Defense – July 6, 2011 44 / 51
  • 48.
    Community of Users Architectureof a Peer PhD Defense – July 6, 2011 45 / 51
  • 49.
    Outline G Motivation G Semantic Indexing G Indexing Patterns G Communityof Users G Conclusion G Future Work PhD Defense – July 6, 2011 46 / 51
  • 50.
    Conclusion G A model ofIndexing Patterns H G Description extension mechanism H H G For transforming a user understandable description to a machine understandable one. Form managing publication and retrieval contexts. The description is enlarged during publication to foresee different possible retrieval situations. A Web platform H H H That makes feasible the life of a decentralized community. Contains a set of Core resources for allowing the indexing. Contains a distributed Semantic Wiki and a distributed system of Notes. PhD Defense – July 6, 2011 47 / 51
  • 51.
    Outline G Motivation G Semantic Indexing G Indexing Patterns G Communityof Users G Conclusion G Future Work PhD Defense – July 6, 2011 48 / 51
  • 52.
    Future Work G Advanced navigationsystem for ontologies H G Exchange with an external system. H G It may query our system by creatin a semantic description of potential resources based on RDF Multilingual issues. H G A richer navigation system for ontologies for better organize the visual composition of represented data. It concerns resources indexed on keywords or indexed on virtual individuals because the user has to add at least one string in order to describe this individual. Evaluation. H Experiments should prove that the system can really support a real community. PhD Defense – July 6, 2011 49 / 51
  • 53.
    Publications (I) G In proceedings H H H H Moulin,C., Lai, C. Ontologies based approach for semantic indexing in distributed environments. KEOD 2009, International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Madeira, 6 - 8 October 2009, Portugal. pp. 420-423. Moulin, C., Lai, C. Semantic indexing within a Semantic Desktop. WWW/Internet 2009, Rome, 19 - 22 November 2009, Italy, pp. 149-152. Moulin, C., Lai, C. Indexing Patterns within a Distributed System. In S. D. Fatos Xhafa, Santi Caballe, Ajith Abraham, ed., Second International Conference on Intelligent Networking and Collaborative Systems (INCOS 2010), Thessaloniki, 24 - 26 November 2010, Greece, pp. 206-213. Moulin, C., Lai, C. Reasoning in a distributed semantic indexing system. 4th International Workshop on Distributed Agent-based Retrieval Tools, DART 2010. 18 - 19 June, Geneva, pp. 88-98. PhD Defense – July 6, 2011 50 / 51
  • 54.
    Publications (II) G Journal H Moulin, C.,Lai, C. Harmonization between personal and shared memories. International Journal of Software Engineering and Knowledge Engineering, 20(4), pp.521-531, 2010. G Book chapters H H Moulin, C., Lai, C. Semantic desktop: a common gate on local and distributed indexed resources. In A. Soro, E. Vargiu, G. Armano and G. Paddeu, eds., Information Retrieval and Mining in Distributed Environments, Springer, 2010. pp.61-76. Moulin, C., Lai, C. Query Building in a Distributed Semantic Indexing System. Advances in Distributed Agent-based Retrieval Tools, post-proceedings of DART 2010. (in press). PhD Defense – July 6, 2011 51 / 51