Date: 30/11/2012
SSONDE: Semantic Similarity On
liNked Data Entities
Riccardo Albertoni
ralbertoni@delicias.dia.fi.upm.es
...
2
Presentation Outline
1. How SSONDE fits with other linked data
technologies
• What is it for? what is it not for?
2. Cha...
3
Linked data Crawling architectural pattern
Riccardo Albertoni
SSONDE
LDSPIDER/FUSE
KI
LDIF
Cluster analysis Explorative ...
4
SSONDE Instance similarity
is not
to align ontologies/schemas;
to interlink/consolidate entities;
aims at
• providing a ...
5
Presentation Outline
1. How SSONDE fits with other linked data
technologies
• What is it for? what is it not for?
2. Cha...
6
Example: Researchers’ comparison
their
Publications
Researchers
Their
Research
Topics
Their
Projects
7
• Common
publications
• Common research
projects
• Similar research
interests
Different Contexts
the researchers, public...
8
[ResearchStaff, Interest]{{{TopicName,Inter}},{{RelatedTopic, Inter} }}
Formalization of Application Context
A function...
9
Why an Asymmetric Similarity?
Sim(a,b) might differ from Sim(b,a)
• Sim is not the inverse of a metric distance  metric...
10
SSONDE’s Asymmetric Similarity returns
Sim(A,B) ranges in [0,1]
It is proportional to the number of data and
object pro...
11
Results comparing young and senior researchers of IMATI
Research Experience Research Interest
The darkest is the matrix...
12
Presentation Outline
1. How SSONDE fits with other linked data
technologies
• What is it for? what is it not for?
2. Ch...
13
SSONDE
Output
TDB
Rep.
SDB
Rep.
RDF
Dumps
Configuration Similarity
Context Layer
Ontology Layer
Data Layer
Data wrapper...
14
SSONDE: a building block for new analysis services
SSONDE applied on “real linked data”
• Analysing Habitat and Species...
15Riccardo Albertoni
Applying SSONDE on data.cnr.it
16Riccardo Albertoni
Applying SSONDE on data.cnr.it
http://code.google.com/p/ssonde/wiki/RDF_statements_download
17
Configuration file 1
{ "StoreConfiguration":{
"KindOfStore":"JENATDB",
"RDFDocumentURIs":[ ],
"TDBDirectory":"data/CNRI...
18
Crawled by Data.CNR.it
Crawled by DBPEDIA
Data.cnr.it – defining a context
Riccardo Albertoni
Res 226
pub: 22
Topic:25R...
19
Similarity Matrix:
Riccardo Albertoni
data is more recent but
less accurate
But
More Researchers are
represented
&
Stil...
20
Hierarchical clustering: Scientific cluster are discovered
Hierarchical Clustering Hierarchical Clustering Explorer, 3....
21
What next?
(i) semantic similarity optimization:
(i) the caching of intermediate similarity results
(ii) the adoption o...
22
THANKS for your kind attention!!!
Questions/ Discussion / Suggestion
Riccardo Albertoni
• SSONDE can be deployed in som...
23
SSONDE Framework
• R. Albertoni, M. De Martino, SSONDE: Semantic Similarity On liNked Data Entities, 6th Metadata
and S...
Upcoming SlideShare
Loading in …5
×

Presentation at MTSR 2012

205 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
205
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation at MTSR 2012

  1. 1. Date: 30/11/2012 SSONDE: Semantic Similarity On liNked Data Entities Riccardo Albertoni ralbertoni@delicias.dia.fi.upm.es Ontology Engineering Group. Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Joint work with Monica De Martino (CNR-IMATI-GE) MTSR 2012, 6th Metadata and Semantics Research Conference 28-30 November 2012 - Cádiz (Spain)
  2. 2. 2 Presentation Outline 1. How SSONDE fits with other linked data technologies • What is it for? what is it not for? 2. Characteristics of instance similarity in SSONDE • The theory behind SSONDE’s similarity is detailed in • Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008. 3. SSONDE Architecture and Examples on Linked Data Riccardo Albertoni
  3. 3. 3 Linked data Crawling architectural pattern Riccardo Albertoni SSONDE LDSPIDER/FUSE KI LDIF Cluster analysis Explorative search on resources Build analysis services Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). 1-136. Morgan & Claypool
  4. 4. 4 SSONDE Instance similarity is not to align ontologies/schemas; to interlink/consolidate entities; aims at • providing a method for comparing entities represented as instances in ontology driven repository or as entities exposed in linked data; • supporting in explorative searches. assumes all the integration steps are done Actually, it works at the Application Layer of the Linked Data Crawling Architectural Pattern main characteristics (make SSONDE unique in its kind) Context to represent similarity criteria (algorithm parameters); Asymmetry to emphasize containment between instances. Example: comparing researchers
  5. 5. 5 Presentation Outline 1. How SSONDE fits with other linked data technologies • What is it for? what is it not for? 2. Characteristics of instance similarity in SSONDE • The theory behind SSONDE’s similarity is detailed in • Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008. 3. SSONDE Architecture and Examples on Linked Data Riccardo Albertoni
  6. 6. 6 Example: Researchers’ comparison their Publications Researchers Their Research Topics Their Projects
  7. 7. 7 • Common publications • Common research projects • Similar research interests Different Contexts the researchers, publications, … are instances Researcher’s Experience Researchers’ Scientific Interest • Age • Number of publications • Number of projects Contexts Researchers’ Features (Data/Object properties) considered in the Sim. It is used only in this context!! They are used In both the contexts!!
  8. 8. 8 [ResearchStaff, Interest]{{{TopicName,Inter}},{{RelatedTopic, Inter} }} Formalization of Application Context A function that for each recursion path specifies data/objects properties and which operations to consider Example • Common publications • Common research project • Similar research interest Researchers’ Scientific Interest [ResearchStaff] {{Φ}, {{Publication, Inter} {WorkAtProject, Inter} {interest, Simil}}}
  9. 9. 9 Why an Asymmetric Similarity? Sim(a,b) might differ from Sim(b,a) • Sim is not the inverse of a metric distance  metric properties cannot be exploited to prune comparisons Here asymmetry is adopted to highlight the containment between instances A, B Example of containment: (Comparing wrt publications only) • A is Ph.D student who has always published with his tutor B, A B pub 3 pub 1 pub 2 Aiscontainedin B!!! (A<<B) A can be replaced by B B is notcontainedin A!!! If you replace B with A some experience got lost !!
  10. 10. 10 SSONDE’s Asymmetric Similarity returns Sim(A,B) ranges in [0,1] It is proportional to the number of data and object property values that A shares with B • A is contained in B Sim(A,B)=1 • If A is not contained in B Sim(A,B)<1 • If A and B don’t share any “features” Sim(A,B)=0 • If A has exactly the same characteristics of B (A<<B, B<<A)  Sim(A,B) = Sim(B,A) = 1
  11. 11. 11 Results comparing young and senior researchers of IMATI Research Experience Research Interest The darkest is the matrix value the more is the similarity
  12. 12. 12 Presentation Outline 1. How SSONDE fits with other linked data technologies • What is it for? what is it not for? 2. Characteristics of instance similarity in SSONDE • The theory behind SSONDE’s similarity is detailed in • Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008. 3. SSONDE Architecture and Examples on Linked Data Riccardo Albertoni
  13. 13. 13 SSONDE Output TDB Rep. SDB Rep. RDF Dumps Configuration Similarity Context Layer Ontology Layer Data Layer Data wrappers JENA TDB JENA SDB JENA MEM List of Instances Java Class to generate the list Ref. Context Ref. Rules (e.g., JENA rules) Similarity matrix in CSV n-most similar entities In JSON ...Virtuoso Wrppr virtuoso Kind of Store …. WEBOF DATA RDF Dumps HTTP DEREFERENCIABLE URIs SPARQL End Points Third parties Served Linked dataset Crawling architectural pattern LDIFLDSpider +Fuseki Linked data consumption Local Data Store /Cache SSONDE ARCHITECTURE
  14. 14. 14 SSONDE: a building block for new analysis services SSONDE applied on “real linked data” • Analysing Habitat and Species • published in NatureSDIplus (ECP-2007-GEO-317007), a European project developing a Spatial Data Infrastructure for Nature Conservation. • to rank habitats according to the species they host  an insight into inter-dependencies between habitats and species • Analysing overlaps among scientific interests • Subset of linked dataset provided data.cnr.it as part of SemanticScout framework by third parties (Gangemi et al) • to compare IMATI-CNR researcher according to their research interests Riccardo Albertoni
  15. 15. 15Riccardo Albertoni Applying SSONDE on data.cnr.it
  16. 16. 16Riccardo Albertoni Applying SSONDE on data.cnr.it http://code.google.com/p/ssonde/wiki/RDF_statements_download
  17. 17. 17 Configuration file 1 { "StoreConfiguration":{ "KindOfStore":"JENATDB", "RDFDocumentURIs":[ ], "TDBDirectory":"data/CNRIT/TDB-0.8.9/CNRR/" }, "InstanceConfiguration":{ "InstanceURIsClass":"application.dataCNRIt.GetResearcherIMATIplusCoauthor" }, "OutputConfiguration":{ "KindOfOutput":"JSONOrderedResult", "NumberOfOrderedResult":”20", "FilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CRRIIntPub.res.json" }, "ContextConfiguration":{ "ContextFilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CCRIIntPub.ctx" } } Riccardo Albertoni List of LOD Entities URI Java class Implementing ListOfInputInstances Similarity Matrix CSV - JSON encoding of top n-most similar Context Encoded in a format in-house text format/ hopefully soon in JSON
  18. 18. 18 Crawled by Data.CNR.it Crawled by DBPEDIA Data.cnr.it – defining a context Riccardo Albertoni Res 226 pub: 22 Topic:25Res 225 Topic:26 pub: 26 Topic:2 pub: 29 Res 226 Topic:27 Topic:23 skos:broader dc:subject pub:autoreCNRdi PREFIX dc: <http://purl.org/dc/terms/> PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#> [owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}} [owl:Thing]-> {{}, { (pub:autoreCNRDi, Inter),(dc:subject, Simil)}} No data properties are considered in this context Publications Interests Interest Hierarchy
  19. 19. 19 Similarity Matrix: Riccardo Albertoni data is more recent but less accurate But More Researchers are represented & Still containment is highlighted
  20. 20. 20 Hierarchical clustering: Scientific cluster are discovered Hierarchical Clustering Hierarchical Clustering Explorer, 3.0, Human-Computer Interaction Lab University of Maryland. http://www.cs.umd.edu/hcil/multi-cluster/.
  21. 21. 21 What next? (i) semantic similarity optimization: (i) the caching of intermediate similarity results (ii) the adoption of MapReduce paradigm to speed up the assessment of semantic similarity; (ii) domain driven extensions at data layer: (i) defining new data layer measures suited for geo- referenced entities (ii) the multilingual similarity (iii) definition of interfaces sifting entities according to their similarity exploiting visualization frameworks such as Exibit, Google visualization and JavaScript InfoVis Toolkit. Riccardo Albertoni
  22. 22. 22 THANKS for your kind attention!!! Questions/ Discussion / Suggestion Riccardo Albertoni • SSONDE can be deployed in some of your future projects (proposal) • You are interested in contributing to SSONDE Open framework Do not hesitate to contact us if SSONDE framework • pushes our instance similarity as a ready-to-go tool for the analysis of linked data. • its Java Code available in Google Code • http://purl.oclc.org/NET/SSONDE • licenced as open source code (GNU GPL v3)
  23. 23. 23 SSONDE Framework • R. Albertoni, M. De Martino, SSONDE: Semantic Similarity On liNked Data Entities, 6th Metadata and Semantics Research Conference, 28-30 November 2012 - Cádiz (Spain) [to appear] • Framework Installation & use http://code.google.com/p/ssonde/wiki/GettingStarted Semantic Similarity Theoretical Framework • Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008. • Albertoni R. and De Martino M.;. Semantic similarity of ontology instances tailored on the application context. Full paper at On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, volume 4275 of LNCS, pages 1020–1038. Springer, 2006. Issues adapting theoretical framework to Linked Data • Albertoni R., De Martino M.; Semantic Similarity and Selection of Resources Published According to Linked Data Best Practice, OnToContent 2010, Part of the OTM (OTM'10) Further Applications Comparing EUNIS habitats wrt their species • Albertoni R., De Martino M.; Semantic Technology to Exploit Digital Content Exposed as Linked Data, eChallenges e-2011, 26-28 October 2011 Florence, Italy Comparing shapes metadata (not Linked Data) • Albertoni R., De Martino M.; Using Context Dependent Semantic Similarity to Browse Information Resources: an Application for the Industrial Design, First workshop on multimedia Annotation and Retrieval enabled by Shared Ontologies, Genoa, Italy, (2007) A complete list of references on SSONDE and its Instance Similarity

×