An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

.nju.edu.cn

An Empirical Study of Vocabulary Relatedness
and Its Application to Recommender Systems

Gong Cheng, Saisai Gong, Yuzhong Qu
State Key Laboratory for Novel Software Technology, Nanjing University, China
gcheng@nju.edu.cn

Presented at ISWC2011

ws .nju.edu.cn
Measuring term similarity

0.9

FacultyMember Faculty

FullProfessor 0.8 Professor
AssistantProfessor
AssistantProfessor
Vocabulary matching 1.0

Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 36

ws .nju.edu.cn
Measuring vocabulary similarity

Semantic Web for Research
Communities (SWRC)
Foundational Model of
Anatomy (FMA)
0.8 0.5
Vocabulary distance
GALEN 0.6
0.02
eBiquity Person 0.5

NCBI organismal classification
Vocabulary matching (NCBITaxon)


ws .nju.edu.cn
Measuring vocabulary relatedness

Vocabulary relatedness

FacultyMember Postgraduate-Research-
Degree

Vocabulary distance

FullProfessor

PhD EngD
AssistantProfessor
Vocabulary matching
not that similar, but somewhat related


Contributions
ws .nju.edu.cn

How to measure vocabulary relatedness?
6 measures, from 4 aspects

How about vocabulary relatedness in real-life cases?
Empirical analysis of 2,996 vocabularies and other 4 billion RDF triples

Where to apply vocabulary relatedness?
Post-selection vocabulary recommendation in vocabulary search


Outline
ws .nju.edu.cn

Data set
Post-selection vocabulary recommendation
Conclusions


Data set statistics
ws .nju.edu.cn

Crawled from February 2010 to May 2011 by


Data set distributions
ws .nju.edu.cn

RDF documents over pay-level domains


Data set distributions
ws .nju.edu.cn

Vocabularies over top-level domains


Outline
ws .nju.edu.cn

Data set
Conclusions


ws .nju.edu.cn

6 numerical measures, from 4 aspects
Semantic relatedness
Explicit
Implicit
Hybrid
Content similarity
Expressivity closeness
Distributional relatedness
Comparison


Measure 1: explicit semantic relatedness
ws .nju.edu.cn
E 1
RS v i , v j
weight of a shortestpathbetween vi and v j in GE

1 2
GE v1 v2 v3

owl:imports owl:priorVersion
v1 v3
v2
rdfs:seeAlso


Measure 2: implicit semantic relatedness
ws .nju.edu.cn
I 1
RS v i , v j
weight of a shortestpathbetween vi and v j in GI

1 2
GI v2 v3 v4

owl:inverseOf rdfs:subClassOf
t2 t4
t3
owl:inverseOf

v2 v3 v4

Measure 3: hybrid semantic relatedness
ws .nju.edu.cn
E I 1
RS vi , v j
weight of a shortestpathbetween vi and v j in GE I

1 v2
GE+I 1 v4
v1
2
v3


Empirical analysis (1)
ws .nju.edu.cn

Statistical properties of GE, GI and GE+I


ws .nju.edu.cn

Explicit relations between vocabularies


Measure 4: content similarity
ws .nju.edu.cn

Harmonic mean

Maximum similarity between their labels


ws .nju.edu.cn

86 label-like properties
rdfs:label, dc:title, and their subproperties (e.g. skos:prefLabel)
and local name

Terms and their labels Vocabulary distribution

36.33% 36.21%
63.67% w/ w/
63.79%
w/o w/o


Measure 5: expressivity closeness
ws .nju.edu.cn

tp owl:TransitiveProperty
MetaTerms
rdfs:domain
owl:TransitiveProperty
owl:inverseOf
rdf:type
tq tr

Jaccard


ws .nju.edu.cn

4,978 meta-level terms, 469 (9.42%) in >1 vocabulary
Most popular meta-level terms
1. rdf:type
2. rdfs:domain
3. rdfs:range
4. …
and after excluding language constructs

10.13 meta-level terms per vocabulary
≤20 meta-level terms in 92.96% vocabularies
but hundreds in Cyc


Measure 6: distributional relatedness
ws .nju.edu.cn

Distributional profile

p v1 | v
p v2 | v
DP v RD vi , v j cos DP vi , DP v j
...
p vn | v


ws .nju.edu.cn

Instantiation found for 1,874 (62.55%) vocabularies

Most popular vocabularies (excluding languages)


ws .nju.edu.cn

Co-instantiation found for 9,763 pairs of vocabularies

Most popular vocabulary co-instantiation (excluding languages)


ws .nju.edu.cn

6 numerical measures, from 4 aspects
Semantic relatedness
Explicit
Implicit
Hybrid
Content similarity
Expressivity closeness
Distributional relatedness
Comparison


Agreement between measures
ws .nju.edu.cn

Spearman’s rank correlation coefficient (ρ∈[-1,1])

Single-link hierarchical clustering


Outline
ws .nju.edu.cn

Data set
Conclusions


Relatedness-based ranking
ws .nju.edu.cn

Ranking by single measure:

Ranking by multiple measures:


Popularity-based re-ranking
ws .nju.edu.cn

Degree of influence of popularity

Number of pay-level domains instantiating vi


Evaluation settings
ws .nju.edu.cn

20 “selections” randomly selected from 1,302 moderate-sized vocabularies
Depth-10 pooling with

2 experts
Ratings
Closely related: 2
Somewhat related: 1
Unrelated: 0

Metric: NDCG


Gold standard
ws .nju.edu.cn

739 assessments
Assessments
7.85% Closely related
10.55%

81.60% Somewhat related

Unrelated

Agreement between experts
80%
or 91% when “closely related = somewhat related = related”


Evaluation results --- individual measures
ws .nju.edu.cn

56.88% isolated vocabularies in GE 37.45% uninstantiated vocabularies


Evaluation results --- combinations of measures
ws .nju.edu.cn


Relatedness vs. popularity
ws .nju.edu.cn

NDCG@1 vs. number of pay-level domains instantiating it


Outline
ws .nju.edu.cn

Data set
Conclusions


Conclusions
ws .nju.edu.cn

Vocabulary-level relatedness
4 aspects, 6 measures
Empirical analysis
Statistical findings
Comparison
Relatedness-based ranking
Popularity-based re-ranking
Evaluation

Falcons Ontology Search
http://ws.nju.edu.cn/falcons/ontologysearch/


Take away
ws .nju.edu.cn

Vocabulary meta-descriptions are incomplete.
Terms lack labels.
Co-instantiated ∝ explicitly related

http://ws.nju.edu.cn/falcons/ontologysearch/


An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

Similar to An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems (7)

More from Gong Cheng

More from Gong Cheng (18)

Recently uploaded

Recently uploaded (20)

An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems