Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018

Using Ontology-based Data Summarization to
Develop Semantics-aware Recommender Systems
Tommaso Di Noia*, Corrado Magarelli* Andrea Maurino°, Matteo Palmonari°, Anisa Rula°**
*Polytechnic University of Bari
°University of Milano-Bicocca
**SDA, University of Bonn
This project has received funding from the European Union’s
Horizon 2020 research and innovation program under grant
agreements n. 732003 and n. 732590

Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
2

Outline
• Experiments
3

Recommender Systems
• Help users in dealing with
information/choice overload
• Help to match users with items
4

Several Recommender Systems
perfectly work without using any
content! (e.g.Amazon)
Collaborative Filtering and Matrix
Factorization are state of the art
techniques for implementing
Recommender Systems
(ACM RecSys 2009,
by Neflix Challenge winners)
Why do we need content?
Content can tackle some issues of collaborative filtering
5

Collaborative Filtering issues: sparsity
6

?
Collaborative Filtering issues: new item problem7

Who knows the «customers who bought…»?
Collaborative Filtering issues: poor explanations!8

Content-based Semantic
Recommendations
• Basic item KNN recommender system
• Given an user u a non rated item i, the rating of i
is predicted by:
where:
• N(i) = neighbors of the non rated item i
• r(u) = the items rated by the user u,
• r(u,j) = the rating value given to the item i by the user
u
Similarity functions:
• Jaccard
• Graph kernels
• Cosine similarity in a vector
space
• … several variants
… all based on subgraphs built
using certain properties9

Content-based Semantic
Recommendations
Similarity functions:
• Jaccard
• Graph kernels
• Cosine similarity in a vector
space
• … several variants
… all based on subgraphs built
using certain properties10

The Feature Selection Problem
• Features = properties for
similarity evaluation
• Ontological properties?
• Categorical properties?
• Frequent properties?
• Feature selection
• Usually performed manually (ex-post)
• With statistical measures [Musto&al.UMAP2016]
• With ontology-based data summaries (this paper)
• Fully automatic feature selection with ABSTAT profiles
• (Manual pre-processing + frequency-based ranking with ABSTAT profiles + graph
kernel similarity [Ragone&al.SAC2017] )
The course of dimensionality
11

Ontology-based Data Summarization
vs. Statistical Techniques
• Statistical measures
• Download the full dataset
• Compute statistical measures over the full dataset
• Keep only the data of interest
• Run the algorithm
• Profiles (efficiently accessible via web)
• Ask for top-k most useful properties
• e.g., via API
• Download only the relevant data
• Run the algorithm
12

Outline
• Experiments
13

Ontology-driven Knowledge Graph
Summarization Profiling with ABSTAT
Minimal Type Patterns: there exist entities that
have Company as minimal type, which are
linked to literals that have gYear as minimal type
by the property foundingYear
Occurrence of types and properties
Frequency and instances: how many times this
pattern occurs as minimal type pattern and as a
pattern. Instances count considers pattern
inference
Cardinality descriptors: max/avg/min number of
different subjects associated with a same object
(and vice versa)
For more details: abstat.disco.unimib.it and [ESWC2016-demo, SUMPRE2016, ESWC2018-demo] 14

Ontology-driven Knowledge Graph
Summarization Profiling with ABSTAT
Minimal Type Patterns: there exist entities that
have Company as minimal type, which are
linked to literals that have gYear as minimal type
by the property foundingYear
Occurrence of types and properties
Frequency and instances: how many times this
pattern occurs as minimal type pattern and as a
pattern. Instances count considers pattern
inference
Cardinality descriptors: max/avg/min number of
different subjects associated with a same object
(and vice versa)
For more details: abstat.disco.unimib.it and [ESWC2016-demo, SUMPRE2016, ESWC2018-demo] 15

ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
16

Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
1
For each pattern π
a property
17

Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
3
For each pattern π
a property
18

Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
a property
2
1
1
1
1
3
1
19

Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
1
For each pattern π
a property
20

Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
2
For each pattern π
a property
21

For each pattern π
a property
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
2
1
2
2
1
2
22

For each pattern π
a property
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
[minO,avgO,maxO]
Global cardinality
descriptors
Local cardinality
descriptors
Thing Thing
[1,5,249] [1,1,13]
cinematogaphy
Film Person
[1,14,249] [1,1,7]
cinematogaphy
[minS,avgS,maxS]
23

Cardinality Descriptors for Feature Selection
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
a property
2
1
1
1
1
3
1
24

Cardinality Descriptors for Feature Selection
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
a property
2
1
1
1
1
3
1
+ frequency!
25

Outline
• Experiments
26

Feature Selection with ABSTAT
FILTERING
(local cardinality
descriptor)
RANKING
(value*)
SELECTION
(k-properties)
PATTERNS
avgS > 1 DESC(frequency) k=2
*values of pattern frequency, local cardinality descriptor, or a combination of the first two.
PROJECTION
(property,
MAX(value(*))
P, MAX(frequency)
Properties
27

Feature Selection with ABSTAT
DESC(frequency*maxS) k=5
*values of pattern frequency, local cardinality descriptor, or a combination of the first two.
P, MAX(frequency*maxS)
FILTERING
(local cardinality
descriptor)
RANKING
(value*)
SELECTION
(k-properties)
PATTERNS
PROJECTION
(property,
MAX(value(*))
28

Feature Selection with IG
• Different statistical measures tested: Information Gain, Information Gain
Ratio, Chi-squared test
• Information Gain: expected reduction in entropy occurring when a feature
is present versus when it is absent.
• For a feature fi , IG is defined as
• where E( I ) is the entropy of the data, Iv is the number of items in which the feature
fi (e.g., director for movies) has a value equal to v (e.g., F.F.Coppola in the movie
domain), and E( Iv ) is the entropy computed on data where the feature fi assumes
value v.
29

Feature Selection with IG: preprocessing
• Manual pre-processing is required
• Reduce redundant or irrelevant features that are expected to bring little value
to the recommendation task, but, at the same time, pose scalability issues
Dataset # of features before pre-processing # of feaatures after pre-processing
Movielens 148 34
LastFM 271 25
LibraryThing 201 22
30

Outline
• Experiments
31

Experimental Settings:
Recommendation Method
• Content-based using an item-based nearest neighbors
algorithm [Di Noia & al.TIST2016]
• Given a set of entities rated by the user (=user profile),
• predict the rate only for the k nearest neighbors of the rated
items
• Jaccard item similarity
Rating prediction on k
most similar items to the
items rated by the user
32

Experimental Settings:
Recommendation Method
• Jaccard similarity
• Measures the values of the selected features that are shared
between two items
33

Experimental Setting: Datasets & Measures
Datasets
• One-to-one mapping between RecSys benchmarks and DBpedia [Di Noia &
al.TIST2016]:
• MovieLens  DBpedia (3883 Movies)
• Last.fm  DBpedia (17632 Artists)
• The Library Thing  DBpedia (37231 Books)
• DBpedia-2015-10, including infoboxes (392M triples)
Metrics
• Accuracy:
• Precision@N: fraction of relevant items in the Top-N recommendations
• MRR@N: average reciprocal rank of the first relevant recommended item
• Diversity:
• catalog coverage: percentage of items in the catalog recommended at least once
• aggregate diversity: aggregate entropy
34

Experimental Setting: Datasets & Measures
• Novelty
• Recommend items in the long tail
• Diversity
• Avoid to recommend only items in a
small subset of the catalog
• Suggest diverse items in the
recommendation list
• Serendipity
• Suggest unexpected but interesting
items
Is all about precision?
35

Experimental Settings: dbo vs. dbp properties
• Number of features/properties: 5, 20
• Which DBpedia? dbo (DBpedia Ontology ) vs. dbp (infobox) properties
• noRep best ranked property between dbo and dbp
• withRep keep duplicates
• Onlydbp in case of duplicates, only the dbp
• Onlydbo in case of duplicates, only the dbo
• IG vs. ABSTAT configurations:
Name Filter by Ranking Intuitive
AbsFreqAvgS AvgS > 1 Frequency Only properties that map at least two distinct subjects to one
obejct (on average) ranked by frequency
AbsFreq*MaxS NO FILTER Frequency*maxS Favors properties that are more frequent and map a higher
number of distinct subjects to one object
AbsMaxS NO FILTER MaxS Favors properties that map a higher number of distinct subjects to
one object
Tf-idf
(baseline)
NO FILTER Tfidf over patterns Favors properties that are more peculiar to the domain type36

MovieLens
• ABSTAT-based FS
almost always
“statistically better”
then statistical
measures
• Few “local” exceptions
• Different configurations
optimize different
measures
• AbsOccAvgS vs
AbsMaxS
• Tfidf ABSTAT baseline
good for aggregate
entropy
Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10
Top K features 5 20 5 20 5 20 5 20
withrep.IG .0658 .1078 .2192 .3417 .3829 .5280 7.56 8.50
withrep.AbsFreqAvgS .1059 .1081 .3380 .3477 .5398 .5253 8.70 8.53
withrep.AbsFreq*MaxS .0967 .1074 .3274 .3541 .5962 .5247 8.87 8.54
withrep.AbsMaxS .0919 .1030 .3065 .3400 .6016 .5698 8.96 8.66
withrep.TfIdf .0565 .0851 .2267 .3326 .4347 .3360 8.36 7.80
norep.IG .0841 .1076 .2961 .3390 .3372 .5226 7.94 8.44
norep.AbsFreqAvgS .1066 .1076 .3388 .3400 .5344 .5208 8.68 8.45
norep.AbsMaxS .0885 .1063 .3075 .3467 .6234 .5550 8.99 8.60
norep.TfIdf .0823 .0856 .2994 .3123 .3520 .3908 7.83 7.99
dbo.IG .0841 .1076 .2961 .3390 .3372 .5226 7.94 8.44
dbo.AbsFreqAvgS .1066 .1067 .3388 .3402 .5344 .5208 8.68 8.51
dbo.AbsMaxS .0885 .1059 .3075 .3464 .6234 .5535 8.99 8.60
dbo.TfIdf .0823 .0856 .2994 .3123 .3520 .3908 7.83 7.99
dbp.IG .0688 .1046 .2134 .3336 .2799 .5065 6.54 8.31
dbp.AbsFreqAvgS .1065 .1059 .3408 .3360 .5426 .5105 8.64 8.31
dbp.AbsMaxS .0908 .1030 .3124 .3396 .6219 .5395 8.98 8.52
dbp.TfIdf .0549 .0745 .1924 .2687 .2530 .3575 6.33 7.41
value = “local” best value = “global” best = highlights on global best results
37

LastFM
• ABSTAT-based FS better
or comparable
• In most cases, not
“statistically better”
• Reminder: still advantage
for
• not running statistical
measures on the full
dataset
• no manual
preprocessing
• Tfidf ABSTAT baseline
good for aggregate
entropy
Top K features 5 20 5 20 5 20 5 20
withrep.IG .0501 .1325 .2283 .4102 0.4290 0.5051 11 11.18
withrep.AbsFreqAvgS .1330 .1320 .4047 .4105 0.4812 0.5036 11.1 11.18
withrep.AbsFreq*MaxS .1102 .1227 .3649 .3749 0.5500 0.5332 11.4 11.36
withrep.AbsMaxS .0371 .1156 .1249 .3691 0.1680 0.5440 9.79 11.39
withrep.TfIdf .1017 .1158 .2960 .3584 0.4210 0.4602 10.86 10.97
norep.IG .0501 .1311 .2283 .4040 0.429 0.5018 11 11.17
norep.AbsFreqAvgS .1305 .1307 .3994 .4074 0.489 0.5019 11.11 11.15
norep.AbsFreq*MaxS .1062 .1228 .3546 .3708 0.5362 0.5161 11.4 11.29
norep.AbsMaxS .0392 .1227 .1952 .3715 0.452 0.5344 11.09 11.34
norep.TfIdf .1024 .1132 .3064 .3554 0.4026 0.4508 10.76 10.96
dbo.IG .0411 .1319 .1989 .4083 0.4425 0.5053 11.06 11.2
dbo.AbsFreqAvgS .1283 .1292 .3986 .4063 0.4915 0.4949 11.14 11.14
dbo.AbsFreq*MaxS .1062 .1214 .3546 .3710 0.5362 0.5109 11.4 11.27
dbo.AbsMaxS .0381 .1211 .1927 .3727 0.4291 0.5222 10.97 11.31
dbo.TfIdf .1024 .1132 .3064 .3554 0.4026 0.4508 10.76 10.96
dbp.IG .0678 .1319 .2553 .4083 0.4364 0.5053 10.83 11.2
dbp.AbsFreqAvgS .1319 .1316 .4026 .4113 0.4926 0.5055 11.14 11.2
dbp.AbsFreq*MaxS .1065 .1239 .3580 .3773 0.5444 0.527 11.42 11.36
dbp.AbsMaxS .0401 .1105 .1969 .3553 0.4528 0.5447 11.08 11.42
dbp.TfIdf .079 .1170 .2371 .3572 0.3894 0.4698 10.69 11.04
38

The Library Thing
• ABSTAT-based FS obtains
global better but
• IG better for Catalog
Coverage with 20 feature
• IG better for some duplicate
property management
strategies
• Differences are stastistical
relevant only in some cases
• Tfidf ABSTAT baseline good
for aggregate entropy
Top K features 5 20 5 20 5 20 5 20
withrep.IG .0576 .0588 .2348 .2273 .3983 .4034 10.47 10.44
withrep.AbsOccAvgS .0458 .0568 .2003 .2343 .3670 .4014 10.28 10.50
withrep.AbsOcc*MaxS .0457 .0560 .2116 .2355 .3854 .3826 10.54 10.21
withrep.AbsMaxS .0571 .0567 .2319 .2360 .3689 .4011 10.24 10.29
withrep.TfIdf .0215 .0145 .1607 .1202 .1314 .2349 8.81 9.75
norep.IG .0571 .0579 .2346 .2274 .3988 .4037 10.47 10.44
norep.AbsOccAvgS .0561 .0593 .2328 .2329 .3982 .4030 10.54 10.48
norep.AbsOcc*MaxS .0459 .0570 .2119 .2372 .3852 .3809 10.54 10.15
norep.AbsMaxS .0541 .0567 .2301 .2365 .3653 .4008 10.24 10.29
norep.TfIdf .0215 .0138 .1608 .1211 .1314 .2877 8.81 10.26
dbo.IG .0571 .0579 .2346 .2274 .3988 .4037 10.47 10.44
dbo.AbsOccAvgS .0561 .0593 .2328 .2329 .3982 .4030 10.54 10.48
dbo.AbsOcc*MaxS .0459 .0570 .2119 .2372 .3852 .3809 10.54 10.15
dbo.AbsMaxS .0541 .0567 .2301 .2365 .3653 .4008 10.24 10.29
dbo.TfIdf .0579 .0605 .2374 .2477 .4086 .3991 10.55 10.20
dbp.IG .0586 .0586 .2350 .2299 .4027 .4043 10.49 10.4
dbp.AbsOccAvgS .0623 .0612 .2467 .2342 .3943 .4043 10.42 10.45
dbp.AbsOcc*MaxS .0464 .0606 .2126 .2504 .3862 .3797 10.53 10.07
dbp.AbsMaxS .0571 .0592 .2318 .2398 .3689 .4002 10.24 10.22
dbp.TfIdf .0215 .0132 .1608 .1218 .1314 .2696 8.81 9.96
39

Outline
• Experiments
40

Conclusions & Future Work
• Conclusions
• Fully automatic feature selection method with ontology-based knowledge
graph summaries (ABSTAT)
• Better or, in some cases, comparable to statistical measures, but without
requiring computation over the full dataset
• Additional evidence of informativeness of ABSTAT-based summaries
• Future work
• Add Tfidf in ABSTAT stats
• Experiments with additional measures (e.g., graph-based measures with path
longer than 1)
• API-based suggestion of most salient properties for an input entity type
41

Contacts: palmonari@disco.unimib.it - tommaso.dinoia@poliba.it
This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements n. 732003 and n. 732590
Supporting Event and Weather-
based Data Analytics and
Marketing along the Shopper
Journeywww.ew-shopp.eu
Enabling the European Business Graph for
Innovative Data Products and Services
www.eubusinessgraph.eu/
Experiments & code:
https://zenodo.org/record/12
05712# .WrRCypPwa3U
http://ow.ly/zAA530d0wu0
ABSTAT (open source) code:
https://bitbucket.org/disco_u
nimib/abstat-core
ABSTAT home:
abstat.disco.unimib.it
42

Appendix: Explanations for Better/worst
Performance
Domain Type # Minimal Patterns Avg # Triples Variance
Movies dbo:Film 57757 74.02 549.31
Books dbo:Book 41684 44.97 169.48
Music dbo:Artist 40491 80.50 981.51
43

Appendix: Explanations for Better/worst
Performance
44
Top 20 selected features for the MovieLens dataset by using the different configurations of IG and AbsFreqAvgS.

Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018

Recommended

Recommended

More Related Content

Similar to Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018

Similar to Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018 (20)

More from Università degli Studi di Milano-Bicocca

More from Università degli Studi di Milano-Bicocca (8)

Recently uploaded

Recently uploaded (20)

Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018

Editor's Notes