SlideShare a Scribd company logo
Using Ontology-based Data Summarization to
Develop Semantics-aware Recommender Systems
Tommaso Di Noia*, Corrado Magarelli* Andrea Maurino°, Matteo Palmonari°, Anisa Rula°**
*Polytechnic University of Bari
°University of Milano-Bicocca
**SDA, University of Bonn
This project has received funding from the European Union’s
Horizon 2020 research and innovation program under grant
agreements n. 732003 and n. 732590
Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
2
Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
3
Recommender Systems
• Help users in dealing with
information/choice overload
• Help to match users with items
4
Several Recommender Systems
perfectly work without using any
content! (e.g.Amazon)
Collaborative Filtering and Matrix
Factorization are state of the art
techniques for implementing
Recommender Systems
(ACM RecSys 2009,
by Neflix Challenge winners)
Why do we need content?
Content can tackle some issues of collaborative filtering
5
Collaborative Filtering issues: sparsity
Why do we need content?
6
Why do we need content?
?
Collaborative Filtering issues: new item problem7
Why do we need content?
Who knows the «customers who bought…»?
Collaborative Filtering issues: poor explanations!8
Content-based Semantic
Recommendations
• Basic item KNN recommender system
• Given an user u a non rated item i, the rating of i
is predicted by:
where:
• N(i) = neighbors of the non rated item i
• r(u) = the items rated by the user u,
• r(u,j) = the rating value given to the item i by the user
u
Similarity functions:
• Jaccard
• Graph kernels
• Cosine similarity in a vector
space
• … several variants
… all based on subgraphs built
using certain properties9
Content-based Semantic
Recommendations
Similarity functions:
• Jaccard
• Graph kernels
• Cosine similarity in a vector
space
• … several variants
… all based on subgraphs built
using certain properties10
The Feature Selection Problem
• Features = properties for
similarity evaluation
• Ontological properties?
• Categorical properties?
• Frequent properties?
• Feature selection
• Usually performed manually (ex-post)
• With statistical measures [Musto&al.UMAP2016]
• With ontology-based data summaries (this paper)
• Fully automatic feature selection with ABSTAT profiles
• (Manual pre-processing + frequency-based ranking with ABSTAT profiles + graph
kernel similarity [Ragone&al.SAC2017] )
The course of dimensionality
11
Ontology-based Data Summarization
vs. Statistical Techniques
• Statistical measures
• Download the full dataset
• Compute statistical measures over the full dataset
• Keep only the data of interest
• Run the algorithm
• Profiles (efficiently accessible via web)
• Ask for top-k most useful properties
• e.g., via API
• Download only the relevant data
• Run the algorithm
12
Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
13
Ontology-driven Knowledge Graph
Summarization Profiling with ABSTAT
Minimal Type Patterns: there exist entities that
have Company as minimal type, which are
linked to literals that have gYear as minimal type
by the property foundingYear
Occurrence of types and properties
Frequency and instances: how many times this
pattern occurs as minimal type pattern and as a
pattern. Instances count considers pattern
inference
Cardinality descriptors: max/avg/min number of
different subjects associated with a same object
(and vice versa)
For more details: abstat.disco.unimib.it and [ESWC2016-demo, SUMPRE2016, ESWC2018-demo] 14
Ontology-driven Knowledge Graph
Summarization Profiling with ABSTAT
Minimal Type Patterns: there exist entities that
have Company as minimal type, which are
linked to literals that have gYear as minimal type
by the property foundingYear
Occurrence of types and properties
Frequency and instances: how many times this
pattern occurs as minimal type pattern and as a
pattern. Instances count considers pattern
inference
Cardinality descriptors: max/avg/min number of
different subjects associated with a same object
(and vice versa)
For more details: abstat.disco.unimib.it and [ESWC2016-demo, SUMPRE2016, ESWC2018-demo] 15
ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
16
ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
1
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
17
ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
3
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
18
ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
2
1
1
1
1
3
1
19
ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
1
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
20
ABSTAT: Cardinality Descriptors
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
2
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
21
ABSTAT: Cardinality Descriptors
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
2
1
2
2
1
2
22
ABSTAT: Cardinality Descriptors
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
[minO,avgO,maxO]
Global cardinality
descriptors
Local cardinality
descriptors
Thing Thing
[1,5,249] [1,1,13]
cinematogaphy
Film Person
[1,14,249] [1,1,7]
cinematogaphy
[minS,avgS,maxS]
23
Cardinality Descriptors for Feature Selection
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
2
1
1
1
1
3
1
24
Cardinality Descriptors for Feature Selection
Subjects Objects
MinS(π) = 1
MaxS(π) = 3
AvgS(π) = 1,4 ≈ 1
MinO(π) = 1
MaxO(π) = 2
AvgO(π) = 1,7 ≈ 2
For each pattern π
• MinS, AvgS, MaxS
• Min/Avg/Max number of distinct
subjects associated with unique objects
in the triples represented by π
• MinO, AvgO, MaxO
• Min/Avg/Max number of distinct objects
associated with unique subjects in the
triples represented by π
• Local vs. global
• Local: for patterns
• Global: for properties, i.e., all triples with
a property
2
1
1
1
1
3
1
+ frequency!
25
Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
26
Feature Selection with ABSTAT
FILTERING
(local cardinality
descriptor)
RANKING
(value*)
SELECTION
(k-properties)
PATTERNS
avgS > 1 DESC(frequency) k=2
*values of pattern frequency, local cardinality descriptor, or a combination of the first two.
PROJECTION
(property,
MAX(value(*))
P, MAX(frequency)
Properties
27
Feature Selection with ABSTAT
DESC(frequency*maxS) k=5
*values of pattern frequency, local cardinality descriptor, or a combination of the first two.
P, MAX(frequency*maxS)
FILTERING
(local cardinality
descriptor)
RANKING
(value*)
SELECTION
(k-properties)
PATTERNS
PROJECTION
(property,
MAX(value(*))
28
Feature Selection with IG
• Different statistical measures tested: Information Gain, Information Gain
Ratio, Chi-squared test
• Information Gain: expected reduction in entropy occurring when a feature
is present versus when it is absent.
• For a feature fi , IG is defined as
• where E( I ) is the entropy of the data, Iv is the number of items in which the feature
fi (e.g., director for movies) has a value equal to v (e.g., F.F.Coppola in the movie
domain), and E( Iv ) is the entropy computed on data where the feature fi assumes
value v.
29
Feature Selection with IG: preprocessing
• Manual pre-processing is required
• Reduce redundant or irrelevant features that are expected to bring little value
to the recommendation task, but, at the same time, pose scalability issues
Dataset # of features before pre-processing # of feaatures after pre-processing
Movielens 148 34
LastFM 271 25
LibraryThing 201 22
30
Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
31
Experimental Settings:
Recommendation Method
• Content-based using an item-based nearest neighbors
algorithm [Di Noia & al.TIST2016]
• Given a set of entities rated by the user (=user profile),
• predict the rate only for the k nearest neighbors of the rated
items
• Jaccard item similarity
Rating prediction on k
most similar items to the
items rated by the user
32
Experimental Settings:
Recommendation Method
• Jaccard similarity
• Measures the values of the selected features that are shared
between two items
33
Experimental Setting: Datasets & Measures
Datasets
• One-to-one mapping between RecSys benchmarks and DBpedia [Di Noia &
al.TIST2016]:
• MovieLens  DBpedia (3883 Movies)
• Last.fm  DBpedia (17632 Artists)
• The Library Thing  DBpedia (37231 Books)
• DBpedia-2015-10, including infoboxes (392M triples)
Metrics
• Accuracy:
• Precision@N: fraction of relevant items in the Top-N recommendations
• MRR@N: average reciprocal rank of the first relevant recommended item
• Diversity:
• catalog coverage: percentage of items in the catalog recommended at least once
• aggregate diversity: aggregate entropy
34
Experimental Setting: Datasets & Measures
• Novelty
• Recommend items in the long tail
• Diversity
• Avoid to recommend only items in a
small subset of the catalog
• Suggest diverse items in the
recommendation list
• Serendipity
• Suggest unexpected but interesting
items
Is all about precision?
35
Experimental Settings: dbo vs. dbp properties
• Number of features/properties: 5, 20
• Which DBpedia? dbo (DBpedia Ontology ) vs. dbp (infobox) properties
• noRep best ranked property between dbo and dbp
• withRep keep duplicates
• Onlydbp in case of duplicates, only the dbp
• Onlydbo in case of duplicates, only the dbo
• IG vs. ABSTAT configurations:
Name Filter by Ranking Intuitive
AbsFreqAvgS AvgS > 1 Frequency Only properties that map at least two distinct subjects to one
obejct (on average) ranked by frequency
AbsFreq*MaxS NO FILTER Frequency*maxS Favors properties that are more frequent and map a higher
number of distinct subjects to one object
AbsMaxS NO FILTER MaxS Favors properties that map a higher number of distinct subjects to
one object
Tf-idf
(baseline)
NO FILTER Tfidf over patterns Favors properties that are more peculiar to the domain type36
MovieLens
• ABSTAT-based FS
almost always
“statistically better”
then statistical
measures
• Few “local” exceptions
• Different configurations
optimize different
measures
• AbsOccAvgS vs
AbsMaxS
• Tfidf ABSTAT baseline
good for aggregate
entropy
Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10
Top K features 5 20 5 20 5 20 5 20
withrep.IG .0658 .1078 .2192 .3417 .3829 .5280 7.56 8.50
withrep.AbsFreqAvgS .1059 .1081 .3380 .3477 .5398 .5253 8.70 8.53
withrep.AbsFreq*MaxS .0967 .1074 .3274 .3541 .5962 .5247 8.87 8.54
withrep.AbsMaxS .0919 .1030 .3065 .3400 .6016 .5698 8.96 8.66
withrep.TfIdf .0565 .0851 .2267 .3326 .4347 .3360 8.36 7.80
norep.IG .0841 .1076 .2961 .3390 .3372 .5226 7.94 8.44
norep.AbsFreqAvgS .1066 .1076 .3388 .3400 .5344 .5208 8.68 8.45
norep.AbsMaxS .0885 .1063 .3075 .3467 .6234 .5550 8.99 8.60
norep.TfIdf .0823 .0856 .2994 .3123 .3520 .3908 7.83 7.99
dbo.IG .0841 .1076 .2961 .3390 .3372 .5226 7.94 8.44
dbo.AbsFreqAvgS .1066 .1067 .3388 .3402 .5344 .5208 8.68 8.51
dbo.AbsMaxS .0885 .1059 .3075 .3464 .6234 .5535 8.99 8.60
dbo.TfIdf .0823 .0856 .2994 .3123 .3520 .3908 7.83 7.99
dbp.IG .0688 .1046 .2134 .3336 .2799 .5065 6.54 8.31
dbp.AbsFreqAvgS .1065 .1059 .3408 .3360 .5426 .5105 8.64 8.31
dbp.AbsMaxS .0908 .1030 .3124 .3396 .6219 .5395 8.98 8.52
dbp.TfIdf .0549 .0745 .1924 .2687 .2530 .3575 6.33 7.41
value = “local” best value = “global” best = highlights on global best results
37
LastFM
• ABSTAT-based FS better
or comparable
• In most cases, not
“statistically better”
• Reminder: still advantage
for
• not running statistical
measures on the full
dataset
• no manual
preprocessing
• Tfidf ABSTAT baseline
good for aggregate
entropy
Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10
Top K features 5 20 5 20 5 20 5 20
withrep.IG .0501 .1325 .2283 .4102 0.4290 0.5051 11 11.18
withrep.AbsFreqAvgS .1330 .1320 .4047 .4105 0.4812 0.5036 11.1 11.18
withrep.AbsFreq*MaxS .1102 .1227 .3649 .3749 0.5500 0.5332 11.4 11.36
withrep.AbsMaxS .0371 .1156 .1249 .3691 0.1680 0.5440 9.79 11.39
withrep.TfIdf .1017 .1158 .2960 .3584 0.4210 0.4602 10.86 10.97
norep.IG .0501 .1311 .2283 .4040 0.429 0.5018 11 11.17
norep.AbsFreqAvgS .1305 .1307 .3994 .4074 0.489 0.5019 11.11 11.15
norep.AbsFreq*MaxS .1062 .1228 .3546 .3708 0.5362 0.5161 11.4 11.29
norep.AbsMaxS .0392 .1227 .1952 .3715 0.452 0.5344 11.09 11.34
norep.TfIdf .1024 .1132 .3064 .3554 0.4026 0.4508 10.76 10.96
dbo.IG .0411 .1319 .1989 .4083 0.4425 0.5053 11.06 11.2
dbo.AbsFreqAvgS .1283 .1292 .3986 .4063 0.4915 0.4949 11.14 11.14
dbo.AbsFreq*MaxS .1062 .1214 .3546 .3710 0.5362 0.5109 11.4 11.27
dbo.AbsMaxS .0381 .1211 .1927 .3727 0.4291 0.5222 10.97 11.31
dbo.TfIdf .1024 .1132 .3064 .3554 0.4026 0.4508 10.76 10.96
dbp.IG .0678 .1319 .2553 .4083 0.4364 0.5053 10.83 11.2
dbp.AbsFreqAvgS .1319 .1316 .4026 .4113 0.4926 0.5055 11.14 11.2
dbp.AbsFreq*MaxS .1065 .1239 .3580 .3773 0.5444 0.527 11.42 11.36
dbp.AbsMaxS .0401 .1105 .1969 .3553 0.4528 0.5447 11.08 11.42
dbp.TfIdf .079 .1170 .2371 .3572 0.3894 0.4698 10.69 11.04
value = “local” best value = “global” best = highlights on global best results
38
The Library Thing
• ABSTAT-based FS obtains
global better but
• IG better for Catalog
Coverage with 20 feature
• IG better for some duplicate
property management
strategies
• Differences are stastistical
relevant only in some cases
• Tfidf ABSTAT baseline good
for aggregate entropy
Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10
Top K features 5 20 5 20 5 20 5 20
withrep.IG .0576 .0588 .2348 .2273 .3983 .4034 10.47 10.44
withrep.AbsOccAvgS .0458 .0568 .2003 .2343 .3670 .4014 10.28 10.50
withrep.AbsOcc*MaxS .0457 .0560 .2116 .2355 .3854 .3826 10.54 10.21
withrep.AbsMaxS .0571 .0567 .2319 .2360 .3689 .4011 10.24 10.29
withrep.TfIdf .0215 .0145 .1607 .1202 .1314 .2349 8.81 9.75
norep.IG .0571 .0579 .2346 .2274 .3988 .4037 10.47 10.44
norep.AbsOccAvgS .0561 .0593 .2328 .2329 .3982 .4030 10.54 10.48
norep.AbsOcc*MaxS .0459 .0570 .2119 .2372 .3852 .3809 10.54 10.15
norep.AbsMaxS .0541 .0567 .2301 .2365 .3653 .4008 10.24 10.29
norep.TfIdf .0215 .0138 .1608 .1211 .1314 .2877 8.81 10.26
dbo.IG .0571 .0579 .2346 .2274 .3988 .4037 10.47 10.44
dbo.AbsOccAvgS .0561 .0593 .2328 .2329 .3982 .4030 10.54 10.48
dbo.AbsOcc*MaxS .0459 .0570 .2119 .2372 .3852 .3809 10.54 10.15
dbo.AbsMaxS .0541 .0567 .2301 .2365 .3653 .4008 10.24 10.29
dbo.TfIdf .0579 .0605 .2374 .2477 .4086 .3991 10.55 10.20
dbp.IG .0586 .0586 .2350 .2299 .4027 .4043 10.49 10.4
dbp.AbsOccAvgS .0623 .0612 .2467 .2342 .3943 .4043 10.42 10.45
dbp.AbsOcc*MaxS .0464 .0606 .2126 .2504 .3862 .3797 10.53 10.07
dbp.AbsMaxS .0571 .0592 .2318 .2398 .3689 .4002 10.24 10.22
dbp.TfIdf .0215 .0132 .1608 .1218 .1314 .2696 8.81 9.96
value = “local” best value = “global” best = highlights on global best results
39
Outline
• Feature Selection for Semantics-aware Recommender Systems
• Ontology-based Data Summarization with ABSTAT
• Feature Selection (ABSTAT vs. Information Gain)
• Experiments
• Conclusions and Future Work
40
Conclusions & Future Work
• Conclusions
• Fully automatic feature selection method with ontology-based knowledge
graph summaries (ABSTAT)
• Better or, in some cases, comparable to statistical measures, but without
requiring computation over the full dataset
• Additional evidence of informativeness of ABSTAT-based summaries
• Future work
• Add Tfidf in ABSTAT stats
• Experiments with additional measures (e.g., graph-based measures with path
longer than 1)
• API-based suggestion of most salient properties for an input entity type
41
Contacts: palmonari@disco.unimib.it - tommaso.dinoia@poliba.it
This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements n. 732003 and n. 732590
Supporting Event and Weather-
based Data Analytics and
Marketing along the Shopper
Journeywww.ew-shopp.eu
Enabling the European Business Graph for
Innovative Data Products and Services
www.eubusinessgraph.eu/
Experiments & code:
https://zenodo.org/record/12
05712# .WrRCypPwa3U
http://ow.ly/zAA530d0wu0
ABSTAT (open source) code:
https://bitbucket.org/disco_u
nimib/abstat-core
ABSTAT home:
abstat.disco.unimib.it
42
Appendix: Explanations for Better/worst
Performance
Domain Type # Minimal Patterns Avg # Triples Variance
Movies dbo:Film 57757 74.02 549.31
Books dbo:Book 41684 44.97 169.48
Music dbo:Artist 40491 80.50 981.51
43
Appendix: Explanations for Better/worst
Performance
44
Top 20 selected features for the MovieLens dataset by using the different con- figurations of IG and AbsFreqAvgS.

More Related Content

Similar to Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018

Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Julie Iskander
 
Spatial Data Mining : Seminar
Spatial Data Mining : SeminarSpatial Data Mining : Seminar
Spatial Data Mining : Seminar
Ipsit Dash
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
Rebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
PyData
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
Atner Yegorov
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
Hiye Biniam
 
An Answer Set Programming based framework for High-Utility Pattern Mining ext...
An Answer Set Programming based framework for High-Utility Pattern Mining ext...An Answer Set Programming based framework for High-Utility Pattern Mining ext...
An Answer Set Programming based framework for High-Utility Pattern Mining ext...
Francesco Cauteruccio
 
Machine Learning Comparative Analysis - Part 1
Machine Learning Comparative Analysis - Part 1Machine Learning Comparative Analysis - Part 1
Machine Learning Comparative Analysis - Part 1
Kaniska Mandal
 
Applications
ApplicationsApplications
Applications
Edward Blurock
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
Amund Tveit
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
Soumya Mukherjee
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
Spatial co location pattern mining
Spatial co location pattern miningSpatial co location pattern mining
Spatial co location pattern mining
Seung Kwan Kim
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
Experfy
 
University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...
geraintduck
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
hoangminhdong
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
2015 10-08 - additive manufacturing software 1
2015 10-08 - additive manufacturing software  12015 10-08 - additive manufacturing software  1
2015 10-08 - additive manufacturing software 1
Biofabrication Group at University of Pisa
 
How to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesHow to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical Features
Domino Data Lab
 

Similar to Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018 (20)

Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Spatial Data Mining : Seminar
Spatial Data Mining : SeminarSpatial Data Mining : Seminar
Spatial Data Mining : Seminar
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
 
An Answer Set Programming based framework for High-Utility Pattern Mining ext...
An Answer Set Programming based framework for High-Utility Pattern Mining ext...An Answer Set Programming based framework for High-Utility Pattern Mining ext...
An Answer Set Programming based framework for High-Utility Pattern Mining ext...
 
Machine Learning Comparative Analysis - Part 1
Machine Learning Comparative Analysis - Part 1Machine Learning Comparative Analysis - Part 1
Machine Learning Comparative Analysis - Part 1
 
Applications
ApplicationsApplications
Applications
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
Spatial co location pattern mining
Spatial co location pattern miningSpatial co location pattern mining
Spatial co location pattern mining
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
 
University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...University of Manchester Symposium 2012: Extraction and Representation of in ...
University of Manchester Symposium 2012: Extraction and Representation of in ...
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
2015 10-08 - additive manufacturing software 1
2015 10-08 - additive manufacturing software  12015 10-08 - additive manufacturing software  1
2015 10-08 - additive manufacturing software 1
 
How to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical FeaturesHow to Effectively Combine Numerical Features and Categorical Features
How to Effectively Combine Numerical Features and Categorical Features
 

More from Università degli Studi di Milano-Bicocca

Semantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop PerspectiveSemantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop Perspective
Università degli Studi di Milano-Bicocca
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
Università degli Studi di Milano-Bicocca
 
EW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and SolutionsEW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and Solutions
Università degli Studi di Milano-Bicocca
 
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
Università degli Studi di Milano-Bicocca
 
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Università degli Studi di Milano-Bicocca
 
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Università degli Studi di Milano-Bicocca
 
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Università degli Studi di Milano-Bicocca
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
Università degli Studi di Milano-Bicocca
 

More from Università degli Studi di Milano-Bicocca (8)

Semantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop PerspectiveSemantic Data Enrichment: a Human-in-the-Loop Perspective
Semantic Data Enrichment: a Human-in-the-Loop Perspective
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
 
EW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and SolutionsEW-Shopp: Interoperability Challenges and Solutions
EW-Shopp: Interoperability Challenges and Solutions
 
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...EW-Shopp: Supporting Event and Weather-basedData Analytics and Marketing alo...
EW-Shopp: Supporting Event and Weather-based Data Analytics and Marketing alo...
 
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
 
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
Facet Annotation Using Reference Knowledge Bases - The Web Conference 2018 (R...
 
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
Pay-as-you-go Multi-User Feedback Model for Ontology Matching - EKAW2014
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 

Recently uploaded

(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 

Recently uploaded (20)

(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 

Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems. ESWC 2018

  • 1. Using Ontology-based Data Summarization to Develop Semantics-aware Recommender Systems Tommaso Di Noia*, Corrado Magarelli* Andrea Maurino°, Matteo Palmonari°, Anisa Rula°** *Polytechnic University of Bari °University of Milano-Bicocca **SDA, University of Bonn This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements n. 732003 and n. 732590
  • 2. Outline • Feature Selection for Semantics-aware Recommender Systems • Ontology-based Data Summarization with ABSTAT • Feature Selection (ABSTAT vs. Information Gain) • Experiments • Conclusions and Future Work 2
  • 3. Outline • Feature Selection for Semantics-aware Recommender Systems • Ontology-based Data Summarization with ABSTAT • Feature Selection (ABSTAT vs. Information Gain) • Experiments • Conclusions and Future Work 3
  • 4. Recommender Systems • Help users in dealing with information/choice overload • Help to match users with items 4
  • 5. Several Recommender Systems perfectly work without using any content! (e.g.Amazon) Collaborative Filtering and Matrix Factorization are state of the art techniques for implementing Recommender Systems (ACM RecSys 2009, by Neflix Challenge winners) Why do we need content? Content can tackle some issues of collaborative filtering 5
  • 6. Collaborative Filtering issues: sparsity Why do we need content? 6
  • 7. Why do we need content? ? Collaborative Filtering issues: new item problem7
  • 8. Why do we need content? Who knows the «customers who bought…»? Collaborative Filtering issues: poor explanations!8
  • 9. Content-based Semantic Recommendations • Basic item KNN recommender system • Given an user u a non rated item i, the rating of i is predicted by: where: • N(i) = neighbors of the non rated item i • r(u) = the items rated by the user u, • r(u,j) = the rating value given to the item i by the user u Similarity functions: • Jaccard • Graph kernels • Cosine similarity in a vector space • … several variants … all based on subgraphs built using certain properties9
  • 10. Content-based Semantic Recommendations Similarity functions: • Jaccard • Graph kernels • Cosine similarity in a vector space • … several variants … all based on subgraphs built using certain properties10
  • 11. The Feature Selection Problem • Features = properties for similarity evaluation • Ontological properties? • Categorical properties? • Frequent properties? • Feature selection • Usually performed manually (ex-post) • With statistical measures [Musto&al.UMAP2016] • With ontology-based data summaries (this paper) • Fully automatic feature selection with ABSTAT profiles • (Manual pre-processing + frequency-based ranking with ABSTAT profiles + graph kernel similarity [Ragone&al.SAC2017] ) The course of dimensionality 11
  • 12. Ontology-based Data Summarization vs. Statistical Techniques • Statistical measures • Download the full dataset • Compute statistical measures over the full dataset • Keep only the data of interest • Run the algorithm • Profiles (efficiently accessible via web) • Ask for top-k most useful properties • e.g., via API • Download only the relevant data • Run the algorithm 12
  • 13. Outline • Feature Selection for Semantics-aware Recommender Systems • Ontology-based Data Summarization with ABSTAT • Feature Selection (ABSTAT vs. Information Gain) • Experiments • Conclusions and Future Work 13
  • 14. Ontology-driven Knowledge Graph Summarization Profiling with ABSTAT Minimal Type Patterns: there exist entities that have Company as minimal type, which are linked to literals that have gYear as minimal type by the property foundingYear Occurrence of types and properties Frequency and instances: how many times this pattern occurs as minimal type pattern and as a pattern. Instances count considers pattern inference Cardinality descriptors: max/avg/min number of different subjects associated with a same object (and vice versa) For more details: abstat.disco.unimib.it and [ESWC2016-demo, SUMPRE2016, ESWC2018-demo] 14
  • 15. Ontology-driven Knowledge Graph Summarization Profiling with ABSTAT Minimal Type Patterns: there exist entities that have Company as minimal type, which are linked to literals that have gYear as minimal type by the property foundingYear Occurrence of types and properties Frequency and instances: how many times this pattern occurs as minimal type pattern and as a pattern. Instances count considers pattern inference Cardinality descriptors: max/avg/min number of different subjects associated with a same object (and vice versa) For more details: abstat.disco.unimib.it and [ESWC2016-demo, SUMPRE2016, ESWC2018-demo] 15
  • 16. ABSTAT: Cardinality Descriptors Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 16
  • 17. ABSTAT: Cardinality Descriptors Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 1 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 17
  • 18. ABSTAT: Cardinality Descriptors Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 3 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 18
  • 19. ABSTAT: Cardinality Descriptors Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 2 1 1 1 1 3 1 19
  • 20. ABSTAT: Cardinality Descriptors Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 1 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 20
  • 21. ABSTAT: Cardinality Descriptors Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 2 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 21
  • 22. ABSTAT: Cardinality Descriptors For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 2 1 2 2 1 2 22
  • 23. ABSTAT: Cardinality Descriptors For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 [minO,avgO,maxO] Global cardinality descriptors Local cardinality descriptors Thing Thing [1,5,249] [1,1,13] cinematogaphy Film Person [1,14,249] [1,1,7] cinematogaphy [minS,avgS,maxS] 23
  • 24. Cardinality Descriptors for Feature Selection Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 2 1 1 1 1 3 1 24
  • 25. Cardinality Descriptors for Feature Selection Subjects Objects MinS(π) = 1 MaxS(π) = 3 AvgS(π) = 1,4 ≈ 1 MinO(π) = 1 MaxO(π) = 2 AvgO(π) = 1,7 ≈ 2 For each pattern π • MinS, AvgS, MaxS • Min/Avg/Max number of distinct subjects associated with unique objects in the triples represented by π • MinO, AvgO, MaxO • Min/Avg/Max number of distinct objects associated with unique subjects in the triples represented by π • Local vs. global • Local: for patterns • Global: for properties, i.e., all triples with a property 2 1 1 1 1 3 1 + frequency! 25
  • 26. Outline • Feature Selection for Semantics-aware Recommender Systems • Ontology-based Data Summarization with ABSTAT • Feature Selection (ABSTAT vs. Information Gain) • Experiments • Conclusions and Future Work 26
  • 27. Feature Selection with ABSTAT FILTERING (local cardinality descriptor) RANKING (value*) SELECTION (k-properties) PATTERNS avgS > 1 DESC(frequency) k=2 *values of pattern frequency, local cardinality descriptor, or a combination of the first two. PROJECTION (property, MAX(value(*)) P, MAX(frequency) Properties 27
  • 28. Feature Selection with ABSTAT DESC(frequency*maxS) k=5 *values of pattern frequency, local cardinality descriptor, or a combination of the first two. P, MAX(frequency*maxS) FILTERING (local cardinality descriptor) RANKING (value*) SELECTION (k-properties) PATTERNS PROJECTION (property, MAX(value(*)) 28
  • 29. Feature Selection with IG • Different statistical measures tested: Information Gain, Information Gain Ratio, Chi-squared test • Information Gain: expected reduction in entropy occurring when a feature is present versus when it is absent. • For a feature fi , IG is defined as • where E( I ) is the entropy of the data, Iv is the number of items in which the feature fi (e.g., director for movies) has a value equal to v (e.g., F.F.Coppola in the movie domain), and E( Iv ) is the entropy computed on data where the feature fi assumes value v. 29
  • 30. Feature Selection with IG: preprocessing • Manual pre-processing is required • Reduce redundant or irrelevant features that are expected to bring little value to the recommendation task, but, at the same time, pose scalability issues Dataset # of features before pre-processing # of feaatures after pre-processing Movielens 148 34 LastFM 271 25 LibraryThing 201 22 30
  • 31. Outline • Feature Selection for Semantics-aware Recommender Systems • Ontology-based Data Summarization with ABSTAT • Feature Selection (ABSTAT vs. Information Gain) • Experiments • Conclusions and Future Work 31
  • 32. Experimental Settings: Recommendation Method • Content-based using an item-based nearest neighbors algorithm [Di Noia & al.TIST2016] • Given a set of entities rated by the user (=user profile), • predict the rate only for the k nearest neighbors of the rated items • Jaccard item similarity Rating prediction on k most similar items to the items rated by the user 32
  • 33. Experimental Settings: Recommendation Method • Jaccard similarity • Measures the values of the selected features that are shared between two items 33
  • 34. Experimental Setting: Datasets & Measures Datasets • One-to-one mapping between RecSys benchmarks and DBpedia [Di Noia & al.TIST2016]: • MovieLens  DBpedia (3883 Movies) • Last.fm  DBpedia (17632 Artists) • The Library Thing  DBpedia (37231 Books) • DBpedia-2015-10, including infoboxes (392M triples) Metrics • Accuracy: • Precision@N: fraction of relevant items in the Top-N recommendations • MRR@N: average reciprocal rank of the first relevant recommended item • Diversity: • catalog coverage: percentage of items in the catalog recommended at least once • aggregate diversity: aggregate entropy 34
  • 35. Experimental Setting: Datasets & Measures • Novelty • Recommend items in the long tail • Diversity • Avoid to recommend only items in a small subset of the catalog • Suggest diverse items in the recommendation list • Serendipity • Suggest unexpected but interesting items Is all about precision? 35
  • 36. Experimental Settings: dbo vs. dbp properties • Number of features/properties: 5, 20 • Which DBpedia? dbo (DBpedia Ontology ) vs. dbp (infobox) properties • noRep best ranked property between dbo and dbp • withRep keep duplicates • Onlydbp in case of duplicates, only the dbp • Onlydbo in case of duplicates, only the dbo • IG vs. ABSTAT configurations: Name Filter by Ranking Intuitive AbsFreqAvgS AvgS > 1 Frequency Only properties that map at least two distinct subjects to one obejct (on average) ranked by frequency AbsFreq*MaxS NO FILTER Frequency*maxS Favors properties that are more frequent and map a higher number of distinct subjects to one object AbsMaxS NO FILTER MaxS Favors properties that map a higher number of distinct subjects to one object Tf-idf (baseline) NO FILTER Tfidf over patterns Favors properties that are more peculiar to the domain type36
  • 37. MovieLens • ABSTAT-based FS almost always “statistically better” then statistical measures • Few “local” exceptions • Different configurations optimize different measures • AbsOccAvgS vs AbsMaxS • Tfidf ABSTAT baseline good for aggregate entropy Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10 Top K features 5 20 5 20 5 20 5 20 withrep.IG .0658 .1078 .2192 .3417 .3829 .5280 7.56 8.50 withrep.AbsFreqAvgS .1059 .1081 .3380 .3477 .5398 .5253 8.70 8.53 withrep.AbsFreq*MaxS .0967 .1074 .3274 .3541 .5962 .5247 8.87 8.54 withrep.AbsMaxS .0919 .1030 .3065 .3400 .6016 .5698 8.96 8.66 withrep.TfIdf .0565 .0851 .2267 .3326 .4347 .3360 8.36 7.80 norep.IG .0841 .1076 .2961 .3390 .3372 .5226 7.94 8.44 norep.AbsFreqAvgS .1066 .1076 .3388 .3400 .5344 .5208 8.68 8.45 norep.AbsMaxS .0885 .1063 .3075 .3467 .6234 .5550 8.99 8.60 norep.TfIdf .0823 .0856 .2994 .3123 .3520 .3908 7.83 7.99 dbo.IG .0841 .1076 .2961 .3390 .3372 .5226 7.94 8.44 dbo.AbsFreqAvgS .1066 .1067 .3388 .3402 .5344 .5208 8.68 8.51 dbo.AbsMaxS .0885 .1059 .3075 .3464 .6234 .5535 8.99 8.60 dbo.TfIdf .0823 .0856 .2994 .3123 .3520 .3908 7.83 7.99 dbp.IG .0688 .1046 .2134 .3336 .2799 .5065 6.54 8.31 dbp.AbsFreqAvgS .1065 .1059 .3408 .3360 .5426 .5105 8.64 8.31 dbp.AbsMaxS .0908 .1030 .3124 .3396 .6219 .5395 8.98 8.52 dbp.TfIdf .0549 .0745 .1924 .2687 .2530 .3575 6.33 7.41 value = “local” best value = “global” best = highlights on global best results 37
  • 38. LastFM • ABSTAT-based FS better or comparable • In most cases, not “statistically better” • Reminder: still advantage for • not running statistical measures on the full dataset • no manual preprocessing • Tfidf ABSTAT baseline good for aggregate entropy Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10 Top K features 5 20 5 20 5 20 5 20 withrep.IG .0501 .1325 .2283 .4102 0.4290 0.5051 11 11.18 withrep.AbsFreqAvgS .1330 .1320 .4047 .4105 0.4812 0.5036 11.1 11.18 withrep.AbsFreq*MaxS .1102 .1227 .3649 .3749 0.5500 0.5332 11.4 11.36 withrep.AbsMaxS .0371 .1156 .1249 .3691 0.1680 0.5440 9.79 11.39 withrep.TfIdf .1017 .1158 .2960 .3584 0.4210 0.4602 10.86 10.97 norep.IG .0501 .1311 .2283 .4040 0.429 0.5018 11 11.17 norep.AbsFreqAvgS .1305 .1307 .3994 .4074 0.489 0.5019 11.11 11.15 norep.AbsFreq*MaxS .1062 .1228 .3546 .3708 0.5362 0.5161 11.4 11.29 norep.AbsMaxS .0392 .1227 .1952 .3715 0.452 0.5344 11.09 11.34 norep.TfIdf .1024 .1132 .3064 .3554 0.4026 0.4508 10.76 10.96 dbo.IG .0411 .1319 .1989 .4083 0.4425 0.5053 11.06 11.2 dbo.AbsFreqAvgS .1283 .1292 .3986 .4063 0.4915 0.4949 11.14 11.14 dbo.AbsFreq*MaxS .1062 .1214 .3546 .3710 0.5362 0.5109 11.4 11.27 dbo.AbsMaxS .0381 .1211 .1927 .3727 0.4291 0.5222 10.97 11.31 dbo.TfIdf .1024 .1132 .3064 .3554 0.4026 0.4508 10.76 10.96 dbp.IG .0678 .1319 .2553 .4083 0.4364 0.5053 10.83 11.2 dbp.AbsFreqAvgS .1319 .1316 .4026 .4113 0.4926 0.5055 11.14 11.2 dbp.AbsFreq*MaxS .1065 .1239 .3580 .3773 0.5444 0.527 11.42 11.36 dbp.AbsMaxS .0401 .1105 .1969 .3553 0.4528 0.5447 11.08 11.42 dbp.TfIdf .079 .1170 .2371 .3572 0.3894 0.4698 10.69 11.04 value = “local” best value = “global” best = highlights on global best results 38
  • 39. The Library Thing • ABSTAT-based FS obtains global better but • IG better for Catalog Coverage with 20 feature • IG better for some duplicate property management strategies • Differences are stastistical relevant only in some cases • Tfidf ABSTAT baseline good for aggregate entropy Precision@10 MRR@10 catalogCoverage@10 aggrEntropy@10 Top K features 5 20 5 20 5 20 5 20 withrep.IG .0576 .0588 .2348 .2273 .3983 .4034 10.47 10.44 withrep.AbsOccAvgS .0458 .0568 .2003 .2343 .3670 .4014 10.28 10.50 withrep.AbsOcc*MaxS .0457 .0560 .2116 .2355 .3854 .3826 10.54 10.21 withrep.AbsMaxS .0571 .0567 .2319 .2360 .3689 .4011 10.24 10.29 withrep.TfIdf .0215 .0145 .1607 .1202 .1314 .2349 8.81 9.75 norep.IG .0571 .0579 .2346 .2274 .3988 .4037 10.47 10.44 norep.AbsOccAvgS .0561 .0593 .2328 .2329 .3982 .4030 10.54 10.48 norep.AbsOcc*MaxS .0459 .0570 .2119 .2372 .3852 .3809 10.54 10.15 norep.AbsMaxS .0541 .0567 .2301 .2365 .3653 .4008 10.24 10.29 norep.TfIdf .0215 .0138 .1608 .1211 .1314 .2877 8.81 10.26 dbo.IG .0571 .0579 .2346 .2274 .3988 .4037 10.47 10.44 dbo.AbsOccAvgS .0561 .0593 .2328 .2329 .3982 .4030 10.54 10.48 dbo.AbsOcc*MaxS .0459 .0570 .2119 .2372 .3852 .3809 10.54 10.15 dbo.AbsMaxS .0541 .0567 .2301 .2365 .3653 .4008 10.24 10.29 dbo.TfIdf .0579 .0605 .2374 .2477 .4086 .3991 10.55 10.20 dbp.IG .0586 .0586 .2350 .2299 .4027 .4043 10.49 10.4 dbp.AbsOccAvgS .0623 .0612 .2467 .2342 .3943 .4043 10.42 10.45 dbp.AbsOcc*MaxS .0464 .0606 .2126 .2504 .3862 .3797 10.53 10.07 dbp.AbsMaxS .0571 .0592 .2318 .2398 .3689 .4002 10.24 10.22 dbp.TfIdf .0215 .0132 .1608 .1218 .1314 .2696 8.81 9.96 value = “local” best value = “global” best = highlights on global best results 39
  • 40. Outline • Feature Selection for Semantics-aware Recommender Systems • Ontology-based Data Summarization with ABSTAT • Feature Selection (ABSTAT vs. Information Gain) • Experiments • Conclusions and Future Work 40
  • 41. Conclusions & Future Work • Conclusions • Fully automatic feature selection method with ontology-based knowledge graph summaries (ABSTAT) • Better or, in some cases, comparable to statistical measures, but without requiring computation over the full dataset • Additional evidence of informativeness of ABSTAT-based summaries • Future work • Add Tfidf in ABSTAT stats • Experiments with additional measures (e.g., graph-based measures with path longer than 1) • API-based suggestion of most salient properties for an input entity type 41
  • 42. Contacts: palmonari@disco.unimib.it - tommaso.dinoia@poliba.it This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements n. 732003 and n. 732590 Supporting Event and Weather- based Data Analytics and Marketing along the Shopper Journeywww.ew-shopp.eu Enabling the European Business Graph for Innovative Data Products and Services www.eubusinessgraph.eu/ Experiments & code: https://zenodo.org/record/12 05712# .WrRCypPwa3U http://ow.ly/zAA530d0wu0 ABSTAT (open source) code: https://bitbucket.org/disco_u nimib/abstat-core ABSTAT home: abstat.disco.unimib.it 42
  • 43. Appendix: Explanations for Better/worst Performance Domain Type # Minimal Patterns Avg # Triples Variance Movies dbo:Film 57757 74.02 549.31 Books dbo:Book 41684 44.97 169.48 Music dbo:Artist 40491 80.50 981.51 43
  • 44. Appendix: Explanations for Better/worst Performance 44 Top 20 selected features for the MovieLens dataset by using the different con- figurations of IG and AbsFreqAvgS.

Editor's Notes

  1. We did not consider only accuracy because in RecSys is also important to go beyond popularity bias and show diverse elements across the catalog, as well as items in the long tail
  2. There does not exist one approach that performs better than
  3. DBO: Nella prima riga. forse non c’p differenza tra il DBP: