Factorization Machines Leveraging Lightweight
Linked Open Data-enabled Features for Top-N
Recommendations
Guangyuan Piao, John G. Breslin
Insight Centre for Data Analytics, National University of Ireland Galway
The 18th International Conference on Web Information Systems Engineering
Moscow, Russia, 7-10th, October
Background
2	
Linked Open Data (LOD) provides
domain knowledge and
rich Information about items
content-based recommender systems
[source]: http://lod-cloud.net
•  1st class citizen in LOD cloud
•  Structured information from Wikipedia
•  4.58 million things
•  1,445,000 persons, 87,000 films etc.
Background
3	
Linked Open Data (LOD) provides
domain knowledge and
rich Information about items
[source]: http://lod-cloud.net
knowledge
base
Background Knowledge from DBpedia
4	
Chase_films Auto_racing_films…
•  Knowledge is represented as SPO triples
•  SPO: Subject ! Property ! Object
•  Knowledge is freely accessible via a public SPARQL Endpoint
Background Knowledge from DBpedia
5	
musicComposer
(Subject)
(Property)
(Object)
(Some) Related Work
•  Semantic Similarity/Distance Measures
•  [Passant et al. ISWC’10, AAAI’10]
•  [Piao et al. SAC’16]
•  Graph-based algorithms such as PageRank
•  [Musto et al. UMAP’16]
•  [Nguyen et al. WWW’15]
•  Machine learning approaches
•  [Noia et al. RecSys’12], VSM + SVM classifier
•  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPRank)
6
(Some) Related Work
•  Semantic Similarity/Distance Measures
•  [Passant et al. ISWC’10, AAAI’10]
•  [Piao et al. SAC’16]
•  Graph-based algorithms such as PageRank
•  [Musto et al. UMAP’16]
•  [Nguyen et al. WWW’15]
•  Machine learning approaches
•  [Noia et al. RecSys’12], VSM + SVM classifier
•  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPrank)
7	
user-item
interactions
item background
knowledge
build a graph
extract features
feed to algorithms
SPARQL Endpoint
Combined Graph
8	
Chase_films …
user-item
interactions
item background
knowledge
build a graph
extract features
feed to algorithms
SPARQL Endpoint
•  Using lightweight LOD features from DBpedia
•  lightweight: directly obtained via SPARQL Endpoint
•  Lightweight LOD features
•  Property-Object list (PO)
Proposed Approach: Features
9	
user-item
interactions
item background
knowledge
SPARQL Endpoint	
dbr:The_Godfather	
dbr:Carlo_Savina	
dbo:knownFor
dbr:Francis_Ford_Coppola	
dbr:The_Godfather_Returns	 dbc:Gangster_films	
dbo:series
dbo:director
dc:subject
feed to algorithms
•  Using lightweight LOD features from DBpedia
•  lightweight: directly obtained via SPARQL Endpoint
•  LOD features
•  Property-Object list (PO)
•  Subject-Property list (SP)
Proposed Approach: Features
10	
user-item
interactions
item background
knowledge
SPARQL Endpoint	
dbr:The_Godfather	
dbr:Carlo_Savina	
dbo:knownFor
dbr:Francis_Ford_Coppola	
dbr:The_Godfather_Returns	 dbc:Gangster_films	
dbo:series
dbo:director
dc:subject
feed to algorithms
•  Using lightweight LOD features from DBpedia
•  lightweight: directly obtained via SPARQL Endpoint
•  LOD features
•  Property-Object list (PO)
•  Subject-Property list (SP)
•  PageRank score (PR)
Proposed Approach: Features
11	
user-item
interactions
item background
knowledge
SPARQL Endpoint	
dbr:The_Godfather	
dbr:Carlo_Savina	
dbo:knownFor
dbr:Francis_Ford_Coppola	
dbr:The_Godfather_Returns	 dbc:Gangster_films	
dbo:series
dbo:director
dc:subject
feed to algorithms
•  Factorization Machines (FMs)
•  Optimization: Bayesian Personalized Ranking (BPR)
Proposed Approach: Algorithms
12
Proposed Approach
13	
1 0 … 1 0 … 0.2 0.2 … 0.1 0 … 0.1
0 1 … 0 1 … 0.3 0.5 … 0 0.3 … 0.2
…
…
…
…
…
…
…
…
…
…
…
…
…
user item PO SP PR
1
0
…
x1
Feature vector x Target y
x2
•  Overall features for Factorization Machines
•  Movielens dataset for LOD-enabled recommender systems
•  80% for training set, and 20% for test set
Experimental Setup: Dataset
14
•  P@N: the precision at rank N
•  R@N: the recall at rank N
•  nDCG@N: normalized Discounted Cumulative Gain
•  MRR: Mean Reciprocal Rank
•  MAP: Mean Average Precision
Experimental Setup: Evaluation Metrics
15
•  PopRank: baseline approach
•  kNN-item: item-based k-nearest neighbors algorithm
•  BPRMF: matrix factorization with the BPR optimization
•  SPRank: learning-to-rank using semantic paths based on LOD
•  LODFM: our proposed approach
Experimental Setup: Compared Methods
16
Results
17	
best tuned parameters: m=200, PO+PR
Model Analysis: Features (m=10)
18
Model Analysis: Dimensionality
19
Model Analysis: Dimensionality
20
•  LODFM provides state-of-the-art performance
•  Using FMs with lightweight LOD-enabled features
•  directly obtained via a public SPARQL Endpoint of DBpedia
•  without maintaining graph, and extracting features from it
•  Useful features: Property-Object list & PageRank
•  Feature work
•  investigate other lightweight LOD-enable features
•  evaluate in other domain dataset
Conclusions
21
22	
Guangyuan Piao
e-mail: guangyuan.piao@insight-centre.org
twitter: https://twitter.com/parklize
slideshare: http://www.slideshare.net/parklize

WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-enabled Features for Top-N Recommendations

  • 1.
    Factorization Machines LeveragingLightweight Linked Open Data-enabled Features for Top-N Recommendations Guangyuan Piao, John G. Breslin Insight Centre for Data Analytics, National University of Ireland Galway The 18th International Conference on Web Information Systems Engineering Moscow, Russia, 7-10th, October
  • 2.
    Background 2 Linked Open Data(LOD) provides domain knowledge and rich Information about items content-based recommender systems [source]: http://lod-cloud.net
  • 3.
    •  1st classcitizen in LOD cloud •  Structured information from Wikipedia •  4.58 million things •  1,445,000 persons, 87,000 films etc. Background 3 Linked Open Data (LOD) provides domain knowledge and rich Information about items [source]: http://lod-cloud.net knowledge base
  • 4.
    Background Knowledge fromDBpedia 4 Chase_films Auto_racing_films…
  • 5.
    •  Knowledge isrepresented as SPO triples •  SPO: Subject ! Property ! Object •  Knowledge is freely accessible via a public SPARQL Endpoint Background Knowledge from DBpedia 5 musicComposer (Subject) (Property) (Object)
  • 6.
    (Some) Related Work • Semantic Similarity/Distance Measures •  [Passant et al. ISWC’10, AAAI’10] •  [Piao et al. SAC’16] •  Graph-based algorithms such as PageRank •  [Musto et al. UMAP’16] •  [Nguyen et al. WWW’15] •  Machine learning approaches •  [Noia et al. RecSys’12], VSM + SVM classifier •  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPRank) 6
  • 7.
    (Some) Related Work • Semantic Similarity/Distance Measures •  [Passant et al. ISWC’10, AAAI’10] •  [Piao et al. SAC’16] •  Graph-based algorithms such as PageRank •  [Musto et al. UMAP’16] •  [Nguyen et al. WWW’15] •  Machine learning approaches •  [Noia et al. RecSys’12], VSM + SVM classifier •  [Noia et al. TIST’16], semantic paths + learning-to-rank (SPrank) 7 user-item interactions item background knowledge build a graph extract features feed to algorithms SPARQL Endpoint
  • 8.
    Combined Graph 8 Chase_films … user-item interactions itembackground knowledge build a graph extract features feed to algorithms SPARQL Endpoint
  • 9.
    •  Using lightweightLOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint •  Lightweight LOD features •  Property-Object list (PO) Proposed Approach: Features 9 user-item interactions item background knowledge SPARQL Endpoint dbr:The_Godfather dbr:Carlo_Savina dbo:knownFor dbr:Francis_Ford_Coppola dbr:The_Godfather_Returns dbc:Gangster_films dbo:series dbo:director dc:subject feed to algorithms
  • 10.
    •  Using lightweightLOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint •  LOD features •  Property-Object list (PO) •  Subject-Property list (SP) Proposed Approach: Features 10 user-item interactions item background knowledge SPARQL Endpoint dbr:The_Godfather dbr:Carlo_Savina dbo:knownFor dbr:Francis_Ford_Coppola dbr:The_Godfather_Returns dbc:Gangster_films dbo:series dbo:director dc:subject feed to algorithms
  • 11.
    •  Using lightweightLOD features from DBpedia •  lightweight: directly obtained via SPARQL Endpoint •  LOD features •  Property-Object list (PO) •  Subject-Property list (SP) •  PageRank score (PR) Proposed Approach: Features 11 user-item interactions item background knowledge SPARQL Endpoint dbr:The_Godfather dbr:Carlo_Savina dbo:knownFor dbr:Francis_Ford_Coppola dbr:The_Godfather_Returns dbc:Gangster_films dbo:series dbo:director dc:subject feed to algorithms
  • 12.
    •  Factorization Machines(FMs) •  Optimization: Bayesian Personalized Ranking (BPR) Proposed Approach: Algorithms 12
  • 13.
    Proposed Approach 13 1 0… 1 0 … 0.2 0.2 … 0.1 0 … 0.1 0 1 … 0 1 … 0.3 0.5 … 0 0.3 … 0.2 … … … … … … … … … … … … … user item PO SP PR 1 0 … x1 Feature vector x Target y x2 •  Overall features for Factorization Machines
  • 14.
    •  Movielens datasetfor LOD-enabled recommender systems •  80% for training set, and 20% for test set Experimental Setup: Dataset 14
  • 15.
    •  P@N: theprecision at rank N •  R@N: the recall at rank N •  nDCG@N: normalized Discounted Cumulative Gain •  MRR: Mean Reciprocal Rank •  MAP: Mean Average Precision Experimental Setup: Evaluation Metrics 15
  • 16.
    •  PopRank: baselineapproach •  kNN-item: item-based k-nearest neighbors algorithm •  BPRMF: matrix factorization with the BPR optimization •  SPRank: learning-to-rank using semantic paths based on LOD •  LODFM: our proposed approach Experimental Setup: Compared Methods 16
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    •  LODFM providesstate-of-the-art performance •  Using FMs with lightweight LOD-enabled features •  directly obtained via a public SPARQL Endpoint of DBpedia •  without maintaining graph, and extracting features from it •  Useful features: Property-Object list & PageRank •  Feature work •  investigate other lightweight LOD-enable features •  evaluate in other domain dataset Conclusions 21
  • 22.
    22 Guangyuan Piao e-mail: guangyuan.piao@insight-centre.org twitter:https://twitter.com/parklize slideshare: http://www.slideshare.net/parklize