발표자: 이연창(한양대 박사과정)
발표일: 2018.2.
We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e.g., clicked or bookmarked), and propose a novel graph-theoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 real-life datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets.
SQL Database Design For Developers at php[tek] 2024
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applications
1. Recommender Systems with Implicit Feedback:
Challenges, Techniques, and Applications
Yeon-Chang Lee
Advisor: Sang-Wook Kim
Big Data Science Lab.
Hanyang University
2018-02-21
2. Contents
gOCCF: Graph-Theoretic One-Class Collaborative Filtering Based
on Uninteresting Items (AAAI 18)
Background
Motivation
Proposed Approach
Evaluation
Conclusions
Applications of OCCF
Job Recommendation (ACM SAC 16, IEEE SMC 17)
Paper Recommendation (IEEE SMC 16)
TV Show Recommendation (Submitted to IJCAI 18)
2
4. Background: Recommender Systems
Provide a user with a few items that she would like
Using her history of evaluating, purchasing and browsing
4
Active user
interested in shoes
Purchased
items
A recommender system
Recommended
items
Purchasing several
pairs of shoes
Analyzing the user’s
preference
Selecting a few items
that she would like
Providing the selected
items
Overview of recommender systems
5. Background: Collaborative Filtering
One of popular recommendation techniques that uses
the similarity between users’ past feedback
Explicit feedback (e.g., 1-5 stars rating)
Implicit feedback (e.g., purchase logs, browsing history)
5
Overview of collaborative filtering
6. Motivation
One-Class Collaborative Filtering (OCCF) problem
Problem to handle the implicit feedback
One-class setting (clicked or unknown) versus multi-class
setting (1-5 stars rating)
Challenges
Less information to capture a user’s taste than multi-class
setting
More sparse datasets than in multi-class setting
6
7. Motivation
Existing OCCF approaches
Common concept
View all unrated items as negative preferences
e.g., WRMF, BPRMF
Challenge
Less effective in dealing with sparse dataset with many
unrated items
7
Accuracy of an OCCF method per sparsity
8. Motivation
Zero-injection [Hwang et al., ICDE 16]
Address the sparsity problem successfully in multi-class
setting
Define a novel notion of ‘uninteresting items’ (U-items, in
short) of a user
Items on which the user has “negative” preferences
8
Whole items (𝐼u)
Interesting items
(𝐼 𝑢
𝑖𝑛
)
Uninteresting items
(𝐼 𝑢
𝑢𝑛
= 𝐼 − 𝐼 𝑢
𝑖𝑛
)
Evaluated items
(𝐼 𝑢
𝑒𝑣𝑎𝑙
)
Preferred Items
(𝐼 𝑢
𝑝𝑟𝑒
)
Venn diagram for preferences of items
11. Motivation
Zero-injection [Hwang et al., ICDE 16]
A naïve application of zero-injection in one-class setting
Lower accuracy than OCCF approaches
Sensitivity to the number of U-items
11
Accuracy in one-class setting: SVD and PMF with varying
degree of zero injection and WRMF without zero-injection
12. Overview of Our Approach: gOCCF
12
Graph-Theoretic One-Class Collaborative Filtering
(S1) Inferring the degree of interestingness
(S2) Determining the number of U-items without 𝜃
(S3) Graph Modeling and Recommendation
Overview of gOCCF
13. Employ a popular OCCF method (WRMF) [Pan et al.,
ICDM 08]
Steps
Treat all unrated items as negative preferences (i.e., value of 0)
Assign different weights to quantify the relative contribution
Predict the value of by matrix factorization
13
U
V
Feature of user u
Feature of item i
1 1 1 0
1 0 0 0
0 0 1 0
1 1 1 0.6
1 0.2 0.2 0.2
0.2 0.2 1 0.2
Binary rating matrixUnary rating matrix
Weight matrix
1 1 1
1
1
WRMF
(S1) How to infer the degree of interestingness
14. Equations
Objective function
ℒ 𝑋, 𝑌 = σ 𝑢 ቂ
ቃ
σ𝑖 𝑤 𝑢𝑖 ቄ
ቅ
𝑝 𝑢𝑖 − 𝑋 𝑢 𝑌𝑖
𝑇 2
+ λ ቀ
ቁ
𝑋 𝑢 ∙ 𝐹
2
+
𝑌𝑖 ∙ 𝐹
2
Updates elements in the matrices X and Y
𝑋 𝑢 ∙ = 𝑝 𝑢 ∙ 𝑤 𝑢 ∙ Y 𝑌 𝑇 𝑤 𝑢 ∙ 𝑌 + λ σ𝑖 𝑤 𝑢𝑖 𝐿
−1
𝑤 𝑢(∙) is a diagonal matrix with elements of 𝑤 𝑢(∙) on the
diagonal
Matrix L is an identity matrix
𝑌𝑖 ∙ = p ∙ 𝑖
𝑇
𝑤 ∙ 𝑖X 𝑋 𝑇 𝑤 ∙ 𝑖 𝑋 + λ σ 𝑢 𝑤 𝑢𝑖 𝐿
−1
Final value
𝑃 ≈ P = X𝑌 𝑇
14
(S1) How to infer the degree of interestingness
15. Brute-force way
Construct negative graphs 𝐺− by gradually increasing # of U-
items
Measure the accuracy of each negative graph 𝐺−
Choose the best 𝐺− that provides the highest accuracy
Challenge: the search space for # of U-items
Large # of U-items
There are 1.5M unrated items among which U-items can
be chosen in MovieLens
Prohibitively expensive to find optimal # of U-items
Perform a recommendation algorithm on every 𝐺− for a
given dataset
Dataset and algorithm dependent
15
(S2) How to determine a right number of U-items
16. Consider k unrated user-item pairs with the lowest
degree of interestingness as negative links
Starting from 10, k doubles until negative links reach 90% of
unrated pairs
Model them as a single bipartite (negative) graph 𝐺−
Examine each 𝐺− by analyzing graph properties
Topological properties
Information propagation
16
(S2) How to determine a right number of U-items
Candidate negative graphs
17. Topological property
Graph shattering theory [Appel et al., SDM 09]
Introduces a “shattering point”
Connectivity of a graph becomes seriously collapsed
As links are continuously removed in a random way
ShatterPlot: visualizing the process of generating the
shattering point
17
ShatterPlot for positive graph 𝐺+
of MovieLens 100K
18. Examine the change in topological properties of 𝐺−
ShatterPlot for 𝐺− has the shattering point
𝐺−
has the most similar topological property to 𝐺+
when it’s
number of links equal to that of 𝐺+
Topological property
18
ShatterPlot for negative graph 𝐺− of MovieLens 100K
• ES-region: extremely sparse region
• S-region: shattered region
• R-region: real graph region
• D-region: dense region
X: real 𝐺+
22. Information propagation
gOCCF is based on graph analysis techniques
To analyze the information propagation in a graph
To make a recommendation list based on the analysis result
Effects of employing 𝐺−
Appear when information propagation of 𝐺+ can be changed
accordingly by that of 𝐺−
22
Random Walk with Restart Belief Propagation
Examples of graph analysis techniques
23. Information propagation
Examine the PageRank scores of 𝐺−
𝐺− shows a power law-like distribution, similar to real 𝐺+
when it’s number of links equal to that of 𝐺+
When we employ 𝐺−, the existing propagation of 𝐺+ seems
to have significant changes
Negative scores with various values influence positive
scores of 𝐺+
23
Real positive graph 𝐺+ R-region
PageRank scores for 𝐺+ and the representative graph in R-region of 𝐺−
24. Information propagation
The results with different 𝐺−
It is difficult to change the existing propagation of 𝐺+ using
other 𝐺−
24
PageRank scores for the representative graphs in other regions of 𝐺−
D-region
S-regionES-region
R-region
26. Information propagation
The results with different datasets
CiteULike
26
PageRank scores for 𝐺+ and the representative graphs of 𝐺−
Real positive graph 𝐺+
R-region
S-region
ES-region
D-region
27. (S3) How to recommend top-N items
Transform a dataset in one-class setting to one in
binary-class setting with both U-items and I-items
Model binary-class information as undirected graphs
Two separate graphs
Employ two variants of random walk with restart and
belief propagation
SeparateRWR
SeparateBP
27
Separate graphs
𝐺−
𝐺+
28. (S3) How to recommend top-N items
Transform a dataset in one-class setting to one in
binary-class setting with both U-items and I-items
Model binary-class information as undirected graphs
A single signed graph
Employ a variant of belief propagation
SignedBP
28
Signed graph 𝐺±
29. SeparateRWR
Perform RWR separately on each graph consisting of
only one type of links
Equations
Ԧ𝑟 = Ԧ𝑟(+)
− Ԧ𝑟 −
Ԧ𝑟(+)
= 𝛼ഥ𝑤(+)
Ԧ𝑟(+)
+ 1 − 𝛼 Ԧ𝑡
Ԧ𝑟(+): positive ranking vectors of all nodes
ഥ𝑤(+): normalized weight matrix built based on positive links
Ԧ𝑟(−) = 𝛼ഥ𝑤(−) Ԧ𝑟(−) + (1 − 𝛼)Ԧ𝑡
Ԧ𝑟(−): negative ranking vectors of all nodes
ഥ𝑤(−): normalized weight matrix built based on negative links
𝛼: a damping factor
Ԧ𝑡: a personalization vector representing the target nodes to be
restarted
29
31. SignedBP
Perform BP on a signed graph that contains both
positive and negative links together
Equations
Message passing
𝑚𝑖𝑗 𝜎 ← ቐ
σ 𝜎 𝜙𝑖 𝜎′
𝜑𝑖𝑗
+
𝜎′
, 𝜎 ς 𝑘∈𝑁 𝑖 j 𝑚 𝑘𝑖 𝜎′
σ 𝜎 𝜙𝑖 𝜎′ 𝜑𝑖𝑗
−
𝜎′, 𝜎 ς 𝑘∈𝑁 𝑖 j 𝑚 𝑘𝑖 𝜎′
Belief score
𝑏𝑖 𝜎 = 𝑘 ς 𝑗∈𝑁 𝑖 𝑚𝑗𝑖 𝜎
31
32. Empirical validation
Datasets: MovieLens 100K, Watcha*, CiteULike
Metrics: Precision, Recall, nDCG, MRR, HLU
Competing Methods
Baseline: MostPopular
Graph-based methods: RWR, BP
Zero-injection methods: SVD_ZI, PMF_ZI
OCCF methods: WRMF, ldNMF, SLIM, BPRMF, GBPRMF
32
Dataset statistics
* Watcha is the largest Korean movie rating system
33. Questions to Be Answered
Q1: Is the degree of interestingness of the items
inferred by the WRMF useful for recommendation?
Q2: Are the U-items selected by our U-items decision
method useful for recommendation?
Q3: Is it useful to recommend additional U-items to
existing graph-based recommendations?
Q4: Does our proposed OCCF methods provide more
accurate recommendations than existing OCCF
methods with a very sparse dataset?
33
34. Question 1
Accuracy metric: user u’s error rate
𝑒𝑟𝑟𝑢
𝜃
=
𝐼 𝑢
𝑢𝑛 𝜃 ∩𝐼 𝑢
𝑡𝑒𝑠𝑡
𝐼 𝑢
𝑡𝑒𝑠𝑡
How many rated items (in the test set) are selected as
uninteresting items (i.e., mis-classified) for u
WRMF and ldNMF is the most effective
34
Error rates of methods for finding U-items (MovieLens)
35. gOCCF provides the best accuracy when having the
same number of negative links as that of positive links
Question 2
35
Accuracy according to # of U-items (MovieLens)
SeparateRWR SeparateBP
SignedBP
36. Question 3
Utilizing U-items in gOCCF is effective in improving the
accuracy significantly
36
Accuracy before and after exploiting U-items (MovieLens)
43. Motivation
E-recruitment sites
Important places to both job seekers and recruiters
Examples: Reed, Indeed, AskStory
Limitation of current AskStory services
Only provides a function of keyword-based searches
Very time-consuming task for a job seeker to find job openings
Our proposal
A hybrid approach exploiting content and collaborative dataset
43
JobRec (ACM SAC 16, IEEE SMC 17)
Job seekersRecruiters Job openings
database
Resumes
database
Upload
Search
Apply
Post
Check
44. JobRec (ACM SAC 16, IEEE SMC 17)
Our approach
Overview
Input: a resume of a target job seeker
Output: a list of top-N job openings
Datasets
Resume posted by job seekers
Consists of a series of job records representing
her/his career experiences
A job record represents a set of a company
name, a job title, a working period, and a
description
Job opening posted by recruiters
Consists of descriptions for a company and a job
title
44
Request
ResumesJob Openings
Resume
Database
Job Opening
Database
Data
Modeling
Preference
Inference
Matching
Job Openings
Target
Job Seeker
Recommendation
Recommend
Preprocessing
45. JobRec (ACM SAC 16, IEEE SMC 17)
Our approach
Overall process
Data preprocessing and storage
Extracting content and collaborative information
Graph modeling for the information extracted
CF graph
Hybrid graph
Assign content-based weights on the original links of the CF graph
Add new links based on content-based similarities to the CF graph
Analyzing the graph by using graph analysis algorithms
Recommending the most suitable N job openings to a user
45
User-Company
Job seeker 1
Job seeker 2
Job seeker 3
Job seeker 4
C1
C2
C3
C4
User-JobTitle
Job seeker 1
Job seeker 2
Job seeker 3
Job seeker 4
J1
J2
J3
Job seeker 1
Job seeker 2
Job seeker 3
Job seeker 4
J1
J2
J3
C1
C2
C3
C4
46. JobRec (ACM SAC 16, IEEE SMC 17)
Results
RWR algorithm outperformed other algorithms consistently
Hybrid approach outperformed other approaches consistently
46
Different algorithms
Different recommendation approaches
47. PaperRec (IEEE SMC 16)
Motivation
Digital-bibliography service provider
Finding relevant literature to understand a trend of their research
topic
Examples: Google scholar, DBLP, DBpia
Limitation of current DBpia services
Only provides search functions
Very time-consuming task for a researcher to find relevant papers
Our proposal
A hybrid approach exploiting content and collaborative dataset
47
Researchers
Journals
Conference
Proceedings
Articles
Search Download Read
48. PaperRec (IEEE SMC 16)
Our approach
Four datasets used for recommendation
Content information of each paper
Consists of the title, the abstract, the body, and
the keywords of the paper
Citation relationships between papers
A citation relationship of a paper represents a
list of other papers cited by the paper
Researcher-to-paper history
For a researcher, it consists of information on
other papers downloaded by the researcher
Paper-to-paper history
For a paper, it consists of information for other
papers downloaded together with the paper by
researchers
48
49. PaperRec (IEEE SMC 16)
Our approach
Idea
To exploit both of content-based similarity and graph-based network
Proposed Approach
Content-based approach
Based on content-based similarity between papers
Supplemented by information of similar users
Graph-based approach
Graph modeling using citation relationships between papers
To employ belief propagation (BP) algorithm that probabilistically
determines a target user’s preference on an item
Hybrid approach
To combine the results obtained from content-based or graph-based
approaches
𝑤: weight of the content-based approach
(1 − 𝑤): weight of the graph-based approach
49
𝒇 𝒉𝒚𝒃𝒓𝒊𝒅 = 𝒘 ∗ 𝒇 𝒄𝒐𝒏𝒕𝒆𝒏𝒕 + 1 − 𝒘 ∗ 𝒇 𝒈𝒓𝒂𝒑𝒉
50. PaperRec (IEEE SMC 16)
Results
Utilizing both of content and graph
information together provides
accuracy higher than those using
two information sources
independently
DBpia Insight
http://insight.dbpia.co.kr/
50
Different recommendation approaches
51. Motivation
A large number of channels (>= 100 channels)
Very time-consuming task for a user to find TV shows that they prefer
51
TVshowRec (Submitted to IJCAI 18)
Data in a TV show recommendation
52. TVshowRec (Submitted to IJCAI 18)
Our approach
Core concepts
A user 𝑢’s ‘watching interval’
range between the starting and ending times of 𝑢’s watching TV
shows
A user 𝑢’s ‘watchable interval’ for an episode
A certain timeframe that 𝑢 can watch the episode within
An overlapped interval between 𝑢’s ‘watching interval’ and the
episode’s broadcasting interval
A user 𝑢’s ‘watchable episodes’
a set of 𝑢’s determined by her ‘watching interval’
episodes that 𝑢 can give feedback to
Idea
A user 𝑢’s preference on an episode can be inferred through the
analysis on her feedback given in the range of her watchable interval
for the episode
52
53. TVshowRec (Submitted to IJCAI 18)
Results
Our proposed framework provides the highest accuracy in all test sets and
with all metrics
53
Final recommendation accuracy in terms of nDCG
54. References
gOCCF
Yeon-Chang Lee, Sang-Wook Kim, Dongwon Lee, “gOCCF: Graph-Theoretic One-Class
Collaborative Filtering Based on Uninteresting Items”, In AAAI 2018
JobRec
Yujin Lee*, Yeon-Chang Lee*, Jiwon Hong, Sang-Wook Kim, “Exploiting Job Transition
Patterns for Effective Job Recommendation”, In IEEE SMC 2017 (*co-first authors with equal
contribution)
Yeon-Chang Lee, Jiwon Hong, and Sang-Wook Kim, “Job Recommendation in AskStory:
Experiences, Methods, and Evaluation,” In ACM SAC 2016
Yeon-Chang Lee, Jiwon Hong, Sang-Wook Kim, Sheng Gao, and Ji-Yong Hwang, “On
Recommending Job Openings,” In ACM HT 2015
PaperRec
Yeon-Chang Lee, Jungwan Yeom, Kiburm Song, Jiwoon Ha, Kichun Lee, Jangho Yeo, and Sang-
Wook Kim, “Recommendation of Research Papers in DBpia: A Hybrid Approach Exploiting
Content and Collaborative Data,” In IEEE SMC 2016
TVshowRec
Kyung-Jae Cho, Yeon-Chang Lee, Kyungsik Han, Sang-Wook Kim, “TV Show Recommendation
using Watchable Episodes,” In IJCAI 2018 (Submitted)
54