Toward Recommendation for Upskilling:
Modeling Skill Improvement and
Item Difficulty in Action Sequences
ICDE 2020 – Research 3 (Recommendation Systems)
Kazutoshi Umemoto
(The University of Tokyo)
Tova Milo
(Tel Aviv University)
Masaru Kitsuregawa
(The University of Tokyo)
“Life is a series of practices”
2
Cooking dishes Playing the guitarLearning a second language
l We don’t know what to do at first
l Through many trials and errors, we gain knowledge, ability, or skill
Motivation
3
Can information systems help people improve their skills?
l Repeating easy tasks would not improve people’s skills
l Too-challenging tasks would be difficult to complete
Goal: Recommender Systems for Upskilling
4
This item helps you
improve your skill
skill = 3 skill = 4
difficulty = 3.3
Challenges
Understanding the dynamics of user skills
l How large is the target user’s skill level at a particular time?
l When does his/her skill improve?
1
2
3
The scope of the present work
Estimating the difficulty of each item
l Which items are more difficult (and why)?
Recommending items suitable for upskilling
l How to find challenging and interesting items?
l When and how to present such items?
Action sequences ! are given as input,
where each action " = (%, ', () consists of time %, user ', and item (
Problem Definition
5
We don’t include user feedback (e.g., rating scores) in actions to cover diverse domains
'*
'+
',
(* (+ (,
!-.
!-/
!-0
⊂ !
∈ ℐ
4 ∋
(items)
(actions)(users)
%
(times)
l recipes
l movies
l ...
Problem Definition (cont.)
6
Skill improvement
Item difficulty
Problem 1 (Skill Improvement)
Determine the skill level !"# ∈ % = {1, ⋯ , +} of each user - at each time .
when - takes an action / = (., -, 1)
The various improvement
patterns should be captured
The estimation should be
robust for rare items
Problem 2 (Item Difficulty)
Determine the difficulty level 34 ∈ [1, +] of each item 1
Modeling Skill Improvement
Basic Ideas|Progression Model (Yang et al., 2014)
8
l Users with different skill levels tend to select different items
l User skill is monotonically non-decreasing with respect to time
!" !# !$ !%
& = 1
& = 2
& = 3
+ ! & = 1 ≇ + ! & = -
. ≤ .0 ⟹ &23 ≤ &234
!" !%!$!#
beginner expertintermediate
Inference
9
!" !#!$!%
& = 1 & = 2 & = 3
Update item selection distributions
using skill assignments
Find the optimal skill improvement path
maximizing the joint selection probability
!" !% !$ !#
& = 1
& = 2
& = 3
+ !# & = 1
Initialize skill assignments by dividing each sequence in equal size
until
convergence
Please refer to our paper for more technical details
& = 1 & = 2 & = 3
Rare Item Issue
10
The base progression model (Yang et al., 2014) uses item IDs alone to model item selection
! " # = ! ID " #
This item appears only
once in all sequences
This makes the skill estimation unreliable for rare items
! " # = ! "' = ID " , ⋯ , "* # = +
,-'
*
!, ", #
Multi-faceted features
We use multiple facets "', ⋯ , "* shared across items
(e.g., ingredients, cooking time, the number of steps, ... for recipe)
Modeling Item Difficulty
Basic Idea|Learning from Skill Model
12
Users usually select items whose difficulty is not greater than their skill
(B = beginners; E = experts)
E E E
EE E
hard
B B B
EB E
easy
Assignment-based Estimation
The difficulty !" of an item #
= the mean skill level of users who select # in their actions
Skill level estimated
by our skill model
The estimation goes wrong for easy items selected by a single expert
Rare Item Issue
13
Eeasy
estimated as difficult
since the expert has a high skill level
Generation-based Estimation
The difficulty !" of an item #
= the expected skill level that is assigned to #
where
Calculated by
our skill model
Two approaches
l uniform distr.
l empirical distr.
Experiments
Q1 (Interpretability)
Can our skill model capture domain-dependent skills?
Q2 (Accuracy)
How accurate are our skill and difficulty models?
Q3 (Usefulness)
How useful are our models for practical recommendation?
Q4 (Efficiency)
How efficiently can we train our models?
Datasets
15
Domain Source
!
(# users)
"
(# items)
#
(# actions)
Language Lang-8 51,644 248,009 248,009
Cooking Rakuten Recipe 6,012 37,092 115,337
Beer RateBeer 4,540 8,953 1,986,231
Film MovieLens 85,095 4,589 8,508,819
Synthetic (N/A) 10,000 50,000 500,491
For domains without prior knowledge,
the number of skill levels was determined by using hold-out data
1 2 3 4 5 6 7
# of skill levels
−7.15
−7.10
−7.05
held-outloglikelihood
×10
5
dataset
Cooking
Q1 (Interpretability)|Component Analysis
16
Before After
“i” “I”
! “I”
“english” “English”
! “a”
! “.”
Before After
! “the”
! “(”
! “)”
“the” !
! “of”
Language
Low skill High skill
0 2 4
mean correction count
0.0
0.2
0.4
0.6
probabilitydensity
skill
1
2
3
Error correction count Error correction rules
Learners with higher skills
made fewer errors
l Capitalization
l Missing punctuation
l Misuse of articles
(Yamada and Matsuura, 1982)
l Comments (not errors)
Our skill model can capture the domain-dependent skill improvement successfully
Q1 (Interpretability)|Component Analysis (cont.)
17
Title Year
Pulp Fiction 1994
Star Wars: Episode IV 1977
Star Wars: Episode VI 1983
Star Wars: Episode V 1980
Batman 1989
Frequently watched movies
Low skill
Title Year
Rear Window 1954
The Sound of Music 1965
The Graduate 1967
It’s a Wonderful Life 1946
The Birds 1963
High skill
l Newly released movies
l Light movies
l Not necessary widely appealing
l Classic movies
Our skill model can capture the domain-dependent skill improvement successfully
Film
Q2 (Accuracy)|Setting
l Dataset
❯ Synthetic (only the dataset containing ground truth)
l Measures
❯ Correlations: Pearson’s !, Spearman’s ", Kendall’s #
❯ Error: Root Mean Square Error (RMSE)
l Methods
18
item $ = ($' = ID $ ) item $ = ($' = ID $ , ⋯ , $-)
Uniform
(same as initialization)
ID
(Yang et al., 2014)
Multi-faceted
(proposed)
Q2 (Accuracy)|Result
19
0
0.5
1
1.5
2
Pearson’s r Spearman’s ρ Kendall’s τ RMSE
Uniform
ID
Multi-faceted
0
0.3
0.6
0.9
1.2
Pearson’s r Spearman’s ρ Kendall’s τ RMSE
Uniform + Assignment
ID + Assignment
ID + Uniform
ID + Empirical
Multi-faceted + Assignment
Multi-faceted + Uniform
Multi-faceted + Empirical
l Skill: Multi-faceted model outperformed baselines (uniform and ID)
l Difficulty: Generation-based models (uniform and empirical) performed better than Assignment model
Skill Estimation
Difficulty Estimation
Q3 (Usefulness)|Rating Prediction
l Setting
❯ Dataset: Beer (which contains rating data)
❯ Target: Rating score ∈ 0, 5 of the last item in each user sequence
❯ Measure: RMSE
❯ Methods: Field-aware Factorization Machine (FFM) with different
features
― Base: user and item (U+I)
― Extended by: skill (U+I+S), difficulty(U+I+D), and both (U+I+S+D)
l Result
20
U+I U+I+S U+I+D U+I+S+D
0.571 0.562 0.568 0.561
The estimated skill and difficulty levels both contributed to performance improvement
Q4 (Efficiency)|Running Time
l Setting
❯ Parallelize the computation for learning the skill model
(the most time-consuming process)
l Result
21
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
thread count
0
2
4
6
8
10
runningtime(hour)
method
ID
0ulti-faceted
The increased running time can be reduced to a large extent by parallelization
The multi-faceted model requires additional computation for skill estimation
Summary
22
Contributions
Future work
l Understanding user satisfaction for improving our skill and difficulty models
l Developing a dedicated recommendation algorithm for upskilling
l Exploring suitable timing, interaction, etc. for the recommendation
Introduced skill improvement and item difficulty as core problems to address1
2
3
Proposed skill and difficulty models that utilize multi-faceted item features to
improve the robustness against sparse data
Conducted experiments with five datasets, demonstrating the interpretability,
accuracy, usefulness, and efficiency of the proposed models
beginner expert
very
easy
very
hard
hardeasy
Toward recommendation for upskilling...
Related Work
23
Sequential recommendation
l Suggest items related to recently selected ones
l Similar in that the both capture the ordering patterns of actions
l We focus on quantifying users’ skill levels
Progression modeling
l Infer the dynamics of invisible states that affect observable outcomes
(e.g., disease stages)
l We use (Yang et al., 2014)’s model as the basis for skill estimation
and extend it to improve the robustness against rare items
Knowledge tracing
l Understand the knowledge of students interacting with exercises
l We don’t consider feedback from users to cover more diverse domains

Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences

  • 1.
    Toward Recommendation forUpskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences ICDE 2020 – Research 3 (Recommendation Systems) Kazutoshi Umemoto (The University of Tokyo) Tova Milo (Tel Aviv University) Masaru Kitsuregawa (The University of Tokyo)
  • 2.
    “Life is aseries of practices” 2 Cooking dishes Playing the guitarLearning a second language l We don’t know what to do at first l Through many trials and errors, we gain knowledge, ability, or skill
  • 3.
    Motivation 3 Can information systemshelp people improve their skills? l Repeating easy tasks would not improve people’s skills l Too-challenging tasks would be difficult to complete
  • 4.
    Goal: Recommender Systemsfor Upskilling 4 This item helps you improve your skill skill = 3 skill = 4 difficulty = 3.3 Challenges Understanding the dynamics of user skills l How large is the target user’s skill level at a particular time? l When does his/her skill improve? 1 2 3 The scope of the present work Estimating the difficulty of each item l Which items are more difficult (and why)? Recommending items suitable for upskilling l How to find challenging and interesting items? l When and how to present such items?
  • 5.
    Action sequences !are given as input, where each action " = (%, ', () consists of time %, user ', and item ( Problem Definition 5 We don’t include user feedback (e.g., rating scores) in actions to cover diverse domains '* '+ ', (* (+ (, !-. !-/ !-0 ⊂ ! ∈ ℐ 4 ∋ (items) (actions)(users) % (times) l recipes l movies l ...
  • 6.
    Problem Definition (cont.) 6 Skillimprovement Item difficulty Problem 1 (Skill Improvement) Determine the skill level !"# ∈ % = {1, ⋯ , +} of each user - at each time . when - takes an action / = (., -, 1) The various improvement patterns should be captured The estimation should be robust for rare items Problem 2 (Item Difficulty) Determine the difficulty level 34 ∈ [1, +] of each item 1
  • 7.
  • 8.
    Basic Ideas|Progression Model(Yang et al., 2014) 8 l Users with different skill levels tend to select different items l User skill is monotonically non-decreasing with respect to time !" !# !$ !% & = 1 & = 2 & = 3 + ! & = 1 ≇ + ! & = - . ≤ .0 ⟹ &23 ≤ &234 !" !%!$!# beginner expertintermediate
  • 9.
    Inference 9 !" !#!$!% & =1 & = 2 & = 3 Update item selection distributions using skill assignments Find the optimal skill improvement path maximizing the joint selection probability !" !% !$ !# & = 1 & = 2 & = 3 + !# & = 1 Initialize skill assignments by dividing each sequence in equal size until convergence Please refer to our paper for more technical details & = 1 & = 2 & = 3
  • 10.
    Rare Item Issue 10 Thebase progression model (Yang et al., 2014) uses item IDs alone to model item selection ! " # = ! ID " # This item appears only once in all sequences This makes the skill estimation unreliable for rare items ! " # = ! "' = ID " , ⋯ , "* # = + ,-' * !, ", # Multi-faceted features We use multiple facets "', ⋯ , "* shared across items (e.g., ingredients, cooking time, the number of steps, ... for recipe)
  • 11.
  • 12.
    Basic Idea|Learning fromSkill Model 12 Users usually select items whose difficulty is not greater than their skill (B = beginners; E = experts) E E E EE E hard B B B EB E easy Assignment-based Estimation The difficulty !" of an item # = the mean skill level of users who select # in their actions Skill level estimated by our skill model
  • 13.
    The estimation goeswrong for easy items selected by a single expert Rare Item Issue 13 Eeasy estimated as difficult since the expert has a high skill level Generation-based Estimation The difficulty !" of an item # = the expected skill level that is assigned to # where Calculated by our skill model Two approaches l uniform distr. l empirical distr.
  • 14.
    Experiments Q1 (Interpretability) Can ourskill model capture domain-dependent skills? Q2 (Accuracy) How accurate are our skill and difficulty models? Q3 (Usefulness) How useful are our models for practical recommendation? Q4 (Efficiency) How efficiently can we train our models?
  • 15.
    Datasets 15 Domain Source ! (# users) " (#items) # (# actions) Language Lang-8 51,644 248,009 248,009 Cooking Rakuten Recipe 6,012 37,092 115,337 Beer RateBeer 4,540 8,953 1,986,231 Film MovieLens 85,095 4,589 8,508,819 Synthetic (N/A) 10,000 50,000 500,491 For domains without prior knowledge, the number of skill levels was determined by using hold-out data 1 2 3 4 5 6 7 # of skill levels −7.15 −7.10 −7.05 held-outloglikelihood ×10 5 dataset Cooking
  • 16.
    Q1 (Interpretability)|Component Analysis 16 BeforeAfter “i” “I” ! “I” “english” “English” ! “a” ! “.” Before After ! “the” ! “(” ! “)” “the” ! ! “of” Language Low skill High skill 0 2 4 mean correction count 0.0 0.2 0.4 0.6 probabilitydensity skill 1 2 3 Error correction count Error correction rules Learners with higher skills made fewer errors l Capitalization l Missing punctuation l Misuse of articles (Yamada and Matsuura, 1982) l Comments (not errors) Our skill model can capture the domain-dependent skill improvement successfully
  • 17.
    Q1 (Interpretability)|Component Analysis(cont.) 17 Title Year Pulp Fiction 1994 Star Wars: Episode IV 1977 Star Wars: Episode VI 1983 Star Wars: Episode V 1980 Batman 1989 Frequently watched movies Low skill Title Year Rear Window 1954 The Sound of Music 1965 The Graduate 1967 It’s a Wonderful Life 1946 The Birds 1963 High skill l Newly released movies l Light movies l Not necessary widely appealing l Classic movies Our skill model can capture the domain-dependent skill improvement successfully Film
  • 18.
    Q2 (Accuracy)|Setting l Dataset ❯Synthetic (only the dataset containing ground truth) l Measures ❯ Correlations: Pearson’s !, Spearman’s ", Kendall’s # ❯ Error: Root Mean Square Error (RMSE) l Methods 18 item $ = ($' = ID $ ) item $ = ($' = ID $ , ⋯ , $-) Uniform (same as initialization) ID (Yang et al., 2014) Multi-faceted (proposed)
  • 19.
    Q2 (Accuracy)|Result 19 0 0.5 1 1.5 2 Pearson’s rSpearman’s ρ Kendall’s τ RMSE Uniform ID Multi-faceted 0 0.3 0.6 0.9 1.2 Pearson’s r Spearman’s ρ Kendall’s τ RMSE Uniform + Assignment ID + Assignment ID + Uniform ID + Empirical Multi-faceted + Assignment Multi-faceted + Uniform Multi-faceted + Empirical l Skill: Multi-faceted model outperformed baselines (uniform and ID) l Difficulty: Generation-based models (uniform and empirical) performed better than Assignment model Skill Estimation Difficulty Estimation
  • 20.
    Q3 (Usefulness)|Rating Prediction lSetting ❯ Dataset: Beer (which contains rating data) ❯ Target: Rating score ∈ 0, 5 of the last item in each user sequence ❯ Measure: RMSE ❯ Methods: Field-aware Factorization Machine (FFM) with different features ― Base: user and item (U+I) ― Extended by: skill (U+I+S), difficulty(U+I+D), and both (U+I+S+D) l Result 20 U+I U+I+S U+I+D U+I+S+D 0.571 0.562 0.568 0.561 The estimated skill and difficulty levels both contributed to performance improvement
  • 21.
    Q4 (Efficiency)|Running Time lSetting ❯ Parallelize the computation for learning the skill model (the most time-consuming process) l Result 21 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 thread count 0 2 4 6 8 10 runningtime(hour) method ID 0ulti-faceted The increased running time can be reduced to a large extent by parallelization The multi-faceted model requires additional computation for skill estimation
  • 22.
    Summary 22 Contributions Future work l Understandinguser satisfaction for improving our skill and difficulty models l Developing a dedicated recommendation algorithm for upskilling l Exploring suitable timing, interaction, etc. for the recommendation Introduced skill improvement and item difficulty as core problems to address1 2 3 Proposed skill and difficulty models that utilize multi-faceted item features to improve the robustness against sparse data Conducted experiments with five datasets, demonstrating the interpretability, accuracy, usefulness, and efficiency of the proposed models beginner expert very easy very hard hardeasy Toward recommendation for upskilling...
  • 23.
    Related Work 23 Sequential recommendation lSuggest items related to recently selected ones l Similar in that the both capture the ordering patterns of actions l We focus on quantifying users’ skill levels Progression modeling l Infer the dynamics of invisible states that affect observable outcomes (e.g., disease stages) l We use (Yang et al., 2014)’s model as the basis for skill estimation and extend it to improve the robustness against rare items Knowledge tracing l Understand the knowledge of students interacting with exercises l We don’t consider feedback from users to cover more diverse domains