Recommender system for education

Recommender system for education
May 22, 2017
Changho Suh
EE, KAIST
Joint work w/ Kangwook Lee & Jichan Chung

Recommendation system
for
for
for
A system that recommends favorite items to users
Examples:

• Matrix completion!
• Goal: Complete missing entries from partially observed entries
• Why related to recommender system?
• Example:
Algorithm for recommendation?
User 1
User 2
User 3
User 4
User 5
Movie1
Movie2
Movie3
Movie4
Movie5
Movie6
Movie7
Movie8
Movie9
Movie10
3 4 5 5 2 2
1 2 4 3 4
2 4 2 2 4
5 4 4 3
2 3 1 3 5 1 1
3 4 5 4 3 5 5 2 2 2
1 2 3 5 2 4 3 3 4 4
5 2 3 4 2 5 3 2 4 3
5 5 4 3 4 4 1 3 5 2
2 3 3 1 3 4 5 2 1 1
User 1
User 2
User 3
User 4
User 5
Movie1
Movie2
Movie3
Movie4
Movie5
Movie6
Movie7
Movie8
Movie9
Movie10
recommend!

Connection to education
• A common way to educate students is via problem solving!
• How to recommend problem sets via matrix completion?
Problem sets solved by a typical
high school student in Korea> 50m copies

Student 1
Student 2
Student 3
Student 4
Student 5
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
O O X O X O O X
O O O
O X O O X X
O O X O
O X O O O O
Student 1
Student 2
Student 3
Student 4
Student 5
O O X O X O X O O X
O O O O O O X O O O
O X O X O X O O X X
O X O X O X O O O O
O O X X O O O O X O
Education via matrix completion
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
recommend!

Two-step approach for matrix completion
Probability that user i guess correctly for problem j
Assume: There exists a latent matrix.
Step 1: Estimate P using test responses (O/X)
Step 2: Flip coins with parameters in P to predict
predicted binary matrix

Structure of the latent matrix P
Observation: There are only a few concepts that constitute problem sets.
user features
question features
User i’s understanding level on concept j
Probability that user i guess correctly for problem j
Concept j’s contribution level to problem i

An earlier approach
user features
question features
1. Estimate R with the help of experts.
3. Estimate L solving the linear-regression problem.
2. It then reduces to a linear regression problem.

How to estimate R?
0 .5 .5 0 0
“pronoun”
Question
____________ who want to apply for this position are requested to
submit their performance.
(A) You (B) Those (C) Another (D) Some
expert
Do this for every question!

Issue: Depends on experts
0 .5 .5 0 0
“pronoun”
Question
0 .8 .2 0 0

0 .5 .5 0 0
“pronoun”
Question
0 .8 .2 0 0
0 0 .2 0 0 .8
Issue: Depends on experts

Performance evaluation
(detecionprob:OO)

Predict everything as 0
Predict everything as 1
Predict a randomly chosen half as 1
Correctly guess everything!

Area Under Curve (AUC) = 0.5

0.50
0.58
Regression (0.58)

Evaluation: Baseline / Regression / User-based
0.50
0.58
0.65
User-based (0.65)

Evaluation: Baseline / Regression / User-based / Q-based
Question-based (0.72)
0.50
0.58
0.65
0.72

A different approach [Lee, Chung, Cha, Suh, 16’]
user features
question features
Estimate both L and R simultaneously!

Algorithm
• Find the user & question features
• using partially observed entries
• that maximize the likelihood (= minimize the negative log likelihood)
• while minimizing the rank of P (rank regularization)
We use stochastic gradient descent (SGD) for scalability
approximately
equivalent

Experiments: Data set
• Mobile applications (iOS/Android) launched in Korea
• Equipped w/ 3,835 TOEIC questions
• Easy to collect data
• Data had been collected from 1/1/2016 to 1/15/2017
• As a result,
• 124k students signed up, 8.9m responses collected
• ~1.86% responses are revealed
• Many outliers
• Our app became so popular that a lot of
people signed up just for checking out
• Needed to preprocess the data

Results (AUC)
AUC = 0.77
0.50
0.58
0.65
0.72
0.77

A magical advertisement
Diagnostic tests
Email w/ prediction
2nd test
Comparison

Limitations of the model-based approaches
• Requires domain experts for precise modeling
• E.g., Item Response Theory for educational data
• Requires different models for different types of input/outputs
• E.g., multiple choices or numerical responses instead of O/X
• Performance is affected by various factors
• E.g., Model, data, and algorithm

Benefits of deep learning approach
• Does not require domain experts for precise modeling
• E.g., Item Response Theory for educational data
• Does not requires different models for different types of input/outputs
• E.g., multiple choices or numerical values instead of O/X
• Performance is affected by a single dominant factor:
• Data

Which deep neural networks?
Task:
Complete missing entries
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
CNN: Object detection, image classification
RNN: Speech recognition
noisy version

Denoising autoencoder
Task:
Complete missing entries
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
noisy version

Denoising autoencoder
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
How to train weights?
How to generate training data sets?

Idea for training weights
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
Artificially corrupt part of observed entries.
O
X
O
?
?
?
?
O
Use the corrupted version as training data.
Use them to optimize weights.

Preliminary results (AUC)
AUC = 0.78
0.50
0.58
0.65
0.72
0.77
0.78

Other applications
AI Tutoring AI Assistance
Media Rec.E-commerce
-쇼핑몰 추천
-상품 추천
-영화 추천
-음악 추천
-문제 추천
-학습자료 추천
- 맞춤형 비서

Recommender system for education

Recommended

Recommended

More Related Content

More from NAVER Engineering

More from NAVER Engineering (20)

Recently uploaded

Recently uploaded (20)

Recommender system for education