SlideShare a Scribd company logo
1 of 30
Download to read offline
Recommender system for education
May 22, 2017
Changho Suh
EE, KAIST
Joint work w/ Kangwook Lee & Jichan Chung
Recommendation system
for
for
for
A system that recommends favorite items to users
Examples:
• Matrix completion!
• Goal: Complete missing entries from partially observed entries
• Why related to recommender system?
• Example:
Algorithm for recommendation?
User 1
User 2
User 3
User 4
User 5
Movie1
Movie2
Movie3
Movie4
Movie5
Movie6
Movie7
Movie8
Movie9
Movie10
3 4 5 5 2 2
1 2 4 3 4
2 4 2 2 4
5 4 4 3
2 3 1 3 5 1 1
3 4 5 4 3 5 5 2 2 2
1 2 3 5 2 4 3 3 4 4
5 2 3 4 2 5 3 2 4 3
5 5 4 3 4 4 1 3 5 2
2 3 3 1 3 4 5 2 1 1
User 1
User 2
User 3
User 4
User 5
Movie1
Movie2
Movie3
Movie4
Movie5
Movie6
Movie7
Movie8
Movie9
Movie10
recommend!
Connection to education
• A common way to educate students is via problem solving!
• How to recommend problem sets via matrix completion?
Problem sets solved by a typical
high school student in Korea> 50m copies
Student 1
Student 2
Student 3
Student 4
Student 5
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
O O X O X O O X
O O O
O X O O X X
O O X O
O X O O O O
Student 1
Student 2
Student 3
Student 4
Student 5
O O X O X O X O O X
O O O O O O X O O O
O X O X O X O O X X
O X O X O X O O O O
O O X X O O O O X O
Education via matrix completion
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
recommend!
Two-step approach for matrix completion
Probability that user i guess correctly for problem j
Assume: There exists a latent matrix.
Step 1: Estimate P using test responses (O/X)
Step 2: Flip coins with parameters in P to predict
predicted binary matrix
Structure of the latent matrix P
Observation: There are only a few concepts that constitute problem sets.
user features
question features
User i’s understanding level on concept j
Probability that user i guess correctly for problem j
Concept j’s contribution level to problem i
An earlier approach
user features
question features
1. Estimate R with the help of experts.
3. Estimate L solving the linear-regression problem.
2. It then reduces to a linear regression problem.
How to estimate R?
0 .5 .5 0 0
“pronoun”
Question
____________ who want to apply for this position are requested to
submit their performance.
(A) You (B) Those (C) Another (D) Some
expert
Do this for every question!
Issue: Depends on experts
0 .5 .5 0 0
“pronoun”
Question
0 .8 .2 0 0
____________ who want to apply for this position are requested to
submit their performance.
(A) You (B) Those (C) Another (D) Some
0 .5 .5 0 0
“pronoun”
Question
0 .8 .2 0 0
0 0 .2 0 0 .8
____________ who want to apply for this position are requested to
submit their performance.
(A) You (B) Those (C) Another (D) Some
Issue: Depends on experts
Performance evaluation
(detecionprob:OO)
Predict everything as 0
Predict everything as 1
Predict a randomly chosen half as 1
Correctly guess everything!
Performance evaluation
Area Under Curve (AUC) = 0.5
Performance evaluation
0.50
0.58
Regression (0.58)
Performance evaluation
Evaluation: Baseline / Regression / User-based
0.50
0.58
0.65
User-based (0.65)
Performance evaluation
Evaluation: Baseline / Regression / User-based / Q-based
Question-based (0.72)
0.50
0.58
0.65
0.72
Performance evaluation
A different approach [Lee, Chung, Cha, Suh, 16’]
user features
question features
Estimate both L and R simultaneously!
Algorithm
• Find the user & question features
• using partially observed entries
• that maximize the likelihood (= minimize the negative log likelihood)
• while minimizing the rank of P (rank regularization)
We use stochastic gradient descent (SGD) for scalability
approximately
equivalent
Experiments: Data set
• Mobile applications (iOS/Android) launched in Korea
• Equipped w/ 3,835 TOEIC questions
• Easy to collect data
• Data had been collected from 1/1/2016 to 1/15/2017
• As a result,
• 124k students signed up, 8.9m responses collected
• ~1.86% responses are revealed
• Many outliers
• Our app became so popular that a lot of
people signed up just for checking out
• Needed to preprocess the data
Results (AUC)
AUC = 0.77
0.50
0.58
0.65
0.72
0.77
A magical advertisement
Diagnostic tests
Email w/ prediction
2nd test
Comparison
Limitations of the model-based approaches
• Requires domain experts for precise modeling
• E.g., Item Response Theory for educational data
• Requires different models for different types of input/outputs
• E.g., multiple choices or numerical responses instead of O/X
• Performance is affected by various factors
• E.g., Model, data, and algorithm
Benefits of deep learning approach
• Does not require domain experts for precise modeling
• E.g., Item Response Theory for educational data
• Does not requires different models for different types of input/outputs
• E.g., multiple choices or numerical values instead of O/X
• Performance is affected by a single dominant factor:
• Data
Which deep neural networks?
Task:
Complete missing entries
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
CNN: Object detection, image classification
RNN: Speech recognition
noisy version
Denoising autoencoder
Task:
Complete missing entries
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
noisy version
Denoising autoencoder
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
How to train weights?
How to generate training data sets?
Idea for training weights
O
X
O
?
X
?
?
O
O
X
O
O
X
O
O
O
Artificially corrupt part of observed entries.
O
X
O
?
?
?
?
O
Use the corrupted version as training data.
Use them to optimize weights.
Preliminary results (AUC)
AUC = 0.78
0.50
0.58
0.65
0.72
0.77
0.78
Other applications
AI Tutoring AI Assistance
Media Rec.E-commerce
-쇼핑몰 추천
-상품 추천
-영화 추천
-음악 추천
-문제 추천
-학습자료 추천
- 맞춤형 비서

More Related Content

More from NAVER Engineering

Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
NAVER Engineering
 

More from NAVER Engineering (20)

플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 
200819 NAVER TECH CONCERT 06_놓치기 쉬운 안드로이드 UI 디테일 살펴보기
200819 NAVER TECH CONCERT 06_놓치기 쉬운 안드로이드 UI 디테일 살펴보기200819 NAVER TECH CONCERT 06_놓치기 쉬운 안드로이드 UI 디테일 살펴보기
200819 NAVER TECH CONCERT 06_놓치기 쉬운 안드로이드 UI 디테일 살펴보기
 
200819 NAVER TECH CONCERT 04_NDK로 안드로이드에 C++ 끼얹기
200819 NAVER TECH CONCERT 04_NDK로 안드로이드에 C++ 끼얹기200819 NAVER TECH CONCERT 04_NDK로 안드로이드에 C++ 끼얹기
200819 NAVER TECH CONCERT 04_NDK로 안드로이드에 C++ 끼얹기
 
200819 NAVER TECH CONCERT 02_안드로이드의 '안'자도 몰랐던 나는 어떻게 안드로이드 개발자가 되었을까?
200819 NAVER TECH CONCERT 02_안드로이드의 '안'자도 몰랐던 나는 어떻게 안드로이드 개발자가 되었을까?200819 NAVER TECH CONCERT 02_안드로이드의 '안'자도 몰랐던 나는 어떻게 안드로이드 개발자가 되었을까?
200819 NAVER TECH CONCERT 02_안드로이드의 '안'자도 몰랐던 나는 어떻게 안드로이드 개발자가 되었을까?
 
200819 NAVER TECH CONCERT 01_100만 달러짜리 빠른 앱을 만드는 비법 전수
200819 NAVER TECH CONCERT 01_100만 달러짜리 빠른 앱을 만드는 비법 전수200819 NAVER TECH CONCERT 01_100만 달러짜리 빠른 앱을 만드는 비법 전수
200819 NAVER TECH CONCERT 01_100만 달러짜리 빠른 앱을 만드는 비법 전수
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
Redux
ReduxRedux
Redux
 
Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
 
An expanding and expansive view of computing research
An expanding and expansive view of computing researchAn expanding and expansive view of computing research
An expanding and expansive view of computing research
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Recommender system for education

  • 1. Recommender system for education May 22, 2017 Changho Suh EE, KAIST Joint work w/ Kangwook Lee & Jichan Chung
  • 2. Recommendation system for for for A system that recommends favorite items to users Examples:
  • 3. • Matrix completion! • Goal: Complete missing entries from partially observed entries • Why related to recommender system? • Example: Algorithm for recommendation? User 1 User 2 User 3 User 4 User 5 Movie1 Movie2 Movie3 Movie4 Movie5 Movie6 Movie7 Movie8 Movie9 Movie10 3 4 5 5 2 2 1 2 4 3 4 2 4 2 2 4 5 4 4 3 2 3 1 3 5 1 1 3 4 5 4 3 5 5 2 2 2 1 2 3 5 2 4 3 3 4 4 5 2 3 4 2 5 3 2 4 3 5 5 4 3 4 4 1 3 5 2 2 3 3 1 3 4 5 2 1 1 User 1 User 2 User 3 User 4 User 5 Movie1 Movie2 Movie3 Movie4 Movie5 Movie6 Movie7 Movie8 Movie9 Movie10 recommend!
  • 4. Connection to education • A common way to educate students is via problem solving! • How to recommend problem sets via matrix completion? Problem sets solved by a typical high school student in Korea> 50m copies
  • 5. Student 1 Student 2 Student 3 Student 4 Student 5 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 O O X O X O O X O O O O X O O X X O O X O O X O O O O Student 1 Student 2 Student 3 Student 4 Student 5 O O X O X O X O O X O O O O O O X O O O O X O X O X O O X X O X O X O X O O O O O O X X O O O O X O Education via matrix completion Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 recommend!
  • 6. Two-step approach for matrix completion Probability that user i guess correctly for problem j Assume: There exists a latent matrix. Step 1: Estimate P using test responses (O/X) Step 2: Flip coins with parameters in P to predict predicted binary matrix
  • 7. Structure of the latent matrix P Observation: There are only a few concepts that constitute problem sets. user features question features User i’s understanding level on concept j Probability that user i guess correctly for problem j Concept j’s contribution level to problem i
  • 8. An earlier approach user features question features 1. Estimate R with the help of experts. 3. Estimate L solving the linear-regression problem. 2. It then reduces to a linear regression problem.
  • 9. How to estimate R? 0 .5 .5 0 0 “pronoun” Question ____________ who want to apply for this position are requested to submit their performance. (A) You (B) Those (C) Another (D) Some expert Do this for every question!
  • 10. Issue: Depends on experts 0 .5 .5 0 0 “pronoun” Question 0 .8 .2 0 0 ____________ who want to apply for this position are requested to submit their performance. (A) You (B) Those (C) Another (D) Some
  • 11. 0 .5 .5 0 0 “pronoun” Question 0 .8 .2 0 0 0 0 .2 0 0 .8 ____________ who want to apply for this position are requested to submit their performance. (A) You (B) Those (C) Another (D) Some Issue: Depends on experts
  • 13. Predict everything as 0 Predict everything as 1 Predict a randomly chosen half as 1 Correctly guess everything! Performance evaluation
  • 14. Area Under Curve (AUC) = 0.5 Performance evaluation
  • 16. Evaluation: Baseline / Regression / User-based 0.50 0.58 0.65 User-based (0.65) Performance evaluation
  • 17. Evaluation: Baseline / Regression / User-based / Q-based Question-based (0.72) 0.50 0.58 0.65 0.72 Performance evaluation
  • 18. A different approach [Lee, Chung, Cha, Suh, 16’] user features question features Estimate both L and R simultaneously!
  • 19. Algorithm • Find the user & question features • using partially observed entries • that maximize the likelihood (= minimize the negative log likelihood) • while minimizing the rank of P (rank regularization) We use stochastic gradient descent (SGD) for scalability approximately equivalent
  • 20. Experiments: Data set • Mobile applications (iOS/Android) launched in Korea • Equipped w/ 3,835 TOEIC questions • Easy to collect data • Data had been collected from 1/1/2016 to 1/15/2017 • As a result, • 124k students signed up, 8.9m responses collected • ~1.86% responses are revealed • Many outliers • Our app became so popular that a lot of people signed up just for checking out • Needed to preprocess the data
  • 21. Results (AUC) AUC = 0.77 0.50 0.58 0.65 0.72 0.77
  • 22. A magical advertisement Diagnostic tests Email w/ prediction 2nd test Comparison
  • 23. Limitations of the model-based approaches • Requires domain experts for precise modeling • E.g., Item Response Theory for educational data • Requires different models for different types of input/outputs • E.g., multiple choices or numerical responses instead of O/X • Performance is affected by various factors • E.g., Model, data, and algorithm
  • 24. Benefits of deep learning approach • Does not require domain experts for precise modeling • E.g., Item Response Theory for educational data • Does not requires different models for different types of input/outputs • E.g., multiple choices or numerical values instead of O/X • Performance is affected by a single dominant factor: • Data
  • 25. Which deep neural networks? Task: Complete missing entries O X O ? X ? ? O O X O O X O O O CNN: Object detection, image classification RNN: Speech recognition noisy version
  • 26. Denoising autoencoder Task: Complete missing entries O X O ? X ? ? O O X O O X O O O noisy version
  • 27. Denoising autoencoder O X O ? X ? ? O O X O O X O O O How to train weights? How to generate training data sets?
  • 28. Idea for training weights O X O ? X ? ? O O X O O X O O O Artificially corrupt part of observed entries. O X O ? ? ? ? O Use the corrupted version as training data. Use them to optimize weights.
  • 29. Preliminary results (AUC) AUC = 0.78 0.50 0.58 0.65 0.72 0.77 0.78
  • 30. Other applications AI Tutoring AI Assistance Media Rec.E-commerce -쇼핑몰 추천 -상품 추천 -영화 추천 -음악 추천 -문제 추천 -학습자료 추천 - 맞춤형 비서