SlideShare a Scribd company logo
Reward-Constrained Interactive Recommendation
with Natural Language Feedback
2020. 02. 24.
Jeong-Gwan Lee
1
"Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning." NeurIPS 2019
(Duke University, Samsung Research America, University at Buffalo)
2
Table of contents
● Visual Item Interactive Recommendation
● Non-Natural Language Feedback
● Natural Language Feedback
● Dataset and Setup
● MDP & Constrained MDP
● Recommendation as MDP
● Reward Constrained Recommender Model
● Model Detail(Feature Extractor, Discriminator, Recommender)
● Reward function
● Recommendation as Constrained MDP
● Model Training
● Evaluation
● Conclusion
3
Visual Item Interactive Recommendation
Recommender system has sought to interact with users,
to adapt to user preferences over time.
• Non-Natural Language Feedback
• Clicking Data
• Updated Rating
They provide little information to reflect complex user attitude.
……Round 1
Round 2
……Round 1
Round 2
0.2 0.2 0.6 0.8
4
Visual Item Interactive Recommendation
Text-based recommendation provides richer user feedback.
• Natural Language Feedback (Not dialogue-based)
This paper targets this setting.
Recommender
Seeker
5
Visual Item Recommendation
with Natural Language Feedback Setting
UT-Zappos50K
• A shoe dataset consisting of 50,025 shoe images.
• Samples
• Labels
6
Visual Item Recommendation
with Natural Language Feedback Setting
UT-Zappos50K
• A shoe dataset consisting of 50,025 shoe images.
• Rich attribute data
1. shoes category(4) = {Shoes, Boots, Sandals, Slippers}
2. shoes subcategory(21) = {Oxfords, MidCalf, Heel, Ankle,…}
3. heel height(7) = {flat, Under 1inch, 1~2inch, 2~3inch,…}
4. closure(18) = {leather, padded, removable,…}
5. gender(8) = {men, women, boys, girls,…}
6. toe style(17) = {Capped, Round, Square,…}
7
Dataset and Setup
User simulator
• Unfortunately, Zappos50K didn’t collect the user’s comments relevant
to attributes with ground truth.
1. Given pairs of recommended item and desired item, (10,000 pairs)
the real-world sentences are collected from annotators.
2. From above, the authors derive several sentence templates and
synthesize 20,000 labeled sentence by filling these templates
with the attribute label.
3. They train a Seq2seq based user simulator.
(input : the difference on one attribute value between two items,
output: a sentence describing the visual attribute difference)
Template
recommended desired
Show me more shoes with round toe.
Gender : Men Gender : Women
I prefer shoes for women.
8
Reward Constrained Recommendation
They propose Reward Constrained Recommendation(RCR),
which sequentially incorporates constraints from previous
feedback.
• A constraint-augmented RL problem setting
• A learnable discriminator to detect violations of user
preferences in an adversarial manner
9
MDP & Constrained MDP
MDP(Markov Decision Process)
Constrained MDP
10
Recommendation as MDP
We can model the recommendation-feedback loop as an MDP,
abstractly.
Recommender
Seeker
𝒔 𝟏
𝒂 𝟏
𝒙 𝟏
𝒓 𝟏?
𝒔 𝟐
𝒂 𝟐
𝒙 𝟐
𝒓 𝟐?
𝒔 𝟑
𝒂 𝟑
𝒙 𝟑
𝒓 𝟑?
𝒔 𝟒
𝒓 𝟒?
𝒂 𝟒
𝒙 𝟒
11
Remind of dataset
UT-Zappos50K
• A shoe dataset consisting of 50,025 shoe images.
• Rich attribute data (shoes category(4), shoes subcategory(21), heel
height(7), closure(18), gender(8) and toe style(17))
• Samples
• Labels
12
Reward Constrained Recommender Model
Feature Extractor (extract features of feedback, recommended items)
Recommender (predict attributes, match, and recommend)
Discriminator (prevent constraint violation)
13
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
14
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
Cat : Shoes
SubCat : Dress shoes
HeelHei. : X
Closure : …
Attributes (at training time)
15
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
ResNet50
AttrNet
Concat
Visual Encoder
Cat : Shoes
SubCat : Dress shoes
HeelHei. : X
Closure : …
Attributes (at training time)
16
Feature Extractor
Visual Encoder = ResNet50[1] + AttrNet (pretrained)
Textual Encoder = Embedding + LSTM + FC
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
ResNet50
AttrNet
Concat
Visual Encoder
Cat : Shoes
SubCat : Dress shoes
HeelHei. : X
Closure : …
Attributes (at training time)
Category(4)
SubCategory(21)
Heel Height(7)
AttrNet
…
ResNet
Features
Attribute Net
17
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
Feature Representation
18
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
Categorical
Sampling!
FCs
FCs
…
Policy 𝝅 𝜽 with multi-discrete action space
Softmax
Softmax
FCs Softmax
Category(4)
SubCategory(21)
Heel Height(7)…
Feature Representation
19
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
ResNet50 AttrNet
Visual Encoder
Categorical
Sampling!
FCs
FCs
…
Policy 𝝅 𝜽 with multi-discrete action space
Softmax
Softmax
FCs Softmax
Category(4)
SubCategory(21)
Heel Height(7)…
Feature Representation
20
Recommender
Policy 𝝅 𝜽 selects the closest to the sampled attribute values under
Euclidean distance in the visual attribute space.
ResNet50 AttrNet
Visual Encoder
Categorical
Sampling!
FCs
FCs
…
Policy 𝝅 𝜽 with multi-discrete action space
Softmax
Softmax
FCs Softmax
Category(4)
SubCategory(21)
Heel Height(7)…
Feature Representation
Category = shoes
SubCat = heel
Heel.H = 3 inch.
[1,0,0,0]
[0,0,0,1,….]
[0,0,1,0,….]
Categorical Sampling Results
…
Euclidean
distance
Distance-based Matching
21
Reward function
Reward : the visual and attribute similarity between the
recommended and desired items.
• It is desired that the recommended one becomes more similar to the
desired one with more interaction
• We want to minimize visual and attribute difference.
• to ensure the scales of the two distances are similar
• If the system can’t find the desired item before 50 iterations,
the system will receive an extra reward -3 (as a penalty)
Recommender
Seeker
22
Why explicitly constraints need?
RL algorithms which doesn’t consider constraints easily violate
preference from past feedback, since it needs to explore new items
for further improvement.
• Success case
• Failure case
Recommender
Seeker
23
Discriminator
Discriminator 𝐶" outputs whether the recommended item
violates the user comment.
𝑥!"# : I prefer leather.
𝑥! : I prefer high heel.
…
Feedback History
24
Collecting (non-)violation distribution
One user session
User session finish!
25
Collecting (non-)violation distribution
One user session
Non-violation pair
26
Collecting (non-)violation distribution
One user session
Violation pair
27
Collecting (non-)violation distribution
One user session
Non-violation pair
28
Discriminator
A discriminator is defined as a constraint function.
• Discriminator training
• 𝐶" 𝒔, 𝒂 is induced to 1, if violation.
• 𝐶" 𝒔, 𝒂 is induced to 0, if non-violation.
violation pair non-violation pair
29
Collecting (non-)violation distribution
Discriminator is updated after each user session.
It can’t be pretrained.
• To judge violations or not, we need sequential feedbacks.
• But the dataset doesn’t have sequential feedback.
(only user simulator)
One user session
User session finish!
30
Remind: Reward Constrained Recommender Model
Feature Extractor (extract features of feedback, rec. items)
Discriminator (prevent constraint violation)
Recommender (predict attributes, match, and recommend)
𝑪 𝝓(𝒔, 𝒂)
𝝅 𝜽(𝒂|𝐬)
31
Recommendation as Constrained MDP
Directly solving the constrained-optimization is difficult,
Lagrange relaxation transforms the objective to dual problem.
• Primal problem
• Dual problem(refer to Appendix: Lagrange Relaxation)
• Lagrangian function
• Relaxed objective
Lagrange multiplier
32
Recommendation as Constrained MDP
The goal is to find a saddle point,
can be achieved by alternating gradient descent/ascent
approximately.
Reward function with constraints penalizes the policy for violation.
𝜆 is also optimized to ensure the constraints.
1) If violations happen, 𝜆 will increase to penalize the policy.
2) If there is no violation, 𝜆 will decrease to give the policy more reward
Reward function with Constraints
33
Model Training
Reward Constrained Recommendation Process
• Alternatively training the discriminator 𝐶& and the recommender 𝜋'
: a projection operator, which
keeps the stability as the parameters
are updated within a trust region[1]
: projects 𝜆 into the range [0, 𝜆()*]
[1] Schulman, John, et al. "Trust region policy optimization." International conference on machine
learning. 2015.
One user session
34
Evaluation
SR@K : Success Rate after K interactions
NI : Number of user Interactions before success
NV : Number of Violated attributes compared with the desired
attributes of users
𝜆 increases at early stage
(since violation ↑),
𝜆 becomes stable more.
𝜆 ≈ 0.04 is automatically learned
discriminator weight.
35
Evaluation
RL baseline : ignoring the constraints.
RL + Naive constraints : Fixed the lagrange multiplier 𝜆
• All models are trained for 100,000 iterations (user sessions)
• Seen : training data
• Unseen : test data
• Averaged over 100 sessions with standard error
The learned constraint (discriminator) has better generalization.
36
Conclusion
They propose Reward Constrained Recommendation(RCR), which
sequentially incorporates constraints from previous feedback.
• A constraint-augmented RL problem setting
• A learnable discriminator to detect violations of user preferences in an
adversarial manner
The proposed method can be extended to other applications,
such as,
1. vision-and-dialogue navigation
2. Interactive Recommendation with user’s prior information
3. Dialogue-based Recommendation
37
Appendix: Lagrange Relaxation
38
Appendix: Generated feedback
Simulator only generates simple comments on the visual
attribute difference between the candidate image and the
desired image
39
Appendix: Hyperparameter setting
In reinforcement learning, they use Adam as the optimizer.
They set ,
• 𝛼 : threshold of constraints (refer to page 15)
• 𝜆()* : projection boundary of 𝜆

More Related Content

What's hot

“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
Edge AI and Vision Alliance
 
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Jian Wu
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
LEE HOSEONG
 
C3 w1
C3 w1C3 w1
C3 w2
C3 w2C3 w2
Learning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifierLearning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifier
Jian Wu
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
LEE HOSEONG
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
SBGC
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
Edge AI and Vision Alliance
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
ankit_ppt
 
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
IJMER
 
C3 w5
C3 w5C3 w5
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
Mohammed Ashour
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
Aditya Bhattacharya
 
Iaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognition
Iaetsd Iaetsd
 
Image super resolution based on
Image super resolution based onImage super resolution based on
Image super resolution based on
jpstudcorner
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
csandit
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Ontico
 

What's hot (20)

“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
Evaluate deep q learning for sequential targeted marketing with 10-fold cross...
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
 
C3 w1
C3 w1C3 w1
C3 w1
 
C3 w2
C3 w2C3 w2
C3 w2
 
Learning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifierLearning keras by building dogs-vs-cats image classifier
Learning keras by building dogs-vs-cats image classifier
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
FixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidenceFixMatch:simplifying semi supervised learning with consistency and confidence
FixMatch:simplifying semi supervised learning with consistency and confidence
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
 
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
“Modern Machine Vision from Basics to Advanced Deep Learning,” a Presentation...
 
Technical Portion of PhD Research
Technical Portion of PhD ResearchTechnical Portion of PhD Research
Technical Portion of PhD Research
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...Enhance Example-Based Super Resolution to Achieve Fine  Magnification of Low ...
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...
 
C3 w5
C3 w5C3 w5
C3 w5
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Iaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognition
 
Image super resolution based on
Image super resolution based onImage super resolution based on
Image super resolution based on
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
 

Similar to Reward constrained interactive recommendation with natural language feedback noani

[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
YONG ZHENG
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Benjamin Bengfort
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profilingyingfeng
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
dongchangim30
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
Turi, Inc.
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
PAPIs.io
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Suvadip Shome
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
gdgsurrey
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
predictionio
 
2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop
Wolfram Arnold
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
ABHISHEK MAURYA
 
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...YONG ZHENG
 
Silhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human posesSilhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human poses
AVVENIRE TECHNOLOGIES
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
Harivamshi D
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
D Yogendra Rao
 

Similar to Reward constrained interactive recommendation with natural language feedback noani (20)

[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Webpage Personalization and User Profiling
Webpage Personalization and User ProfilingWebpage Personalization and User Profiling
Webpage Personalization and User Profiling
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop2011-02-03 LA RubyConf Rails3 TDD Workshop
2011-02-03 LA RubyConf Rails3 TDD Workshop
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
 
Silhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human posesSilhouette analysis based action recognition via exploiting human poses
Silhouette analysis based action recognition via exploiting human poses
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 

Recently uploaded

block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 

Recently uploaded (20)

block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 

Reward constrained interactive recommendation with natural language feedback noani

  • 1. Reward-Constrained Interactive Recommendation with Natural Language Feedback 2020. 02. 24. Jeong-Gwan Lee 1 "Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning." NeurIPS 2019 (Duke University, Samsung Research America, University at Buffalo)
  • 2. 2 Table of contents ● Visual Item Interactive Recommendation ● Non-Natural Language Feedback ● Natural Language Feedback ● Dataset and Setup ● MDP & Constrained MDP ● Recommendation as MDP ● Reward Constrained Recommender Model ● Model Detail(Feature Extractor, Discriminator, Recommender) ● Reward function ● Recommendation as Constrained MDP ● Model Training ● Evaluation ● Conclusion
  • 3. 3 Visual Item Interactive Recommendation Recommender system has sought to interact with users, to adapt to user preferences over time. • Non-Natural Language Feedback • Clicking Data • Updated Rating They provide little information to reflect complex user attitude. ……Round 1 Round 2 ……Round 1 Round 2 0.2 0.2 0.6 0.8
  • 4. 4 Visual Item Interactive Recommendation Text-based recommendation provides richer user feedback. • Natural Language Feedback (Not dialogue-based) This paper targets this setting. Recommender Seeker
  • 5. 5 Visual Item Recommendation with Natural Language Feedback Setting UT-Zappos50K • A shoe dataset consisting of 50,025 shoe images. • Samples • Labels
  • 6. 6 Visual Item Recommendation with Natural Language Feedback Setting UT-Zappos50K • A shoe dataset consisting of 50,025 shoe images. • Rich attribute data 1. shoes category(4) = {Shoes, Boots, Sandals, Slippers} 2. shoes subcategory(21) = {Oxfords, MidCalf, Heel, Ankle,…} 3. heel height(7) = {flat, Under 1inch, 1~2inch, 2~3inch,…} 4. closure(18) = {leather, padded, removable,…} 5. gender(8) = {men, women, boys, girls,…} 6. toe style(17) = {Capped, Round, Square,…}
  • 7. 7 Dataset and Setup User simulator • Unfortunately, Zappos50K didn’t collect the user’s comments relevant to attributes with ground truth. 1. Given pairs of recommended item and desired item, (10,000 pairs) the real-world sentences are collected from annotators. 2. From above, the authors derive several sentence templates and synthesize 20,000 labeled sentence by filling these templates with the attribute label. 3. They train a Seq2seq based user simulator. (input : the difference on one attribute value between two items, output: a sentence describing the visual attribute difference) Template recommended desired Show me more shoes with round toe. Gender : Men Gender : Women I prefer shoes for women.
  • 8. 8 Reward Constrained Recommendation They propose Reward Constrained Recommendation(RCR), which sequentially incorporates constraints from previous feedback. • A constraint-augmented RL problem setting • A learnable discriminator to detect violations of user preferences in an adversarial manner
  • 9. 9 MDP & Constrained MDP MDP(Markov Decision Process) Constrained MDP
  • 10. 10 Recommendation as MDP We can model the recommendation-feedback loop as an MDP, abstractly. Recommender Seeker 𝒔 𝟏 𝒂 𝟏 𝒙 𝟏 𝒓 𝟏? 𝒔 𝟐 𝒂 𝟐 𝒙 𝟐 𝒓 𝟐? 𝒔 𝟑 𝒂 𝟑 𝒙 𝟑 𝒓 𝟑? 𝒔 𝟒 𝒓 𝟒? 𝒂 𝟒 𝒙 𝟒
  • 11. 11 Remind of dataset UT-Zappos50K • A shoe dataset consisting of 50,025 shoe images. • Rich attribute data (shoes category(4), shoes subcategory(21), heel height(7), closure(18), gender(8) and toe style(17)) • Samples • Labels
  • 12. 12 Reward Constrained Recommender Model Feature Extractor (extract features of feedback, recommended items) Recommender (predict attributes, match, and recommend) Discriminator (prevent constraint violation)
  • 13. 13 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • 14. 14 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Cat : Shoes SubCat : Dress shoes HeelHei. : X Closure : … Attributes (at training time)
  • 15. 15 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. ResNet50 AttrNet Concat Visual Encoder Cat : Shoes SubCat : Dress shoes HeelHei. : X Closure : … Attributes (at training time)
  • 16. 16 Feature Extractor Visual Encoder = ResNet50[1] + AttrNet (pretrained) Textual Encoder = Embedding + LSTM + FC [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. ResNet50 AttrNet Concat Visual Encoder Cat : Shoes SubCat : Dress shoes HeelHei. : X Closure : … Attributes (at training time) Category(4) SubCategory(21) Heel Height(7) AttrNet … ResNet Features Attribute Net
  • 17. 17 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. Feature Representation
  • 18. 18 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. Categorical Sampling! FCs FCs … Policy 𝝅 𝜽 with multi-discrete action space Softmax Softmax FCs Softmax Category(4) SubCategory(21) Heel Height(7)… Feature Representation
  • 19. 19 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. ResNet50 AttrNet Visual Encoder Categorical Sampling! FCs FCs … Policy 𝝅 𝜽 with multi-discrete action space Softmax Softmax FCs Softmax Category(4) SubCategory(21) Heel Height(7)… Feature Representation
  • 20. 20 Recommender Policy 𝝅 𝜽 selects the closest to the sampled attribute values under Euclidean distance in the visual attribute space. ResNet50 AttrNet Visual Encoder Categorical Sampling! FCs FCs … Policy 𝝅 𝜽 with multi-discrete action space Softmax Softmax FCs Softmax Category(4) SubCategory(21) Heel Height(7)… Feature Representation Category = shoes SubCat = heel Heel.H = 3 inch. [1,0,0,0] [0,0,0,1,….] [0,0,1,0,….] Categorical Sampling Results … Euclidean distance Distance-based Matching
  • 21. 21 Reward function Reward : the visual and attribute similarity between the recommended and desired items. • It is desired that the recommended one becomes more similar to the desired one with more interaction • We want to minimize visual and attribute difference. • to ensure the scales of the two distances are similar • If the system can’t find the desired item before 50 iterations, the system will receive an extra reward -3 (as a penalty)
  • 22. Recommender Seeker 22 Why explicitly constraints need? RL algorithms which doesn’t consider constraints easily violate preference from past feedback, since it needs to explore new items for further improvement. • Success case • Failure case Recommender Seeker
  • 23. 23 Discriminator Discriminator 𝐶" outputs whether the recommended item violates the user comment. 𝑥!"# : I prefer leather. 𝑥! : I prefer high heel. … Feedback History
  • 24. 24 Collecting (non-)violation distribution One user session User session finish!
  • 25. 25 Collecting (non-)violation distribution One user session Non-violation pair
  • 26. 26 Collecting (non-)violation distribution One user session Violation pair
  • 27. 27 Collecting (non-)violation distribution One user session Non-violation pair
  • 28. 28 Discriminator A discriminator is defined as a constraint function. • Discriminator training • 𝐶" 𝒔, 𝒂 is induced to 1, if violation. • 𝐶" 𝒔, 𝒂 is induced to 0, if non-violation. violation pair non-violation pair
  • 29. 29 Collecting (non-)violation distribution Discriminator is updated after each user session. It can’t be pretrained. • To judge violations or not, we need sequential feedbacks. • But the dataset doesn’t have sequential feedback. (only user simulator) One user session User session finish!
  • 30. 30 Remind: Reward Constrained Recommender Model Feature Extractor (extract features of feedback, rec. items) Discriminator (prevent constraint violation) Recommender (predict attributes, match, and recommend) 𝑪 𝝓(𝒔, 𝒂) 𝝅 𝜽(𝒂|𝐬)
  • 31. 31 Recommendation as Constrained MDP Directly solving the constrained-optimization is difficult, Lagrange relaxation transforms the objective to dual problem. • Primal problem • Dual problem(refer to Appendix: Lagrange Relaxation) • Lagrangian function • Relaxed objective Lagrange multiplier
  • 32. 32 Recommendation as Constrained MDP The goal is to find a saddle point, can be achieved by alternating gradient descent/ascent approximately. Reward function with constraints penalizes the policy for violation. 𝜆 is also optimized to ensure the constraints. 1) If violations happen, 𝜆 will increase to penalize the policy. 2) If there is no violation, 𝜆 will decrease to give the policy more reward Reward function with Constraints
  • 33. 33 Model Training Reward Constrained Recommendation Process • Alternatively training the discriminator 𝐶& and the recommender 𝜋' : a projection operator, which keeps the stability as the parameters are updated within a trust region[1] : projects 𝜆 into the range [0, 𝜆()*] [1] Schulman, John, et al. "Trust region policy optimization." International conference on machine learning. 2015. One user session
  • 34. 34 Evaluation SR@K : Success Rate after K interactions NI : Number of user Interactions before success NV : Number of Violated attributes compared with the desired attributes of users 𝜆 increases at early stage (since violation ↑), 𝜆 becomes stable more. 𝜆 ≈ 0.04 is automatically learned discriminator weight.
  • 35. 35 Evaluation RL baseline : ignoring the constraints. RL + Naive constraints : Fixed the lagrange multiplier 𝜆 • All models are trained for 100,000 iterations (user sessions) • Seen : training data • Unseen : test data • Averaged over 100 sessions with standard error The learned constraint (discriminator) has better generalization.
  • 36. 36 Conclusion They propose Reward Constrained Recommendation(RCR), which sequentially incorporates constraints from previous feedback. • A constraint-augmented RL problem setting • A learnable discriminator to detect violations of user preferences in an adversarial manner The proposed method can be extended to other applications, such as, 1. vision-and-dialogue navigation 2. Interactive Recommendation with user’s prior information 3. Dialogue-based Recommendation
  • 38. 38 Appendix: Generated feedback Simulator only generates simple comments on the visual attribute difference between the candidate image and the desired image
  • 39. 39 Appendix: Hyperparameter setting In reinforcement learning, they use Adam as the optimizer. They set , • 𝛼 : threshold of constraints (refer to page 15) • 𝜆()* : projection boundary of 𝜆