SlideShare a Scribd company logo
“Deep Reinforcement Learning
based Recommendation with
Explicit User-Item Interactions
Modeling“
by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou
Zhang
Presented by Kishor Datta Gupta
Problem
• A Recommender System refers to
a system that is capable of
predicting the future preference of
a set of items for a user, and
recommend the top items..
• How to build an effective
recommender system?
Recommender
System
Analyzed
Content-based collaborative filtering
Matrix factorization based methods
Logistic regression
Factorization machines and its variants
Deep learning models
Multi-armed bandits
Problem in
Existing systems
• They consider the
recommendation procedure as
a static process, i.e., they
assume the underlying user’s
preference keeps static and
they aim to learn the user’s
preference as precise as
possible.
• They are learned to maximize
the immediate rewards of
recommendations, but ignore
the long-term benefits that the
recommendations can make
Analysis on sequential patterns on user’s behavior in
MovieLens and Yahoo!Music datasets
Proposed Solution
• A deep reinforcement learning based recommendation
framework DRR. Unlike the conventional studies, DRR
adopts an “Actor-Critic” structure and treats the
recommendation as a sequential decision making
process, which takes both the immediate and long-
term rewards into consideration
Deep RL based Recommendation (DRR) Framework
Deep RL based Recommendation (DRR) Framework
The Actor network:
The user state, denoted by the embeddings of
her n latest positively interacted items, is
regarded as the input. Then the embeddings are
fed into a state representation module (which will
be introduced in detail later) to produce a
summarized representations for the user.
The top ranked item (w.r.t. the ranking scores) is
recommended to the user.
used ε-greedy exploration technique.
Deep RL based Recommendation (DRR) Framework
The Critic network:
According to the Q-value, the
parameters of the Actor
network are updated in the
direction of improving the
performance of action a Based
on the deterministic policy
gradient theorem
Critic network is updated
accordingly by the temporal-
difference learning approach ,
i.e., minimizing the mean
squared error
Deep RL based Recommendation (DRR) Framework
State Representation Module:
Modeling the feature
interactions explicitly can boost
the performance
DRR-p.
DRR-u
DRR-Ave
Deep RL based Recommendation (DRR) Framework
DRR-p.
Product based neural network for the state representation module
utilizes a product operator to capture the pairwise local dependency between
items.
Deep RL based Recommendation (DRR) Framework
DRR-u.
In DRR-u, we can see that the user embedding is also incorporated. In addition to
the local dependency between items, the pairwise interactions of user-item are
also taken into account.
Deep RL based Recommendation (DRR) Framework
DRR-Ave.
the embeddings of items are first transformed by a weighted average pooling
layer. Then, the resulting vector is leveraged to model the interactions with the
input user. Finally, the embedding of the user, the interaction vector, and the
average pooling result of items are concatenate into a vector to denote the state
representation.
Deep RL based Recommendation (DRR) Training
DRR utilizes the users’ interaction history with the recommender agent as training data.
During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the
user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1.
According to the feedback, the recommender updates its recommendation policy.
The training procedure mainly includes two phases, i.e., transition generation and model updating.
• For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an
actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can
be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the
transition(st,at,rt,st+1)into the replay bufferD.
• In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling
technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’
parameters with the soft replace strategy.
Experiments
Dataset
•Movielens 100k
•Movielens 1m
•Yahoo Music
•Jester
Comparison Method
•Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number
of positive ratings from current available items to the users at each timestep.
•PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements.
•SVD++mixes strengths of the latent model as well as the neighborhood model
•DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies.
Reward Function
•in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j
comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore,
the reward function can be defined as follows: R(s,a) =rate i,j/10
Results
Normalized Discounted Cumulative Gain (NDCG)
Claims
both the immediate and long-term
rewards into account. Incorporated and
three instantiation structures are
designed, which can explicitly model the
interactions between users and items.
Extensive experiments on four real-world
datasets demonstrate the superiority of
the proposed DRR method over state-of-
the-art competitors
My thaughts
A neural network to convert user-
item relation into a state is not clear
to me.
Authors didn’t consider cold-start
problem.
Questions ?

More Related Content

What's hot

Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
Yves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
Olivier Jeunen
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Activity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedActivity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn Feed
Bodla Kumar
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
Justin Basilico
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
Rohan Agrawal
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
Sudeep Das, Ph.D.
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
Faisal Siddiqi
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
Erik Bernhardsson
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Muhammad Iqbal Tawakal
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
WithTheBest
 
Fair Recommender Systems
Fair Recommender Systems Fair Recommender Systems
Fair Recommender Systems
Sharmistha Chatterjee
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
Justin Basilico
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 

What's hot (20)

Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Activity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedActivity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn Feed
 
Recent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix Perspective
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Fair Recommender Systems
Fair Recommender Systems Fair Recommender Systems
Fair Recommender Systems
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 

Similar to Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
Milind Gokhale
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
Shrutika Oswal
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
nextlib
 
B1802021823
B1802021823B1802021823
B1802021823
IOSR Journals
 
Summer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversitySummer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar University
Rishabh Misra
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
LokeshKumarReddy8
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
Crossing Minds
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
YONG ZHENG
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
Harivamshi D
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
Aravindharamanan S
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
IRJET Journal
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
cstalks
 
Recommender system
Recommender systemRecommender system
Recommender system
Saiguru P.v
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
Ding Li
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learning
Institute of Contemporary Sciences
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
Giuseppe Ricci
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
Changsung Moon
 

Similar to Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling (20)

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
B1802021823
B1802021823B1802021823
B1802021823
 
Summer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar UniversitySummer internship 2014 report by Rishabh Misra, Thapar University
Summer internship 2014 report by Rishabh Misra, Thapar University
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learning
 
PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.PhD Consortium ADBIS presetation.
PhD Consortium ADBIS presetation.
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 

More from Kishor Datta Gupta

GAN introduction.pptx
GAN introduction.pptxGAN introduction.pptx
GAN introduction.pptx
Kishor Datta Gupta
 
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Kishor Datta Gupta
 
A safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable dataA safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable data
Kishor Datta Gupta
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
Kishor Datta Gupta
 
Who is responsible for adversarial defense
Who is responsible for adversarial defenseWho is responsible for adversarial defense
Who is responsible for adversarial defense
Kishor Datta Gupta
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Kishor Datta Gupta
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
Kishor Datta Gupta
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Kishor Datta Gupta
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
Kishor Datta Gupta
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
Kishor Datta Gupta
 
Cyber intrusion
Cyber intrusionCyber intrusion
Cyber intrusion
Kishor Datta Gupta
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
Kishor Datta Gupta
 
Different representation space for MNIST digit
Different representation space for MNIST digitDifferent representation space for MNIST digit
Different representation space for MNIST digit
Kishor Datta Gupta
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
Kishor Datta Gupta
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Kishor Datta Gupta
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
Kishor Datta Gupta
 
Clustering report
Clustering reportClustering report
Clustering report
Kishor Datta Gupta
 
Basic digital image concept
Basic digital image conceptBasic digital image concept
Basic digital image concept
Kishor Datta Gupta
 
An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)
Kishor Datta Gupta
 
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Kishor Datta Gupta
 

More from Kishor Datta Gupta (20)

GAN introduction.pptx
GAN introduction.pptxGAN introduction.pptx
GAN introduction.pptx
 
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
 
A safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable dataA safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable data
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
Who is responsible for adversarial defense
Who is responsible for adversarial defenseWho is responsible for adversarial defense
Who is responsible for adversarial defense
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Cyber intrusion
Cyber intrusionCyber intrusion
Cyber intrusion
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
 
Different representation space for MNIST digit
Different representation space for MNIST digitDifferent representation space for MNIST digit
Different representation space for MNIST digit
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
 
Clustering report
Clustering reportClustering report
Clustering report
 
Basic digital image concept
Basic digital image conceptBasic digital image concept
Basic digital image concept
 
An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)
 
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
Hybrid pow-pos-based-system against majority attack-in-cryptocurrency system ...
 

Recently uploaded

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Deep Reinforcement Learning based Recommendation with Explicit User-ItemInteractions Modeling

  • 1. “Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling“ by Feng Liu∗, Ruiming Tang†, Xutao Li∗, Weinan Zhang‡Yunming Ye∗, Haokun Chen‡, Huifeng Guo†and Yuzhou Zhang Presented by Kishor Datta Gupta
  • 2. Problem • A Recommender System refers to a system that is capable of predicting the future preference of a set of items for a user, and recommend the top items.. • How to build an effective recommender system?
  • 3. Recommender System Analyzed Content-based collaborative filtering Matrix factorization based methods Logistic regression Factorization machines and its variants Deep learning models Multi-armed bandits
  • 4. Problem in Existing systems • They consider the recommendation procedure as a static process, i.e., they assume the underlying user’s preference keeps static and they aim to learn the user’s preference as precise as possible. • They are learned to maximize the immediate rewards of recommendations, but ignore the long-term benefits that the recommendations can make Analysis on sequential patterns on user’s behavior in MovieLens and Yahoo!Music datasets
  • 5. Proposed Solution • A deep reinforcement learning based recommendation framework DRR. Unlike the conventional studies, DRR adopts an “Actor-Critic” structure and treats the recommendation as a sequential decision making process, which takes both the immediate and long- term rewards into consideration
  • 6. Deep RL based Recommendation (DRR) Framework
  • 7. Deep RL based Recommendation (DRR) Framework The Actor network: The user state, denoted by the embeddings of her n latest positively interacted items, is regarded as the input. Then the embeddings are fed into a state representation module (which will be introduced in detail later) to produce a summarized representations for the user. The top ranked item (w.r.t. the ranking scores) is recommended to the user. used ε-greedy exploration technique.
  • 8. Deep RL based Recommendation (DRR) Framework The Critic network: According to the Q-value, the parameters of the Actor network are updated in the direction of improving the performance of action a Based on the deterministic policy gradient theorem Critic network is updated accordingly by the temporal- difference learning approach , i.e., minimizing the mean squared error
  • 9. Deep RL based Recommendation (DRR) Framework State Representation Module: Modeling the feature interactions explicitly can boost the performance DRR-p. DRR-u DRR-Ave
  • 10. Deep RL based Recommendation (DRR) Framework DRR-p. Product based neural network for the state representation module utilizes a product operator to capture the pairwise local dependency between items.
  • 11. Deep RL based Recommendation (DRR) Framework DRR-u. In DRR-u, we can see that the user embedding is also incorporated. In addition to the local dependency between items, the pairwise interactions of user-item are also taken into account.
  • 12. Deep RL based Recommendation (DRR) Framework DRR-Ave. the embeddings of items are first transformed by a weighted average pooling layer. Then, the resulting vector is leveraged to model the interactions with the input user. Finally, the embedding of the user, the interaction vector, and the average pooling result of items are concatenate into a vector to denote the state representation.
  • 13. Deep RL based Recommendation (DRR) Training DRR utilizes the users’ interaction history with the recommender agent as training data. During the procedure, the recommender takes an action at following the current recommendation policy πθ(st) after observing the user (environment) state st, then it obtains the feedback (reward)rt from the user, and the user state is updated to st+1. According to the feedback, the recommender updates its recommendation policy. The training procedure mainly includes two phases, i.e., transition generation and model updating. • For the first stage, the recommender observes the current statest that is calculated by the proposed state representation module, then generates an actionat=πθ(st) according to the current policy πθ with ε-greedy exploration, and recommends an itemit according to the action. Subsequently, the reward rt can be calculated based on the feedback of the user to the recommended item it, and the user state is updated Finally, the recommender agent stores the transition(st,at,rt,st+1)into the replay bufferD. • In the second stage, the model updating, the recommender samples a minibatch of N transitions with widely used prioritized experience replay sampling technique. Then, the recommender updates the parameters of the Actor network and Critic network. Finally, the recommender updates the target networks’ parameters with the soft replace strategy.
  • 14. Experiments Dataset •Movielens 100k •Movielens 1m •Yahoo Music •Jester Comparison Method •Popularity recommends the most popular item, i.e., the item with the highest average rating or the items with largest number of positive ratings from current available items to the users at each timestep. •PMF makes a matrix decomposition as SVD, while it only takes into account the non zero elements. •SVD++mixes strengths of the latent model as well as the neighborhood model •DRR-n simply utilizes the concatenation of the item embeddings to represent user state, which is widelyused in previous studies. Reward Function •in timestep t, the recommender agent recommends an item j to user i,(denoted as action a in states), and the rating rate i,j comes from the interaction logs if user I actually rates item j, or from a predicted value by the simulator otherwise. Therefore, the reward function can be defined as follows: R(s,a) =rate i,j/10
  • 16. Claims both the immediate and long-term rewards into account. Incorporated and three instantiation structures are designed, which can explicitly model the interactions between users and items. Extensive experiments on four real-world datasets demonstrate the superiority of the proposed DRR method over state-of- the-art competitors
  • 17. My thaughts A neural network to convert user- item relation into a state is not clear to me. Authors didn’t consider cold-start problem.