SlideShare a Scribd company logo
1 of 18
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST
Kyungwoo Song, Mingi Ji, Sungrae Park, Il-Chul Moon
Department of Industrial and Systems Engineering
KAIST
1
Hierarchical Context enabled
Recurrent Neural Network for
Recommendation (HCRNN)
kyungwoo.song@gmail.com
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 2
Contents
• Motivation
• Related Work
• Methodology
• Experimental Setting
• Results
• Conclusion
• Reference
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 3
Motivation
Temporary context
for current
𝒕 = 𝟏 𝒕 = 𝟐 𝒕 = 𝟑 𝒕 = 𝟒 𝒕 = 𝟓 𝒕 = 𝟔 𝒕 = 𝟕 𝒕 = 𝟖
Action Action Musical Musical Action Action Action/Romance Action
Sub-sequence1 (Action) Sub-sequence2 (Musical) Sub-sequence3 (Action/Romance)
≈
≈
≈
≈
≈
≈
≈
≈
≈
Local context
for subsequence
Global context
for sequence
• The long user history contains multiple hierarchical context; global context, local context,
and temporary context.
• The users’ interest drift should be considered in the hierarchical context.
• If we consider the hierarchical context
• User’s primary interest: action movie.
• We can recommend an action movie at t = 8 rather than a romance movie.
• 1) How can we model the hierarchical context?
• 2) How can we model the interest drift based on the hierarchical context?
• 3) How can we model the long-term and short-term dependency?
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 4
Related work
• Session based Recommendation (GRU4REC)
• Sequential model with GRUs for the recommendation. This
model adopts a session parallel batch and a loss function such
as Cross-Entropy, TOP1, and BPR.
• Neural Attentive Recommendation Machine (NARM)
• NARM is based on GRU4REC with an attention to consider the
long-term dependency.
• Short-Term Attention/Memory Priority (STAMP)
• STAMP considers both current interest and general interest of
users. In particular, STAMP used an additional neural
network for the current input only to model the user’s
current interest
NARM can be improved
• If we consider the both long-term and short-term dependency
STAMP can be improved
• If we consider the structured interest drift modeling
• If we consider the interest drift with hierarchical context
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 5
Methodology (HCRNN overall)
• Global context (for sequence) :
𝜃, Mglobal
• Abstractive context
• 𝜃 : Topic proportion
• Mglobal : Topic memory
• Local context (for subsequence) : 𝑐𝑡
• Relatively abstractive context
• It is generated by global context
adaptively
• Temporary context (for current) : ℎ 𝑡
• Specific context
• It is generated by focusing on the
current input
1) How can we model the hierarchical context ?
• Hierarchical contexts should have a different level of context
• Separate the generation of local context and the temporary context
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 6
Methodology (HCRNN overall)
1) How can we model the hierarchical context ?
• Hierarchical contexts should have a different level of context
• Separate the generation of local context and the temporary context
< LSTM with peephole > < HCRNN >
𝑐𝑡 = 𝑓𝑡 ⊙ 𝑐𝑡−1 + 𝑖 𝑡 ⊙ 𝜎𝑐 𝑐𝑡
ℎ 𝑡 = 𝑜𝑡 ⊙ 𝜎ℎ(𝑐𝑡)
𝑐𝑡 = 1 − 𝐺𝑡
𝑐
⊙ 𝑐𝑡−1 + 𝐺𝑡
𝑐
⊙ 𝑐𝑡
ℎ 𝑡 = 𝑟𝑡 ⊙ ℎ 𝑡−1 𝑊ℎℎ + 𝑥 𝑡 𝑊𝑥ℎ + 𝑏ℎ
ℎ 𝑡 = 1 − 𝑧𝑡 ⊙ ℎ 𝑡−1 + 𝑧𝑡 ⊙ 𝜎ℎ(ℎ 𝑡)
No direct connection between 𝒄 𝒕 and 𝒉 𝒕 in HCRNN
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 7
Methodology (HCRNN-1)
• (11)-(12) : Topic proportion for sequence (Variational encoder)
• (13) : Attention weight (Which global context vector (𝑀𝑔𝑙𝑜𝑏𝑎𝑙
𝑘
) should be used for the
current local context
• If 𝜃 𝑘 is large, its corresponding global context (𝑀𝑔𝑙𝑜𝑏𝑎𝑙
𝑘
) vector is used with high
importance
• (14)-(16) : Generation of local context with local context gate 𝐺𝑡
𝑐
• (17)-(20) : Generation of temporary context ℎ 𝑡 (separation with local context)
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 8
Methodology (HCRNN-2)
• Interest drift assumption : “If the user’s local context (for sub-
sequence) and the current item are very different, the user’s
temporary interest drift occurs.”
• local context : 𝑐𝑡
• Current item : 𝑥𝑡
• 𝑥𝑡 ⊙ 𝑐𝑡 ↓ ⟹ 𝑟𝑡 ↓ ⟹ ℎ 𝑡 focus on the current input instead of ℎ 𝑡−1
2) How can we model the interest drift based on the hierarchical context?
• Interest drift assumption
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 9
Methodology (HCRNN-3)
2) How can we model the interest drift based on the hierarchical context?
• Interest drift assumption with interest drift gate
• Sigmoid function outputs a value between 0 and 1, the reset gate of
HCRNN-2 in Eq. 21 can have a value between 0 and 1 theoretically.
• However, the sigmoid function is not sharp ⟹ 𝑟𝑡 in Eq. 21 : 0.47 (± 0.03)
• ⟹ Additional gate to make ℎ 𝑡 focus on the current input
• 𝑟𝑡 ⊙ 𝐺𝑡
𝑑
in HCRNN-3 : 0.29 (± 0.021) (38% smaller than 𝑟𝑡 in HCRNN-2)
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 10
Methodology (HCRNN-3+Bi)
bi-channel attention
• 𝛼 𝑡
𝑐
: attention based on the local context
• Emphasizes belonging to the same sub-sequence with the current
• ⟹ Short-term dependency
• 𝛼 𝑡
ℎ
: attention based on the temporary context
• find the similar transition throughout the entire history
• ⟹ Long-term dependency
3) How can we model the long-term and short-term dependency?
• bi-channel attention
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 11
Experimental Setting
• We aim at modeling a long user history
• ⇒ we removed sequences whose length is
less than 10.
• We removed the items that exist only in the
test set
• We removed the items that appeared less than
50/50/25 times in three datasets respectively.
• Cross-validation by assigning 10% of the
randomly chosen train set as the validation set
• POP
• SPOP
• Item-KNN (RecSys-10)
• BPR-MF (UAI-09)
• GRU4REC (ICLR-16)
• LSTM4REC
• NARM (CIKM-17)
• STAMP (KDD-18)
Data preprocessing Baselines
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 12
Results (Quantitative Evaluation)
• HCRNN have significant performance improvements in all data and
metrics
• HCRNN-1 > Baselines (NARM, STAMP)
• ⇒ The need for hierarchical context modeling in recommendations
• HCRNN-3 > HCRNN-2, HCRNN-1
• ⇒ Interest drift assumption may be experimentally justifiable.
• HCRNN-3+Bi > HCRNN-3
• ⇒ bi-channel attention with hierarchical contexts may improve the
performance experimentally.
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 13
Results (Embedding and Context)
• The local context is generated by the global context memory (𝑴 𝒈𝒍𝒐𝒃𝒂𝒍),
and the temporary context is generated by the previous temporary context
and the current item embedding (𝒙 𝒕)
• The item embeddings are coherently organized as a cohesive cluster with
the same genre
• The global context memory covers most of the area that the item
embeddings are dispersed
< Visualization of 𝑴 𝒈𝒍𝒐𝒃𝒂𝒍 and 𝒙 𝒕 >
< Interpretation of 𝑴 𝒈𝒍𝒐𝒃𝒂𝒍 >
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 14
Results (Interest drift assumption)
• If the genre of the current input is different with previous items, 𝑟𝑡 ⊙
𝐺𝑡
𝑑
has a smaller value compared to the opposite situation.
√
√
𝒓 𝒕
𝐍𝐀𝐑𝐌
𝒓 𝒕
𝐇𝐂𝐑𝐍𝐍
𝑮 𝒕
𝒅
𝒓 𝒕
𝐍𝐀𝐑𝐌
𝒓 𝒕
𝐇𝐂𝐑𝐍𝐍
𝑮 𝒕
𝒅
(case 1)
(case 2)
𝑯𝒊𝒈𝒉
𝑳𝒐𝒘
Time
Gate heatmap for a user history as an example.
We mark “check” when the genre of item changes.
Average value of 𝑟𝑡
𝐻𝐶𝑅𝑁𝑁
⊙ 𝐺𝑡
𝑑
gate after appearing items
with similar genre consecutively.
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 15
Results (bi-channel attention)
• NARM attention weight, 𝛼 𝑡
𝑁𝐴𝑅𝑀
, cannot differentiate the attentions
on the local and the temporary contexts.
• The bi-channel attentions distinguishes the attentions
• 𝛼 𝑡
(𝑐)
focuses on the neighbor attention (short-term)
• 𝛼 𝑡
(ℎ)
reads out through the whole sequence (long-term)
( )
( )
( )
( )
(case 1)
(case 2)
Time
Attention heatmap for a user history.
Averaged attention weight over time difference. Δ𝑡
means a time difference between prediction time
step and the timestep of the previous user history.
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 16
Results (Case study)
• Attention weight
• 𝛼 𝑡
𝑐
focuses on recent history
• 𝛼 𝑡
ℎ
considers relatively far
history.
Attention, gate value in NARM and HCRNN, and
the change of context value in HCRNN overtime
• Gate
• 𝐺𝑡=17
(𝑑)
has a relatively small value
• This small value is caused by the
selection of items disaligned to the
previous sub-sequence at 𝑡 = 16.
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 17
Conclusion
• Proposes the HCRNN to model the hierarchical contexts and
interest drift assumption for the sequential recommendation
• 1) How can we model the hierarchical context?
• Hierarchical contexts should have a different level of context
• Separate the generation of local context and the temporary context
• 2) How can we model the interest drift based on the
hierarchical context?
• Interest drift assumption with interest drift gate
• 3) How can we model the long-term and short-term
dependency?
• bi-channel attention with hierarchical context
Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 18
Reference
• Liu, Q.; Zeng, Y.; Mokhosi, R.; and Zhang, H. 2018. STAMP: Short-Term Attention/Memory Priority Model
for Session-based Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining, 1831–1839. ACM.
• Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; and Ma, J. 2017. Neural attentive session-based
recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management, 1419–1428. ACM.
• Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; and Tikk, D. 2016. Session-based recommendations with
recurrent neural networks. International Conference on Learning Representations.
• Cho, K.; Van Merri¨enboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y.
2014. Learning phrase representations using RNN encoderdecoder for statistical machine translation.
arXiv preprint arXiv:1406.1078.
• Bahdanau, D.; Cho, K.; and Bengio, Y. 2015. Neural machine translation by jointly learning to align and
translate. International Conference on Learning Representations.
• Rakhlin, A.; Shamir, O.; and Sridharan, K. 2012. Making gradient descent optimal for strongly convex
stochastic optimization. International Conference on Machine Learning.
• Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt- Thieme, L. 2009. BPR: Bayesian personalized
ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial
intelligence, 452–461. AUAI Press.
• Sukhbaatar, S.; Weston, J.; Fergus, R.; et al. 2015. End-toend memory networks. In Advances in neural
information processing systems, 2440–2448.
• van der Maaten, L., and Hinton, G. 2008. Visualizing data using t-SNE. Journal of machine learning
research 9(Nov):2579–2605.
• Kingma, D. P., and Welling, M. 2014. Auto-encoding variational bayes. International Conference on
Learning Representations.

More Related Content

What's hot

Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
AtakanAral
 

What's hot (7)

Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
 
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
An Approach to Reduce Energy Consumption in Cloud data centers using Harmony ...
 
C1803052327
C1803052327C1803052327
C1803052327
 
Time and Reliability Optimization Bat Algorithm for Scheduling Workflow in Cloud
Time and Reliability Optimization Bat Algorithm for Scheduling Workflow in CloudTime and Reliability Optimization Bat Algorithm for Scheduling Workflow in Cloud
Time and Reliability Optimization Bat Algorithm for Scheduling Workflow in Cloud
 
Ieeepro techno solutions 2014 ieee java project - a hyper-heuristic scheduling
Ieeepro techno solutions   2014 ieee java project - a hyper-heuristic schedulingIeeepro techno solutions   2014 ieee java project - a hyper-heuristic scheduling
Ieeepro techno solutions 2014 ieee java project - a hyper-heuristic scheduling
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 

Similar to Hierarchical Context enabled Recurrent Neural Network for Recommendation

final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptx
shwetabhagat25
 

Similar to Hierarchical Context enabled Recurrent Neural Network for Recommendation (20)

NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Design of robust layout for dynamic plant layout layout problem 1
Design of robust layout for dynamic plant layout layout problem 1Design of robust layout for dynamic plant layout layout problem 1
Design of robust layout for dynamic plant layout layout problem 1
 
Traffic models and estimation
Traffic models and estimation Traffic models and estimation
Traffic models and estimation
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 
Camera Analytics System (Based on IEEE topic Camera Selection for adaptive hu...
Camera Analytics System (Based on IEEE topic Camera Selection for adaptive hu...Camera Analytics System (Based on IEEE topic Camera Selection for adaptive hu...
Camera Analytics System (Based on IEEE topic Camera Selection for adaptive hu...
 
Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan insight student conference v2
Gunjan insight student conference v2
 
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
HYDROTHERMAL COORDINATION FOR SHORT RANGE FIXED HEAD STATIONS USING FAST GENE...
 
ESDL Research Overview
ESDL Research OverviewESDL Research Overview
ESDL Research Overview
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performance
 
final_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptxfinal_project_1_2k21cse07.pptx
final_project_1_2k21cse07.pptx
 
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
[Seminar] hyunwook 0624
[Seminar] hyunwook 0624[Seminar] hyunwook 0624
[Seminar] hyunwook 0624
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
 

Recently uploaded

School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
Kamal Acharya
 
Paint shop management system project report.pdf
Paint shop management system project report.pdfPaint shop management system project report.pdf
Paint shop management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

internship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOTinternship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOT
 
Lect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptxLect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptx
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdf
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
E-Commerce Shopping for developing a shopping ecommerce site
E-Commerce Shopping for developing a shopping ecommerce siteE-Commerce Shopping for developing a shopping ecommerce site
E-Commerce Shopping for developing a shopping ecommerce site
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Paint shop management system project report.pdf
Paint shop management system project report.pdfPaint shop management system project report.pdf
Paint shop management system project report.pdf
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 

Hierarchical Context enabled Recurrent Neural Network for Recommendation

  • 1. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST Kyungwoo Song, Mingi Ji, Sungrae Park, Il-Chul Moon Department of Industrial and Systems Engineering KAIST 1 Hierarchical Context enabled Recurrent Neural Network for Recommendation (HCRNN) kyungwoo.song@gmail.com
  • 2. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 2 Contents • Motivation • Related Work • Methodology • Experimental Setting • Results • Conclusion • Reference
  • 3. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 3 Motivation Temporary context for current 𝒕 = 𝟏 𝒕 = 𝟐 𝒕 = 𝟑 𝒕 = 𝟒 𝒕 = 𝟓 𝒕 = 𝟔 𝒕 = 𝟕 𝒕 = 𝟖 Action Action Musical Musical Action Action Action/Romance Action Sub-sequence1 (Action) Sub-sequence2 (Musical) Sub-sequence3 (Action/Romance) ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ Local context for subsequence Global context for sequence • The long user history contains multiple hierarchical context; global context, local context, and temporary context. • The users’ interest drift should be considered in the hierarchical context. • If we consider the hierarchical context • User’s primary interest: action movie. • We can recommend an action movie at t = 8 rather than a romance movie. • 1) How can we model the hierarchical context? • 2) How can we model the interest drift based on the hierarchical context? • 3) How can we model the long-term and short-term dependency?
  • 4. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 4 Related work • Session based Recommendation (GRU4REC) • Sequential model with GRUs for the recommendation. This model adopts a session parallel batch and a loss function such as Cross-Entropy, TOP1, and BPR. • Neural Attentive Recommendation Machine (NARM) • NARM is based on GRU4REC with an attention to consider the long-term dependency. • Short-Term Attention/Memory Priority (STAMP) • STAMP considers both current interest and general interest of users. In particular, STAMP used an additional neural network for the current input only to model the user’s current interest NARM can be improved • If we consider the both long-term and short-term dependency STAMP can be improved • If we consider the structured interest drift modeling • If we consider the interest drift with hierarchical context
  • 5. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 5 Methodology (HCRNN overall) • Global context (for sequence) : 𝜃, Mglobal • Abstractive context • 𝜃 : Topic proportion • Mglobal : Topic memory • Local context (for subsequence) : 𝑐𝑡 • Relatively abstractive context • It is generated by global context adaptively • Temporary context (for current) : ℎ 𝑡 • Specific context • It is generated by focusing on the current input 1) How can we model the hierarchical context ? • Hierarchical contexts should have a different level of context • Separate the generation of local context and the temporary context
  • 6. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 6 Methodology (HCRNN overall) 1) How can we model the hierarchical context ? • Hierarchical contexts should have a different level of context • Separate the generation of local context and the temporary context < LSTM with peephole > < HCRNN > 𝑐𝑡 = 𝑓𝑡 ⊙ 𝑐𝑡−1 + 𝑖 𝑡 ⊙ 𝜎𝑐 𝑐𝑡 ℎ 𝑡 = 𝑜𝑡 ⊙ 𝜎ℎ(𝑐𝑡) 𝑐𝑡 = 1 − 𝐺𝑡 𝑐 ⊙ 𝑐𝑡−1 + 𝐺𝑡 𝑐 ⊙ 𝑐𝑡 ℎ 𝑡 = 𝑟𝑡 ⊙ ℎ 𝑡−1 𝑊ℎℎ + 𝑥 𝑡 𝑊𝑥ℎ + 𝑏ℎ ℎ 𝑡 = 1 − 𝑧𝑡 ⊙ ℎ 𝑡−1 + 𝑧𝑡 ⊙ 𝜎ℎ(ℎ 𝑡) No direct connection between 𝒄 𝒕 and 𝒉 𝒕 in HCRNN
  • 7. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 7 Methodology (HCRNN-1) • (11)-(12) : Topic proportion for sequence (Variational encoder) • (13) : Attention weight (Which global context vector (𝑀𝑔𝑙𝑜𝑏𝑎𝑙 𝑘 ) should be used for the current local context • If 𝜃 𝑘 is large, its corresponding global context (𝑀𝑔𝑙𝑜𝑏𝑎𝑙 𝑘 ) vector is used with high importance • (14)-(16) : Generation of local context with local context gate 𝐺𝑡 𝑐 • (17)-(20) : Generation of temporary context ℎ 𝑡 (separation with local context)
  • 8. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 8 Methodology (HCRNN-2) • Interest drift assumption : “If the user’s local context (for sub- sequence) and the current item are very different, the user’s temporary interest drift occurs.” • local context : 𝑐𝑡 • Current item : 𝑥𝑡 • 𝑥𝑡 ⊙ 𝑐𝑡 ↓ ⟹ 𝑟𝑡 ↓ ⟹ ℎ 𝑡 focus on the current input instead of ℎ 𝑡−1 2) How can we model the interest drift based on the hierarchical context? • Interest drift assumption
  • 9. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 9 Methodology (HCRNN-3) 2) How can we model the interest drift based on the hierarchical context? • Interest drift assumption with interest drift gate • Sigmoid function outputs a value between 0 and 1, the reset gate of HCRNN-2 in Eq. 21 can have a value between 0 and 1 theoretically. • However, the sigmoid function is not sharp ⟹ 𝑟𝑡 in Eq. 21 : 0.47 (± 0.03) • ⟹ Additional gate to make ℎ 𝑡 focus on the current input • 𝑟𝑡 ⊙ 𝐺𝑡 𝑑 in HCRNN-3 : 0.29 (± 0.021) (38% smaller than 𝑟𝑡 in HCRNN-2)
  • 10. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 10 Methodology (HCRNN-3+Bi) bi-channel attention • 𝛼 𝑡 𝑐 : attention based on the local context • Emphasizes belonging to the same sub-sequence with the current • ⟹ Short-term dependency • 𝛼 𝑡 ℎ : attention based on the temporary context • find the similar transition throughout the entire history • ⟹ Long-term dependency 3) How can we model the long-term and short-term dependency? • bi-channel attention
  • 11. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 11 Experimental Setting • We aim at modeling a long user history • ⇒ we removed sequences whose length is less than 10. • We removed the items that exist only in the test set • We removed the items that appeared less than 50/50/25 times in three datasets respectively. • Cross-validation by assigning 10% of the randomly chosen train set as the validation set • POP • SPOP • Item-KNN (RecSys-10) • BPR-MF (UAI-09) • GRU4REC (ICLR-16) • LSTM4REC • NARM (CIKM-17) • STAMP (KDD-18) Data preprocessing Baselines
  • 12. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 12 Results (Quantitative Evaluation) • HCRNN have significant performance improvements in all data and metrics • HCRNN-1 > Baselines (NARM, STAMP) • ⇒ The need for hierarchical context modeling in recommendations • HCRNN-3 > HCRNN-2, HCRNN-1 • ⇒ Interest drift assumption may be experimentally justifiable. • HCRNN-3+Bi > HCRNN-3 • ⇒ bi-channel attention with hierarchical contexts may improve the performance experimentally.
  • 13. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 13 Results (Embedding and Context) • The local context is generated by the global context memory (𝑴 𝒈𝒍𝒐𝒃𝒂𝒍), and the temporary context is generated by the previous temporary context and the current item embedding (𝒙 𝒕) • The item embeddings are coherently organized as a cohesive cluster with the same genre • The global context memory covers most of the area that the item embeddings are dispersed < Visualization of 𝑴 𝒈𝒍𝒐𝒃𝒂𝒍 and 𝒙 𝒕 > < Interpretation of 𝑴 𝒈𝒍𝒐𝒃𝒂𝒍 >
  • 14. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 14 Results (Interest drift assumption) • If the genre of the current input is different with previous items, 𝑟𝑡 ⊙ 𝐺𝑡 𝑑 has a smaller value compared to the opposite situation. √ √ 𝒓 𝒕 𝐍𝐀𝐑𝐌 𝒓 𝒕 𝐇𝐂𝐑𝐍𝐍 𝑮 𝒕 𝒅 𝒓 𝒕 𝐍𝐀𝐑𝐌 𝒓 𝒕 𝐇𝐂𝐑𝐍𝐍 𝑮 𝒕 𝒅 (case 1) (case 2) 𝑯𝒊𝒈𝒉 𝑳𝒐𝒘 Time Gate heatmap for a user history as an example. We mark “check” when the genre of item changes. Average value of 𝑟𝑡 𝐻𝐶𝑅𝑁𝑁 ⊙ 𝐺𝑡 𝑑 gate after appearing items with similar genre consecutively.
  • 15. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 15 Results (bi-channel attention) • NARM attention weight, 𝛼 𝑡 𝑁𝐴𝑅𝑀 , cannot differentiate the attentions on the local and the temporary contexts. • The bi-channel attentions distinguishes the attentions • 𝛼 𝑡 (𝑐) focuses on the neighbor attention (short-term) • 𝛼 𝑡 (ℎ) reads out through the whole sequence (long-term) ( ) ( ) ( ) ( ) (case 1) (case 2) Time Attention heatmap for a user history. Averaged attention weight over time difference. Δ𝑡 means a time difference between prediction time step and the timestep of the previous user history.
  • 16. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 16 Results (Case study) • Attention weight • 𝛼 𝑡 𝑐 focuses on recent history • 𝛼 𝑡 ℎ considers relatively far history. Attention, gate value in NARM and HCRNN, and the change of context value in HCRNN overtime • Gate • 𝐺𝑡=17 (𝑑) has a relatively small value • This small value is caused by the selection of items disaligned to the previous sub-sequence at 𝑡 = 16.
  • 17. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 17 Conclusion • Proposes the HCRNN to model the hierarchical contexts and interest drift assumption for the sequential recommendation • 1) How can we model the hierarchical context? • Hierarchical contexts should have a different level of context • Separate the generation of local context and the temporary context • 2) How can we model the interest drift based on the hierarchical context? • Interest drift assumption with interest drift gate • 3) How can we model the long-term and short-term dependency? • bi-channel attention with hierarchical context
  • 18. Copyright © 2018 by Kyungwoo Song, Dept. of Industrial and Systems Engineering, KAIST 18 Reference • Liu, Q.; Zeng, Y.; Mokhosi, R.; and Zhang, H. 2018. STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1831–1839. ACM. • Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; and Ma, J. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 1419–1428. ACM. • Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; and Tikk, D. 2016. Session-based recommendations with recurrent neural networks. International Conference on Learning Representations. • Cho, K.; Van Merri¨enboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078. • Bahdanau, D.; Cho, K.; and Bengio, Y. 2015. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations. • Rakhlin, A.; Shamir, O.; and Sridharan, K. 2012. Making gradient descent optimal for strongly convex stochastic optimization. International Conference on Machine Learning. • Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt- Thieme, L. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, 452–461. AUAI Press. • Sukhbaatar, S.; Weston, J.; Fergus, R.; et al. 2015. End-toend memory networks. In Advances in neural information processing systems, 2440–2448. • van der Maaten, L., and Hinton, G. 2008. Visualizing data using t-SNE. Journal of machine learning research 9(Nov):2579–2605. • Kingma, D. P., and Welling, M. 2014. Auto-encoding variational bayes. International Conference on Learning Representations.