Job Scheduling on the Grid Environment using Max-Min Firefly AlgorithmEditor IJCATR
Similar to NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer", CIKM 2021 (20)
SQL Database Design For Developers at php[tek] 2024
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer", CIKM 2021
1. LAB SEMINAR
Nguyen Thanh Sang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: sang.ngt99@gmail.com
Continuous-Time Sequential Recommendation with
Temporal Graph Collaborative Transformer
--- Fan, Z., Liu, Z., Zhang, J., Xiong, Y., Zheng, L., & Yu, P. S ---
2023-06-15
3. 2
Introduction
Recommender system has become essential in providing
personalized information filtering services in a variety of
applications.
Learns the user and item embeddings from historical records on
the user-item interactions.
Current research works leverage historical time-ordered item
purchasing sequences to predict future items for users
(sequential recommendation (SR) problem).
4. 3
Problems
+ Existing works only leverage the sequential patterns to model
the item transitions within sequences
still insufficient to yield satisfactory results.
ignore the crucial temporal collaborative signals.
+ Incorporating temporal collaborative signals in SR is rather
challenging.
+ Current models capture the sequential pattern based on the
transition of items within sequences
lacking the mechanism to model the collaborative signals
across sequences.
+ Hard to express the temporal effects of collaborative signals.
5. 4
Contributions
+ Propose a new model Temporal Graph Sequential Recommender (TGSRec):
(1) the Temporal Collaborative Transformer (TCT) layer
explicitly model collaborative signals in sequences and express temporal correlations of items in
sequences
(2) graph information propagation: devised upon ContinuousTime Bipartite Graph (CTBG).
propagate temporal collaborative information learned around each node to surrounding neighbors over
CTBG.
+ Use temporal embeddings of nodes for recommendation: dynamic and inferred at specified timestamps.
7. 6
Embedding Layer
+ Long-Term User/Item Embeddings: necessary for long-term collaborative signals representation.
+ Continuous-Time Embedding: encoding behaves as a function that maps those scalar timestamps
into vector.
- A kernel value of the time embeddings of 𝑡1 and 𝑡2:
- Based on Bochner’s Theorem, the temporal embedding:
temporal kernel
where 𝝎 = [߱1, . . . , ߱𝑑𝑇
] are learnable and 𝑑𝑇 is the dimension.
8. 7
Temporal Collaborative Transformer
+ Each TCT layer: the combination of long term node embeddings and time embeddings. The query
input information at the 𝑙-th layer for user 𝑢 at time 𝑡 is:
+ Randomly sample 𝑆 different interactions of 𝑢 before time 𝑡 as
+ The input information at the 𝑙-th layer for each (𝑖, 𝑡𝑠) pair is:
1. Information Construction
9. 8
Temporal Collaborative Transformer
+ After constructing the information, we propagate the information of sampled neighbors N𝑢 (𝑡) to infer
the temporal embeddings.
unify the sequential patterns with temporal collaborative signals.
represents the impact of a historical interaction (u, i, 𝑡𝑠) to the temporal inference of 𝑢 at time 𝑡.
2. Information Propagation
the importance of an interaction (u, i, 𝑡𝑠)
10. 9
Temporal Collaborative Transformer
+ Measure the weights ߨ𝑡
𝑢(𝑖, 𝑡𝑠), which considers both neighboring interactions and the temporal
information on edges.
a better mechanism to capture temporal collaborative signals than self-attention mechanism that
only models item-item correlations.
+ Normalize the attention weights:
+ For simplicity and without ambiguity:
3. Temporal Collaborative Attention
11. 10
Temporal Collaborative Transformer
+ The final step of a TCT layer is to aggregate the query information:
4. Information Aggregation
the temporal embedding of 𝑢 at 𝑡 on 𝑙-th layer
12. 11
Temporal Collaborative Transformer
+ Though we only present the TCT layer from the user query perspective, it is analogous if the query is
an item at a specific time.
+ We only need to alternate the user query information to the item query information, and change the
neighbor information in Eq. (4) and Eq. (5) accordingly as user-time pairs.
+ Then, we can make an inference of the temporal embedding of item 𝑖 at time 𝑡 as ࢋ𝑖
(𝑙)
(t), which is
sent to the next layer.
5. Generalization to items
13. 12
Model Prediction
+ TGSRec model consists of 𝐿 TCT layers.
+ For each test triplet (𝑢,𝑖, 𝑡), it yields temporal embeddings for both 𝑢 and 𝑖 at 𝑡 on the last TCT layer.
=> generalize and infer user/item embeddings at any timestamp, thus making multiple steps
recommendation feasible while existing work only predicts next item.
the score to recommend 𝑖 for 𝑢 at time 𝑡.
15. 14
Experiments
Research questions
+ RQ1: Does TGSRec yield better recommendation?
+ RQ2: How do different hyper-parameters (e.g., number of neighbors 𝑆, etc.) affect the
performance of TGSRec?
+ RQ3: How do different modules (e.g., temporal collaborative attention, etc.) affect the
performance of TGSRec?
+ RQ4: Can TGSRec effectively unify sequential patterns and temporal collaborative signals?
(Reveal temporal correlations)
16. 15
Experiments
Datasets
+ The Amazon datasets are collected from different
domains (Amazon website, May 1996 - July 2014).
+ The Movie Lens dataset( September 19th, 1997
through April 22nd, 1998).
17. 16
Experiment results
Baselines
+ TGSRec consistently and significantly outperforms all baselines in all datasets.
+ The transformer-based SR methods consistently outperform all other types of baselines
demonstrates the effectiveness of using transformer structure to encode sequence.
18. 17
Experiment results
Parameter Sensitivity
+ The number of layers:
- 𝐿 = 0 => unable to infer temporal embeddings.
- 𝐿 = 1, it makes temporal inference, but without propagation to
the next layer. => worse
- 𝐿 = 2: make temporal inference and capture high-order signals
=> alleviates the data sparsity problem.
+ Embedding size:
- Performance increases as the embedding size enlarges.
- The embedding size is too large => performance drops.
+ Number of neighbors:
- TGSRec has performance gains on most datasets as the number
of neighbors grows.
19. 18
Experiment results
Ablation Study
+ Temporal collaborative attention:
- Substituting collaborative attention with a mean pooling layer severely
spoils the performance.
- Encoding sequential patterns by considering item transitions is important.
- The advantage of temporal collaborative attention in encoding
sequences.
+ Continuous-time embedding:
- TGSRec has the ability to encode sequential patterns.
- even a fixed ߱ to learn the time embedding can significantly outperform
the position embedding.
+ Loss function:
- BCE loss performs inferior to BPR loss, except for the ML100K dataset.
- Because BPR loss is optimized for ranking while BCE loss is designed for
binary classification.
20. 19
Experiment results
Temporal Correlations
+ The attention weights for items are dynamic at different timestamps
indicates the temporal inference characteristics of TGSRec.
+ The time increments can be arbitrary values => verifies its continuity.
+ The top predicted items from SASRec are also recommended by TGSRec, though in lower ranks.
TGSRec can unify sequential patterns and temporal collaborative signals.
21. 20
Conclusions
• A new SR model, TGSRec, to unify sequential patterns and temporal collaborative signals.
• The TCT layer infers temporal embeddings of nodes. It samples neighbors and learns attention
weights to aggregate both node embeddings and time vectors.
• In this way, a TCT layer is able to encode both sequential patterns and collaborative signals, as
well as reveal temporal effects.
• TGSRec significantly outperforms existing transformer-based sequential recommendation
models.
• TGSRec is a better framework to solve the SR problem with temporal information.