Review: [CIKM'21]UltraGCN.pptx

UltraGCN: Ultra Simplification of
Graph Convolutional Networks for
Recommendation
CIKM’21, Kelong Mao(Huawei Noah’s Ark Lab) et al.
POSTECH DI Lab
Presenter: Changsoo Kwak
2021.11.23
1

Preliminary: Previous GNN model
2
Message passing in GCN[1]
𝐸(𝑙+1) = 𝜎(𝐷−
1
2𝐴𝐷−
1
2𝐸 𝑙 𝑊 𝑙 )
Message passing in LightGCN[2]
𝐸(𝑙+1) = (𝐷−
1
2𝐴𝐷−
1
2)𝐸 𝑙
[1] Semi-Supervised Classification with Graph Convolutional Networks(ICLR’17)
[2] LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation(SIGIR’20)
Removing feature transformations & non-linear activations
Predict using dot product of user/item
representation from last layer
𝑁 𝑖 : Neighbor of node 𝑖
𝑒𝑖
(𝑙)
: Node 𝑖’s embedding at 𝑙th layer
𝐸(𝑙)
: Embedding matrix at 𝑙th layer
𝑊 𝑙
: Trainable weight matrix at 𝑙th layer
𝐴 : Adjacency matrix with self-connection
𝐷 : Diagonal node degree matrix with self connection
𝑒𝑢
(𝑙+1)
∙ 𝑒𝑖
𝑙+1
= 𝛼𝑢𝑖 𝑒𝑢
𝑙
∙ 𝑒𝑖
𝑙
+
𝑘∈𝑁 𝑢
𝛼𝑖𝑘 𝑒𝑖
𝑙
∙ 𝑒𝑘
𝑙
+
𝑣∈𝑁 𝑖
𝛼𝑢𝑣 𝑒𝑢
𝑙
∙ 𝑒𝑣
𝑙
+
𝑣∈𝑁 𝑖 𝑘∈𝑁(𝑢)
𝛼𝑘𝑣 𝑒𝑘
𝑙
∙ 𝑒𝑣
𝑙

Preliminary: Dot product in LightGCN
3
𝑒𝑢
(𝑙+1)
∙ 𝑒𝑖
𝑙+1
= 𝛼𝑢𝑖 𝑒𝑢
𝑙
∙ 𝑒𝑖
𝑙
+
𝑘∈𝑁 𝑢
𝛼𝑖𝑘 𝑒𝑖
𝑙
∙ 𝑒𝑘
𝑙
+
𝑣∈𝑁 𝑖
𝛼𝑢𝑣 𝑒𝑢
𝑙
∙ 𝑒𝑣
𝑙
+
𝑣∈𝑁 𝑖 𝑘∈𝑁(𝑢)
𝛼𝑘𝑣 𝑒𝑘
𝑙
∙ 𝑒𝑣
𝑙
similarity between target user – target item
similarity between target item – interacted item
similarity between target user – interacted user
similarity between interacted user – interacted item
Construction representation in LightGCN
𝑒𝑢
(𝑙+1)
=
1
𝑑𝑢 + 1
𝑒𝑢
(𝑙)
+
𝑘∈𝑁 𝑢
1
𝑑𝑢 + 1 𝑑𝑘 + 1
𝑒𝑘
(𝑙)
𝑒𝑖
(𝑙+1)
=
1
𝑑𝑖 + 1
𝑒𝑖
(𝑙)
+
𝑣∈𝑁 𝑖
1
𝑑𝑣 + 1 𝑑𝑖 + 1
𝑒𝑣
(𝑙)
𝛼𝑢𝑖 =
1
(𝑑𝑢 + 1)(𝑑𝑖 + 1)
𝛼𝑖𝑘 =
1
𝑑𝑢 + 1 𝑑𝑘 + 1(𝑑𝑖 + 1)
𝛼𝑢𝑣 =
1
𝑑𝑣 + 1 𝑑𝑖 + 1(𝑑𝑢 + 1)
𝛼𝑘𝑣 =
1
𝑑𝑢 + 1 𝑑𝑘 + 1 𝑑𝑣 + 1 𝑑𝑖 + 1
Contributions of various
relationships are different

Limitation of LightGCN
4
Limitation 1
Weight 𝛼𝑖𝑘, 𝛼𝑢𝑣 is not reasonable due to factors of users/items are asymmetric
𝛼𝑖𝑘 =
1
𝑑𝑢 + 1 𝑑𝑘 + 1(𝑑𝑖 + 1)
Limitation 2
Message passing combine various relationships via stacked layers
Due to stacking problematic information, affect negative impact on result
weight for modeling item-item relationship Different weight on target item 𝑖 & interacted item 𝑘
Need to adjust weight(importance) of various relationship

Limitation of LightGCN
5
Limitation 3
Stacking more layers → Capture higher-order collaborative signals
LightGCN performs best with 2~3 layers → Over-smoothing problem may occur
From Theorem 1 in GCNⅡ[1], infinite powers of message passing can derived
lim
𝑙→∞
(𝐷−
1
2𝐴𝐷−
1
2)𝑖,𝑗
𝑙
=
𝑑𝑖 + 1 𝑑𝑗 + 1
2𝑚 + 𝑛
[1] Simple and Deep Graph Convolutional Networks (ICML’20)
Motivation: Removing explicit message passing!

Proposed method: UltraGCN
6
Figure 1: UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation (CIKM’21)
Remove explicit message passing
Directly approximate such convergence state
𝑒𝑖 = lim
𝑛→∞
𝑒𝑖
(𝑙+1)
= lim
𝑛→∞
𝑒𝑖
(𝑙)
Link prediction in graph

Learning on user-item graph
7
Idea: 𝑒𝑢 = lim
𝑛→∞
𝑒𝑢
(𝑙+1)
= lim
𝑛→∞
𝑒𝑢
(𝑙)
Propagation in LightGCN: 𝑒𝑢 =
1
𝑑𝑢 + 1
𝑒𝑢 +
𝑖∈𝑁 𝑢
1
𝑑𝑢 + 1 𝑑𝑖 + 1
𝑒𝑖
𝑒𝑢 =
𝑖∈𝑁 𝑢
𝛽𝑢,𝑖𝑒𝑖 , 𝛽𝑢,𝑖 =
1
𝑑𝑢
𝑑𝑢 + 1
𝑑𝑖 + 1
Transposing
Objective: Minimize difference of both sides
𝐿𝐶 = −
𝑢,𝑖 ∈𝑁+
𝛽𝑢,𝑖 log 𝜎 𝑒𝑢
⊺
𝑒𝑖 −
𝑢,𝑗 ∈𝑁−
𝛽𝑢,𝑗 log 𝜎 −𝑒𝑢
⊺
𝑒𝑗
𝑁+
: Set of positive pairs
𝑁−
: Set of randomly sampled negative pairs

Learning on user-item graph
8
Typical prediction on recommender system → Link prediction on graph
Possible loss: Pairwise BPR vs Pointwise BCE
𝐿𝑂 = −
𝑢,𝑖 ∈𝑁+
log 𝜎 𝑒𝑢
⊺ 𝑒𝑖 −
𝑢,𝑗 ∈𝑁−
log 𝜎 −𝑒𝑢
⊺ 𝑒𝑗
𝐿 = 𝐿𝑂 + 𝜆𝐿𝐶
Above loss depends on user-item graphs(UltraGCNBase)

Learning on item-item graph
9
Limitation 2: Need to adjust weight(importance) of various relationship
UltraGCN does not use explicit message passing → Can adjust weight on various relationship flexibly
Item-Item co-occurrence graph is useful for recommendation[1]
1. Build item-item co-occurrence graph 𝐺 ∈ ℝ 𝐼 ×|𝐼| = 𝐴⊺𝐴
2. Do the same thing for approximate infinite state as similar to deriving 𝛽𝑢,𝑖, with item-item co-occurrence graph 𝐺
𝑒𝑖 =
𝑗∈𝑁𝐺(𝑖)
𝜔𝑖,𝑗𝑒𝑗, where 𝜔𝑖,𝑗 =
𝐺𝑖,𝑗
𝑔𝑖 − 𝐺𝑖,𝑖
𝑔𝑖
𝑔𝑗
, 𝑔𝑖 =
𝑘
𝐺𝑖,𝑘
[1]: M2GRL: A Multi-task Multi-view Graph Representation Learning Framework for Web-scale Recommender Systems (KDD’20 ads track, oral)
Rather than using all 𝑗 ∈ 𝑁𝐺(𝑖), select top-K most similar items 𝑆(𝑖) based on 𝜔𝑖,𝑗 for training

Learning on item-item graph & Final loss
10
On the item-item graph, proper representation of item 𝑖
𝑒𝑖 =
𝑗∈𝑆(𝑖)
𝜔𝑖,𝑗𝑒𝑗
For positive pair 𝑢, 𝑖 ∈ 𝑁+, BCE loss with infinite state of 𝑒𝑖
𝐿𝐼 =
𝑢,𝑖 ∈𝑁+ 𝑗∈𝑆(𝑖)
𝜔𝑖,𝑗 log 𝜎 𝑒𝑢
⊺
𝑒𝑗 + 𝑛𝑒𝑔. 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
Final loss
𝐿 = 𝐿𝑂 + 𝜆𝐿𝐶 + 𝛾𝐿𝐼
𝐿𝐶, 𝐿𝐼
𝐿𝑂

Experiments
12
Epoch needed for achieve best performance

Experiments – Ablation study
13
Checklists
1. Are each part of UltraGCN effective?
2. Training user-item pair on item-item co-occurrence graph is better than training item-item pair?
3. Why not use user-user co-occurrence graph?
𝐿𝐼
′
=
𝑢,𝑖 ∈𝑁+ 𝑗∈𝑆(𝑖)
𝜔𝑖,𝑗 log 𝜎 𝑒𝑖
⊺
𝑒𝑗

Experiments – Ablation study
14
Checklists
1. Are each part of UltraGCN effective?
2. Training user-item pair on item-item co-occurrence graph is better than training item-item pair?
3. Why not use user-user co-occurrence graph?

Review: [CIKM'21]UltraGCN.pptx

Recommended

Recommended

More Related Content

Similar to Review: [CIKM'21]UltraGCN.pptx

Similar to Review: [CIKM'21]UltraGCN.pptx (20)

Recently uploaded

Recently uploaded (20)

Review: [CIKM'21]UltraGCN.pptx

Editor's Notes