Graph neural network #2-2 (heterogeneous graph transformer)

Graph Neural Network
<2-2부> Recommendation - Heterogeneous

Heterogeneous Graph Neural Network Researches
해당 논문은 2018년에 발표된 논문으로 GCN 이 여러가지 Task 에서 State-of-the-art 를 달성하고 Graph
Neural Network 가 주목받는 시점에 연구된 논문으로, 실무에서 Graph Neural Network를 사용하기 위해서는
서로 다른 Node 와 Link 간의 조합에 대한 해석이 필요하지만, 기존의 연구는 동일한 Node 와 Link 로 구성된
Graph 로 한정하여 연구가 진행되었다는 한계를 극복하기 위한 연구들이 진행됨.
Heterogeneous Graph
Neural Network (‘19)
Heterogeneous Graph
Attention Network (‘19)
Heterogeneous
Graph
Transformer(‘20)
GraphSage:
Representation
Learning on Large
Graphs(‘18)
Link
SEMI-SUPERVISED
CLASSIFICATION
WITH GRAPH
CONVOLUTIONAL
NETWORKS(‘17)
Link
Link GitHub
Link
Link
CNN 을 활용하여
Graph Network를
구성한 연구
(Graph Attention
Network 등 유사 연
구 다수)
GCN의 Large
Graph 적용 한계를
극복하기 위해
Sampling 을 적용
한 연구
서로 다른 노드와 링크
구성을 해석하기 위한
연구로 사전에 Node
Path 를 정의하는
Meta-Path 중심 연구
Meta-Path 없이도
Heterogeneous 한
그래프 해석을 위
한 연구
(Transformer적용)

https://arxiv.org/pdf/2003.01332.pdf

HGT(Heterogeneous Graph Transformer)
Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling
structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges
belong to the same types, making them infeasible to represent heterogeneous structures.
기존의 연구들은 노드와 링크의 형태가 통일된 homogeneous 그래프에 한정하여 연구가 진행되었으나,
실제 필드에서는 노드와 링크의 타입과 형태가 다른 heterogeneous 가 일반적으로 이러한 문제 해결을 위한
연구를 진행 함.
[heterogeneous]
[homogeneous]
Node 와 Link 의 형태가 다름 !

Problem of Past Researches
(1) Meta-Path 에 의존하기 때문에 Domain Knowledge 에 대한 이해가 필수적, Domain 별 특화 설계 필요
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
이종의 Node 조합 순서 사전 정의 및
이에 따른 별도의 네트워크 설계

(2) 서로 다른 Meta-Path 정의 간에 별도의 Weight 및 Network 를 갖기 때문에, 충분한 Heterogeneous
정보의 학습이 되지 않음
노드, 링크 간의 조합을 생각하면 훨씬 많은 조
합이 존재하지만, 조합을 너무 다양화 할 경우,
조합별 충분한 훈련 데이터가 확보되지 않을 수
있어, 조합이 제한됨 => 충분한 Heterogeneous
학습이 되지 않음

(3) Heterogeneous Graph 는 매우 동적인데(변동 될 수 있는데) 구조적으로 그 변화를 반영하는데 한계
판매 시간, 종료 시간 등 동적으로
변할 수 있는 요소 처리를 위해서 Time
별로 별도의 노드를 만들어서 해결
=> 여러가지 문제 발생 (H/W 리소스 등)

Heterogeneous Graph Transformer Approach
[동일한 노드 간에도 다양한 관계 표현]
(1) Instead of attending on node or edge type alone, we use the meta relation ⟨τ (s),ϕ(e), τ (t)⟩ to decompose the
interaction and transform matrices, enabling HGT to capture both the common and specific patterns of different
relationships using equal or even fewer parameters.
저자와 논문간에 1저자, 2저자, 3저자 등
동일 노드 간에도 다양한 관계가 존재할
수 있도록 표현

[아키택쳐 자체로 soft meta path 구현]
(2) Different from most of the existing works that are based on customized meta paths, we rely on the nature
of the neural architecture to incorporate high-order heterogeneous neighbor information, which automatically
learns the importance of implicit meta paths.
Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different
types through message passing across layers, which can be regarded as “soft” meta paths. That said, even if
HGT take only its one-hop edges as input without manually designing meta paths, the proposed attention
mechanism can automatically and implicitly learn and extract “meta paths” that are important for different
downstream tasks.
사전 지식을 가지고 Meta Path 를 정의하여 사용 Attention 기반으로 중요한 path 를 찾아서 알아서 배울 수
있는 신경망적인 구조를 만들었어! => 뒤에서 상세 설명

[Positional Encoding 이용 Time Gap 적용]
(3) Most previous works don’t take the dynamic nature of (heterogeneous) graphs into consideration, while we
propose the relative temporal encoding technique to incorporate temporal information by using limited
computational resources.
(4) None of the existing heterogeneous GNNs are designed for and experimented with Web-scale graphs, we
therefore propose the heterogeneous Mini-Batch graph sampling algorithm designed for Web-scale graph training,
enabling experiments on the billion-scale Open Academic Graph.
[Heterogeneous Graph 의 Balance 를 고려한 Mini-Batch 기법]

Overall Architecture-그래프 구조 예시
Target Node
(Update 대상)
Source Node
(Type-1)
Source Node
(Type-2)

Overall Architecture-Message Parsing 구조
Attention
Message
Aggregate

Overall Architecture-Transformer 의 응용
[Transformer Multi-head Attention]
Q
K
V
Attention(Q,E,K)
Target Node(t-1) Target Node(t-1)
[HGT Overall Architecture]

Overall Architecture-Heterogeneous Mutual Attention 상세 (1)
Heterogeneous Mutual Attention
S : Source
T : Target
E : Edge
Multi Head Attention

we add a prior tensor µ ∈ R |A |× |R |×
|A | to denote the general significance of
each meta relation triplet
Therefore, unlike the vanilla Transformer
that directly calculates the dot product
between the Query and Key vectors, we
keep a distinct edge based matrix W ATT
ϕ(e) ∈ R d h × d h for each edge type ϕ(e).
In doing so, the model can capture
different semantic relations even between
the same node type pairs
Linear projection (or fully connected layer) is
perhaps one of the most common operations in
deep learning models. When doing linear
projection, we can project a vector x of
dimension n to a vector y of dimension size
m by multiplying a projection matrix W of
shape [n, m]

Triplet (T - E - S) 의 중요도를 반영
각각 다른 백터 사이즈를 Fully Connected Layer
연산을 통해 동일한 사이즈로 맞추는 작업
Q K
ᆞ
원래 Transformer 에서 Attention
변형된 Attention (단순 T 와 S 의 유사도가
아닌, T - E - S 를 고려한 유사도를 구함)
T E
ᆞ S
ᆞ
target edge source

Overall Architecture-Heterogeneous Message Passing 상세 (1)
Heterogeneous Message Passing

Overall Architecture-Target Specific Aggregation 상세 (1)
Target Specific Aggregation

Overall Architecture-Relative Temporal Encoding
The traditional way to incorporate temporal information is to construct a separate graph for each time slot.
However, such a procedure may lose a large portion of structural dependencies across different time slots. We
propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in
heterogeneous graphs. RTE is inspired by Transformer’s positional encoding method .Specifically, given a
source node s and a target node t, along with their corresponding timestamps T (s) and T (t), we denote the
relative time gap ∆T (t,s) = T (t) −T (s) as an index to get a relative temporal encoding RT E(∆T (t,s)).
짝
홀
타겟 노드와 소스 노드의 시간 Gap 을 Positional Encoding 기법으로
Embedding 하고 이 값을 Source 에 (+) 하는 방법으로 상대적인 시간
차이를 표현한다.

참고 - Layer Dependent Importance Sampling
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
(a)같은 노드를 반
복적으로 Sampling
함
(b)근접하지 않은
노드를
Sampling 함
(a),(b) 문제를 해결
하고 이웃 노드를
샘플링
- 검은색 : 타겟 노드
- 파란색 : 이웃 노드
- 붉은 테두리 : 샘플링 노드

참고 - Layer Dependent Importance Sampling
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
실제로는 Normalized Laplacian (예시는 D-A)
i 번째

Overall Architecture - HGSampling
[기존 Sampling 적용시 문제점] directly using them for
heterogeneous graphs is prone to get sub-graphs that are
extremely imbalanced regarding different node types, due to
that the degree distribution and the total number of nodes for
each type can vary dramatically
=> Sub-Graph 가 노드 타입과 데이터의 분포 정도라는 측면에
서 매우 불균형 해질 수가 있다.
[해결 방안]
1) keep a similar number of nodes and edges for each type
=> 노드와 엣지 타입을 균등하게 되도록 Type 을 구분해서
관리하는 Budget 을 운영한다.
2) keep the sampled sub-graph dense to minimize the
information loss and reduce the sample variance
=> 여러개의 인접 노드들 중에 무엇을 선택할지의 문제에 있
어서, 인접 노드의 Sampling Probability 를 구하여 Sampling
Layer-Dependent Importance Sampling 참조
algorithm2

Overall Architecture - Inductive Timestamp Assignment
Till now we have assumed that each node t is assigned with a timestamp T (t). However, in real-world
heterogeneous graphs, many nodes are not associated with a fixed time. Therefore, we need to assign
different timestamps to it. We denote these nodes as plain nodes.
Algorithm 1 의 Add-In-Budget Method 상세
(Budget 에 담을 때, Time Stamp 가 없다면,
Target 의 Time 을 상속 받도록 함)
Time 이 있는 경우에는 D_t 를 더해 줌

Overall Architecture - HGSampling with Inductive Timestamp Assignment
Node Type 별로 Budget
이 관리 됨 ! 균등하게
Sampling 하기 위함
p1 을 시작으로 초기화
P1 과 연결된 모든 노드를
Budget 에 추가 함
타입별로 n 개를 골라서 Output 에 추가하고,
Budget 에서는 pop-out 시킴 여기서 n=1 이고 동그
라미 타입 Budget 에 소스 노드 하나 남았음
(여기서 importance sampling 을 사용)
앞에서 선택한 type 별 n 개의 노드로 부터 인접한
노드를 다시 찾아서 Budget 에 넣는다
Budget 에 넣은 Node 를 다시 Importance
Sampling 을 통해서 n개를 선택해서 Output 에
추가하고, Pop-Out 시킴

Graph neural network #2-2 (heterogeneous graph transformer)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Graph neural network #2-2 (heterogeneous graph transformer)

Similar to Graph neural network #2-2 (heterogeneous graph transformer) (20)

More from seungwoo kim

More from seungwoo kim (9)

Graph neural network #2-2 (heterogeneous graph transformer)