SlideShare a Scribd company logo
1 of 25
Graph Neural Network
<2-2부> Recommendation - Heterogeneous
Heterogeneous Graph Neural Network Researches
해당 논문은 2018년에 발표된 논문으로 GCN 이 여러가지 Task 에서 State-of-the-art 를 달성하고 Graph
Neural Network 가 주목받는 시점에 연구된 논문으로, 실무에서 Graph Neural Network를 사용하기 위해서는
서로 다른 Node 와 Link 간의 조합에 대한 해석이 필요하지만, 기존의 연구는 동일한 Node 와 Link 로 구성된
Graph 로 한정하여 연구가 진행되었다는 한계를 극복하기 위한 연구들이 진행됨.
Heterogeneous Graph
Neural Network (‘19)
Heterogeneous Graph
Attention Network (‘19)
Heterogeneous
Graph
Transformer(‘20)
GraphSage:
Representation
Learning on Large
Graphs(‘18)
Link
SEMI-SUPERVISED
CLASSIFICATION
WITH GRAPH
CONVOLUTIONAL
NETWORKS(‘17)
Link
Link GitHub
Link
Link
CNN 을 활용하여
Graph Network를
구성한 연구
(Graph Attention
Network 등 유사 연
구 다수)
GCN의 Large
Graph 적용 한계를
극복하기 위해
Sampling 을 적용
한 연구
서로 다른 노드와 링크
구성을 해석하기 위한
연구로 사전에 Node
Path 를 정의하는
Meta-Path 중심 연구
Meta-Path 없이도
Heterogeneous 한
그래프 해석을 위
한 연구
(Transformer적용)
https://arxiv.org/pdf/2003.01332.pdf
HGT(Heterogeneous Graph Transformer)
Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling
structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges
belong to the same types, making them infeasible to represent heterogeneous structures.
기존의 연구들은 노드와 링크의 형태가 통일된 homogeneous 그래프에 한정하여 연구가 진행되었으나,
실제 필드에서는 노드와 링크의 타입과 형태가 다른 heterogeneous 가 일반적으로 이러한 문제 해결을 위한
연구를 진행 함.
[heterogeneous]
[homogeneous]
Node 와 Link 의 형태가 다름 !
Problem of Past Researches
(1) Meta-Path 에 의존하기 때문에 Domain Knowledge 에 대한 이해가 필수적, Domain 별 특화 설계 필요
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
이종의 Node 조합 순서 사전 정의 및
이에 따른 별도의 네트워크 설계
Problem of Past Researches
(2) 서로 다른 Meta-Path 정의 간에 별도의 Weight 및 Network 를 갖기 때문에, 충분한 Heterogeneous
정보의 학습이 되지 않음
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
노드, 링크 간의 조합을 생각하면 훨씬 많은 조
합이 존재하지만, 조합을 너무 다양화 할 경우,
조합별 충분한 훈련 데이터가 확보되지 않을 수
있어, 조합이 제한됨 => 충분한 Heterogeneous
학습이 되지 않음
Problem of Past Researches
(3) Heterogeneous Graph 는 매우 동적인데(변동 될 수 있는데) 구조적으로 그 변화를 반영하는데 한계
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
판매 시간, 종료 시간 등 동적으로
변할 수 있는 요소 처리를 위해서 Time
별로 별도의 노드를 만들어서 해결
=> 여러가지 문제 발생 (H/W 리소스 등)
Heterogeneous Graph Transformer Approach
[동일한 노드 간에도 다양한 관계 표현]
(1) Instead of attending on node or edge type alone, we use the meta relation ⟨τ (s),ϕ(e), τ (t)⟩ to decompose the
interaction and transform matrices, enabling HGT to capture both the common and specific patterns of different
relationships using equal or even fewer parameters.
저자와 논문간에 1저자, 2저자, 3저자 등
동일 노드 간에도 다양한 관계가 존재할
수 있도록 표현
Heterogeneous Graph Transformer Approach
[아키택쳐 자체로 soft meta path 구현]
(2) Different from most of the existing works that are based on customized meta paths, we rely on the nature
of the neural architecture to incorporate high-order heterogeneous neighbor information, which automatically
learns the importance of implicit meta paths.
Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different
types through message passing across layers, which can be regarded as “soft” meta paths. That said, even if
HGT take only its one-hop edges as input without manually designing meta paths, the proposed attention
mechanism can automatically and implicitly learn and extract “meta paths” that are important for different
downstream tasks.
사전 지식을 가지고 Meta Path 를 정의하여 사용 Attention 기반으로 중요한 path 를 찾아서 알아서 배울 수
있는 신경망적인 구조를 만들었어! => 뒤에서 상세 설명
Heterogeneous Graph Transformer Approach
[Positional Encoding 이용 Time Gap 적용]
(3) Most previous works don’t take the dynamic nature of (heterogeneous) graphs into consideration, while we
propose the relative temporal encoding technique to incorporate temporal information by using limited
computational resources.
(4) None of the existing heterogeneous GNNs are designed for and experimented with Web-scale graphs, we
therefore propose the heterogeneous Mini-Batch graph sampling algorithm designed for Web-scale graph training,
enabling experiments on the billion-scale Open Academic Graph.
[Heterogeneous Graph 의 Balance 를 고려한 Mini-Batch 기법]
Overall Architecture
Overall Architecture-그래프 구조 예시
Target Node
(Update 대상)
Source Node
(Type-1)
Source Node
(Type-2)
Overall Architecture-Message Parsing 구조
Attention
Message
Aggregate
Overall Architecture-Transformer 의 응용
[Transformer Multi-head Attention]
Q
K
V
Attention(Q,E,K)
Target Node(t-1) Target Node(t-1)
[HGT Overall Architecture]
Overall Architecture-Heterogeneous Mutual Attention 상세 (1)
Heterogeneous Mutual Attention
S : Source
T : Target
E : Edge
Multi Head Attention
Overall Architecture-Heterogeneous Mutual Attention 상세 (2)
we add a prior tensor µ ∈ R |A |× |R |×
|A | to denote the general significance of
each meta relation triplet
Therefore, unlike the vanilla Transformer
that directly calculates the dot product
between the Query and Key vectors, we
keep a distinct edge based matrix W ATT
ϕ(e) ∈ R d h × d h for each edge type ϕ(e).
In doing so, the model can capture
different semantic relations even between
the same node type pairs
Linear projection (or fully connected layer) is
perhaps one of the most common operations in
deep learning models. When doing linear
projection, we can project a vector x of
dimension n to a vector y of dimension size
m by multiplying a projection matrix W of
shape [n, m]
Heterogeneous Mutual Attention
Overall Architecture-Heterogeneous Mutual Attention 상세 (3)
Triplet (T - E - S) 의 중요도를 반영
각각 다른 백터 사이즈를 Fully Connected Layer
연산을 통해 동일한 사이즈로 맞추는 작업
Q K
ᆞ
원래 Transformer 에서 Attention
변형된 Attention (단순 T 와 S 의 유사도가
아닌, T - E - S 를 고려한 유사도를 구함)
T E
ᆞ S
ᆞ
target edge source
Heterogeneous Mutual Attention
Overall Architecture-Heterogeneous Message Passing 상세 (1)
Heterogeneous Message Passing
Overall Architecture-Target Specific Aggregation 상세 (1)
Target Specific Aggregation
Overall Architecture-Relative Temporal Encoding
The traditional way to incorporate temporal information is to construct a separate graph for each time slot.
However, such a procedure may lose a large portion of structural dependencies across different time slots. We
propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in
heterogeneous graphs. RTE is inspired by Transformer’s positional encoding method .Specifically, given a
source node s and a target node t, along with their corresponding timestamps T (s) and T (t), we denote the
relative time gap ∆T (t,s) = T (t) −T (s) as an index to get a relative temporal encoding RT E(∆T (t,s)).
짝
홀
타겟 노드와 소스 노드의 시간 Gap 을 Positional Encoding 기법으로
Embedding 하고 이 값을 Source 에 (+) 하는 방법으로 상대적인 시간
차이를 표현한다.
참고 - Layer Dependent Importance Sampling
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
https://arxiv.org/pdf/1911.07323.pdf
(a)같은 노드를 반
복적으로 Sampling
함
(b)근접하지 않은
노드를
Sampling 함
(a),(b) 문제를 해결
하고 이웃 노드를
샘플링
- 검은색 : 타겟 노드
- 파란색 : 이웃 노드
- 붉은 테두리 : 샘플링 노드
참고 - Layer Dependent Importance Sampling
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
https://arxiv.org/pdf/1911.07323.pdf
실제로는 Normalized Laplacian (예시는 D-A)
i 번째
Overall Architecture - HGSampling
[기존 Sampling 적용시 문제점] directly using them for
heterogeneous graphs is prone to get sub-graphs that are
extremely imbalanced regarding different node types, due to
that the degree distribution and the total number of nodes for
each type can vary dramatically
=> Sub-Graph 가 노드 타입과 데이터의 분포 정도라는 측면에
서 매우 불균형 해질 수가 있다.
[해결 방안]
1) keep a similar number of nodes and edges for each type
=> 노드와 엣지 타입을 균등하게 되도록 Type 을 구분해서
관리하는 Budget 을 운영한다.
2) keep the sampled sub-graph dense to minimize the
information loss and reduce the sample variance
=> 여러개의 인접 노드들 중에 무엇을 선택할지의 문제에 있
어서, 인접 노드의 Sampling Probability 를 구하여 Sampling
Layer-Dependent Importance Sampling 참조
algorithm2
Overall Architecture - Inductive Timestamp Assignment
Till now we have assumed that each node t is assigned with a timestamp T (t). However, in real-world
heterogeneous graphs, many nodes are not associated with a fixed time. Therefore, we need to assign
different timestamps to it. We denote these nodes as plain nodes.
Algorithm 1 의 Add-In-Budget Method 상세
(Budget 에 담을 때, Time Stamp 가 없다면,
Target 의 Time 을 상속 받도록 함)
Time 이 있는 경우에는 D_t 를 더해 줌
Overall Architecture - HGSampling with Inductive Timestamp Assignment
Node Type 별로 Budget
이 관리 됨 ! 균등하게
Sampling 하기 위함
p1 을 시작으로 초기화
P1 과 연결된 모든 노드를
Budget 에 추가 함
타입별로 n 개를 골라서 Output 에 추가하고,
Budget 에서는 pop-out 시킴 여기서 n=1 이고 동그
라미 타입 Budget 에 소스 노드 하나 남았음
(여기서 importance sampling 을 사용)
앞에서 선택한 type 별 n 개의 노드로 부터 인접한
노드를 다시 찾아서 Budget 에 넣는다
Budget 에 넣은 Node 를 다시 Importance
Sampling 을 통해서 n개를 선택해서 Output 에
추가하고, Pop-Out 시킴

More Related Content

What's hot

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Prml Reading Group 10 8.3
Prml Reading Group 10 8.3Prml Reading Group 10 8.3
Prml Reading Group 10 8.3正志 坪坂
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
 
Capsule Graph Neural Network
Capsule Graph Neural NetworkCapsule Graph Neural Network
Capsule Graph Neural Networkharmonylab
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...Deep Learning JP
 
PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現hagino 3000
 
Graph Neural Networks
Graph Neural NetworksGraph Neural Networks
Graph Neural Networkstm1966
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについてMasahiro Suzuki
 
スペクトラル・クラスタリング
スペクトラル・クラスタリングスペクトラル・クラスタリング
スペクトラル・クラスタリングAkira Miyazawa
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?IAMAl
 
[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural ProcessesDeep Learning JP
 
R を起動するその前に
R を起動するその前にR を起動するその前に
R を起動するその前にKosei ABE
 
研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節Koji Matsuda
 
データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例 〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』
データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例  〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例  〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』
データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例 〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』The Japan DataScientist Society
 
自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介Taku Yoshioka
 

What's hot (20)

Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Prml Reading Group 10 8.3
Prml Reading Group 10 8.3Prml Reading Group 10 8.3
Prml Reading Group 10 8.3
 
グラフと木
グラフと木グラフと木
グラフと木
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
Rの高速化
Rの高速化Rの高速化
Rの高速化
 
Capsule Graph Neural Network
Capsule Graph Neural NetworkCapsule Graph Neural Network
Capsule Graph Neural Network
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
 
PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現
 
Graph Neural Networks
Graph Neural NetworksGraph Neural Networks
Graph Neural Networks
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
スペクトラル・クラスタリング
スペクトラル・クラスタリングスペクトラル・クラスタリング
スペクトラル・クラスタリング
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes[DL輪読会]Conditional Neural Processes
[DL輪読会]Conditional Neural Processes
 
R を起動するその前に
R を起動するその前にR を起動するその前に
R を起動するその前に
 
研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節
 
データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例 〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』
データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例  〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例  〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』
データサイエンティスト協会 木曜勉強会 #04 『クラスター分析の基礎と総合通販会社での活用例 〜 ビッグデータ時代にクラスター分析はどう変わるか 〜』
 
自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介
 
Graph Convolutional Network 概説
Graph Convolutional Network 概説Graph Convolutional Network 概説
Graph Convolutional Network 概説
 

Similar to Graph neural network #2-2 (heterogeneous graph transformer)

Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 bitnineglobal
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...taeseon ryu
 
Bidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehensionBidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehensionWoodam Lim
 
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
 [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Othersjdo
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_pptseungwoo kim
 
InfoGAN Paper Review
InfoGAN Paper ReviewInfoGAN Paper Review
InfoGAN Paper Review태엽 김
 
Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역NAVER LABS
 
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsExploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsTae Young Lee
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기Mad Scientists
 
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)XAIC
 
Big Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer SequencesBig Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer Sequencestaeseon ryu
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...taeseon ryu
 
220112 지승현 mauve
220112 지승현 mauve220112 지승현 mauve
220112 지승현 mauvessuser23ed0c
 
순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요Byoung-Hee Kim
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attentiontaeseon ryu
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methodsKyeongUkJang
 
이산치수학 Project4
이산치수학 Project4이산치수학 Project4
이산치수학 Project4KoChungWook
 
NoSQL 간단한 소개
NoSQL 간단한 소개NoSQL 간단한 소개
NoSQL 간단한 소개Wonchang Song
 

Similar to Graph neural network #2-2 (heterogeneous graph transformer) (20)

Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
 
Bidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehensionBidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehension
 
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
 [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_ppt
 
InfoGAN Paper Review
InfoGAN Paper ReviewInfoGAN Paper Review
InfoGAN Paper Review
 
Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역
 
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsExploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMs
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기
 
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
 
Big Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer SequencesBig Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer Sequences
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...
 
220112 지승현 mauve
220112 지승현 mauve220112 지승현 mauve
220112 지승현 mauve
 
순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요
 
Text summarization
Text summarizationText summarization
Text summarization
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methods
 
이산치수학 Project4
이산치수학 Project4이산치수학 Project4
이산치수학 Project4
 
NoSQL 간단한 소개
NoSQL 간단한 소개NoSQL 간단한 소개
NoSQL 간단한 소개
 

More from seungwoo kim

Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)seungwoo kim
 
Graph neural network 2부 recommendation 개요
Graph neural network  2부  recommendation 개요Graph neural network  2부  recommendation 개요
Graph neural network 2부 recommendation 개요seungwoo kim
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부seungwoo kim
 
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsEnhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsseungwoo kim
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsseungwoo kim
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researchesseungwoo kim
 
Siamese neural networks+Bert
Siamese neural networks+BertSiamese neural networks+Bert
Siamese neural networks+Bertseungwoo kim
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 

More from seungwoo kim (9)

Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 
Graph neural network 2부 recommendation 개요
Graph neural network  2부  recommendation 개요Graph neural network  2부  recommendation 개요
Graph neural network 2부 recommendation 개요
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부
 
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsEnhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
 
Albert
AlbertAlbert
Albert
 
Siamese neural networks+Bert
Siamese neural networks+BertSiamese neural networks+Bert
Siamese neural networks+Bert
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 

Recently uploaded

A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)Tae Young Lee
 
Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)Wonjun Hwang
 
캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차캐드앤그래픽스
 
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionMOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionKim Daeun
 
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Kim Daeun
 
Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)Wonjun Hwang
 

Recently uploaded (6)

A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)
 
Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)
 
캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차
 
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionMOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
 
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
 
Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)
 

Graph neural network #2-2 (heterogeneous graph transformer)

  • 1. Graph Neural Network <2-2부> Recommendation - Heterogeneous
  • 2. Heterogeneous Graph Neural Network Researches 해당 논문은 2018년에 발표된 논문으로 GCN 이 여러가지 Task 에서 State-of-the-art 를 달성하고 Graph Neural Network 가 주목받는 시점에 연구된 논문으로, 실무에서 Graph Neural Network를 사용하기 위해서는 서로 다른 Node 와 Link 간의 조합에 대한 해석이 필요하지만, 기존의 연구는 동일한 Node 와 Link 로 구성된 Graph 로 한정하여 연구가 진행되었다는 한계를 극복하기 위한 연구들이 진행됨. Heterogeneous Graph Neural Network (‘19) Heterogeneous Graph Attention Network (‘19) Heterogeneous Graph Transformer(‘20) GraphSage: Representation Learning on Large Graphs(‘18) Link SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS(‘17) Link Link GitHub Link Link CNN 을 활용하여 Graph Network를 구성한 연구 (Graph Attention Network 등 유사 연 구 다수) GCN의 Large Graph 적용 한계를 극복하기 위해 Sampling 을 적용 한 연구 서로 다른 노드와 링크 구성을 해석하기 위한 연구로 사전에 Node Path 를 정의하는 Meta-Path 중심 연구 Meta-Path 없이도 Heterogeneous 한 그래프 해석을 위 한 연구 (Transformer적용)
  • 4. HGT(Heterogeneous Graph Transformer) Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges belong to the same types, making them infeasible to represent heterogeneous structures. 기존의 연구들은 노드와 링크의 형태가 통일된 homogeneous 그래프에 한정하여 연구가 진행되었으나, 실제 필드에서는 노드와 링크의 타입과 형태가 다른 heterogeneous 가 일반적으로 이러한 문제 해결을 위한 연구를 진행 함. [heterogeneous] [homogeneous] Node 와 Link 의 형태가 다름 !
  • 5. Problem of Past Researches (1) Meta-Path 에 의존하기 때문에 Domain Knowledge 에 대한 이해가 필수적, Domain 별 특화 설계 필요 https://www.youtube.com/watch?v=lPP6LRqejA4 [Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation] 이종의 Node 조합 순서 사전 정의 및 이에 따른 별도의 네트워크 설계
  • 6. Problem of Past Researches (2) 서로 다른 Meta-Path 정의 간에 별도의 Weight 및 Network 를 갖기 때문에, 충분한 Heterogeneous 정보의 학습이 되지 않음 https://www.youtube.com/watch?v=lPP6LRqejA4 [Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation] 노드, 링크 간의 조합을 생각하면 훨씬 많은 조 합이 존재하지만, 조합을 너무 다양화 할 경우, 조합별 충분한 훈련 데이터가 확보되지 않을 수 있어, 조합이 제한됨 => 충분한 Heterogeneous 학습이 되지 않음
  • 7. Problem of Past Researches (3) Heterogeneous Graph 는 매우 동적인데(변동 될 수 있는데) 구조적으로 그 변화를 반영하는데 한계 https://www.youtube.com/watch?v=lPP6LRqejA4 [Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation] 판매 시간, 종료 시간 등 동적으로 변할 수 있는 요소 처리를 위해서 Time 별로 별도의 노드를 만들어서 해결 => 여러가지 문제 발생 (H/W 리소스 등)
  • 8. Heterogeneous Graph Transformer Approach [동일한 노드 간에도 다양한 관계 표현] (1) Instead of attending on node or edge type alone, we use the meta relation ⟨τ (s),ϕ(e), τ (t)⟩ to decompose the interaction and transform matrices, enabling HGT to capture both the common and specific patterns of different relationships using equal or even fewer parameters. 저자와 논문간에 1저자, 2저자, 3저자 등 동일 노드 간에도 다양한 관계가 존재할 수 있도록 표현
  • 9. Heterogeneous Graph Transformer Approach [아키택쳐 자체로 soft meta path 구현] (2) Different from most of the existing works that are based on customized meta paths, we rely on the nature of the neural architecture to incorporate high-order heterogeneous neighbor information, which automatically learns the importance of implicit meta paths. Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different types through message passing across layers, which can be regarded as “soft” meta paths. That said, even if HGT take only its one-hop edges as input without manually designing meta paths, the proposed attention mechanism can automatically and implicitly learn and extract “meta paths” that are important for different downstream tasks. 사전 지식을 가지고 Meta Path 를 정의하여 사용 Attention 기반으로 중요한 path 를 찾아서 알아서 배울 수 있는 신경망적인 구조를 만들었어! => 뒤에서 상세 설명
  • 10. Heterogeneous Graph Transformer Approach [Positional Encoding 이용 Time Gap 적용] (3) Most previous works don’t take the dynamic nature of (heterogeneous) graphs into consideration, while we propose the relative temporal encoding technique to incorporate temporal information by using limited computational resources. (4) None of the existing heterogeneous GNNs are designed for and experimented with Web-scale graphs, we therefore propose the heterogeneous Mini-Batch graph sampling algorithm designed for Web-scale graph training, enabling experiments on the billion-scale Open Academic Graph. [Heterogeneous Graph 의 Balance 를 고려한 Mini-Batch 기법]
  • 12. Overall Architecture-그래프 구조 예시 Target Node (Update 대상) Source Node (Type-1) Source Node (Type-2)
  • 13. Overall Architecture-Message Parsing 구조 Attention Message Aggregate
  • 14. Overall Architecture-Transformer 의 응용 [Transformer Multi-head Attention] Q K V Attention(Q,E,K) Target Node(t-1) Target Node(t-1) [HGT Overall Architecture]
  • 15. Overall Architecture-Heterogeneous Mutual Attention 상세 (1) Heterogeneous Mutual Attention S : Source T : Target E : Edge Multi Head Attention
  • 16. Overall Architecture-Heterogeneous Mutual Attention 상세 (2) we add a prior tensor µ ∈ R |A |× |R |× |A | to denote the general significance of each meta relation triplet Therefore, unlike the vanilla Transformer that directly calculates the dot product between the Query and Key vectors, we keep a distinct edge based matrix W ATT ϕ(e) ∈ R d h × d h for each edge type ϕ(e). In doing so, the model can capture different semantic relations even between the same node type pairs Linear projection (or fully connected layer) is perhaps one of the most common operations in deep learning models. When doing linear projection, we can project a vector x of dimension n to a vector y of dimension size m by multiplying a projection matrix W of shape [n, m] Heterogeneous Mutual Attention
  • 17. Overall Architecture-Heterogeneous Mutual Attention 상세 (3) Triplet (T - E - S) 의 중요도를 반영 각각 다른 백터 사이즈를 Fully Connected Layer 연산을 통해 동일한 사이즈로 맞추는 작업 Q K ᆞ 원래 Transformer 에서 Attention 변형된 Attention (단순 T 와 S 의 유사도가 아닌, T - E - S 를 고려한 유사도를 구함) T E ᆞ S ᆞ target edge source Heterogeneous Mutual Attention
  • 18. Overall Architecture-Heterogeneous Message Passing 상세 (1) Heterogeneous Message Passing
  • 19. Overall Architecture-Target Specific Aggregation 상세 (1) Target Specific Aggregation
  • 20. Overall Architecture-Relative Temporal Encoding The traditional way to incorporate temporal information is to construct a separate graph for each time slot. However, such a procedure may lose a large portion of structural dependencies across different time slots. We propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in heterogeneous graphs. RTE is inspired by Transformer’s positional encoding method .Specifically, given a source node s and a target node t, along with their corresponding timestamps T (s) and T (t), we denote the relative time gap ∆T (t,s) = T (t) −T (s) as an index to get a relative temporal encoding RT E(∆T (t,s)). 짝 홀 타겟 노드와 소스 노드의 시간 Gap 을 Positional Encoding 기법으로 Embedding 하고 이 값을 Source 에 (+) 하는 방법으로 상대적인 시간 차이를 표현한다.
  • 21. 참고 - Layer Dependent Importance Sampling Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks https://arxiv.org/pdf/1911.07323.pdf (a)같은 노드를 반 복적으로 Sampling 함 (b)근접하지 않은 노드를 Sampling 함 (a),(b) 문제를 해결 하고 이웃 노드를 샘플링 - 검은색 : 타겟 노드 - 파란색 : 이웃 노드 - 붉은 테두리 : 샘플링 노드
  • 22. 참고 - Layer Dependent Importance Sampling Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks https://arxiv.org/pdf/1911.07323.pdf 실제로는 Normalized Laplacian (예시는 D-A) i 번째
  • 23. Overall Architecture - HGSampling [기존 Sampling 적용시 문제점] directly using them for heterogeneous graphs is prone to get sub-graphs that are extremely imbalanced regarding different node types, due to that the degree distribution and the total number of nodes for each type can vary dramatically => Sub-Graph 가 노드 타입과 데이터의 분포 정도라는 측면에 서 매우 불균형 해질 수가 있다. [해결 방안] 1) keep a similar number of nodes and edges for each type => 노드와 엣지 타입을 균등하게 되도록 Type 을 구분해서 관리하는 Budget 을 운영한다. 2) keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance => 여러개의 인접 노드들 중에 무엇을 선택할지의 문제에 있 어서, 인접 노드의 Sampling Probability 를 구하여 Sampling Layer-Dependent Importance Sampling 참조 algorithm2
  • 24. Overall Architecture - Inductive Timestamp Assignment Till now we have assumed that each node t is assigned with a timestamp T (t). However, in real-world heterogeneous graphs, many nodes are not associated with a fixed time. Therefore, we need to assign different timestamps to it. We denote these nodes as plain nodes. Algorithm 1 의 Add-In-Budget Method 상세 (Budget 에 담을 때, Time Stamp 가 없다면, Target 의 Time 을 상속 받도록 함) Time 이 있는 경우에는 D_t 를 더해 줌
  • 25. Overall Architecture - HGSampling with Inductive Timestamp Assignment Node Type 별로 Budget 이 관리 됨 ! 균등하게 Sampling 하기 위함 p1 을 시작으로 초기화 P1 과 연결된 모든 노드를 Budget 에 추가 함 타입별로 n 개를 골라서 Output 에 추가하고, Budget 에서는 pop-out 시킴 여기서 n=1 이고 동그 라미 타입 Budget 에 소스 노드 하나 남았음 (여기서 importance sampling 을 사용) 앞에서 선택한 type 별 n 개의 노드로 부터 인접한 노드를 다시 찾아서 Budget 에 넣는다 Budget 에 넣은 Node 를 다시 Importance Sampling 을 통해서 n개를 선택해서 Output 에 추가하고, Pop-Out 시킴