SlideShare a Scribd company logo
Graph Neural Network
<2-2부> Recommendation - Heterogeneous
Heterogeneous Graph Neural Network Researches
해당 논문은 2018년에 발표된 논문으로 GCN 이 여러가지 Task 에서 State-of-the-art 를 달성하고 Graph
Neural Network 가 주목받는 시점에 연구된 논문으로, 실무에서 Graph Neural Network를 사용하기 위해서는
서로 다른 Node 와 Link 간의 조합에 대한 해석이 필요하지만, 기존의 연구는 동일한 Node 와 Link 로 구성된
Graph 로 한정하여 연구가 진행되었다는 한계를 극복하기 위한 연구들이 진행됨.
Heterogeneous Graph
Neural Network (‘19)
Heterogeneous Graph
Attention Network (‘19)
Heterogeneous
Graph
Transformer(‘20)
GraphSage:
Representation
Learning on Large
Graphs(‘18)
Link
SEMI-SUPERVISED
CLASSIFICATION
WITH GRAPH
CONVOLUTIONAL
NETWORKS(‘17)
Link
Link GitHub
Link
Link
CNN 을 활용하여
Graph Network를
구성한 연구
(Graph Attention
Network 등 유사 연
구 다수)
GCN의 Large
Graph 적용 한계를
극복하기 위해
Sampling 을 적용
한 연구
서로 다른 노드와 링크
구성을 해석하기 위한
연구로 사전에 Node
Path 를 정의하는
Meta-Path 중심 연구
Meta-Path 없이도
Heterogeneous 한
그래프 해석을 위
한 연구
(Transformer적용)
https://arxiv.org/pdf/2003.01332.pdf
HGT(Heterogeneous Graph Transformer)
Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling
structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges
belong to the same types, making them infeasible to represent heterogeneous structures.
기존의 연구들은 노드와 링크의 형태가 통일된 homogeneous 그래프에 한정하여 연구가 진행되었으나,
실제 필드에서는 노드와 링크의 타입과 형태가 다른 heterogeneous 가 일반적으로 이러한 문제 해결을 위한
연구를 진행 함.
[heterogeneous]
[homogeneous]
Node 와 Link 의 형태가 다름 !
Problem of Past Researches
(1) Meta-Path 에 의존하기 때문에 Domain Knowledge 에 대한 이해가 필수적, Domain 별 특화 설계 필요
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
이종의 Node 조합 순서 사전 정의 및
이에 따른 별도의 네트워크 설계
Problem of Past Researches
(2) 서로 다른 Meta-Path 정의 간에 별도의 Weight 및 Network 를 갖기 때문에, 충분한 Heterogeneous
정보의 학습이 되지 않음
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
노드, 링크 간의 조합을 생각하면 훨씬 많은 조
합이 존재하지만, 조합을 너무 다양화 할 경우,
조합별 충분한 훈련 데이터가 확보되지 않을 수
있어, 조합이 제한됨 => 충분한 Heterogeneous
학습이 되지 않음
Problem of Past Researches
(3) Heterogeneous Graph 는 매우 동적인데(변동 될 수 있는데) 구조적으로 그 변화를 반영하는데 한계
https://www.youtube.com/watch?v=lPP6LRqejA4
[Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation]
판매 시간, 종료 시간 등 동적으로
변할 수 있는 요소 처리를 위해서 Time
별로 별도의 노드를 만들어서 해결
=> 여러가지 문제 발생 (H/W 리소스 등)
Heterogeneous Graph Transformer Approach
[동일한 노드 간에도 다양한 관계 표현]
(1) Instead of attending on node or edge type alone, we use the meta relation ⟨τ (s),ϕ(e), τ (t)⟩ to decompose the
interaction and transform matrices, enabling HGT to capture both the common and specific patterns of different
relationships using equal or even fewer parameters.
저자와 논문간에 1저자, 2저자, 3저자 등
동일 노드 간에도 다양한 관계가 존재할
수 있도록 표현
Heterogeneous Graph Transformer Approach
[아키택쳐 자체로 soft meta path 구현]
(2) Different from most of the existing works that are based on customized meta paths, we rely on the nature
of the neural architecture to incorporate high-order heterogeneous neighbor information, which automatically
learns the importance of implicit meta paths.
Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different
types through message passing across layers, which can be regarded as “soft” meta paths. That said, even if
HGT take only its one-hop edges as input without manually designing meta paths, the proposed attention
mechanism can automatically and implicitly learn and extract “meta paths” that are important for different
downstream tasks.
사전 지식을 가지고 Meta Path 를 정의하여 사용 Attention 기반으로 중요한 path 를 찾아서 알아서 배울 수
있는 신경망적인 구조를 만들었어! => 뒤에서 상세 설명
Heterogeneous Graph Transformer Approach
[Positional Encoding 이용 Time Gap 적용]
(3) Most previous works don’t take the dynamic nature of (heterogeneous) graphs into consideration, while we
propose the relative temporal encoding technique to incorporate temporal information by using limited
computational resources.
(4) None of the existing heterogeneous GNNs are designed for and experimented with Web-scale graphs, we
therefore propose the heterogeneous Mini-Batch graph sampling algorithm designed for Web-scale graph training,
enabling experiments on the billion-scale Open Academic Graph.
[Heterogeneous Graph 의 Balance 를 고려한 Mini-Batch 기법]
Overall Architecture
Overall Architecture-그래프 구조 예시
Target Node
(Update 대상)
Source Node
(Type-1)
Source Node
(Type-2)
Overall Architecture-Message Parsing 구조
Attention
Message
Aggregate
Overall Architecture-Transformer 의 응용
[Transformer Multi-head Attention]
Q
K
V
Attention(Q,E,K)
Target Node(t-1) Target Node(t-1)
[HGT Overall Architecture]
Overall Architecture-Heterogeneous Mutual Attention 상세 (1)
Heterogeneous Mutual Attention
S : Source
T : Target
E : Edge
Multi Head Attention
Overall Architecture-Heterogeneous Mutual Attention 상세 (2)
we add a prior tensor µ ∈ R |A |× |R |×
|A | to denote the general significance of
each meta relation triplet
Therefore, unlike the vanilla Transformer
that directly calculates the dot product
between the Query and Key vectors, we
keep a distinct edge based matrix W ATT
ϕ(e) ∈ R d h × d h for each edge type ϕ(e).
In doing so, the model can capture
different semantic relations even between
the same node type pairs
Linear projection (or fully connected layer) is
perhaps one of the most common operations in
deep learning models. When doing linear
projection, we can project a vector x of
dimension n to a vector y of dimension size
m by multiplying a projection matrix W of
shape [n, m]
Heterogeneous Mutual Attention
Overall Architecture-Heterogeneous Mutual Attention 상세 (3)
Triplet (T - E - S) 의 중요도를 반영
각각 다른 백터 사이즈를 Fully Connected Layer
연산을 통해 동일한 사이즈로 맞추는 작업
Q K
ᆞ
원래 Transformer 에서 Attention
변형된 Attention (단순 T 와 S 의 유사도가
아닌, T - E - S 를 고려한 유사도를 구함)
T E
ᆞ S
ᆞ
target edge source
Heterogeneous Mutual Attention
Overall Architecture-Heterogeneous Message Passing 상세 (1)
Heterogeneous Message Passing
Overall Architecture-Target Specific Aggregation 상세 (1)
Target Specific Aggregation
Overall Architecture-Relative Temporal Encoding
The traditional way to incorporate temporal information is to construct a separate graph for each time slot.
However, such a procedure may lose a large portion of structural dependencies across different time slots. We
propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in
heterogeneous graphs. RTE is inspired by Transformer’s positional encoding method .Specifically, given a
source node s and a target node t, along with their corresponding timestamps T (s) and T (t), we denote the
relative time gap ∆T (t,s) = T (t) −T (s) as an index to get a relative temporal encoding RT E(∆T (t,s)).
짝
홀
타겟 노드와 소스 노드의 시간 Gap 을 Positional Encoding 기법으로
Embedding 하고 이 값을 Source 에 (+) 하는 방법으로 상대적인 시간
차이를 표현한다.
참고 - Layer Dependent Importance Sampling
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
https://arxiv.org/pdf/1911.07323.pdf
(a)같은 노드를 반
복적으로 Sampling
함
(b)근접하지 않은
노드를
Sampling 함
(a),(b) 문제를 해결
하고 이웃 노드를
샘플링
- 검은색 : 타겟 노드
- 파란색 : 이웃 노드
- 붉은 테두리 : 샘플링 노드
참고 - Layer Dependent Importance Sampling
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
https://arxiv.org/pdf/1911.07323.pdf
실제로는 Normalized Laplacian (예시는 D-A)
i 번째
Overall Architecture - HGSampling
[기존 Sampling 적용시 문제점] directly using them for
heterogeneous graphs is prone to get sub-graphs that are
extremely imbalanced regarding different node types, due to
that the degree distribution and the total number of nodes for
each type can vary dramatically
=> Sub-Graph 가 노드 타입과 데이터의 분포 정도라는 측면에
서 매우 불균형 해질 수가 있다.
[해결 방안]
1) keep a similar number of nodes and edges for each type
=> 노드와 엣지 타입을 균등하게 되도록 Type 을 구분해서
관리하는 Budget 을 운영한다.
2) keep the sampled sub-graph dense to minimize the
information loss and reduce the sample variance
=> 여러개의 인접 노드들 중에 무엇을 선택할지의 문제에 있
어서, 인접 노드의 Sampling Probability 를 구하여 Sampling
Layer-Dependent Importance Sampling 참조
algorithm2
Overall Architecture - Inductive Timestamp Assignment
Till now we have assumed that each node t is assigned with a timestamp T (t). However, in real-world
heterogeneous graphs, many nodes are not associated with a fixed time. Therefore, we need to assign
different timestamps to it. We denote these nodes as plain nodes.
Algorithm 1 의 Add-In-Budget Method 상세
(Budget 에 담을 때, Time Stamp 가 없다면,
Target 의 Time 을 상속 받도록 함)
Time 이 있는 경우에는 D_t 를 더해 줌
Overall Architecture - HGSampling with Inductive Timestamp Assignment
Node Type 별로 Budget
이 관리 됨 ! 균등하게
Sampling 하기 위함
p1 을 시작으로 초기화
P1 과 연결된 모든 노드를
Budget 에 추가 함
타입별로 n 개를 골라서 Output 에 추가하고,
Budget 에서는 pop-out 시킴 여기서 n=1 이고 동그
라미 타입 Budget 에 소스 노드 하나 남았음
(여기서 importance sampling 을 사용)
앞에서 선택한 type 별 n 개의 노드로 부터 인접한
노드를 다시 찾아서 Budget 에 넣는다
Budget 에 넣은 Node 를 다시 Importance
Sampling 을 통해서 n개를 선택해서 Output 에
추가하고, Pop-Out 시킴

More Related Content

What's hot

Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
BeerenSahu
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
Grigory Sapunov
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
Hyeongmin Lee
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Edureka!
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
Sören Auer
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
RashiJoshi11
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
Jure Leskovec
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysisirisshicat
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
IAMAl
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
Deakin University
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
Changjin Lee
 
Resnet
ResnetResnet
Geospatial Data Analysis and Visualization in Python
Geospatial Data Analysis and Visualization in PythonGeospatial Data Analysis and Visualization in Python
Geospatial Data Analysis and Visualization in Python
Halfdan Rump
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
Donghyeon Kim
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
San Kim
 

What's hot (20)

Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis
 
How Powerful are Graph Networks?
How Powerful are Graph Networks?How Powerful are Graph Networks?
How Powerful are Graph Networks?
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
Resnet
ResnetResnet
Resnet
 
Geospatial Data Analysis and Visualization in Python
Geospatial Data Analysis and Visualization in PythonGeospatial Data Analysis and Visualization in Python
Geospatial Data Analysis and Visualization in Python
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 

Similar to Graph neural network #2-2 (heterogeneous graph transformer)

Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
bitnineglobal
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
taeseon ryu
 
Bidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehensionBidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehension
Woodam Lim
 
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
 [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
jdo
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_ppt
seungwoo kim
 
InfoGAN Paper Review
InfoGAN Paper ReviewInfoGAN Paper Review
InfoGAN Paper Review
태엽 김
 
Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역
NAVER LABS
 
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsExploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Tae Young Lee
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기
Mad Scientists
 
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
XAIC
 
Big Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer SequencesBig Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer Sequences
taeseon ryu
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
Sunghoon Joo
 
Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...
taeseon ryu
 
220112 지승현 mauve
220112 지승현 mauve220112 지승현 mauve
220112 지승현 mauve
ssuser23ed0c
 
순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요
Byoung-Hee Kim
 
Text summarization
Text summarizationText summarization
Text summarization
Sang-Houn Choi
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
taeseon ryu
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methods
KyeongUkJang
 
이산치수학 Project4
이산치수학 Project4이산치수학 Project4
이산치수학 Project4KoChungWook
 
NoSQL 간단한 소개
NoSQL 간단한 소개NoSQL 간단한 소개
NoSQL 간단한 소개
Wonchang Song
 

Similar to Graph neural network #2-2 (heterogeneous graph transformer) (20)

Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
 
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...deep encoder, shallow decoder  reevaluating non-autoregressive machine transl...
deep encoder, shallow decoder reevaluating non-autoregressive machine transl...
 
Bidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehensionBidirectional attention flow for machine comprehension
Bidirectional attention flow for machine comprehension
 
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
 [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others [컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
[컴퓨터비전과 인공지능] 8. 합성곱 신경망 아키텍처 5 - Others
 
MRC recent trend_ppt
MRC recent trend_pptMRC recent trend_ppt
MRC recent trend_ppt
 
InfoGAN Paper Review
InfoGAN Paper ReviewInfoGAN Paper Review
InfoGAN Paper Review
 
Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역Neural Machine Translation 기반의 영어-일본어 자동번역
Neural Machine Translation 기반의 영어-일본어 자동번역
 
Exploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMsExploring Deep Learning Acceleration Technology Embedded in LLMs
Exploring Deep Learning Acceleration Technology Embedded in LLMs
 
LSTM 네트워크 이해하기
LSTM 네트워크 이해하기LSTM 네트워크 이해하기
LSTM 네트워크 이해하기
 
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)파이콘 한국 2019 튜토리얼 - LRP (Part 2)
파이콘 한국 2019 튜토리얼 - LRP (Part 2)
 
Big Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer SequencesBig Bird - Transformers for Longer Sequences
Big Bird - Transformers for Longer Sequences
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...Efficient and effective passage search via contextualized late interaction ov...
Efficient and effective passage search via contextualized late interaction ov...
 
220112 지승현 mauve
220112 지승현 mauve220112 지승현 mauve
220112 지승현 mauve
 
순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요순환신경망(Recurrent neural networks) 개요
순환신경망(Recurrent neural networks) 개요
 
Text summarization
Text summarizationText summarization
Text summarization
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methods
 
이산치수학 Project4
이산치수학 Project4이산치수학 Project4
이산치수학 Project4
 
NoSQL 간단한 소개
NoSQL 간단한 소개NoSQL 간단한 소개
NoSQL 간단한 소개
 

More from seungwoo kim

Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
seungwoo kim
 
Graph neural network 2부 recommendation 개요
Graph neural network  2부  recommendation 개요Graph neural network  2부  recommendation 개요
Graph neural network 2부 recommendation 개요
seungwoo kim
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부
seungwoo kim
 
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsEnhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
seungwoo kim
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
seungwoo kim
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
seungwoo kim
 
Albert
AlbertAlbert
Albert
seungwoo kim
 
Siamese neural networks+Bert
Siamese neural networks+BertSiamese neural networks+Bert
Siamese neural networks+Bert
seungwoo kim
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
seungwoo kim
 

More from seungwoo kim (9)

Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 
Graph neural network 2부 recommendation 개요
Graph neural network  2부  recommendation 개요Graph neural network  2부  recommendation 개요
Graph neural network 2부 recommendation 개요
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부
 
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanismsEnhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
Enhancing VAEs for collaborative filtering : flexible priors & gating mechanisms
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
 
Albert
AlbertAlbert
Albert
 
Siamese neural networks+Bert
Siamese neural networks+BertSiamese neural networks+Bert
Siamese neural networks+Bert
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 

Graph neural network #2-2 (heterogeneous graph transformer)

  • 1. Graph Neural Network <2-2부> Recommendation - Heterogeneous
  • 2. Heterogeneous Graph Neural Network Researches 해당 논문은 2018년에 발표된 논문으로 GCN 이 여러가지 Task 에서 State-of-the-art 를 달성하고 Graph Neural Network 가 주목받는 시점에 연구된 논문으로, 실무에서 Graph Neural Network를 사용하기 위해서는 서로 다른 Node 와 Link 간의 조합에 대한 해석이 필요하지만, 기존의 연구는 동일한 Node 와 Link 로 구성된 Graph 로 한정하여 연구가 진행되었다는 한계를 극복하기 위한 연구들이 진행됨. Heterogeneous Graph Neural Network (‘19) Heterogeneous Graph Attention Network (‘19) Heterogeneous Graph Transformer(‘20) GraphSage: Representation Learning on Large Graphs(‘18) Link SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS(‘17) Link Link GitHub Link Link CNN 을 활용하여 Graph Network를 구성한 연구 (Graph Attention Network 등 유사 연 구 다수) GCN의 Large Graph 적용 한계를 극복하기 위해 Sampling 을 적용 한 연구 서로 다른 노드와 링크 구성을 해석하기 위한 연구로 사전에 Node Path 를 정의하는 Meta-Path 중심 연구 Meta-Path 없이도 Heterogeneous 한 그래프 해석을 위 한 연구 (Transformer적용)
  • 4. HGT(Heterogeneous Graph Transformer) Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges belong to the same types, making them infeasible to represent heterogeneous structures. 기존의 연구들은 노드와 링크의 형태가 통일된 homogeneous 그래프에 한정하여 연구가 진행되었으나, 실제 필드에서는 노드와 링크의 타입과 형태가 다른 heterogeneous 가 일반적으로 이러한 문제 해결을 위한 연구를 진행 함. [heterogeneous] [homogeneous] Node 와 Link 의 형태가 다름 !
  • 5. Problem of Past Researches (1) Meta-Path 에 의존하기 때문에 Domain Knowledge 에 대한 이해가 필수적, Domain 별 특화 설계 필요 https://www.youtube.com/watch?v=lPP6LRqejA4 [Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation] 이종의 Node 조합 순서 사전 정의 및 이에 따른 별도의 네트워크 설계
  • 6. Problem of Past Researches (2) 서로 다른 Meta-Path 정의 간에 별도의 Weight 및 Network 를 갖기 때문에, 충분한 Heterogeneous 정보의 학습이 되지 않음 https://www.youtube.com/watch?v=lPP6LRqejA4 [Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation] 노드, 링크 간의 조합을 생각하면 훨씬 많은 조 합이 존재하지만, 조합을 너무 다양화 할 경우, 조합별 충분한 훈련 데이터가 확보되지 않을 수 있어, 조합이 제한됨 => 충분한 Heterogeneous 학습이 되지 않음
  • 7. Problem of Past Researches (3) Heterogeneous Graph 는 매우 동적인데(변동 될 수 있는데) 구조적으로 그 변화를 반영하는데 한계 https://www.youtube.com/watch?v=lPP6LRqejA4 [Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation] 판매 시간, 종료 시간 등 동적으로 변할 수 있는 요소 처리를 위해서 Time 별로 별도의 노드를 만들어서 해결 => 여러가지 문제 발생 (H/W 리소스 등)
  • 8. Heterogeneous Graph Transformer Approach [동일한 노드 간에도 다양한 관계 표현] (1) Instead of attending on node or edge type alone, we use the meta relation ⟨τ (s),ϕ(e), τ (t)⟩ to decompose the interaction and transform matrices, enabling HGT to capture both the common and specific patterns of different relationships using equal or even fewer parameters. 저자와 논문간에 1저자, 2저자, 3저자 등 동일 노드 간에도 다양한 관계가 존재할 수 있도록 표현
  • 9. Heterogeneous Graph Transformer Approach [아키택쳐 자체로 soft meta path 구현] (2) Different from most of the existing works that are based on customized meta paths, we rely on the nature of the neural architecture to incorporate high-order heterogeneous neighbor information, which automatically learns the importance of implicit meta paths. Due to the nature of its architecture, HGT can incorporate information from high-order neighbors of different types through message passing across layers, which can be regarded as “soft” meta paths. That said, even if HGT take only its one-hop edges as input without manually designing meta paths, the proposed attention mechanism can automatically and implicitly learn and extract “meta paths” that are important for different downstream tasks. 사전 지식을 가지고 Meta Path 를 정의하여 사용 Attention 기반으로 중요한 path 를 찾아서 알아서 배울 수 있는 신경망적인 구조를 만들었어! => 뒤에서 상세 설명
  • 10. Heterogeneous Graph Transformer Approach [Positional Encoding 이용 Time Gap 적용] (3) Most previous works don’t take the dynamic nature of (heterogeneous) graphs into consideration, while we propose the relative temporal encoding technique to incorporate temporal information by using limited computational resources. (4) None of the existing heterogeneous GNNs are designed for and experimented with Web-scale graphs, we therefore propose the heterogeneous Mini-Batch graph sampling algorithm designed for Web-scale graph training, enabling experiments on the billion-scale Open Academic Graph. [Heterogeneous Graph 의 Balance 를 고려한 Mini-Batch 기법]
  • 12. Overall Architecture-그래프 구조 예시 Target Node (Update 대상) Source Node (Type-1) Source Node (Type-2)
  • 13. Overall Architecture-Message Parsing 구조 Attention Message Aggregate
  • 14. Overall Architecture-Transformer 의 응용 [Transformer Multi-head Attention] Q K V Attention(Q,E,K) Target Node(t-1) Target Node(t-1) [HGT Overall Architecture]
  • 15. Overall Architecture-Heterogeneous Mutual Attention 상세 (1) Heterogeneous Mutual Attention S : Source T : Target E : Edge Multi Head Attention
  • 16. Overall Architecture-Heterogeneous Mutual Attention 상세 (2) we add a prior tensor µ ∈ R |A |× |R |× |A | to denote the general significance of each meta relation triplet Therefore, unlike the vanilla Transformer that directly calculates the dot product between the Query and Key vectors, we keep a distinct edge based matrix W ATT ϕ(e) ∈ R d h × d h for each edge type ϕ(e). In doing so, the model can capture different semantic relations even between the same node type pairs Linear projection (or fully connected layer) is perhaps one of the most common operations in deep learning models. When doing linear projection, we can project a vector x of dimension n to a vector y of dimension size m by multiplying a projection matrix W of shape [n, m] Heterogeneous Mutual Attention
  • 17. Overall Architecture-Heterogeneous Mutual Attention 상세 (3) Triplet (T - E - S) 의 중요도를 반영 각각 다른 백터 사이즈를 Fully Connected Layer 연산을 통해 동일한 사이즈로 맞추는 작업 Q K ᆞ 원래 Transformer 에서 Attention 변형된 Attention (단순 T 와 S 의 유사도가 아닌, T - E - S 를 고려한 유사도를 구함) T E ᆞ S ᆞ target edge source Heterogeneous Mutual Attention
  • 18. Overall Architecture-Heterogeneous Message Passing 상세 (1) Heterogeneous Message Passing
  • 19. Overall Architecture-Target Specific Aggregation 상세 (1) Target Specific Aggregation
  • 20. Overall Architecture-Relative Temporal Encoding The traditional way to incorporate temporal information is to construct a separate graph for each time slot. However, such a procedure may lose a large portion of structural dependencies across different time slots. We propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in heterogeneous graphs. RTE is inspired by Transformer’s positional encoding method .Specifically, given a source node s and a target node t, along with their corresponding timestamps T (s) and T (t), we denote the relative time gap ∆T (t,s) = T (t) −T (s) as an index to get a relative temporal encoding RT E(∆T (t,s)). 짝 홀 타겟 노드와 소스 노드의 시간 Gap 을 Positional Encoding 기법으로 Embedding 하고 이 값을 Source 에 (+) 하는 방법으로 상대적인 시간 차이를 표현한다.
  • 21. 참고 - Layer Dependent Importance Sampling Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks https://arxiv.org/pdf/1911.07323.pdf (a)같은 노드를 반 복적으로 Sampling 함 (b)근접하지 않은 노드를 Sampling 함 (a),(b) 문제를 해결 하고 이웃 노드를 샘플링 - 검은색 : 타겟 노드 - 파란색 : 이웃 노드 - 붉은 테두리 : 샘플링 노드
  • 22. 참고 - Layer Dependent Importance Sampling Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks https://arxiv.org/pdf/1911.07323.pdf 실제로는 Normalized Laplacian (예시는 D-A) i 번째
  • 23. Overall Architecture - HGSampling [기존 Sampling 적용시 문제점] directly using them for heterogeneous graphs is prone to get sub-graphs that are extremely imbalanced regarding different node types, due to that the degree distribution and the total number of nodes for each type can vary dramatically => Sub-Graph 가 노드 타입과 데이터의 분포 정도라는 측면에 서 매우 불균형 해질 수가 있다. [해결 방안] 1) keep a similar number of nodes and edges for each type => 노드와 엣지 타입을 균등하게 되도록 Type 을 구분해서 관리하는 Budget 을 운영한다. 2) keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance => 여러개의 인접 노드들 중에 무엇을 선택할지의 문제에 있 어서, 인접 노드의 Sampling Probability 를 구하여 Sampling Layer-Dependent Importance Sampling 참조 algorithm2
  • 24. Overall Architecture - Inductive Timestamp Assignment Till now we have assumed that each node t is assigned with a timestamp T (t). However, in real-world heterogeneous graphs, many nodes are not associated with a fixed time. Therefore, we need to assign different timestamps to it. We denote these nodes as plain nodes. Algorithm 1 의 Add-In-Budget Method 상세 (Budget 에 담을 때, Time Stamp 가 없다면, Target 의 Time 을 상속 받도록 함) Time 이 있는 경우에는 D_t 를 더해 줌
  • 25. Overall Architecture - HGSampling with Inductive Timestamp Assignment Node Type 별로 Budget 이 관리 됨 ! 균등하게 Sampling 하기 위함 p1 을 시작으로 초기화 P1 과 연결된 모든 노드를 Budget 에 추가 함 타입별로 n 개를 골라서 Output 에 추가하고, Budget 에서는 pop-out 시킴 여기서 n=1 이고 동그 라미 타입 Budget 에 소스 노드 하나 남았음 (여기서 importance sampling 을 사용) 앞에서 선택한 type 별 n 개의 노드로 부터 인접한 노드를 다시 찾아서 Budget 에 넣는다 Budget 에 넣은 Node 를 다시 Importance Sampling 을 통해서 n개를 선택해서 Output 에 추가하고, Pop-Out 시킴