SlideShare a Scribd company logo
1 of 19
Pure Transformers are
Powerful Graph Learners
Tien-Bach-Thanh Do
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: osfa19730@catholic.ac.kr
2024/03/04
Jinwoo Kim et al.
Advances in Neural Information Processing Systems, 2022
2
Introduction
• Standard Transformers without graph-specific modifications can lead to promising results
• Treat all nodes and edges as independent tokens, augment them with token embeddings, and feed
them to a Transformer
• This approach is at least as expressive as an invariant graph network (2-IGN) composed of equivariant
linear layers, which is already more expressive than all message-passing GNN
3
Related Works
• Multiple works tried combining self-attention in GNN architecture where message passing was previously
dominant [50]
• Global self-attention across nodes cannot reflect the graph structure
○ Restrict self-attention to local neighborhoods [69, 51, 19]
○ Use global self-attention in conjunction with message-passing GNN [58, 43, 34]
○ Inject edge information into global self-attention via attention bias [72, 78, 29, 54]
• Issues with message-passing such as oversmoothing [40, 8, 52]
• Incompatible with useful engineering techniques like linear attention [65]
• [50] E. Min, R. Chen, Y. Bian, T. Xu, K. Zhao, W. Huang, P. Zhao, J. Huang, S. Ananiadou, and Y. Rong. Transformer for graphs: An overview from architecture perspective. arXiv, 2022
• [69] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018
• [51] D. Q. Nguyen, T. D. Nguyen, and D. Phung. Universal graph transformer self-attention networks. In WWW, 2022
• [19] V. P. Dwivedi and X. Bresson. A generalization of transformer networks to graphs. arXiv, 2020
• [58] Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang. Self-supervised graph transformer on large-scale molecular data. In NeurIPS, 2020
• [43] K. Lin, L. Wang, and Z. Liu. Mesh graphormer. In ICCV, 2021
• [34] J. Kim, S. Oh, and S. Hong. Transformers generalize deepsets and can be extended to graphs and hypergraphs. In NeurIPS, 2021
• [72] X. Wang, Z. Tu, L. Wang, and S. Shi. Self-attention with structural position representations. In EMNLP- IJCNLP, 2019
• [78] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and T. Liu. Do transformers really perform bad for graph representation? In NeurIPS, 2021
• [29] M. S. Hussain, M. J. Zaki, and D. Subramanian. Edge-augmented graph transformers: Global self-attention is enough for graphs. arXiv, 2021
• [54] W. Park, W. Chang, D. Lee, J. Kim, and S. won Hwang. Grpe: Relative positional encoding for graph transformer. arXiv, 2022
• [40] Q. Li, Z. Han, and X. Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI, 2018
• [8] C. Cai and Y. Wang. A note on over-smoothing for graph neural networks. arXiv, 2020
• [52] K. Oono and T. Suzuki. Graph neural networks exponentially lose expressive power for node classification. In ICLR, 2020
• [65] Y. Tay, M. Dehghani, D. Bahri, and D. Metzler. Efficient transformers: A survey. arXiv, 2020
4
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
• Opposite direction of applying a standard Transformer directly for graphs
• A pure Transformer architecture for graphs with token-wise embeddings composed of node identifiers
and type identifiers
• Node Identifiers: The first component of token-wise embedding is the orthonormal node identifier to
represent the connectivity structure given in the input graph
○ Given input graph G = (V, E), n node-wise orthonormal vectors P
• Type Identifiers: The trainable type identifier that encodes whether a token is a node or edge
5
Pure Transformers for Graph Learning
Node identifiers
• The node identifiers are only required to be orthonormal
○ Orthogonal random features (ORF), obtained by QR-decomposing a random Gaussian matrix
○ Laplacian eigenvectors (Lap), obtained by eigendecomposing the graph Laplacian matrix
• Laplacian eigenvectors are already widely used for graph PE
6
Pure Transformers for Graph Learning
Type identifiers
• The type identifiers are only required to be trainable
• Use two trainable vectores, one for all nodes and one for all edges
7
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
8
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
• Treat n nodes and m edges as (n+m) independent tokens
9
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
• Treat n nodes and m edges as (n+m) independent tokens
• Concat simple token-wise embeddings
10
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
• Treat n nodes and m edges as (n+m) independent tokens
• Concat simple token-wise embeddings
○ Trainable type identifiers (node?/edge?) + orthonormal node identifiers
11
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
• Treat n nodes and m edges as (n+m) independent tokens
• Concat simple token-wise embeddings
○ Trainable type identifiers (node?/edge?) + orthonormal node identifiers
12
Pure Transformers for Graph Learning
Tokenized Graph Transformer (TokenGT)
• Treat n nodes and m edges as (n+m) independent tokens
• Concat simple token-wise embeddings
○ Trainable type identifiers (node?/edge?) + orthonormal node identifiers
• Feed the (n+m) tokens to standard Transformer encoder
13
Pure Transformers for Graph Learning
How does this work?
• Compare the node identifiers of a pair of tokens reveals incidence info
• This allows self-attention to identify and exploit the graph structure
14
Pure Transformers for Graph Learning
How does this work?
• Compare the node identifiers of a pair of tokens reveals incidence info
• This allows self-attention to identify and exploit the graph structure
16
Experiments
17
Experiments
18
Experiments
19
Experiments
Self-attention distance visualization
Lower layers learn to attend locally, and deep layers learn to attend globally
TokenGT adaptively learns graph operation, unlike hard-coded GNNs
20
Conclusion
Pure Transformers (TokenGT) are powerful graph learners
• Minimal modification to Transformer architecture, theory, and codebase
• Theoretically more expressive than all MPNNs
• Empirically learns well from large-scale data
• Can adopt transformer-specific techniques like kernelization

More Related Content

Similar to 240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx

Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...
paper_reader
 

Similar to 240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx (20)

Community detection in graphs
Community detection in graphsCommunity detection in graphs
Community detection in graphs
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
Sparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptxSparse Graph Attention Networks 2021.pptx
Sparse Graph Attention Networks 2021.pptx
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)
 
A Generalization of Transformer Networks to Graphs.pptx
A Generalization of Transformer Networks to Graphs.pptxA Generalization of Transformer Networks to Graphs.pptx
A Generalization of Transformer Networks to Graphs.pptx
 
Graceful labelings
Graceful labelingsGraceful labelings
Graceful labelings
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 
Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...
 
Multiplex Networks: structure and dynamics
Multiplex Networks: structure and dynamicsMultiplex Networks: structure and dynamics
Multiplex Networks: structure and dynamics
 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Attention: Generalizing Trans...
NS-CUK Seminar: J.H.Lee,  Review on "Relational Attention: Generalizing Trans...NS-CUK Seminar: J.H.Lee,  Review on "Relational Attention: Generalizing Trans...
NS-CUK Seminar: J.H.Lee, Review on "Relational Attention: Generalizing Trans...
 

More from thanhdowork

More from thanhdowork (20)

[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
 
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
 
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
 
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
 
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
 
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
 
240506_Thanh_LabSeminar[ASG2Caption].pptx
240506_Thanh_LabSeminar[ASG2Caption].pptx240506_Thanh_LabSeminar[ASG2Caption].pptx
240506_Thanh_LabSeminar[ASG2Caption].pptx
 
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
 
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx
 

Recently uploaded

SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 

Recently uploaded (20)

An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Climbers and Creepers used in landscaping
Climbers and Creepers used in landscapingClimbers and Creepers used in landscaping
Climbers and Creepers used in landscaping
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 

240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx

  • 1. Pure Transformers are Powerful Graph Learners Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: osfa19730@catholic.ac.kr 2024/03/04 Jinwoo Kim et al. Advances in Neural Information Processing Systems, 2022
  • 2. 2 Introduction • Standard Transformers without graph-specific modifications can lead to promising results • Treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer • This approach is at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing GNN
  • 3. 3 Related Works • Multiple works tried combining self-attention in GNN architecture where message passing was previously dominant [50] • Global self-attention across nodes cannot reflect the graph structure ○ Restrict self-attention to local neighborhoods [69, 51, 19] ○ Use global self-attention in conjunction with message-passing GNN [58, 43, 34] ○ Inject edge information into global self-attention via attention bias [72, 78, 29, 54] • Issues with message-passing such as oversmoothing [40, 8, 52] • Incompatible with useful engineering techniques like linear attention [65] • [50] E. Min, R. Chen, Y. Bian, T. Xu, K. Zhao, W. Huang, P. Zhao, J. Huang, S. Ananiadou, and Y. Rong. Transformer for graphs: An overview from architecture perspective. arXiv, 2022 • [69] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018 • [51] D. Q. Nguyen, T. D. Nguyen, and D. Phung. Universal graph transformer self-attention networks. In WWW, 2022 • [19] V. P. Dwivedi and X. Bresson. A generalization of transformer networks to graphs. arXiv, 2020 • [58] Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang. Self-supervised graph transformer on large-scale molecular data. In NeurIPS, 2020 • [43] K. Lin, L. Wang, and Z. Liu. Mesh graphormer. In ICCV, 2021 • [34] J. Kim, S. Oh, and S. Hong. Transformers generalize deepsets and can be extended to graphs and hypergraphs. In NeurIPS, 2021 • [72] X. Wang, Z. Tu, L. Wang, and S. Shi. Self-attention with structural position representations. In EMNLP- IJCNLP, 2019 • [78] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and T. Liu. Do transformers really perform bad for graph representation? In NeurIPS, 2021 • [29] M. S. Hussain, M. J. Zaki, and D. Subramanian. Edge-augmented graph transformers: Global self-attention is enough for graphs. arXiv, 2021 • [54] W. Park, W. Chang, D. Lee, J. Kim, and S. won Hwang. Grpe: Relative positional encoding for graph transformer. arXiv, 2022 • [40] Q. Li, Z. Han, and X. Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI, 2018 • [8] C. Cai and Y. Wang. A note on over-smoothing for graph neural networks. arXiv, 2020 • [52] K. Oono and T. Suzuki. Graph neural networks exponentially lose expressive power for node classification. In ICLR, 2020 • [65] Y. Tay, M. Dehghani, D. Bahri, and D. Metzler. Efficient transformers: A survey. arXiv, 2020
  • 4. 4 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT) • Opposite direction of applying a standard Transformer directly for graphs • A pure Transformer architecture for graphs with token-wise embeddings composed of node identifiers and type identifiers • Node Identifiers: The first component of token-wise embedding is the orthonormal node identifier to represent the connectivity structure given in the input graph ○ Given input graph G = (V, E), n node-wise orthonormal vectors P • Type Identifiers: The trainable type identifier that encodes whether a token is a node or edge
  • 5. 5 Pure Transformers for Graph Learning Node identifiers • The node identifiers are only required to be orthonormal ○ Orthogonal random features (ORF), obtained by QR-decomposing a random Gaussian matrix ○ Laplacian eigenvectors (Lap), obtained by eigendecomposing the graph Laplacian matrix • Laplacian eigenvectors are already widely used for graph PE
  • 6. 6 Pure Transformers for Graph Learning Type identifiers • The type identifiers are only required to be trainable • Use two trainable vectores, one for all nodes and one for all edges
  • 7. 7 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT)
  • 8. 8 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT) • Treat n nodes and m edges as (n+m) independent tokens
  • 9. 9 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT) • Treat n nodes and m edges as (n+m) independent tokens • Concat simple token-wise embeddings
  • 10. 10 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT) • Treat n nodes and m edges as (n+m) independent tokens • Concat simple token-wise embeddings ○ Trainable type identifiers (node?/edge?) + orthonormal node identifiers
  • 11. 11 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT) • Treat n nodes and m edges as (n+m) independent tokens • Concat simple token-wise embeddings ○ Trainable type identifiers (node?/edge?) + orthonormal node identifiers
  • 12. 12 Pure Transformers for Graph Learning Tokenized Graph Transformer (TokenGT) • Treat n nodes and m edges as (n+m) independent tokens • Concat simple token-wise embeddings ○ Trainable type identifiers (node?/edge?) + orthonormal node identifiers • Feed the (n+m) tokens to standard Transformer encoder
  • 13. 13 Pure Transformers for Graph Learning How does this work? • Compare the node identifiers of a pair of tokens reveals incidence info • This allows self-attention to identify and exploit the graph structure
  • 14. 14 Pure Transformers for Graph Learning How does this work? • Compare the node identifiers of a pair of tokens reveals incidence info • This allows self-attention to identify and exploit the graph structure
  • 18. 19 Experiments Self-attention distance visualization Lower layers learn to attend locally, and deep layers learn to attend globally TokenGT adaptively learns graph operation, unlike hard-coded GNNs
  • 19. 20 Conclusion Pure Transformers (TokenGT) are powerful graph learners • Minimal modification to Transformer architecture, theory, and codebase • Theoretically more expressive than all MPNNs • Empirically learns well from large-scale data • Can adopt transformer-specific techniques like kernelization