240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

•Download as PPTX, PDF•

0 likes•6 views

thanhdowork

Simplifying and Empowering Transformers for Large-Graph Representations

Education

Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
2024-04-08

2
BACKGROUND: Message Passing GNNs vs Graph Transf
ormers
• Generate node embeddings based on local network neighborhoods
• Nodes have embeddings at each layer, repeating combine messages
from their neighbor using neural networks

3
Message Passing GNNs vs Graph Transformers
• a node’s update is a function over its neighbors, in GTs, a node’s update is a function
of all nodes in a graph (thanks to the self-attention mechanism in the Transformer layer).

4
Graph Transformers: Challenges
• How to build GT for large-graph representations:
• The quadratic global attentions hinder the scalability for large graphs
• Over-fitting problem

5
Deep attention layers
• Do we need many attention layers?
• Other Transformers often require multiple attention layers for desired capacity

6
The power of 1-layer attention
• mini-batch sampling that randomly partitions the input graph into mini-batches with
smaller sizes.
• will be fed into the SGFormer model that is implemented with a one-layer global
attention and a GNN network

7
Simple Global Attention
• A single-layer global attention is sufficient.
• This is because through one-layer propagation over a densely connected attention
graph, the information of each node can be adaptively propagated to arbitrary nodes
within the batch.
• The computation of Eq. (3) can be achieved in O(N) time complexity, which is much
more efficient than the Softmax attention in original Transformers

8
Incorporation of Structural Information
• A simple-yet-effective scheme that combines Z with the propagated embeddings by
GNNs at the output layer:

9
Empirical Evaluation
• Datasets:
• Cora CiteSeer PubMed Actor Squirrel Chameleon Deezer

10
Empirical Evaluation
• node property prediction benchmarks

11
Empirical Evaluation
• Scalability test of training time per epoch
• Amazon2M dataset and randomly sample a subset of nodes with the node number
ranging from 10K to 100K.

12
SUMMARY
• The potential of simple Transformer-style architectures for learning large-graph
representations where the scalability challenge plays a bottleneck
• A one-layer attention model combined with a vanilla GCN can surprisingly produce
highly competitive performance.
• Challenge of out-of-distribution learning

Similar to 240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee

Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71

Chapter 4 better.pptxAbanobZakaria1

NS-CUK Joint Journal Club: V.T.Hoang, Review on "NAGphormer: A Tokenized Grap...ssuser4b1f48

Cvpr 2018 papers review (efficient computing)DonghyunKang12

ConvNeXt: A ConvNet for the 2020s explainedSushant Gautam

lec6a.pptSaadMemon23

Handwritten Digit Recognition and performance of various modelsation[autosaved]SubhradeepMaji

Image Segmentation Using Deep Learning : A surveyNUPUR YADAV

Deep Learning in Low Power DevicesLokesh Vadlamudi

PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee

NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...ssuser4b1f48

[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork

201907 AutoML and Neural Architecture SearchDaeJin Kim

04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386

Simulation of Heterogeneous Cloud InfrastructuresCloudLightning

Deep LearningPierre de Lacaze

A Generalization of Transformer Networks to Graphs.pptxssuser2624f71

240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...thanhdowork

Similar to 240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx (20)

PR-183: MixNet: Mixed Depthwise Convolutional Kernels

Introduction to CNN Models: DenseNet & MobileNet

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx

Chapter 4 better.pptx

NS-CUK Joint Journal Club: V.T.Hoang, Review on "NAGphormer: A Tokenized Grap...

Cvpr 2018 papers review (efficient computing)

ConvNeXt: A ConvNet for the 2020s explained

lec6a.ppt

Handwritten Digit Recognition and performance of various modelsation[autosaved]

Image Segmentation Using Deep Learning : A survey

Deep Learning in Low Power Devices

PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...

NS-CUK Seminar: S.T.Nguyen, Review on "DeepGCNs: Can GCNs Go as Deep as CNNs?...

[20240422_LabSeminar_Huy]Taming_Effect.pptx

201907 AutoML and Neural Architecture Search

04 Deep CNN (Ch_01 to Ch_3).pptx

Simulation of Heterogeneous Cloud Infrastructures

Deep Learning

A Generalization of Transformer Networks to Graphs.pptx

240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...

Recently uploaded

Basic Civil Engineering notes on Transportation Engineering & Modes of TransportDenish Jangid

TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection

Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite

diagnosting testing bsc 2nd sem.pptx....Ritu480198

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中央社

Personalisation of Education by AI and Big Data - Lourdes GuàrdiaEADTU

OS-operating systems- ch05 (CPU Scheduling) ...Dr. Mazin Mohamed alkathiri

The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy

Including Mental Health Support in Project Delivery, 14 May.pdfAssociation for Project Management

Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993

e-Sealing at EADTU by Kamakshi RajagopalEADTU

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU

8 Tips for Effective Working Capital ManagementMBA Assignment Experts

Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD

How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George

SPLICE Working Group:Reusable Code ExamplesPeter Brusilovsky

Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox

MOOD STABLIZERS DRUGS.pptxPoojaSen20

ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection

Trauma-Informed Leadership - Five Practical PrinciplesPooky Knightsmith

Recently uploaded (20)

Basic Civil Engineering notes on Transportation Engineering & Modes of Transport

TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...

Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships

diagnosting testing bsc 2nd sem.pptx....

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文

Personalisation of Education by AI and Big Data - Lourdes Guàrdia

OS-operating systems- ch05 (CPU Scheduling) ...

The Story of Village Palampur Class 9 Free Study Material PDF

Including Mental Health Support in Project Delivery, 14 May.pdf

Major project report on Tata Motors and its marketing strategies

e-Sealing at EADTU by Kamakshi Rajagopal

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...

8 Tips for Effective Working Capital Management

Andreas Schleicher presents at the launch of What does child empowerment mean...

How to Send Pro Forma Invoice to Your Customers in Odoo 17

SPLICE Working Group:Reusable Code Examples

Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room

MOOD STABLIZERS DRUGS.pptx

ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...

Trauma-Informed Leadership - Five Practical Principles

240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

1. Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: hoangvanthuy90@gmail.com 2024-04-08

2. 2 BACKGROUND: Message Passing GNNs vs Graph Transf ormers • Generate node embeddings based on local network neighborhoods • Nodes have embeddings at each layer, repeating combine messages from their neighbor using neural networks

3. 3 Message Passing GNNs vs Graph Transformers • a node’s update is a function over its neighbors, in GTs, a node’s update is a function of all nodes in a graph (thanks to the self-attention mechanism in the Transformer layer).

4. 4 Graph Transformers: Challenges • How to build GT for large-graph representations: • The quadratic global attentions hinder the scalability for large graphs • Over-fitting problem

5. 5 Deep attention layers • Do we need many attention layers? • Other Transformers often require multiple attention layers for desired capacity

6. 6 The power of 1-layer attention • mini-batch sampling that randomly partitions the input graph into mini-batches with smaller sizes. • will be fed into the SGFormer model that is implemented with a one-layer global attention and a GNN network

7. 7 Simple Global Attention • A single-layer global attention is sufficient. • This is because through one-layer propagation over a densely connected attention graph, the information of each node can be adaptively propagated to arbitrary nodes within the batch. • The computation of Eq. (3) can be achieved in O(N) time complexity, which is much more efficient than the Softmax attention in original Transformers

8. 8 Incorporation of Structural Information • A simple-yet-effective scheme that combines Z with the propagated embeddings by GNNs at the output layer:

9. 9 Empirical Evaluation • Datasets: • Cora CiteSeer PubMed Actor Squirrel Chameleon Deezer

10. 10 Empirical Evaluation • node property prediction benchmarks

11. 11 Empirical Evaluation • Scalability test of training time per epoch • Amazon2M dataset and randomly sample a subset of nodes with the node number ranging from 10K to 100K.

12. 12 SUMMARY • The potential of simple Transformer-style architectures for learning large-graph representations where the scalability challenge plays a bottleneck • A one-layer attention model combined with a vanilla GCN can surprisingly produce highly competitive performance. • Challenge of out-of-distribution learning

240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

Recommended

Recommended

More Related Content

Similar to 240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx

Similar to 240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx (20)

More from thanhdowork

More from thanhdowork (20)

Recently uploaded

Recently uploaded (20)

240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Graph Representations].pptx