SlideShare a Scribd company logo
1 of 13
2023. 03. 31
Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
1
Contributions
 main contribution:
 a new design of the self-attention formula for molecular graphs
that carefully handles various input features to obtain
increased accuracy and robustness in numerous chemical
domains.
 Tackle the aforementioned issues with Relative Molecule Attention
Transformer (RMAT), our pre-trained transformer-based model,
shown in Figure 1.
2
Relative Molecule Self-Attention layer
 (a) neighbourhood embedding one-hot encodes graph distances (neighbourhood order)
from the source node marked with an arrow;
 (b) bond embedding one-hot encodes the bond order (numbers next to the graph
edges) and other bond features for neighbouring nodes;
 (c) distance embedding uses radial basis functions to encode pairwise distances in the 3D
space
3
RELATIVE MOLECULE SELF-ATTENTION
 Molecule Attention Transformer Our work is closely related to Molecule Attention
Transformer (MAT), a transformer-based model with self-attention tailored to processing
molecular data.
 In contrast to aforementioned approaches, MAT incorporates the distance information in
its self-attention module.
 MAT stacks N Molecule Self-Attention blocks followed by a mean pooling and a
prediction layer. For a D-dimensional state x ∈ R^D, the standard.
 Vanilla self-attention operation is defined as:
weights given to individual parts of the attention module
4
Neighbourhood embeddings
 Encode the neighbourhood order between two atoms as a 6 dimensional one hot
encoding
 How many other vertices are between nodes i and j in the original molecular graph
5
Distance embeddings
 Hypothesis: a much more flexible representation of the distance information should be
facilitated in MAT.
 To achieve this, a radial basis distance encoding proposed by:
(d is the distance between two atoms).
6
RELATIVE MOLECULE SELF-ATTENTION
 A hidden layer, shared between all attention heads and output layer, that create a
separate relative embedding for different attention heads.
 in Relative Molecule Self-Attention: e_{ij} as:
if there exists a bo
nd between atoms
i and j
7
RELATIVE MOLECULE ATTENTION TRANSFORMER
 Replace simple mean pooling with an attention-based pooling layer.
 After applying N self-attention layers, we use the following self-attention pooling to get
the graph-level embedding of the molecule:
 Where:
 H is the hidden state obtained from self-attention layers,
 P equal to the pooling hidden dimension and S equal to the number of pooling
attention heads.
 g is then passed to the two layer MLP, with leaky-ReLU activation in order to make
the prediction
8
Pretraining
 two-step pretraining procedure.
 network is trained with the contextual property prediction task where we mask not
only selected atoms, but also their neighbours.
 The goal of the task is to predict the whole atom context. This task is much more
demanding for the network than the classical masking approach presented by
 The second task is a graph-level prediction to predict a set of real-valued descriptors
of properties.
9
Results on molecule property prediction benchmark
 First two datasets are regression tasks (lower is better)
 other datasets are classification tasks (higher is better)
10
LARGE-SCALE EXPERIMENTS
 Models are fine-tuned under a large hyperparameters budget.
 Models fine-tuned with only tuning the learning rate are presented in the last group.
 The last dataset is classification task (higher is better), the remaining datasets are
regression tasks (lower is better)
11
Inconclusion
 Relative Molecule Attention Transformer, a model based on Relative Molecule
SelfAttention, achieves state-of-the-art or very competitive results across a wide range of
molecular property prediction tasks.
 R-MAT is a highly versatile model, showing state-of-the-art results in both quantum
property prediction tasks
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022

More Related Content

Similar to NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022

AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
MultiLevelROM2_Washinton
MultiLevelROM2_WashintonMultiLevelROM2_Washinton
MultiLevelROM2_Washinton
Mohammad Abdo
 
Finite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIFinite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUI
Colby White
 

Similar to NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022 (20)

Development of Multi-level Reduced Order MOdeling Methodology
Development of Multi-level Reduced Order MOdeling MethodologyDevelopment of Multi-level Reduced Order MOdeling Methodology
Development of Multi-level Reduced Order MOdeling Methodology
 
Em molnar2015
Em molnar2015Em molnar2015
Em molnar2015
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting AlgorithmMacromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
 
MultiLevelROM2_Washinton
MultiLevelROM2_WashintonMultiLevelROM2_Washinton
MultiLevelROM2_Washinton
 
International Journal of Advanced Smart Sensor Network Systems (IJASSN)
International Journal of Advanced Smart Sensor Network Systems (IJASSN)International Journal of Advanced Smart Sensor Network Systems (IJASSN)
International Journal of Advanced Smart Sensor Network Systems (IJASSN)
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
 
Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...
Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...
Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Theories of error back propagation in the brain review
Theories of error back propagation in the brain reviewTheories of error back propagation in the brain review
Theories of error back propagation in the brain review
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile SystemsProximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
 
Bistablecamnets
BistablecamnetsBistablecamnets
Bistablecamnets
 
Finite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIFinite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUI
 
2006 ssiai
2006 ssiai2006 ssiai
2006 ssiai
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Molecular dynamics and namd simulation
Molecular dynamics and namd simulationMolecular dynamics and namd simulation
Molecular dynamics and namd simulation
 
Using spectral radius ratio for node degree
Using spectral radius ratio for node degreeUsing spectral radius ratio for node degree
Using spectral radius ratio for node degree
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 

More from ssuser4b1f48

More from ssuser4b1f48 (20)

NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
 
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
 
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
 
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
 
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
 
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
 
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
 
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022

  • 1. 2023. 03. 31 Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: hoangvanthuy90@gmail.com
  • 2. 1 Contributions  main contribution:  a new design of the self-attention formula for molecular graphs that carefully handles various input features to obtain increased accuracy and robustness in numerous chemical domains.  Tackle the aforementioned issues with Relative Molecule Attention Transformer (RMAT), our pre-trained transformer-based model, shown in Figure 1.
  • 3. 2 Relative Molecule Self-Attention layer  (a) neighbourhood embedding one-hot encodes graph distances (neighbourhood order) from the source node marked with an arrow;  (b) bond embedding one-hot encodes the bond order (numbers next to the graph edges) and other bond features for neighbouring nodes;  (c) distance embedding uses radial basis functions to encode pairwise distances in the 3D space
  • 4. 3 RELATIVE MOLECULE SELF-ATTENTION  Molecule Attention Transformer Our work is closely related to Molecule Attention Transformer (MAT), a transformer-based model with self-attention tailored to processing molecular data.  In contrast to aforementioned approaches, MAT incorporates the distance information in its self-attention module.  MAT stacks N Molecule Self-Attention blocks followed by a mean pooling and a prediction layer. For a D-dimensional state x ∈ R^D, the standard.  Vanilla self-attention operation is defined as: weights given to individual parts of the attention module
  • 5. 4 Neighbourhood embeddings  Encode the neighbourhood order between two atoms as a 6 dimensional one hot encoding  How many other vertices are between nodes i and j in the original molecular graph
  • 6. 5 Distance embeddings  Hypothesis: a much more flexible representation of the distance information should be facilitated in MAT.  To achieve this, a radial basis distance encoding proposed by: (d is the distance between two atoms).
  • 7. 6 RELATIVE MOLECULE SELF-ATTENTION  A hidden layer, shared between all attention heads and output layer, that create a separate relative embedding for different attention heads.  in Relative Molecule Self-Attention: e_{ij} as: if there exists a bo nd between atoms i and j
  • 8. 7 RELATIVE MOLECULE ATTENTION TRANSFORMER  Replace simple mean pooling with an attention-based pooling layer.  After applying N self-attention layers, we use the following self-attention pooling to get the graph-level embedding of the molecule:  Where:  H is the hidden state obtained from self-attention layers,  P equal to the pooling hidden dimension and S equal to the number of pooling attention heads.  g is then passed to the two layer MLP, with leaky-ReLU activation in order to make the prediction
  • 9. 8 Pretraining  two-step pretraining procedure.  network is trained with the contextual property prediction task where we mask not only selected atoms, but also their neighbours.  The goal of the task is to predict the whole atom context. This task is much more demanding for the network than the classical masking approach presented by  The second task is a graph-level prediction to predict a set of real-valued descriptors of properties.
  • 10. 9 Results on molecule property prediction benchmark  First two datasets are regression tasks (lower is better)  other datasets are classification tasks (higher is better)
  • 11. 10 LARGE-SCALE EXPERIMENTS  Models are fine-tuned under a large hyperparameters budget.  Models fine-tuned with only tuning the learning rate are presented in the last group.  The last dataset is classification task (higher is better), the remaining datasets are regression tasks (lower is better)
  • 12. 11 Inconclusion  Relative Molecule Attention Transformer, a model based on Relative Molecule SelfAttention, achieves state-of-the-art or very competitive results across a wide range of molecular property prediction tasks.  R-MAT is a highly versatile model, showing state-of-the-art results in both quantum property prediction tasks