NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS 2022

Van Thuy Hoang
Dept. of Artificial Intelligence,
The Catholic University of Korea
hoangvanthuy90@gmail.com
Vijay Prakash Dwivedi et. al., NeurIPS 2022

2
 Long Range Information Bottleneck
 Existing Graph Benchmarks
 Characterizing Long Range Graph Datasets
 Proposed Datasets and Tasks
 Experiments and Questions

3
Long Range Information Bottleneck
 A MP-GNN layer aggregates information from its 1-hop neighbors to update a
node’s feature representation.
 While MP-GNNs have several limitations, the so called Information Bottleneck,
that particularly impacts long range interactions.

4
Long Range Information Bottleneck
 If long-range information from an L-hop neighbor is needed for a task, L
number of layers are ideally required to be stacked.
 With increasing L, the L-hop neighborhood grows exponentially and so does
the amount of information that needs to be encoded into one vector, see the
adjacent pictorial representation.
 This brings a serious bottleneck in MP-GNNs when dealing with tasks that
requires long range information propagation.

5
Existing Graph Benchmarks
 Many of the existing graph learning benchmarks consist of prediction tasks
that primarily rely on local structural information rather than distant information
propagation to compute a target label or metric.
 This can be observed in datasets such as ZINC, ogbg-molhiv and ogbg-
molpcba where models that rely significantly on encoding local (or, near-local)
structural information continue to be among leaderboard toppers.

6
Existing Graph Benchmarks
 This can be observed in datasets such as ZINC, ogbg-molhiv and ogbg-
molpcba where models that rely significantly on encoding local (or, near-local)
structural information continue to be among leaderboard toppers.
 E.g.,
 Zin dataset: 23 ndoes
 OGBG-Molhiv: 25 nodes
 ….
 Dianeter of graphs: ~30 in Enzymes, proteins datasets

7
Characterizing Long Range Graph Datasets
 Graph size:
 The number of nodes in a graph is an important characteristic to determine
a long range graph dataset.
 A term called problem radius which refers to the required range of
interaction between nodes for a particular problem.
 The problem radius must be sufficiently large for a graph dataset if it
serves as a long range benchmark.

8
Characterizing Long Range Graph Datasets
 Although for real world graph datasets, the problem radius may not be exactly
quantified, this hypothetical metric would effectively be smaller if a graph has
smaller number of nodes.
 Therefore, the (average) graph size of a dataset is a key property for
determine whether it could be a potential long range graph dataset

9
Nature of task
 The nature of task can be understood to be directly related to the problem
radius.
 In broad sense, the task can be either short-range, i.e., requiring information
exchange among nodes in local or near-local neighborhood, or long-range,
where interactions are required far away from the near-local neighborhood.
 For example,
 in ZINC molecular regression dataset, the task is associated with counting
local structures and experimental revelations using a substructure-
counting based model by Bouritsas et al., 2020 has shown ZINC’s task
would optimally require counts of 7-length substructures.
 ZINC’s regression task may thus be interpreted as a short-range task.

10
Contribution of global graph structure to task
 A dataset where the learning task benefits from global structural information
can be a potential long range graph dataset.
 Sample representation of 3D atomic contact between distant nodes.

11
Proposed LRGB Datasets and Tasks
 Consider the characteristics described above to propose a collection of 5
graph learning datasets that can be used to prototype GNNs or Transformers
with long range modeling capabilities.
 The table below for an overview of the datasets’ statistics:

12
 PascalVOC-SP and COCO-SP:
 These are superpixel graphs based on Pascal VOC 2011 and MS COCO
image datasets respectively.
 The learning task in both the datasets is node classification where each
node corresponds to a region of the image belonging to a particular class,
with respect to the original semantic segmentation labels in the respective
image datasets.
 PCQM-Contact:
 This dataset is based on a subset of PCQM4M dataset from OGB-LSC
where each graph corresponds to a molecular graph with explicit
hydrogens and the task, link prediction, is to predict pairs of distant nodes
that will be contacting with each other in the 3D space with a pre-defined
threshold.

13
 Sample visualization of a peptide (Left) and its molecular graph (Right).

14
Experiments
 Experiments using two major architecture class from the graph learning
literature:
 local MP-GNNs
 fully connected Graph Transformers, to establish benchmarking trends
and understand more about the datasets while also charting out some
probable challenges that requires further research.

15
 Q1: Is a local feature aggregation, modeled using MP-GNNs with fewer layers,
enough for the proposed tasks in LRGB?
 Simple local MP-GNN instances perform poorly due to increased effect of
over-squashing.
 Q2: Do we observe a visible separation in performance of models with
enhanced capability to capture long range interactions when compared
against local MP-GNNs on the proposed benchmark?
 The baseline Transformers appeared slower to fit on COCO-SP on which
the recent GraphGPS architecture, that can model long range
dependencies, significantly outperforms MP-GNNs.
 Q3: What are the challenges and future discoveries that can be facilitated by
the new benchmark?
 There are different challenges revealed from our benchmarking
experiments that can be pursued for further investigation while using the
proposed datasets.

NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS 2022

NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS 2022

Recommended

Recommended

More Related Content

Similar to NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS 2022

Similar to NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS 2022 (20)

More from ssuser4b1f48

More from ssuser4b1f48 (20)

Recently uploaded

Recently uploaded (20)

NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS 2022