240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning without Augmentations].pptx
1. Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
2024-04-15
2. 2
BACKGROUND: Graph Convolutional Networks (GCNs)
• Generate node embeddings based on local network neighborhoods
• Nodes have embeddings at each layer, repeating combine messages
from their neighbor using neural networks
3. 3
BACKGROUND: Representation Learning on Graphs
• Goal: efficient feature learning for machine learning on graphs
• Low-dimensional node embeddings encode both structural and attributive information.
4. 4
BACKGROUND: Self-supervised learning comes to rescue
• Most GNN models are established in a supervised manner.
• It is often expensive to obtain high-quality labels at scale in real world.
• Supervised models learn the inductive bias encoded in labels, instead of reusable,
task-invariant knowledge.
• Self-supervised methods employ proxy tasks to guide learning the representations.
• The proxy task is designed to predict any part of the input from any other observed
part.
• Typical proxy tasks for visual data include corrupted image restoration, rotation
angle prediction, reorganization of shuffled patches, etc.
5. 5
BACKGROUND: Taxonomy of Self-Supervised Learning
• Generative/predictive: loss measured in the output space
• Contrastive: loss measured in the latent space
6. 6
BACKGROUND: The Contrastive Learning Paradigm
• Contrastive learning aims to maximize the agreement of latent representations under
stochastic data augmentation.
• Three main components:
• Data augmentation pipeline
• Encoder and representation extractor
• Contrastive objective
7. 7
BACKGROUND: Contrastive Learning Objectives
• Usually implemented with an n-way softmax function:
• Commonly referred to as the InfoNCE loss.
• The critic function can be simply implemented as
• Distinguish a pair of representations from two augmentations of the same sample
(positives) apart from (n – 1) pairs of representations from different samples (negatives).
8. 8
Problems
• The key motivation behind is the explicit homophily assumption that connected nodes
belong to the same class and, thus, should be treated as positive pairs in contrastive
learning.
• (a) The heterophilic graph where the color denotes node’s semantic class.
• (b) Contrastive objectives with the homophily assumption encourage one-hop
neighbors to have similar representations.
• GraphACL simply encourages the node to predict its neighbors, which can implicitly
capture neighborhood context (c) two-hop monophily (d).
9. 9
Simple Asymmetric Contrastive Learning of Graphs
• The key idea behind GraphACL is encouraging the encoder to learn representations by
simultaneously capturing one-hop neighborhood context and two-hop monophily,
which generalizes the homophily assumption for modeling both homophilic and
heterophilic graph
• GraphACL introduces an additional predict 𝑔𝜙
10. 10
Simple Asymmetric Contrastive Learning of Graphs
• A natural idea of capturing the neighborhood signal is learning the representations of v
that can well predict the original features of v’s neighbors
• in this case, each node is treated as a specific neighbor “context” t”, and nodes with
similar distributions over the neighbor "context" are assumed to be similar
• Can capture the one-hop neighborhood context without relying on the homophily
assumption or requiring graph augmentation.
• Intuitively, by enforcing identity representations of two-hop neighbors to reconstruct
the same context representation of the same central nodes, GraphACL implicitly makes
representations of two-hop neighbors similar and captures the one-hop neighborhood
context
11. 11
Simple Asymmetric Contrastive Learning of Graphs
• Although this simple neighborhood prediction objective can capture both one-hop
neighborhood pattern and two-hop monophily, it may result in a collapsed and trivial
encoder: all node representations degenerate to a the same single vector on the
hypersphere.
• The main reason is that the prediction loss operates in a fully flexible latent space, and
it can be minimized when the encoder produces a constant representation for all
nodes.
• push all node representations away from each other and alleviate the representation
collapse issue.
12. 12
Graph Asymmetric Contrastive Loss
• a total loss function:
• To address this issue, we instead minimize an upper bound of LCOM, which results in
the following simple objective of GraphACL
13. 13
Experimental Setting
• heterophilic graphs:
• Wisconsin, Cornell, Texas [30], Actor, Squirrel, Crocodile, Chameleon
• two large heterophilic graphs proposed recently: Roman-empire (Roman) and arXiv-
year
• homophilic graphs:
• Cora, Citeseer and Pubmed, Computer and Photo, Ogbn-Arxiv (Arxiv)
15. 15
Ablation Study
• The effect of representation dimension, and the pair-wise similarities of randomly
sampled node pairs, one-hop and two-hop neighbors.
16. 16
Conclusions
• a simple contrastive learning framework named GraphACL for homophilic and
heterophilic graphs.
• The key idea of GraphACL is to capture both a local neighborhood context of one hop
and a monophily similarity of two hops in one single objective.
• a theoretical understanding of GraphACL
• GraphACL also implicitly aligns the two-hop neighbors and enjoys a good downstream
performance.