SlideShare a Scribd company logo
1 of 28
Download to read offline
Graph Representation Learning:
Theories and Applications
Louis Wang
Apr 22nd 2020
1
Agenda
- Graph Representation Learning
- Shallow Methods
- Deepwalk[Bryan, 2014 KDD]
- Node2Vec[A Grover, 2016 KDD]
- Deep Methods: GNN
- Graph Convolutional Networks [Kipf., ICLR 2017](GCN)
- GraphSAGE [WL, 2017 NIPS]
- Graph Attention Networks[Veličković, ICLR 2018] (GAT)
- Applications
- Pins recommendation with PinSage by Pinterest
- Dish Recommendation on Uber Eats with GCN
2
Graph Data are everywhere
3
Graph Representation Learning
Node Embedding Graph Embedding
Representation learning is learning representations of input data typically by transforming it or
extracting features from it(by some means), that makes it easier to perform a task like classification
or prediction. [Yoshua Bengio 2014]
Embedding is ALL you need:
word2vec, doc2vec, node2vec, item2vec, struc2vec…
4
Tasks on Graph
Node Classification
- Predict a type of given node.
Edge Classification/Link Prediction
- Predict whether two nodes are linked
or the type of the link.
Graph Classification
- Identify densely linked clusters of nodes
Network Similarity
- How similar are two (sub)network
5
Goal: Encode (f) nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in
the original network
1. Define an Encoder
2. Define a similarity function
3. Optimization
Node Embedding
6
Shallow Encoding —— an embedding-lookup table
ENC(v)=Zv
Methods: DeepWalk[Perozzi et al. 2014 KDD], Node2vec[Grover et al. 2016 KDD], etc.
7
Shallow Methods Framework
generate ‘sentences’
unbiased walk:
DeepWalk
biased walk:
Node2vec
(different walk strategies)
Idea: Optimize the node embeddings so that nodes have similar embeddings if they tend to co-occur on
short random walks over the graph.
8
DeepWalk
1. Run Short fixed-length random walks starting from each node on the graph using some strategy R
2. For each node u collection N(u), the multiset of nodes visited on random walks starting from u
3. Optimize embeddings according to : Given nodes u, predict its neighbors N(u).
9
DeepWalk Optimization
The loss function is kind of slow because…
1. nested sum give o|V^2| complexity
2. normalization term from the softmax function
Solution: Negative sampling
• Use k negative nodes proportional to
degree instead of all nodes..!
• k should be a balance between
predictive accuracy and computational
efficiency.
10
Node2Vec —— Let’s generate biased walks
Idea: Flexible notion of network neighborhood of node leads to rich node embeddings.
11
Two parameters:
• return parameter p:
• return back to the previous node
• ‘walk away’ parameter q:
• moving outwards(DFS) vs. inwards (BFS)
• intuitively, q is the ratio of BFS vs DFS
Node2Vec —— Explore neighborhoods in a BFS as well as DFS fashion.
The walker just traversed edge (s1,w) and is now at w.
Neighbors of w can only be:
- s2: same distance to s1.
- s1: back to s1
- s3/s4: farther from s1
12
Limitations of Shallow Encoders
• o(|V|) parameters are needed:
• Each node has a unique embedding.
• No sharing of parameters between nodes.
• Inherently “transductive”:
• Either not possible or very time consuming to generate
embeddings for nodes not seen during training.
• Does not incorporate node features
• many graphs have features that we can and should leverage
13
Graph Convolutional Networks
Idea:
Node’s neighborhood defines a computation graph.
To obtain node representations, use a NN to aggregate information from neighbors recursively by limited BFS.
14
Graph Convolutional Networks
• Each layer is one level of depth in the BFS
• Nodes have embeddings at each layer.
• Layer-0 embedding of node u is its input
feature.
• Layer-K embedding gets information from
NN - final embeddings.
So we need…
1. AGG: Aggregator for collecting information
from node’s neighborhood.
2. NNs: Neural network for neighborhood
representation(eg. NN W1) and node’s self
embedding (eg. NN B1)
3. Loss Function for optimization
15
Mathematically…
16
Supervised Training vs Unsupervised Training
For the shallow methods, we train the models in an unsupervised manner:
• use only the graph structure
• similar nodes have similar embeddings
• feed the ‘sentences’ into skipgram model.
For GCN, we directly train the model for a supervised task, like node classification.
We can feed the embeddings into any loss function and run stochastic gradient descent to train the parameters.
17
Inductive capability
1. In many real applications new nodes are often added to the graph.
Needed to generate embeddings for new nodes without retraining.
Hard to do with shallow methods.
2. The same aggregation parameters are shared for all nodes. The number of model parameters is sublinear in |V|
and generalize to unseen nodes
18
GraphSAGE —— Graph SAmple and aggreGatE
GCN just aggregated the neighbor messages by taking the weighted average. How to do better?
Idea: Generalize the aggregation methods to its neighbors and concatenate the features with itself.
19
Neighborhood Aggregator
Mean: Take a weighted average of its neighbors
Pooling: element-wise mean or max pooling.
LSTM: Apply LSTM to reshuffled neighbors
20
Recap for GCN, GraphSAGE
Key Idea: Generate node embeddings based on local neighborhoods using neural networks
Graph Convolutional Network:
• Average neighborhood information and stack neural network
GraphSAGE:
• Generalized neighborhood aggregation (AVG, POOLING, LSTM, etc.)
21
Graph Attention Network —— Learnable Aggregator for GCN
Idea: Borrow the idea of attention mechanisms and learn to assign different weights to different
neighbors in the aggregation process.
Attention Is All You Need [A Vaswani, 2017 NIPS]
22
Graph Attention Network —— Learnable Aggregator for GCN
a is the attention mechanism function
euv indicates the importance of node u’s message to node v
!uv is the normalized coefficients using softmax function
Compute embedding of each node in the graph following an attention strategy.
• Nodes attend over their neighborhoods’ messages
• Implicitly specifgying different weights to different nodes in a neighborhood
23
Attention Mechanism
Attention mechanism a:
The approach is agnostic to the choice of a
• The original paper use a simple single-layer neural network
• Multi-head attention can stabilize the learning process of attention mechanism
• a can have parameters, which needs to be estimated
Parameters of a are trained jointly:
• learn the parameters together with weight matrices in an end-to-end fashion
Benefits:
• Computationally efficient:
computation of attentional coefficients can be parallelized across all edges of the graph
aggregation may be parallelized across all nodes
• Storage efficient:
sparse matrix operations do not require more than O(V+E) entries to be stored
Fixed number of parameters, irrespective of graph size
• Trivially localized:
only attends over local network neighborhoods (masked model).
• Inductive capability:
it is a shared edge-wise mechanism
it does not depend on the global graph structure.
24
Applications ——Pinsage
Challenge for Pinterest:
Scaling up GCN-based node embedding in training and inference is difficult:
300M+ users, 4+B pins and 2+B boards.
Innovations:
• Importance-based neighborhoods sampling strategy by simulating random walks and selecting neighbors
with highest visit counts. (importance pooling)
• selecting a fixed number of nodes to aggregate from allows to control the memory footprint of the
algorithm during training.
25
26
Applications ——Uber Eats
Applications ——Uber Eats
Innovations:
max-margin loss: customized loss function when training GraphSAGE. Good for weighted graph.
27
28
Applications ——Uber Eats

More Related Content

What's hot

Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural NetworksLucaCrociani1
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practicetuxette
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphsDeakin University
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Graph Neural Network (한국어)
Graph Neural Network (한국어)Graph Neural Network (한국어)
Graph Neural Network (한국어)Jungwon Kim
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부seungwoo kim
 
GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptxMAHMOUD729246
 

What's hot (20)

Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
 
Resnet
ResnetResnet
Resnet
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Graph Neural Network (한국어)
Graph Neural Network (한국어)Graph Neural Network (한국어)
Graph Neural Network (한국어)
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Graph Neural Network 1부
Graph Neural Network 1부Graph Neural Network 1부
Graph Neural Network 1부
 
GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptx
 

Similar to Gnn overview

node2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxnode2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxssuser2624f71
 
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...ssuser4b1f48
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...thanhdowork
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...thanhdowork
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernelsivaderivader
 
LINE: Large-scale Information Network Embedding.pptx
LINE: Large-scale Information Network Embedding.pptxLINE: Large-scale Information Network Embedding.pptx
LINE: Large-scale Information Network Embedding.pptxssuser2624f71
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationVijaylaxmiNagurkar
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork
 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020RohanLekhwani
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsPrincy Joy
 
Ire presentation
Ire presentationIre presentation
Ire presentationRaj Patel
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)seungwoo kim
 

Similar to Gnn overview (20)

node2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptxnode2vec: Scalable Feature Learning for Networks.pptx
node2vec: Scalable Feature Learning for Networks.pptx
 
Chapter 4 better.pptx
Chapter 4 better.pptxChapter 4 better.pptx
Chapter 4 better.pptx
 
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Colloquium.pptx
Colloquium.pptxColloquium.pptx
Colloquium.pptx
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
 
Sun_MAPL_GNN.pptx
Sun_MAPL_GNN.pptxSun_MAPL_GNN.pptx
Sun_MAPL_GNN.pptx
 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
 
LINE: Large-scale Information Network Embedding.pptx
LINE: Large-scale Information Network Embedding.pptxLINE: Large-scale Information Network Embedding.pptx
LINE: Large-scale Information Network Embedding.pptx
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
NS-CUK Seminar: S.T.Nguyen, Review on "On Generalized Degree Fairness in Grap...
NS-CUK Seminar: S.T.Nguyen, Review on "On Generalized Degree Fairness in Grap...NS-CUK Seminar: S.T.Nguyen, Review on "On Generalized Degree Fairness in Grap...
NS-CUK Seminar: S.T.Nguyen, Review on "On Generalized Degree Fairness in Grap...
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 

Recently uploaded

Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxnuruddin69
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfsmsksolar
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stageAbc194748
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 

Recently uploaded (20)

Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 

Gnn overview

  • 1. Graph Representation Learning: Theories and Applications Louis Wang Apr 22nd 2020 1
  • 2. Agenda - Graph Representation Learning - Shallow Methods - Deepwalk[Bryan, 2014 KDD] - Node2Vec[A Grover, 2016 KDD] - Deep Methods: GNN - Graph Convolutional Networks [Kipf., ICLR 2017](GCN) - GraphSAGE [WL, 2017 NIPS] - Graph Attention Networks[Veličković, ICLR 2018] (GAT) - Applications - Pins recommendation with PinSage by Pinterest - Dish Recommendation on Uber Eats with GCN 2
  • 3. Graph Data are everywhere 3
  • 4. Graph Representation Learning Node Embedding Graph Embedding Representation learning is learning representations of input data typically by transforming it or extracting features from it(by some means), that makes it easier to perform a task like classification or prediction. [Yoshua Bengio 2014] Embedding is ALL you need: word2vec, doc2vec, node2vec, item2vec, struc2vec… 4
  • 5. Tasks on Graph Node Classification - Predict a type of given node. Edge Classification/Link Prediction - Predict whether two nodes are linked or the type of the link. Graph Classification - Identify densely linked clusters of nodes Network Similarity - How similar are two (sub)network 5
  • 6. Goal: Encode (f) nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network 1. Define an Encoder 2. Define a similarity function 3. Optimization Node Embedding 6
  • 7. Shallow Encoding —— an embedding-lookup table ENC(v)=Zv Methods: DeepWalk[Perozzi et al. 2014 KDD], Node2vec[Grover et al. 2016 KDD], etc. 7
  • 8. Shallow Methods Framework generate ‘sentences’ unbiased walk: DeepWalk biased walk: Node2vec (different walk strategies) Idea: Optimize the node embeddings so that nodes have similar embeddings if they tend to co-occur on short random walks over the graph. 8
  • 9. DeepWalk 1. Run Short fixed-length random walks starting from each node on the graph using some strategy R 2. For each node u collection N(u), the multiset of nodes visited on random walks starting from u 3. Optimize embeddings according to : Given nodes u, predict its neighbors N(u). 9
  • 10. DeepWalk Optimization The loss function is kind of slow because… 1. nested sum give o|V^2| complexity 2. normalization term from the softmax function Solution: Negative sampling • Use k negative nodes proportional to degree instead of all nodes..! • k should be a balance between predictive accuracy and computational efficiency. 10
  • 11. Node2Vec —— Let’s generate biased walks Idea: Flexible notion of network neighborhood of node leads to rich node embeddings. 11
  • 12. Two parameters: • return parameter p: • return back to the previous node • ‘walk away’ parameter q: • moving outwards(DFS) vs. inwards (BFS) • intuitively, q is the ratio of BFS vs DFS Node2Vec —— Explore neighborhoods in a BFS as well as DFS fashion. The walker just traversed edge (s1,w) and is now at w. Neighbors of w can only be: - s2: same distance to s1. - s1: back to s1 - s3/s4: farther from s1 12
  • 13. Limitations of Shallow Encoders • o(|V|) parameters are needed: • Each node has a unique embedding. • No sharing of parameters between nodes. • Inherently “transductive”: • Either not possible or very time consuming to generate embeddings for nodes not seen during training. • Does not incorporate node features • many graphs have features that we can and should leverage 13
  • 14. Graph Convolutional Networks Idea: Node’s neighborhood defines a computation graph. To obtain node representations, use a NN to aggregate information from neighbors recursively by limited BFS. 14
  • 15. Graph Convolutional Networks • Each layer is one level of depth in the BFS • Nodes have embeddings at each layer. • Layer-0 embedding of node u is its input feature. • Layer-K embedding gets information from NN - final embeddings. So we need… 1. AGG: Aggregator for collecting information from node’s neighborhood. 2. NNs: Neural network for neighborhood representation(eg. NN W1) and node’s self embedding (eg. NN B1) 3. Loss Function for optimization 15
  • 17. Supervised Training vs Unsupervised Training For the shallow methods, we train the models in an unsupervised manner: • use only the graph structure • similar nodes have similar embeddings • feed the ‘sentences’ into skipgram model. For GCN, we directly train the model for a supervised task, like node classification. We can feed the embeddings into any loss function and run stochastic gradient descent to train the parameters. 17
  • 18. Inductive capability 1. In many real applications new nodes are often added to the graph. Needed to generate embeddings for new nodes without retraining. Hard to do with shallow methods. 2. The same aggregation parameters are shared for all nodes. The number of model parameters is sublinear in |V| and generalize to unseen nodes 18
  • 19. GraphSAGE —— Graph SAmple and aggreGatE GCN just aggregated the neighbor messages by taking the weighted average. How to do better? Idea: Generalize the aggregation methods to its neighbors and concatenate the features with itself. 19
  • 20. Neighborhood Aggregator Mean: Take a weighted average of its neighbors Pooling: element-wise mean or max pooling. LSTM: Apply LSTM to reshuffled neighbors 20
  • 21. Recap for GCN, GraphSAGE Key Idea: Generate node embeddings based on local neighborhoods using neural networks Graph Convolutional Network: • Average neighborhood information and stack neural network GraphSAGE: • Generalized neighborhood aggregation (AVG, POOLING, LSTM, etc.) 21
  • 22. Graph Attention Network —— Learnable Aggregator for GCN Idea: Borrow the idea of attention mechanisms and learn to assign different weights to different neighbors in the aggregation process. Attention Is All You Need [A Vaswani, 2017 NIPS] 22
  • 23. Graph Attention Network —— Learnable Aggregator for GCN a is the attention mechanism function euv indicates the importance of node u’s message to node v !uv is the normalized coefficients using softmax function Compute embedding of each node in the graph following an attention strategy. • Nodes attend over their neighborhoods’ messages • Implicitly specifgying different weights to different nodes in a neighborhood 23
  • 24. Attention Mechanism Attention mechanism a: The approach is agnostic to the choice of a • The original paper use a simple single-layer neural network • Multi-head attention can stabilize the learning process of attention mechanism • a can have parameters, which needs to be estimated Parameters of a are trained jointly: • learn the parameters together with weight matrices in an end-to-end fashion Benefits: • Computationally efficient: computation of attentional coefficients can be parallelized across all edges of the graph aggregation may be parallelized across all nodes • Storage efficient: sparse matrix operations do not require more than O(V+E) entries to be stored Fixed number of parameters, irrespective of graph size • Trivially localized: only attends over local network neighborhoods (masked model). • Inductive capability: it is a shared edge-wise mechanism it does not depend on the global graph structure. 24
  • 25. Applications ——Pinsage Challenge for Pinterest: Scaling up GCN-based node embedding in training and inference is difficult: 300M+ users, 4+B pins and 2+B boards. Innovations: • Importance-based neighborhoods sampling strategy by simulating random walks and selecting neighbors with highest visit counts. (importance pooling) • selecting a fixed number of nodes to aggregate from allows to control the memory footprint of the algorithm during training. 25
  • 27. Applications ——Uber Eats Innovations: max-margin loss: customized loss function when training GraphSAGE. Good for weighted graph. 27