DeepWalk: Online Learning of
Social Representations
Bryan Perozzi, Rami Al-Rfou, Steven Skiena
Stony Brook University
ACM SIG-KDD
August 26, 2014
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Features From Graphs
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
AdjacencyMatrix
|V|
Bryan Perozzi DeepWalk: Online Learning of Social Representations
A first step in machine learning for graphs is to
extract graph features:
● node: degree
● pairs: # of common neighbors
● groups: cluster assignments
What is a Graph Representation?
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
|V| d << |V|
Latent Dimensions
AdjacencyMatrix
Bryan Perozzi DeepWalk: Online Learning of Social Representations
We can also create features by transforming the
graph into a lower dimensional latent
representation.
DeepWalk
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
|V|
DeepWalk
d << |V|
Latent Dimensions
AdjacencyMatrix
Bryan Perozzi DeepWalk: Online Learning of Social Representations
DeepWalk learns a latent representation of
adjacency matrices using deep learning
techniques developed for language modeling.
Visual Example
Bryan Perozzi DeepWalk: Online Learning of Social Representations
On Zachary’s Karate Graph:
Input Output
Advantages of DeepWalk
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
|V|
DeepWalk
d << |V|
Latent Dimensions
AdjacencyMatrix
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Scalable - An online algorithm that does not
use entire graph at once
● Walks as sentences metaphor
● Works great!
● Implementation available: bit.ly/deepwalk
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Language Modeling
Learning a
representation
means learning a
mapping function
from word co-
occurrence
Bryan Perozzi DeepWalk: Online Learning of Social Representations
[Rumelhart+, 2003]
We hope that the learned
representations capture
inherent structure
[Baroni et al, 2009]
World of Word Embeddings
This is a very active research topic in NLP.
• Importance sampling and hierarchical classification were proposed to speed up training.
[F. Morin and Y.Bengio, AISTATS 2005] [Y. Bengio and J. Sencal, IEEENN 2008] [A. Mnih, G.
Hinton, NIPS 2008]
• NLP applications based on learned representations.
[Colbert et al. NLP (Almost) from Scratch, (JMLR), 2011.]
• Recurrent networks were proposed to learn sequential representations.
[Tomas Mikolov et al. ICASSP 2011]
• Composed representations learned through recursive networks were used for parsing,
paraphrase detection, and sentiment analysis.
[ R. Socher, C. Manning, A. Ng, EMNLP (2011, 2012, 2013) NIPS (2011, 2012) ACL (2012,
2013) ]
• Vector spaces of representations are developed to simplify compositionality.
[ T. Mikolov, G. Corrado, K. Chen and J. Dean, ICLR 2013, NIPS 2013]
Word Frequency in Natural Language
Co-OccurrenceMatrix
■ Words frequency in a natural
language corpus follows a power
law.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Connection: Power Laws
Vertex frequency in random walks on
scale free graphs also follows a power
law.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Vertex Frequency in SFG
■ Short truncated random walks
are sentences in an artificial
language!
■ Random walk distance is
known to be good features for
many problems
Scale Free Graph
Bryan Perozzi DeepWalk: Online Learning of Social Representations
The Cool Idea
Short random walks =
sentences
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Random Walks
■ We generate random walks for each vertex
in the graph.
■ Each short random walk has length .
■ Pick the next step uniformly from the vertex
neighbors.
■ Example:
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Representation Mapping
Bryan Perozzi DeepWalk: Online Learning of Social Representations
■ Map the vertex under focus ( ) to
its representation.
■ Define a window of size
■ If = 1 and =
Maximize:
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Random Walks
2
1 3
4 5
Input: Graph Representation Mapping
Hierarchical Softmax Output: Representation
Hierarchical Softmax
Calculating involves O(V) operations
for each update! Instead:
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Consider the graph vertices
as leaves of a balanced
binary tree.
● Maximizing
is equivalent to maximizing
the probability of the path
from the root to the node.
specifically, maximizingC1
C2
C3
Each of {C1,
C2,
C3
} is a
logistic binary classifier.
Learning
■ Learned parameters:
■ Vertex representations
■ Tree binary classifiers weights
■ Randomly initialize the representations.
■ For each {C1,
C2,
C3
} calculate the loss
function.
■ Use Stochastic Gradient Descent to update
both the classifier weights and the vertex
representation simultaneously.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
[Mikolov+, 2013]
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Attribute Prediction
The Semi-Supervised Network Classification
problem:
Bryan Perozzi DeepWalk: Online Learning of Social Representations
2
1
3
4
6
5
Stony Brook
studentsGooglers
INPUT
A partially labelled graph with node
attributes.
OUTPUT
Attributes for nodes which do not
have them.
Baselines
■ Approximate Inference Techniques:
❑ weighted vote Relational Neighbor (wvRN)
■ Latent Dimensions
❑ Spectral Methods
■ SpectralClustering [Tang+, ‘11]
■ MaxModularity [Tang+, ‘09]
❑ k-means
■ EdgeCluster [Tang+, ‘09]
Bryan Perozzi DeepWalk: Online Learning of Social Representations
[Macskassy+, ‘03]
Results: BlogCatalog
Bryan Perozzi
DeepWalk performs well, especially when labels are
sparse.
Results: Flickr
Bryan Perozzi
Results: YouTube
Bryan Perozzi
Spectral Methods do not scale to large graphs.
Parallelization
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Parallelization doesn’t affect representation quality.
● The sparser the graph, the easier to achieve linear
scalability. (Feng+, NIPS ‘11)
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Variants / Future Work
■ Streaming
❑ No need to ever store entire graph
❑ Can build & update representation as new data
comes in.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
■ “Non-Random” Walks
❑ Many graphs occur through as a by-product of
interactions
❑ One could outside processes (users, etc) to feed
the modeling phase
❑ [This is what language modeling is doing]
Take-away
Language Modeling techniques can be
used for online learning of network
representations.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Thanks!
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Bryan Perozzi
@phanein
bperozzi@cs.stonybrook.edu
DeepWalk available at:
http://bit.ly/deepwalk

DeepWalk: Online Learning of Representations

  • 1.
    DeepWalk: Online Learningof Social Representations Bryan Perozzi, Rami Al-Rfou, Steven Skiena Stony Brook University ACM SIG-KDD August 26, 2014
  • 2.
    Outline Bryan Perozzi DeepWalk:Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 3.
    Features From Graphs ●Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... AdjacencyMatrix |V| Bryan Perozzi DeepWalk: Online Learning of Social Representations A first step in machine learning for graphs is to extract graph features: ● node: degree ● pairs: # of common neighbors ● groups: cluster assignments
  • 4.
    What is aGraph Representation? ● Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... |V| d << |V| Latent Dimensions AdjacencyMatrix Bryan Perozzi DeepWalk: Online Learning of Social Representations We can also create features by transforming the graph into a lower dimensional latent representation.
  • 5.
    DeepWalk ● Anomaly Detection ●Attribute Prediction ● Clustering ● Link Prediction ● ... |V| DeepWalk d << |V| Latent Dimensions AdjacencyMatrix Bryan Perozzi DeepWalk: Online Learning of Social Representations DeepWalk learns a latent representation of adjacency matrices using deep learning techniques developed for language modeling.
  • 6.
    Visual Example Bryan PerozziDeepWalk: Online Learning of Social Representations On Zachary’s Karate Graph: Input Output
  • 7.
    Advantages of DeepWalk ●Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... |V| DeepWalk d << |V| Latent Dimensions AdjacencyMatrix Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Scalable - An online algorithm that does not use entire graph at once ● Walks as sentences metaphor ● Works great! ● Implementation available: bit.ly/deepwalk
  • 8.
    Outline Bryan Perozzi DeepWalk:Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 9.
    Language Modeling Learning a representation meanslearning a mapping function from word co- occurrence Bryan Perozzi DeepWalk: Online Learning of Social Representations [Rumelhart+, 2003] We hope that the learned representations capture inherent structure [Baroni et al, 2009]
  • 10.
    World of WordEmbeddings This is a very active research topic in NLP. • Importance sampling and hierarchical classification were proposed to speed up training. [F. Morin and Y.Bengio, AISTATS 2005] [Y. Bengio and J. Sencal, IEEENN 2008] [A. Mnih, G. Hinton, NIPS 2008] • NLP applications based on learned representations. [Colbert et al. NLP (Almost) from Scratch, (JMLR), 2011.] • Recurrent networks were proposed to learn sequential representations. [Tomas Mikolov et al. ICASSP 2011] • Composed representations learned through recursive networks were used for parsing, paraphrase detection, and sentiment analysis. [ R. Socher, C. Manning, A. Ng, EMNLP (2011, 2012, 2013) NIPS (2011, 2012) ACL (2012, 2013) ] • Vector spaces of representations are developed to simplify compositionality. [ T. Mikolov, G. Corrado, K. Chen and J. Dean, ICLR 2013, NIPS 2013]
  • 11.
    Word Frequency inNatural Language Co-OccurrenceMatrix ■ Words frequency in a natural language corpus follows a power law. Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 12.
    Connection: Power Laws Vertexfrequency in random walks on scale free graphs also follows a power law. Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 13.
    Vertex Frequency inSFG ■ Short truncated random walks are sentences in an artificial language! ■ Random walk distance is known to be good features for many problems Scale Free Graph Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 14.
    The Cool Idea Shortrandom walks = sentences Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 15.
    Outline Bryan Perozzi DeepWalk:Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 16.
    Deep Learning forNetworks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 17.
    Deep Learning forNetworks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 18.
    Random Walks ■ Wegenerate random walks for each vertex in the graph. ■ Each short random walk has length . ■ Pick the next step uniformly from the vertex neighbors. ■ Example: Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 19.
    Deep Learning forNetworks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 20.
    Representation Mapping Bryan PerozziDeepWalk: Online Learning of Social Representations ■ Map the vertex under focus ( ) to its representation. ■ Define a window of size ■ If = 1 and = Maximize:
  • 21.
    Deep Learning forNetworks Bryan Perozzi DeepWalk: Online Learning of Social Representations Random Walks 2 1 3 4 5 Input: Graph Representation Mapping Hierarchical Softmax Output: Representation
  • 22.
    Hierarchical Softmax Calculating involvesO(V) operations for each update! Instead: Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Consider the graph vertices as leaves of a balanced binary tree. ● Maximizing is equivalent to maximizing the probability of the path from the root to the node. specifically, maximizingC1 C2 C3 Each of {C1, C2, C3 } is a logistic binary classifier.
  • 23.
    Learning ■ Learned parameters: ■Vertex representations ■ Tree binary classifiers weights ■ Randomly initialize the representations. ■ For each {C1, C2, C3 } calculate the loss function. ■ Use Stochastic Gradient Descent to update both the classifier weights and the vertex representation simultaneously. Bryan Perozzi DeepWalk: Online Learning of Social Representations [Mikolov+, 2013]
  • 24.
    Deep Learning forNetworks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 25.
    Outline Bryan Perozzi DeepWalk:Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 26.
    Attribute Prediction The Semi-SupervisedNetwork Classification problem: Bryan Perozzi DeepWalk: Online Learning of Social Representations 2 1 3 4 6 5 Stony Brook studentsGooglers INPUT A partially labelled graph with node attributes. OUTPUT Attributes for nodes which do not have them.
  • 27.
    Baselines ■ Approximate InferenceTechniques: ❑ weighted vote Relational Neighbor (wvRN) ■ Latent Dimensions ❑ Spectral Methods ■ SpectralClustering [Tang+, ‘11] ■ MaxModularity [Tang+, ‘09] ❑ k-means ■ EdgeCluster [Tang+, ‘09] Bryan Perozzi DeepWalk: Online Learning of Social Representations [Macskassy+, ‘03]
  • 28.
    Results: BlogCatalog Bryan Perozzi DeepWalkperforms well, especially when labels are sparse.
  • 29.
  • 30.
    Results: YouTube Bryan Perozzi SpectralMethods do not scale to large graphs.
  • 31.
    Parallelization Bryan Perozzi DeepWalk:Online Learning of Social Representations ● Parallelization doesn’t affect representation quality. ● The sparser the graph, the easier to achieve linear scalability. (Feng+, NIPS ‘11)
  • 32.
    Outline Bryan Perozzi DeepWalk:Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 33.
    Variants / FutureWork ■ Streaming ❑ No need to ever store entire graph ❑ Can build & update representation as new data comes in. Bryan Perozzi DeepWalk: Online Learning of Social Representations ■ “Non-Random” Walks ❑ Many graphs occur through as a by-product of interactions ❑ One could outside processes (users, etc) to feed the modeling phase ❑ [This is what language modeling is doing]
  • 34.
    Take-away Language Modeling techniquescan be used for online learning of network representations. Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 35.
    Thanks! Bryan Perozzi DeepWalk:Online Learning of Social Representations Bryan Perozzi @phanein bperozzi@cs.stonybrook.edu DeepWalk available at: http://bit.ly/deepwalk