[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations.pptx
1. DeepWalk: Online Learning of
Social Representations
Quang-Huy Tran
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: xxx@catholic.ac.kr
2024 / 03 / 04
Bryan Perozzi, Rami Al-Rfou, Steven Skiena.
KDD ‘14: Proceedings of the 20th ACM SIGKDD International conference on
Knowledge discovery and date maining
3. 3
Motivation
● Learning meaningful representations of nodes in a network is crucial for various applications,
including:
○ Network classification (e.g., identifying fraudulent activity or community membership)
○ Link prediction (e.g., suggesting potential connections between users)
○ Anomaly detection (e.g., identifying unusual activity patterns)
● Traditional methods often require the entire network structure or face limitations in
scalability.
4. 4
Motivation
What is DeepWalk?
● DeepWalk is a novel approach for learning latent representations of vertices in networks.
● Motivated by advancements in NLP using neural embedding techniques like word2vec.
5. 5
Key Concept
Main DeepWalk Point
● Random walks: a process for determining the probable location of a node by a random
motions, given the probabilities of moving some distance in some direction.
● Context nodes: treat nodes in the walk as context nodes for a target node.
● Skip-gram model: train the model to predict context nodes given a target node.
● Key idea: treat random walks in a graph like sentences and learn embeddings using Skip-
Gram.
○ Estimate likelihood of a specific sequence of words appearing in a corpus.
6. 6
Key Concept
Frequency of Word and Power Law Distribution
● Words frequency in a natural language corpus follows a power law
○ Scale-free graph.
● Vertex frequency in random walks on scale free graphs also follows a power law.
● Short truncated random walks are sentences in an artificial language.
○ Short random walk = sentences.
Fig.1. The power-law distribution of vertices appearing in short random walks (1a) follows a power-law, much like
the distribution of words in natural language (1b)
7. 7
DeepWalk Algorithm
Key Step in Deep Walk
1. Input Graph 2. Random Walk
5. Output Representation
4. Hierarchical Softmax
3. Representation Mapping
8. 8
DeepWalk Algorithm
Random Walk
● Start at a random node.
● We generate random walks gamma for each vertex in the graph.
● Each short random walk has length t.
● Pick the next step uniformly from the vertex neighbors.
9. 9
DeepWalk Algorithm
Representation Mapping - SkipGram
● Map random walks W into a corpus of node sequences.
● Each node in a walk is treated as a word in a sequences.
● The corpus is used as input for the Skip-gram model.
10. 10
DeepWalk Algorithm
Representation Mapping - SkipGram
● Objective: maximizes the probability of its neighbors (words) in the walk (sentence).
● Learn node representations that capture structural information.
● However, calculate probability in line 3 is expensive (O(V) computation).
11. 11
DeepWalk Algorithm
Hierarchical Softmax
● Consider the graph vertices as leaves of a balanced binary tree.
● Assign vertices to the leaves of tree
○ Maximizing the probability of the path from the root to the node.
12. 12
Advantage and Limitation
Advantage
● Unsupervised learning: No labeled data is required for training.
● Efficiency and Scalability: Leverage latent representation obtained from truncated random
walks to learn these representations.
○ Handles large networks efficiently.
● Online Learning: Continuously updates embeddings as new nodes or edges are added to the
network.
13. 13
Advantage and Limitation
Limitation
● Sensitivity to hyperparameters: hyperparameters can impact performance.
● Uniform random walk: may not explore over the network
○ Lead to biased in random walk.
● Limited context: Local information may not be enough with global information.
14. 14
Conclusion
Future Direction
● Hybrid model (Streaming):
○ No need to implement entire graph.
○ Combine with others embedding techniques to achieve better performance.
● Improved latent representation:
○ Consider graphs are created as a by-product of agents interacting with a sequence of elements
(e.g. user ’ navigation of pages on a webapp, like Facebook, Coupang, etc).
○ Research on enhancing the quality of node’s feature.
15. 15
Conclusion
Recap
● DeepWalk provides a novel approach for learning latent representations of vertices in a
network
○ Encode social relations in a continuous vector space.
● Many application to various social network analysis.
○ Node classification: label nodes based on their learned representations.
○ Link prediction: predict missing edges between nodes.
○ Community detection: discover cohesive groups of nodes in the network.
● Ongoing research: exciting possibilities for future developments