NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks", KDD 2015
1. Hyo Eun Lee
Network Science Lab
Dept. of Biotechnology
The Catholic University of Korea
E-mail: gydnsml@gmail.com
2023.08.14
2. 1
Problem definition
Predictive text embedding
• Bipartite Network Embedding
• Heterogeneous Text Network Embedding
• Text Embedding
Experiments
Discussion and conclusion
3. 2
1. Problem definition
Definition
• Definition 1. (Word-Word Network)
• Capture co-occurrence information in unlabeled local contexts
𝐺𝑊𝑊 = (𝑉 , 𝐸𝑊𝑊)
• Traditional word embedding approaches such as skipgrams
• Definition 2. (Word-Document Network) Word-document
• Capture connections between words and documents in a corpus
𝐺𝑊𝐷 = (𝑉 ∪ 𝐷 , 𝐸𝑊𝐷)
• Definition 3. (Word-Label Network) Word-label
𝐺𝑊𝑙 = 𝑉 ∪ 𝐿 , 𝐸𝑊𝑙
𝑤𝑖𝑗 = 𝑛𝑑𝑙
4. 3
1. Problem definition
Definition
• Definition 4. (Heterogeneous Text Network) The heterogeneous text network
• Represents a combination of defined networks
• Captures co-occurrences at multiple levels and includes both labeled and unlabeled data
• Definition 5. (Predictive Text Embedding)
• The resulting low-dimensional embeddings are powerful for certain tasks
5. 4
2. Predictive text embedding
Bipartite Network Embedding
• LINE model was introduced for large-scale information embedding, but weights for different types of
edges cannot be compared
• Therefore, we propose an applied method that applies quadratic proximity between nodes
• 𝐺 = (𝑉𝐴 ∪ 𝑉𝐵, 𝐸)
6. 5
2. Predictive text embedding
Bipartite Network Embedding
• Optimization of the objective function using stochastic gradient descent.
• Using edge sampling and negative sampling techniques.
• Edge sampling method to obtain binary edges e with probability proportional to their weights at each
step and negative samples from the noise distribution p.
• After learning all the embeddings, we can define the objective function
7. 6
Heterogeneous Text Network Embedding
• There are three different networks shared by the word vertices
2. Predictive text embedding
9. 8
Heterogeneous Text Network Embedding
• Train with unlabeled data and refine using labeled
2. Predictive text embedding
10. 9
Text Embedding
• After training the vector representation, it can be averaged to obtain a representation of all the text.
• Learn by minimizing a loss function, specified as the Euclidean distance between embeddings, using a
gradient descent algorithm.
2. Predictive text embedding
19. 18
4. Discussion and conclusion
Discussion and conclusion
• Unsupervised learning uses either local context-level or document-level word co-occurrences, with
document-level co-occurrences being more useful for long documents and local context-level
being more useful for short documents.
• PTE joint training on both labeled and unlabeled data, and outperforms CNNs with more labeled
data.
• PTE needs improvement, such as taking into account the order of words.