1. Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
4. Why learn vector space models?
What is your age?
How old are you?
Where are you heading?
Where are you from?
Different meaning Same Meaning
5. Vector space models applications
Machine Translation
• You eat cereal from a bowl
• You buy something and someone else sells it
Information Extraction Chatbots
10. Word by Word Design
Number of times they occur together within a certain distance k
I like simple data
I prefer simple raw data
data
simple
k=2
2
n
raw like I
1 1 0
11. Word by Document Design
Number of times a word occurs within a certain category
Entertainment
data
Entertainment Economy
Machine
Learning
Economy
Machine
Learning
500 6620 9320
Corpus
film 7000 4000 1000
12. Vector Space
Measures of “similarity:”
Angle
Distance
ML
Economy
Entertainment
data
film
1000 5000 10000
1000
5000
10000
data
Entertainment Economy ML
6620 9320
film 4000 1000
500
7000
13. Summary
● W/W and W/D, counts of occurrence
● Vector Spaces Similarity between words/documents
16. Euclidean distance
Corpus B: (9320,1000)
Corpus A: (500,7000)
ML
Entertainment
data
film
1000 5000 10000
1000
5000
10000
17. Euclidean distance
Corpus B: (9320,1000)
Corpus A: (500,7000)
data
film
1000 5000 10000
1000
5000
10000
ML
Entertainment
18. Euclidean distance for n-dimensional vectors
data boba ice-cream
AI 6 0 1
drinks 0 4 6
food 0 6 8
Norm of ( -
)
19. Euclidean distance in Python
# Create numpy vectors v and w
v = np.array([1, 6, 8])
w = np.array([0, 4, 6])
# Calculate the Euclidean distance d
d = np.linalg.norm(v-w)
# Print the result
print("The Euclidean distance between v and w is: ", d)
The Euclidean distance between v and w is: 3
20. ● Straight line between points
Summary
● Norm of the difference between vectors
23. d1
Euclidean distance vs Cosine similarity
Agriculture corpus (20,40)
The cosine of the angle
between the vectors
d2
Euclidean distance: d2 < d1
α β
Angles comparison: β >
α
Food corpus (5,15) History corpus (30,20)
eggs
disease
10 20 30
10
20
30
40
34. Manipulating word vectors
[Mikolov et al, 2013, Distributed Representations of Words and Phrases and their Compositionality]
Washington (10,5)
Moscow (9,3)
Tokyo(8.5,
2)
Ankara(8.5, 0.9)
Turkey (3,1)
USA (5,6)
Russia (5,5)
Japan (4,3)
(10,4)
Washington - USA =
Russia + =
Moscow
38. Visualization of word vectors
How can you visualize if your representation
captures these relationships?
oil & gas
town & city
oil 0.20 … 0.10
gas 2.10 … 3.40
city 9.30 … 52.1
town 6.20 … 34.3
d > 2
39. Visualization of word vectors
oil 0.20 … 0.10
gas 2.10 … 3.40
city 9.30 … 52.1
town 6.20 … 34.3
oil 2.30 21.2
gas 1.56 19.3
city 13.4 34.1
town 15.6 29.8
d = 2
d > 2
PCA
49. Summary
• Eigenvectors give the direction of uncorrelated features
• Eigenvalues are the variance of the new features
• Dot product gives the projection on uncorrelated features