SlideShare a Scribd company logo
1 of 1
Download to read offline
RiemannianOptimizationforSkip-GramNegativeSampling
ALEXANDER FONAREV1,2,4
, OLEKSII HRINCHUK1,2,3
, GLEB GUSEV2,3
, PAVEL SERDYUKOV2
, IVAN OSELEDETS1,5
1
SKOLKOVO INSTITUTE OF SCIENCE AND TECHNOLOGY, 2
YANDEX LLC, 3
MOSCOW INSTITUTE OF PHYSICS AND TECHNOLOGY, 4
SBDA GROUP, 5
INSTITUTE OF NUMERICAL MATHEMATICS RAS
2. METHODOLOGY
We apply Riemannian optimization for training
SGNS word embedding model.
Main contributions
• We train SNGS word embedding model as a two-
step procedure with clear objectives;
• We use Riemannian approach to optimize SGNS
objective over low-rank matrices X for Step 1;
• We outperform state-of-the-art in both SGNS ob-
jective and the semantic similarity metric.
1. PROBLEM
Training Skip-Gram Negative Sampling (SGNS)
word embedding model (“word2vec”) to measure
the semantic similarity between them can be refor-
mulated as a two-step procedure:
Step 1. Search for a low-rank matrix X that provides
a good SGNS objective value;
Step 2. Search for a good low-rank representation
X = WC in terms of linguistic metrics,
where W is a matrix of word embeddings
and C is a matrix of context embeddings.
5. RESULTS
• Three different methods: RO-SGNS [1], SVD-SPPMI [2] and SGD-SGNS (original “word2vec”).
• Three popular benchmarks for semantic similarity evaluation (“wordsim-353”, “simlex”, “men”)
• Each dataset contains word pairs together with assessor-assigned similarity scores for each pair
• Original “wordsim-353” is a mixture of the word pairs for both word similarity and word relatedness tasks
which we also use in our experiments (“ws-sim” and “ws-rel”)
Dim. d Algorithm ws-sim ws-rel ws-full simlex men
d = 100
SGD-SGNS 0.719 0.570 0.662 0.288 0.645
SVD-SPPMI 0.722 0.585 0.669 0.317 0.686
RO-SGNS 0.729 0.597 0.677 0.322 0.683
d = 200
SGD-SGNS 0.733 0.584 0.677 0.317 0.664
SVD-SPPMI 0.747 0.625 0.694 0.347 0.710
RO-SGNS 0.757 0.647 0.708 0.353 0.701
d = 500
SGD-SGNS 0.738 0.600 0.688 0.350 0.712
SVD-SPPMI 0.765 0.639 0.707 0.380 0.737
RO-SGNS 0.767 0.654 0.715 0.383 0.732
Table 1: Spearman’s correlation between predicted similarities and the manually assesed ones.
3. BACKGROUND
SGNS as Implicit Matrix Factorization [2]
X = WCT
= (xwc), xwc = w, c
Md = X ∈ Rn×m
: rank(X) = d
F(X) =
w∈VW c∈VC
(#(w, c)(log σ(xwc)+
+k
#(w)#(c)
|D|
log σ(−xwc))) → max
X∈Md
Riemannian Optimization
1. Projection of the gradient ascent
step onto the tangent space:
ˆXi+1 = Xi + PTXi
Md
F(Xi)
2. Retraction back to the manifold:
Xi+1 = R( ˆXi+1) ∈ Md
Xi
Xi+1
Xi + rF(Xi)
Md
TXi
Md
projection
retraction
Figure 1: Geometric interpretation of one step of RO.
REFERENCES
[1] Alexander Fonarev, Oleksii Hrinchuk et al. Riemannian opti-
mization for Skip-Gram Negative Sampling. In ACL 2017.
[2] Omer Levy and Yoav Goldberg. Neural word embedding as
implicit matrix factorization. In NIPS 2014.
[3] Christian Lubich and Ivan Oseledets. A projector-splitting
integrator for dynamical low-rank approximation. In BIT
Numerical Mathematics 2014.
SOURCE CODE
• Python implementation of RO-SGNS
• templates of basic experiments
• semantic similarity datasets
https://github.com/AlexGrinch/ro_sgns
FUTURE WORK
• Apply more advanced optimization techniques to
the Step 1 of the proposed scheme.
• Explore the Step 2 of obtaining embeddings with
a given low-rank matrix.
6. EXAMPLES
• We examine words, whose neighbors in terms
of cosine distance change significantly.
• Table 2 contains the neighbors to the word “usa”
with names of USA states marked bold.
• We represent top 11-20 nearest neighbors in
Table 2, as top 10 words turned out to be exactly
names of states for all three methods.
usa
SGD-SGNS SVD-SPPMI RO-SGNS
Neighbors Dist. Neighbors Dist. Neighbors Dist.
akron 0.536 wisconsin 0.700 georgia 0.707
midwest 0.535 delaware 0.693 delaware 0.706
burbank 0.534 ohio 0.691 maryland 0.705
nevada 0.534 northeast 0.690 illinois 0.704
arizona 0.533 cities 0.688 madison 0.703
uk 0.532 southwest 0.684 arkansas 0.699
youngstown 0.532 places 0.684 dakota 0.690
utah 0.530 counties 0.681 tennessee 0.689
milwaukee 0.530 maryland 0.680 northeast 0.687
headquartered 0.527 dakota 0.674 nebraska 0.686
Table 2: Examples of the semantic neighbors for “usa”.
Figure 2: Spearman’s correlation on each iteration.
• The best results were obtained when SVD-
SPPMI was used as initialization.
• We need to stop optimization procedure on
some iteration to get better model.
• Optimal value of K appeared to be the same
for both test and its 10-fold cross-validation.
4. ALGORITHM
RO-SGNS
Require: gradient function F : Rn×m
→ Rn×m
,
(W0, C0), maximum number of iterations K
Ensure: Word embeddings matrix W ∈ Rn×d
1: X0 ← W0CT
0
2: U0, S0, V T
0 ← SVD(X0)
3: for i ← 1, . . . , K do
4: ˆXi = Xi−1 + λ F(Xi−1)
5: Ui, Si ← QR ˆXiVi−1
6: Vi, Si ← QR ˆXi Ui
7: Xi ← UiSiVi
8: end for
9: U, Σ, V ← SVD(XK)
10: W ← U
√
Σ
11: return W
Retract point
back to the
manifold
λ – step-size
Compute word embeddings
Note. Highlighted retraction formulas approxi-
mate [3]:
Xi+1 = R(Xi + PTXi
Md
F(Xi))
.

More Related Content

More from Association for Computational Linguistics

More from Association for Computational Linguistics (20)

Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
 
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian TranslationToshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...
Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...
Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...
 

Recently uploaded

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Alexander Fonarev - 2017 - Riemannian Optimization for Skip-Gram Negative Sampling

  • 1. RiemannianOptimizationforSkip-GramNegativeSampling ALEXANDER FONAREV1,2,4 , OLEKSII HRINCHUK1,2,3 , GLEB GUSEV2,3 , PAVEL SERDYUKOV2 , IVAN OSELEDETS1,5 1 SKOLKOVO INSTITUTE OF SCIENCE AND TECHNOLOGY, 2 YANDEX LLC, 3 MOSCOW INSTITUTE OF PHYSICS AND TECHNOLOGY, 4 SBDA GROUP, 5 INSTITUTE OF NUMERICAL MATHEMATICS RAS 2. METHODOLOGY We apply Riemannian optimization for training SGNS word embedding model. Main contributions • We train SNGS word embedding model as a two- step procedure with clear objectives; • We use Riemannian approach to optimize SGNS objective over low-rank matrices X for Step 1; • We outperform state-of-the-art in both SGNS ob- jective and the semantic similarity metric. 1. PROBLEM Training Skip-Gram Negative Sampling (SGNS) word embedding model (“word2vec”) to measure the semantic similarity between them can be refor- mulated as a two-step procedure: Step 1. Search for a low-rank matrix X that provides a good SGNS objective value; Step 2. Search for a good low-rank representation X = WC in terms of linguistic metrics, where W is a matrix of word embeddings and C is a matrix of context embeddings. 5. RESULTS • Three different methods: RO-SGNS [1], SVD-SPPMI [2] and SGD-SGNS (original “word2vec”). • Three popular benchmarks for semantic similarity evaluation (“wordsim-353”, “simlex”, “men”) • Each dataset contains word pairs together with assessor-assigned similarity scores for each pair • Original “wordsim-353” is a mixture of the word pairs for both word similarity and word relatedness tasks which we also use in our experiments (“ws-sim” and “ws-rel”) Dim. d Algorithm ws-sim ws-rel ws-full simlex men d = 100 SGD-SGNS 0.719 0.570 0.662 0.288 0.645 SVD-SPPMI 0.722 0.585 0.669 0.317 0.686 RO-SGNS 0.729 0.597 0.677 0.322 0.683 d = 200 SGD-SGNS 0.733 0.584 0.677 0.317 0.664 SVD-SPPMI 0.747 0.625 0.694 0.347 0.710 RO-SGNS 0.757 0.647 0.708 0.353 0.701 d = 500 SGD-SGNS 0.738 0.600 0.688 0.350 0.712 SVD-SPPMI 0.765 0.639 0.707 0.380 0.737 RO-SGNS 0.767 0.654 0.715 0.383 0.732 Table 1: Spearman’s correlation between predicted similarities and the manually assesed ones. 3. BACKGROUND SGNS as Implicit Matrix Factorization [2] X = WCT = (xwc), xwc = w, c Md = X ∈ Rn×m : rank(X) = d F(X) = w∈VW c∈VC (#(w, c)(log σ(xwc)+ +k #(w)#(c) |D| log σ(−xwc))) → max X∈Md Riemannian Optimization 1. Projection of the gradient ascent step onto the tangent space: ˆXi+1 = Xi + PTXi Md F(Xi) 2. Retraction back to the manifold: Xi+1 = R( ˆXi+1) ∈ Md Xi Xi+1 Xi + rF(Xi) Md TXi Md projection retraction Figure 1: Geometric interpretation of one step of RO. REFERENCES [1] Alexander Fonarev, Oleksii Hrinchuk et al. Riemannian opti- mization for Skip-Gram Negative Sampling. In ACL 2017. [2] Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization. In NIPS 2014. [3] Christian Lubich and Ivan Oseledets. A projector-splitting integrator for dynamical low-rank approximation. In BIT Numerical Mathematics 2014. SOURCE CODE • Python implementation of RO-SGNS • templates of basic experiments • semantic similarity datasets https://github.com/AlexGrinch/ro_sgns FUTURE WORK • Apply more advanced optimization techniques to the Step 1 of the proposed scheme. • Explore the Step 2 of obtaining embeddings with a given low-rank matrix. 6. EXAMPLES • We examine words, whose neighbors in terms of cosine distance change significantly. • Table 2 contains the neighbors to the word “usa” with names of USA states marked bold. • We represent top 11-20 nearest neighbors in Table 2, as top 10 words turned out to be exactly names of states for all three methods. usa SGD-SGNS SVD-SPPMI RO-SGNS Neighbors Dist. Neighbors Dist. Neighbors Dist. akron 0.536 wisconsin 0.700 georgia 0.707 midwest 0.535 delaware 0.693 delaware 0.706 burbank 0.534 ohio 0.691 maryland 0.705 nevada 0.534 northeast 0.690 illinois 0.704 arizona 0.533 cities 0.688 madison 0.703 uk 0.532 southwest 0.684 arkansas 0.699 youngstown 0.532 places 0.684 dakota 0.690 utah 0.530 counties 0.681 tennessee 0.689 milwaukee 0.530 maryland 0.680 northeast 0.687 headquartered 0.527 dakota 0.674 nebraska 0.686 Table 2: Examples of the semantic neighbors for “usa”. Figure 2: Spearman’s correlation on each iteration. • The best results were obtained when SVD- SPPMI was used as initialization. • We need to stop optimization procedure on some iteration to get better model. • Optimal value of K appeared to be the same for both test and its 10-fold cross-validation. 4. ALGORITHM RO-SGNS Require: gradient function F : Rn×m → Rn×m , (W0, C0), maximum number of iterations K Ensure: Word embeddings matrix W ∈ Rn×d 1: X0 ← W0CT 0 2: U0, S0, V T 0 ← SVD(X0) 3: for i ← 1, . . . , K do 4: ˆXi = Xi−1 + λ F(Xi−1) 5: Ui, Si ← QR ˆXiVi−1 6: Vi, Si ← QR ˆXi Ui 7: Xi ← UiSiVi 8: end for 9: U, Σ, V ← SVD(XK) 10: W ← U √ Σ 11: return W Retract point back to the manifold λ – step-size Compute word embeddings Note. Highlighted retraction formulas approxi- mate [3]: Xi+1 = R(Xi + PTXi Md F(Xi)) .