2. • fastText
Facebook AI Research Enriching Word Vectors with Subword Information ,
Advances in Pre-Training Distributed Word Representations .
• Word2vec T. Mikolov ,
Y. Bengio NNLM, T. Mikolov Word2Vec (CBOW , Skip-gram)
.
3. Introduction
•
) eat eating
•
• (morphological feature)
- morphologically rich languages
1) 40
2) 15
• continuous skip-gram model subword information !
# 01.
# 02. FastText
4. Model
• Word2Vec Skip-gram model
• subword units
• boundary symbols <, >
• special unit
• Bag-of-Characters , 𝑛 -gram Charaters
• Embedding 𝑛 -gram vector
) where 2 ≤ 𝑛 ≤ 5 𝑛 -gram
# 01. General Model
# 02. Subword Model
where
<w, wh, he, er, re, e> 2-gram
<wh, whe, her, ere, re> 3-gram
<whe, wher, here, ere> 4-gram
<wher, where, here> 5-gram
<where> special unit
v(where) =
v(<w) + v(wh) + ... + v(here>) + v(<where>)
5. Model
• Scoring Function
• OOV(out of vocaburary, )
) birthplace n-grams
birth place birthplace embedding vector
• ( )
ex) where wherre character 3-grams
- where <wh, whe, her, ere, re>
- wherre <wh, whe, her, err, rre, re>
# 02. Subword Model
𝐺 : 𝑛 -gram dictionary size
𝐺𝑤⊂ {1,…, 𝐺} : set of n-grams appealing in word 𝑤
𝑍𝑔: vector representation to each 𝑛 -gram 𝑔
2 subword
# 03. This simple model allows ...
6. Experimental setup
• Word2Vec skipgram , cbow
• SGD
• training size = T, passes over the data = P, time = t
• fixed parameter = 𝛾0 step size = 𝛾0(1 −
𝑡
𝑇𝑃
)
• word vectors dimension = 300
• negative samples = 5
• context window size = c, uniformly sample the size c between 1 and 5
• in order to subsample the most frequent words rejection threshold of 10−4
• keep the words that appear at least 5 times in training set
• 𝛾0 = 0.025 for skip-gram(baseline), 0.05 for both cbow(baseline) and our model
• (9 )
# 01. Baseline
# 02. Optimization
# 03. Implementation details (word2vec paramater )
# 04. Datasets
7. Experiment
• : human judgement( ) cosine similarity
• n-gram vector sisg-, sisg skipgram, cbow
• sisg (Subword Information Skip Gram) :
• sisg - : Training data null vector
• cbow skipgram : Word2Vec
• cbow skipgram, sisg- Training data null vector .
# 01. Human similarity judgement
8. Experiment
• sisg English WS353
(Common words)
. RareWord dataset
.
• Dataset sisg sisg-
subword information
• Arabic / German / Russian data
.
# 01. Human similarity judgement
9. Experiment
• A is to B as C is to D
model D
• syntactic( ) task baseline
• semantic( ) task
• German Italian
-grams size
• Cs( ) De( )
(morphologically rich language)
# 02. Word analogy tasks
10. • sisg baseline
• (1~20%) sisg- & cbow
• cbow sisg
sisg X
• sisg train dataset
Experiment
# 03. Effect of the size of the training data
11. • Semantic task : n , Syntactic task : n
• 5,6 column
• 2-gram
Experiment
# 04. Effect of the size of the training data
13. Qualitative analysis
• 3 character n-gram
) autofahrer(car driver)
Fahrer(driver)
• n-grams
• (EN)
)
)
• (FR) (inflections)
# 01. Character n-grams and morphemes
14. Qualitative analysis
•
• .
) "pav-", "-sphal-" : ( ) , "young", "adoles" : ( )
# 02. Word similarity for OOV words
• X : OOV( )
• Y :
15. Conclusion
• FastText skip-gram n-gram subword information
Word2Vec
• trains fast & does not require any preprocessing or supervision
• compute word representations for words that did not appear in the
training data
• State-of-the-art performance !
# Because of its simplicity...