SlideShare a Scribd company logo
1 of 16
Download to read offline
NLP2
FastText
Enriching Word Vectors with Subword Information
1
• fastText
Facebook AI Research Enriching Word Vectors with Subword Information ,
Advances in Pre-Training Distributed Word Representations .
• Word2vec T. Mikolov ,
Y. Bengio NNLM, T. Mikolov Word2Vec (CBOW , Skip-gram)
.
Introduction
•
) eat eating
•
• (morphological feature)
- morphologically rich languages
1) 40
2) 15
• continuous skip-gram model subword information !
# 01.
# 02. FastText
Model
• Word2Vec Skip-gram model
• subword units
• boundary symbols <, >
• special unit
• Bag-of-Characters , 𝑛 -gram Charaters
• Embedding 𝑛 -gram vector
) where 2 ≤ 𝑛 ≤ 5 𝑛 -gram
# 01. General Model
# 02. Subword Model
where
<w, wh, he, er, re, e> 2-gram
<wh, whe, her, ere, re> 3-gram
<whe, wher, here, ere> 4-gram
<wher, where, here> 5-gram
<where> special unit
v(where) =
v(<w) + v(wh) + ... + v(here>) + v(<where>)
Model
• Scoring Function
• OOV(out of vocaburary, )
) birthplace n-grams
birth place birthplace embedding vector
• ( )
ex) where wherre character 3-grams
- where <wh, whe, her, ere, re>
- wherre <wh, whe, her, err, rre, re>
# 02. Subword Model
𝐺 : 𝑛 -gram dictionary size
𝐺𝑤⊂ {1,…, 𝐺} : set of n-grams appealing in word 𝑤
𝑍𝑔: vector representation to each 𝑛 -gram 𝑔
2 subword
# 03. This simple model allows ...
Experimental setup
• Word2Vec skipgram , cbow
• SGD
• training size = T, passes over the data = P, time = t
• fixed parameter = 𝛾0 step size = 𝛾0(1 −
𝑡
𝑇𝑃
)
• word vectors dimension = 300
• negative samples = 5
• context window size = c, uniformly sample the size c between 1 and 5
• in order to subsample the most frequent words rejection threshold of 10−4
• keep the words that appear at least 5 times in training set
• 𝛾0 = 0.025 for skip-gram(baseline), 0.05 for both cbow(baseline) and our model
• (9 )
# 01. Baseline
# 02. Optimization
# 03. Implementation details (word2vec paramater )
# 04. Datasets
Experiment
• : human judgement( ) cosine similarity
• n-gram vector sisg-, sisg skipgram, cbow
• sisg (Subword Information Skip Gram) :
• sisg - : Training data null vector
• cbow skipgram : Word2Vec
• cbow skipgram, sisg- Training data null vector .
# 01. Human similarity judgement
Experiment
• sisg English WS353
(Common words)
. RareWord dataset
.
• Dataset sisg sisg-
subword information
• Arabic / German / Russian data
.
# 01. Human similarity judgement
Experiment
• A is to B as C is to D
model D
• syntactic( ) task baseline
• semantic( ) task
• German Italian
-grams size
• Cs( ) De( )
(morphologically rich language)
# 02. Word analogy tasks
• sisg baseline
• (1~20%) sisg- & cbow
• cbow sisg
sisg X
• sisg train dataset
Experiment
# 03. Effect of the size of the training data
• Semantic task : n , Syntactic task : n
• 5,6 column
• 2-gram
Experiment
# 04. Effect of the size of the training data
Experiment
• sisg (LSTM, sg)
subword
• Vocab Size (Cs( ),
Ru( )) perplexity
# 05. Language modeling
Qualitative analysis
• 3 character n-gram
) autofahrer(car driver)
Fahrer(driver)
• n-grams
• (EN)
)
)
• (FR) (inflections)
# 01. Character n-grams and morphemes
Qualitative analysis
• 
• .
) "pav-", "-sphal-" : ( ) , "young", "adoles" : ( )
# 02. Word similarity for OOV words
• X : OOV( )
• Y :
Conclusion
• FastText skip-gram n-gram subword information
Word2Vec
• trains fast & does not require any preprocessing or supervision
• compute word representations for words that did not appear in the
training data
• State-of-the-art performance !
# Because of its simplicity...
Any Questions?

More Related Content

Similar to FastText : Enriching Word Vectors with Subword Information

[224] backend 개발자의 neural machine translation 개발기 김상경
[224] backend 개발자의 neural machine translation 개발기 김상경[224] backend 개발자의 neural machine translation 개발기 김상경
[224] backend 개발자의 neural machine translation 개발기 김상경NAVER D2
 
Word 2 Vec Algorithm
Word 2 Vec AlgorithmWord 2 Vec Algorithm
Word 2 Vec AlgorithmHyeongmin Lee
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)홍배 김
 
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기
<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기Han-seok Jo
 
(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classification(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classificationMYEONGGYU LEE
 
Papago/N2MT 개발이야기
Papago/N2MT 개발이야기Papago/N2MT 개발이야기
Papago/N2MT 개발이야기NAVER D2
 
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)hkh
 
Brief hystory of NLP and Word2Vec
Brief hystory of NLP and Word2VecBrief hystory of NLP and Word2Vec
Brief hystory of NLP and Word2VecSilverQ
 
연구실 세미나 Show and tell google image captioning
연구실 세미나 Show and tell google image captioning연구실 세미나 Show and tell google image captioning
연구실 세미나 Show and tell google image captioninghkh
 

Similar to FastText : Enriching Word Vectors with Subword Information (13)

[224] backend 개발자의 neural machine translation 개발기 김상경
[224] backend 개발자의 neural machine translation 개발기 김상경[224] backend 개발자의 neural machine translation 개발기 김상경
[224] backend 개발자의 neural machine translation 개발기 김상경
 
Word 2 Vec Algorithm
Word 2 Vec AlgorithmWord 2 Vec Algorithm
Word 2 Vec Algorithm
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)
 
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기
<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기<Little Big Data #1> 한국어 채팅 데이터로  머신러닝 하기
<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기
 
파이썬과 자연어 4 | word/doc2vec
파이썬과 자연어 4 | word/doc2vec파이썬과 자연어 4 | word/doc2vec
파이썬과 자연어 4 | word/doc2vec
 
(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classification(Papers Review)CNN for sentence classification
(Papers Review)CNN for sentence classification
 
Papago/N2MT 개발이야기
Papago/N2MT 개발이야기Papago/N2MT 개발이야기
Papago/N2MT 개발이야기
 
R project_pt1
R project_pt1R project_pt1
R project_pt1
 
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
 
Brief hystory of NLP and Word2Vec
Brief hystory of NLP and Word2VecBrief hystory of NLP and Word2Vec
Brief hystory of NLP and Word2Vec
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
 
자연어4 | 1차강의
자연어4 | 1차강의자연어4 | 1차강의
자연어4 | 1차강의
 
연구실 세미나 Show and tell google image captioning
연구실 세미나 Show and tell google image captioning연구실 세미나 Show and tell google image captioning
연구실 세미나 Show and tell google image captioning
 

FastText : Enriching Word Vectors with Subword Information

  • 1. NLP2 FastText Enriching Word Vectors with Subword Information 1
  • 2. • fastText Facebook AI Research Enriching Word Vectors with Subword Information , Advances in Pre-Training Distributed Word Representations . • Word2vec T. Mikolov , Y. Bengio NNLM, T. Mikolov Word2Vec (CBOW , Skip-gram) .
  • 3. Introduction • ) eat eating • • (morphological feature) - morphologically rich languages 1) 40 2) 15 • continuous skip-gram model subword information ! # 01. # 02. FastText
  • 4. Model • Word2Vec Skip-gram model • subword units • boundary symbols <, > • special unit • Bag-of-Characters , 𝑛 -gram Charaters • Embedding 𝑛 -gram vector ) where 2 ≤ 𝑛 ≤ 5 𝑛 -gram # 01. General Model # 02. Subword Model where <w, wh, he, er, re, e> 2-gram <wh, whe, her, ere, re> 3-gram <whe, wher, here, ere> 4-gram <wher, where, here> 5-gram <where> special unit v(where) = v(<w) + v(wh) + ... + v(here>) + v(<where>)
  • 5. Model • Scoring Function • OOV(out of vocaburary, ) ) birthplace n-grams birth place birthplace embedding vector • ( ) ex) where wherre character 3-grams - where <wh, whe, her, ere, re> - wherre <wh, whe, her, err, rre, re> # 02. Subword Model 𝐺 : 𝑛 -gram dictionary size 𝐺𝑤⊂ {1,…, 𝐺} : set of n-grams appealing in word 𝑤 𝑍𝑔: vector representation to each 𝑛 -gram 𝑔 2 subword # 03. This simple model allows ...
  • 6. Experimental setup • Word2Vec skipgram , cbow • SGD • training size = T, passes over the data = P, time = t • fixed parameter = 𝛾0 step size = 𝛾0(1 − 𝑡 𝑇𝑃 ) • word vectors dimension = 300 • negative samples = 5 • context window size = c, uniformly sample the size c between 1 and 5 • in order to subsample the most frequent words rejection threshold of 10−4 • keep the words that appear at least 5 times in training set • 𝛾0 = 0.025 for skip-gram(baseline), 0.05 for both cbow(baseline) and our model • (9 ) # 01. Baseline # 02. Optimization # 03. Implementation details (word2vec paramater ) # 04. Datasets
  • 7. Experiment • : human judgement( ) cosine similarity • n-gram vector sisg-, sisg skipgram, cbow • sisg (Subword Information Skip Gram) : • sisg - : Training data null vector • cbow skipgram : Word2Vec • cbow skipgram, sisg- Training data null vector . # 01. Human similarity judgement
  • 8. Experiment • sisg English WS353 (Common words) . RareWord dataset . • Dataset sisg sisg- subword information • Arabic / German / Russian data . # 01. Human similarity judgement
  • 9. Experiment • A is to B as C is to D model D • syntactic( ) task baseline • semantic( ) task • German Italian -grams size • Cs( ) De( ) (morphologically rich language) # 02. Word analogy tasks
  • 10. • sisg baseline • (1~20%) sisg- & cbow • cbow sisg sisg X • sisg train dataset Experiment # 03. Effect of the size of the training data
  • 11. • Semantic task : n , Syntactic task : n • 5,6 column • 2-gram Experiment # 04. Effect of the size of the training data
  • 12. Experiment • sisg (LSTM, sg) subword • Vocab Size (Cs( ), Ru( )) perplexity # 05. Language modeling
  • 13. Qualitative analysis • 3 character n-gram ) autofahrer(car driver) Fahrer(driver) • n-grams • (EN) ) ) • (FR) (inflections) # 01. Character n-grams and morphemes
  • 14. Qualitative analysis • • . ) "pav-", "-sphal-" : ( ) , "young", "adoles" : ( ) # 02. Word similarity for OOV words • X : OOV( ) • Y :
  • 15. Conclusion • FastText skip-gram n-gram subword information Word2Vec • trains fast & does not require any preprocessing or supervision • compute word representations for words that did not appear in the training data • State-of-the-art performance ! # Because of its simplicity...