[DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
1.
DEEP LEARNING JP
[DLPapers]
http://deeplearning.jp/
BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding
Makoto Kawano, Keio University
タスク1:Masked Language Model
●pre-traininig時とfine-tuning時で違いが生じてしまう
‣Fine-tuningの時に[MASK]トークンは見ない
‣ 常に置換するのではなく,系列のうち15%の単語を置き換える
• 例:my dog is hairy -> hairyが選択される
• 80%:[MASK]トークンに置換
• my dog is hairy -> my dog is [MASK]
• 10%:ランダムな別の単語に置換
• my dog is hairy -> my dog is apple
• 10%:置き換えない(モデルに実際に観測される言葉表現に偏らせる)
• my dog is hairy -> my dog is hairy
14