BERT

The Illustrated BERT,
ELMo, and co.
http://jalammar.github.io/illustrated-bert/
전상현

오늘 알아가야 할 것
NLP’s Imagenet Moment
Transfer Learning
BERT
Transformer?

Imagenet Moment
Transfer Learning

http://ruder.io/nlp-imagenet/
https://www.slideshare.net/iljakuzovkin/paper-overview-deep-residual-learning-for-image-recognition

Transfer Learning
• In practice, very few people train an entire Convolutional Network from scratch (with
random initialization), because it is relatively rare to have a dataset of suﬃcient size.
Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet,
which contains 1.2 million images with 1000 categories), and then use the ConvNet
either as an initialization or a ﬁxed feature extractor for the task of interest.
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a

• Pretrained ImageNet models have been used to achieve state-of-the-art
results in tasks such as object detection, semantic segmentation, human pose
estimation, and video recognition
https://www.slideshare.net/iljakuzovkin/paper-overview-deep-residual-learning-for-image-recognition

From Shallow to Deep Pre-Training
• At the core of the recent
advances of ULMFiT, ELMo, and
the OpenAI transformer is one
key paradigm shift: going from
just initializing the ﬁrst layer of
our models to pretraining the
entire model with hierarchical
representations

• It is very likely that in a year’s
time NLP practitioners will
download pretrained language
models rather than pretrained
word embeddings
http://ruder.io/nlp-imagenet/

Imagenet Moment
dd
학습된 모형을 다운로드 각자 목적에 맞게 학습
https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148

NLP’s Imagenet Moment
학습된 모형을 다운로드 각자 목적에 맞게 학습
https://github.com/google-research/bert
https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148

BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
Bidirectional Encoder Representations
from Transformers

Input Representation
• WordPiece embedding

• Positional embeddings with supported sequence length
up to 512 tokens

•

WordPieceModel
https://github.com/google/sentencepiece

we do not use traditional left-to-
right or right-to-left language
models to pre-train BERT.
Task #1: Masked LM
Task #2: Next Sentence Prediction
Unsupervised Learning

Training of BERTBASE was performed on 4
Cloud TPUs in Pod conﬁguration (16 TPU chips
total). Training of BERTLARGE was performed
on 16 Cloud TPUs (64 TPU chips total). Each
pretraining took 4 days to complete.

RNN과 Transformer의 차이점
1. RNN은 순서대로 연산을 해야 해서 병렬화가 어려움
2. Transformer(Attention)을 사용하면 병렬화 가능
3. 병렬화가 가능해서 속도도 빠르고,  
더 깊고 넓은 네트워크 학습이 가능해짐
http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/http://jalammar.github.io/illustrated-transformer/

Exploring Randomly Wired Neural Networks
for Image Recognition

결국 중요한 것은 연결을 하나하나 어떻게
하느냐 보다는 큰(넓고 깊은) 네트워크가 잘
학습될 수 있는 구조를 만들고 많은 데이터
를 가지고 잘 학습시키면 된다.

이제 본론으로…
http://jalammar.github.io/illustrated-bert/
나중에(이어서?) 볼 것.
http://jalammar.github.io/illustrated-transformer/
http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-
seq2seq-models-with-attention/

BERT

More Related Content

What's hot

Similar to BERT

Recently uploaded

BERT