BERT入門

)(
3
C Te TC a
C RTs Ci C C
ü t t p s a s g C
• (/ 2) / H N Cs L
• s C C N
• Nv
• ( - N
• . N
• coRh C L (/ 2) /
• D V s m LirgS nd I GN C
dpa V E
A :

2015 32016 h c 1 1 P P b 2
( )
0.
9 2
. . 1
2

5
Kaggle 2019 11 125,564 627 (top 0.5%)
SIGNATE
ü 2
https://www.slideshare.net/matsukenbook/signate-108228406

( ( ) )( ( ) ( (
01
https://gluebenchmark.com/leaderboard

9
6 IB 5B MF
EB / MLPN C 4F DPFN F - BL F F S / 4- 5 EBQ N / MM kdi l o
EB C MA B FIB MBB - PM S o W xo
5F M N C 9BNB M E 7 M LEM NB / MLPN 597/ 2 - PM S ho r o
' BI F BR P FIF MF S B EI M 7B MN LB MI / MM ho o u# o w
8P M 8PBN F 7 FMN 887 2 - PM S ho r o
5P F643 5 EBA 5643 I - PM S
2ho contradiction, entailment, neutralo3 Xtrain
n ergenreo
5P F643 5FNI EBA 5643 II - PM S
2ho contradiction, entailment, neutralo3 Xtrain
n dm genreo
8PBN F 643 8643 - PM S
F FLBAF o n di u r o
9B D FTF D BR P 1 F IB 9 1 - PM S ho r o
# F DM A 643 643 - PM S
C u rk d e ab trpo r
fsu ir
0F D N F N 5 F - 5 EBQ N / MM
contradiction, entailment, neutralo3 Xtrain datamd GLUE
y W dg
o ho z w csr

11
1
ü / : : s : : SbF d
aF. /gfe lind
ü / : :
- B 6 - 6 - B -
. E 1 / 6: : .1/
gfed tRF mkn R : : N
ü 2 B 6 urdk f lind
ü 2 B N B T o ed
• 4 B (
• (
x
ü / B: : 6: B: Bd M 4 66:
) 2 L b p P Flin r cb

- 1 2 1 1
13
0 B 4: : 8 C 45 .: BC 45
h BB 4 F: 8 45 ) ) (
0 B 4: : 8 .: BC Wo T dbe P unPR0
B 4: : 8To P vyo wtWPR
ü dbe PTo wt ( 1 2 21
( 1 ( ( )1 1 1 1
1 B 1 2 ) iL e
E :8 B
ü a : : :4 (sx C
)sx
0 B 4: : 8To PRE :8 B vyo P rn
-12 e gW k /4 p lk
W n wt
1 2
N E MF L

14
text
1
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
text two kids are playing in a swimming pool with a green colored crocodile float .
ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012
text
2
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12
text two kids push an in ##fl ##atable crocodile around in a pool .
ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012
4 9 : :olpi uT t ir T ?:B :
r ShW Pz g
: - B 7 : . B 7 :
B 7 : ea g P g
:# B:9_3 ?:B : ir 4 9 : :i uT t r T ?:B :
B5 9colp nkm g s :# B:9 ?:B : _ (
x = - = B : B : B5 : ?:B : = /= = = . ?:B :
_ : 232#0 _ B9: ++df
b t _)

15
['[CLS]', 'two', 'kids', 'are', 'playing', 'in', 'a', 'swimming', 'pool', 'with', 'a',
'green', 'colored', 'crocodile', 'float', '.', '[SEP]', 'two', 'kids', 'push', 'an',
'in', '##fl', '##atable', 'crocodile', 'around', 'in', 'a', 'pool', '.', '[SEP]']
text
1
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
text two kids are playing in a swimming pool with a green colored crocodile float .
ids 2048 4268 2024 2652 1999 1037 5742 4770 2007 1037 2665 6910 21843 14257 1012
text
2
position_id 0 1 2 3 4 5 6 7 8 9 10 11 12
text two kids push an in ##fl ##atable
crocod
ile
aroun
d
in a pool .
ids 2048 4268 5245 2019 1999 10258 27892 21843 2105 1999 1037 4770 1012

1 1: 1
16
[CLS] two kids are playing kids[SEP] two kidstwo [SEP]
( o % 8 ) o - n s
8 o EA B E n s o
n s l r 4 5t c 5 t
0 11 20 3 , 11 5 e p 5
playing two two
[CLS] two kids are [MASK] kids[SEP] dog kids[MASK] [SEP]
pred2 pred3 pred4
kids
pred1
cross entropy
loss
cross entropy
loss
cross entropy
loss
cross entropy
loss

2 - 2 2 2
17
N I 1 2
IsNext
pred1
cross entropy
loss
e
( ) ) )
[CLS] two kids are playing kids[SEP] two kidstwo [SEP]
/

18
#
unk_token = “[UNK]”, #
sep_token = “[SEP]”, #
pad_token = “[PAD]”, #
cls_token = “[CLS]”, #
mask_token = "[MASK]", # pre-training
https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L141-L145

20
/
- E B
https://huggingface.co/transformers/index.html

21
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text1 = "[CLS] Two kids are playing in a swimming pool with a green colored crocodile
float. [SEP]"
text2 = "Two kids push an inflatable crocodile around in a pool. [SEP]"
tokenized_text = tokenizer.tokenize(text1 + " " + text2)
print(tokenized_text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
pos_sep = tokenized_text.index(“[SEP]”)+1 # al o gi [SEP] 1st sentence
segments_ids = [0]*pos_sep + [1]*(len(indexed_tokens)-pos_sep)
tokens_tensor = tf.Variable([indexed_tokens])
segments_tensors = tf.Variable([segments_ids])
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
print(model.summary())
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
print(outputs)
r Pf BF E gd T - P
C - c C fm nk Fph
e R
-

22
https://github.com/huggingface/transformers/blob/master/src/transformers/tokenization_bert.py#L32-L52
vocab
PRETRAINED_VOCAB_FILES_MAP = {
"vocab_file": {
"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
"bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
"bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
"bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
…
"bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/vocab.txt",
"bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/vocab.txt",
}
}
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_tf_bert.py#L32-L52
weight
TF_BERT_PRETRAINED_MODEL_ARCHIVE_MAP = {
"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5",
"bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-tf_model.h5",
"bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-tf_model.h5",
"bert-large-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-tf_model.h5",
"bert-base-multilingual-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5",
"bert-base-multilingual-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-tf_model.h5",
…
"bert-base-finnish-cased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-cased-v1/tf_model.h5",
"bert-base-finnish-uncased-v1": "https://s3.amazonaws.com/models.huggingface.co/bert/TurkuNLP/bert-base-finnish-uncased-v1/tf_model.h5",
}

24
L . -/ . 1 . --/ - 5 51 5 0 1 5 5 5 51 5 /- 5 / 5 # ##
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
layer_norm_eps=1e-12,
:

25
TFBertForSequenceClassification
TFBertMainLayer
TFBertEmbedding TFBertEncorder TFBertPooler
TFBertLayer
TFBertLayer
TFBertLayer
TFBertLayer
TFBert Attention
TFBertSelfAttention TFBertSelfOutput
TFBertIntermediate TFBertOutput
: /1 //1 /. :. : : : :. : 1 / .
//1 /. C :. : C ::1.1 1 F B
: 21 /
C

26
TFBertMainLayer
TFBertPooler
TFBertEmbedding TFBertEncorder
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBertLayerTFBertLayer
TFBert Attention
input_ids
position_ids
token_type_ids
input_embeds
(n_seq,)
(n_seq,)
(n_seq,)
(n_seq, dim)
input
attention_mask
(n_seq,)
Dense
DD N
Extract
first seq
a e
i F h
(dim, ) (dim, )(n_seq, dim)
= bgF
(n_seq, dim)
pooled_output
sequence_output
(dim,)
(n_seq, dim)
N N
Dropout Dense
DD N
(dim, ) (n_class)
output
BERT c Dropout, Dense _B fTS
dC

27
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
input_ids
position_ids
token_type_ids
input_embeds
8 ab
pd c n
heT(
(n_seq,)
(n_seq,)
(n_seq,)
(n_seq, dim)
8 , , 8 , , 8 o
Weight
[word_embeddings]
Embedding
[position_embeddings]
Embedding
[token_type_embeddings]
gather
(vocab_size, dim)
(n_seq, dim)
(type_vocab_size, dim)
+
(n_seq, dim)
LayerNormalization Dropout hidden_status
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
max_position_embeddings=512
( 8 7
sq[ m
N l i
! " ] E
_f r
) 8 y /
/ n L
gx 8 ] E k
6 6 , t z
8
input
(n_seq, dim)(n_seq, dim)(n_seq, dim)

28
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
TFBertEncoder
hidden_status
attention_mask
input
(n_seq, dim)
(n_seq,)
TFBertLayer TFBertLayer TFBertLayer TFBertLayer
hidden_status
2
(n_seq, dim)
22 2 2 12

TFBertLayer
TFBertSelfOutput
29
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
hidden_status
attention_mask
input
(n_seq, dim)
(n_seq,) TFBertSelfAttention
TFBertSelfOutput
Dense Layer
NormalizationDropout
TFBertIntermediate
Dense gelu
TFBertOutput
Dense Dropout
Layer
Normalization+
hidden_status
(n_seq, dim)
(n_seq,
dim)
(n_seq,
dim) (n_seq, dim)
(n_seq,
dim)
(n_seq,
dim)
(n_seq,
dim)
(n_seq,
dim) (n_seq, dim)
D A
A
=A

A z c lnpo cbvs
30
TFBertMainLayer
TFBertLayer
TFBertLayer
TFBert Attention
hidden_status
attention_mask
multi head
multi head
multi head
Dense
[query (Q)]
Dense
[ Key (K)]
Dense
[ Value (V)]
input
= A u dcb
- [mf g
q ehi ]
Qc[_a A DNMMb
u[- vs N
b
]
tS
(n_seq, dim)
D = A e e AA A e Tb
(n_seq, dim)
(n_head, n_seq,
dim/n_head) (n_head, n_seq, n_seq)
(n_seq,)
softmax Dropout hidden_status
x _ [
A c
cb
attention_mask
attention_mask
attention_mask
attention_mask
(n_head, n_seq,
n_seq)
(n_head, n_seq,
n_seq)
(n_head, n_seq,
dim/n_head)
Reshape
(n_seq, dim)
x _ b
Q vseQQ ]
A A (
) NtS M cbu
AA A e AA A K
[
+
sMc KV A
e r

31
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
1 2 = 1
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
dim/n_head
=64
dim/n_head
=64
dim/n_head
=64
n_head = 12
multi head

[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
( )(
32
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
max_position_embeddings = 512
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
Q lh K lh i d
!"
!#
!$
!%
!&##
'" '# '$
!(
!&
'% '&##
!( '$ i
) ( e D D ) ( am
Key
Query
lh ( n D b )( D D g

( )(
33
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
softmax
m g D x w tl
Key
Query
QueryK “kids”
KeyKo s K
Softmax q
i1 b
q ehd K () f n
sK (K K n ra

( )(
34
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
[PAD]
Key
Query
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
!"
!#
!$
!%
!&##
!'
!&
[CLS]
two
kids
are
playing
in
a
swimming
pool
…
[SEP]
[PAD]
…
[PAD]
hidden_size=768
[CLS]
two
kids
are
playing
in
a
swimming
pool
[SEP]
[PAD]
[PAD]
Qweight h
D gd ( ) 1Q ( a ( ( )
hQ
V gd ( ( ) i ( ( ( Q be

35
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
https://arxiv.org/abs/1810.04805
Transformers
https://huggingface.co/transformers
3rd party pre-trained
https://huggingface.co/models

BERT入門

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BERT入門

Similar to BERT入門 (20)

More from Ken'ichi Matsui

More from Ken'ichi Matsui (20)

Recently uploaded

Recently uploaded (20)

BERT入門