sliffffffffffffffffffdasddasdffffffffh2.pptx

PRESENTATION TITLE
SUBTITLE. SUBTITLE. SUBTITLE.
SUBTITLE. SUBTITLE.

5
3. BERT
The BERT model is built upon the Transformer
architecture, which consists of a stack of identical layers

6
3. BERT
BERT’s pre-training process consists of two
main stages:
● pre-processing
● pre-training

7
3. BERT
In the pre-processing stage, the text data is tokenized
into sentences and then further divided into smaller
segments called "tokens." These tokens are then
encoded into numerical representations using a
vocabulary mapping.

8
3. BERT
In the pre- training, BERT is trained on the pre-
processing data:
● MLM task
● NSP task

9
3. BERT
Masked Language Model
Masked Language Modeling (MLM): BERT randomly
masks a certain percentage of the input tokens and
predicts the masked tokens based on the surrounding
context

10
3. BERT
Next Sentence Prediction (NSP): BERT also
learns to predict whether two sentences appear
consecutively in the original text or not

11
3. BERT
BERT is trained on a large corpus of unlabeled text,
such as Wikipedia and BooksCorpus, with large scale
up to billions of words and millions of sentences. This
diverse and extensive dataset helps BERT in learning
rich and contextual language representations.

12
3. BERT
Although BERT learns contextual representations during
pre-training, it needs to be further fine-tuned on task-
specific data to achieve optimal performance.
Illustration of the pre-training / fine-tuning approach. 3 different downstream NLP tasks, MNLI, NER, and
SQuAD, are all solved with the same pre-trained language model, by fine-tuning on the specific task.
Image credit: Devlin et al 2019.

14
4. EXPERIMENTS
4.1 GLUE
The General Language Understanding Evaluation
(GLUE) benchmark is a collection of resources for
training, evaluating, and analyzing natural language
understanding systems.

15
4. EXPERIMENTS
Table 1: GLUE test results

16
4. EXPERIMENTS
4.2 SQuAD1.1
The Stanford Question Answering Dataset
(SQuAD v1.1) is a collection of 100k crowd-
sourced question/answer pairs

17
4. EXPERIMENTS
Table 2: SQuAD 1.1 results

18
4. EXPERIMENTS
4.3 SQuAD2.0
The SQuAD 2.0 task extends the SQuAD 1.1
problem definition by allowing for the
possibility that no short answer exists in the
provided paragraph, making the problem
more realistic.

19
4. EXPERIMENTS
Table 3: SQuAD 2.0 results.
Table 3: SQuAD 2.0 results

sliffffffffffffffffffdasddasdffffffffh2.pptx

More Related Content

Similar to sliffffffffffffffffffdasddasdffffffffh2.pptx

Recently uploaded

sliffffffffffffffffffdasddasdffffffffh2.pptx