The document discusses BERT, a new linguistic model that uses transformers to represent text. BERT performs very well on natural language understanding tasks and there are different methodologies for adapting BERT to specific tasks, including further pre-training, retraining strategies, and multitasking learning. The basic BERT model has 12 transformer blocks and 512 token input, and outputs vector representations. Research is ongoing to improve subject-specific text classification using BERT.
3. Basic Concepts about the new BERT
linguistic model
● Today, most advanced text models use transformers to teach how to represent
text.
● Ease of use - one output layer to existing neural architecture to obtain state-of-
art accuracy in several NLP tasks
● 2 categories of NLP tasks:
○ Holistic
○ Tokenized
● Masked Language Models
● 2 stages of BERT model training
● Performed very well on GLUE, SQuAD, and SWAG (natural language
understanding tasks)
4. BERT Retraining Methodology for Text
Problems
2 groups of methodologies:
● The use of pretrained models (transfer learning)
● Multitasking Learning
When adapting BERT to specific word processing tasks,
a special retraining technique is required. 3 types of
techniques:
1. Further pre-training
2. Retraining strategies
3. Multitasking Learning
“Catastrophic Forgetting”
Basic BERT model:
● An encoder with 12 transformer blocks, 12
attention areas, and a textual
representation dimension of 768.
● 512 tokens input, and outputs its vector
representation
● SEP, CLS tokens
5. improving the
subject-specific
classification of
texts using BERT
Traditional text embedding models
represent tokens as an embedding
Problems: ambiguity, subject-
specificity
General Universal Text Model, pre-
trained on a large corpus of general
purpose texts
Research is ongoing, potential as a
universal text model has not yet been
revealed