The document discusses BERT, a new linguistic model that uses transformers to represent text. BERT performs very well on natural language understanding tasks and there are two main categories for adapting BERT to specific tasks: using pretrained models through transfer learning and multitasking learning. When retraining BERT for specific tasks, techniques like further pre-training, retraining strategies, and multitasking learning are required to prevent catastrophic forgetting. The basic BERT model has 12 transformer blocks and 512 token input/output. Research is ongoing to improve subject-specific text classification using BERT.