Transformer and BERT models are described. RNNs process sequences sequentially while Transformers allow parallel computation without dependency between time steps, making Transformers faster. BERT is a Transformer-based model that was pre-trained on two tasks: predicting masked words and predicting sentence order. During fine-tuning, BERT can be adapted to tasks like sentiment analysis and document retrieval by adding a classification layer and fine-tuning the entire model end-to-end with a cross entropy loss.