This document discusses and compares two neural network transformer models, BERT and ERNIE, for sentiment analysis. BERT uses bidirectional training of language representations to learn contextual relations between words. ERNIE enhances BERT by integrating knowledge from lexical, syntactic and semantic data during training. The document analyzes how ERNIE uses different masking techniques compared to BERT to better model semantic relationships between words and entities. Experimental results on product review datasets show ERNIE achieves better performance than BERT for sentiment classification tasks.