Convolutional Neural Networks
for Sentiment Classification
何云超
yunchaohe@gmail.com
Word Vectors
• CNN中使用词向量的三种方法
• 作为网络参数,在模型训练中学习,随机初始化
• 使用词向量模型 (word2vec, GloVe等) 训练词向量,在模型训练中保持不
变
• 使用词向量模型 (word2vec, GloVe等) 训练词向量,用于网络初始化,在
模型训练中调整
Sentence Matrix
• 矩阵中的每一行或者每一列为一个词向
量
Convolutional Layer
• Wide Convolution
• Narrow Convolution
The red connections all
have the same weight.
s+m-1=7-5+1=3 s+m-1=7+5-1=11
Pooling Layer
• Max pooling: The idea is to capture the most important feature—one
with the highest value—for each feature map.
Dropout: A Simple Way to Prevent Neural
Networks from Overfitting
• Consider a neural net with one hidden layer.
• Each time we present a training example, we
randomly omit each hidden unit with probability
0.5.
• So we are randomly sampling from 2^H different
architectures.
• All architectures share weights.
• Dropout prevents units from co-adapting (共同作用)
too much.
H
Dropout: A Simple Way to Prevent Neural
Networks from Overfitting
CNN for Sentence Classification [1]
• Two channels
• CNN-rand
• CNN-non-static
• CNN-static
• CNN-multichannel
DCNN Overview [2]
• Convolutional Neural
Networks with Dynamic 𝑘-
Max Pooling
• Wide Convolution
• Dynamic 𝑘-Max Pooling
• 𝑙:当前卷积层数
• 𝐿:卷积曾总数
• 𝑠:句子长度
• 𝑘 𝑡𝑜𝑝:最高层卷积层参数
• Dynamic 𝑘-Max Pooling
• 𝑙:当前卷积层数
• 𝐿:卷积曾总数
• 𝑠:句子长度
• 𝑘 𝑡𝑜𝑝:最高层卷积层参数
• 例
• IF, 𝐿 = 3, 𝑠 = 18 𝑘 𝑡𝑜𝑝 = 3
• Then,
1
3 1
max(3, 18 ) max(3,12)=12
3
k
 
   
 
2
3 2
max(3, 18 ) max(3,6)=6
3
k
 
   
 
• Folding
• 问题:
• 卷积操作独立作用于每一行
• 同一行中建立了复杂的依赖
• 全连接层之前,不同行之间相
互独立
• 因此:
• Folding操做将每两行相加
• d行降低为d/2
• 每一行都依赖于下层中的两行
Semantic Clustering [3]
Sentence
Matrix
Semantic
Candidate
Units
Semantic
Units
m=2, 3, …, 句子长度/2
Semantic Cliques
Semantic Clustering
Sentence
Matrix
Semantic
Candidate
Units
Semantic
Units
m=2, 3, …, 句子长度/2
Semantic Cliques
Semantic
cliques
seq-CNN [4]
• 受启发于图像有RGB、CMYK多通道的思想,将句子视为图像,句
子中的单词视为像素,因此一个d维的词向量可以看成一个有d个
通道的像素
• 例 词汇表
句子
句向量
多通道
. . .
[0 0 0] [0 0 0] [1 0 0 ] [0 0 1] [0 1 0]
Enrich word vectors
• 使用了字符级的向量 (character-level embeddings),将词向量和字
符向量的合并在一起作为其向量表示。 [5]
• 使用传统的文本特征来扩展词向量,主要包括:大写单词数量、
表情符号、拉长的单词 (Elongated Units)、情感词数量、否定词、
标点符号、clusters、n-grams。[6]
MVCNN: Multichannel
Variable-Size Convolution [7]
• 不同word embeddings所含有的单
词不一样
• HLBL
• Huang
• GloVe
• SENNA
• Word2vec
• 对某些unknown words的处理
• Randomly initialized
• Projection: (mutual learning)
𝑎𝑟𝑔𝑚𝑖𝑛||𝑤𝑗 − 𝑤𝑗||2
MVCNN:Training
• Pretraining
• Unsupervised training
• Average of context word vectors as a
predicted representation of the
middle word
• To produce good initial values
• Training
• Logistic regression
References
[1] Kim, Y. (n.d.). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on
Empirical Methods in Natural Language Processing (EMNLP).
[2] Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (n.d.). A Convolutional Neural Network for Modelling Sentences.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
[3] Wang, P., Xu, J., Xu, B., Liu, C. L., Zhang, H., Wang, F., & Hao, H. (2015). Semantic Clustering and Convolutional
Neural Network for Short Text Categorization. In Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 2, pp. 352-
357).
[4] Johnson, R., & Zhang, T. (n.d.). Effective Use of Word Order for Text Categorization with Convolutional Neural
Networks. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies.
[5] dos Santos, C. N., & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In
Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland.
[6] Tang, D., Wei, F., Qin, B., Liu, T., & Zhou, M. (2014, August). Coooolll: A deep learning system for twitter sentiment
classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 208-212).
[7] Wenpeng Yin, Hinrich Schütze. Multichannel Variable-Size Convolution for Sentence Classification. The 19th SIGNLL
Conference on Computational Natural Language Learning (CoNLL'2015, long paper). July 30-31, Peking, China.
谢谢聆听
Q&A
何云超 yunchaohe@gmail.com

Convolutional neural networks for sentiment classification

  • 1.
    Convolutional Neural Networks forSentiment Classification 何云超 yunchaohe@gmail.com
  • 2.
    Word Vectors • CNN中使用词向量的三种方法 •作为网络参数,在模型训练中学习,随机初始化 • 使用词向量模型 (word2vec, GloVe等) 训练词向量,在模型训练中保持不 变 • 使用词向量模型 (word2vec, GloVe等) 训练词向量,用于网络初始化,在 模型训练中调整
  • 3.
  • 4.
    Convolutional Layer • WideConvolution • Narrow Convolution The red connections all have the same weight. s+m-1=7-5+1=3 s+m-1=7+5-1=11
  • 5.
    Pooling Layer • Maxpooling: The idea is to capture the most important feature—one with the highest value—for each feature map.
  • 6.
    Dropout: A SimpleWay to Prevent Neural Networks from Overfitting • Consider a neural net with one hidden layer. • Each time we present a training example, we randomly omit each hidden unit with probability 0.5. • So we are randomly sampling from 2^H different architectures. • All architectures share weights. • Dropout prevents units from co-adapting (共同作用) too much. H
  • 7.
    Dropout: A SimpleWay to Prevent Neural Networks from Overfitting
  • 8.
    CNN for SentenceClassification [1] • Two channels • CNN-rand • CNN-non-static • CNN-static • CNN-multichannel
  • 9.
    DCNN Overview [2] •Convolutional Neural Networks with Dynamic 𝑘- Max Pooling • Wide Convolution • Dynamic 𝑘-Max Pooling • 𝑙:当前卷积层数 • 𝐿:卷积曾总数 • 𝑠:句子长度 • 𝑘 𝑡𝑜𝑝:最高层卷积层参数
  • 10.
    • Dynamic 𝑘-MaxPooling • 𝑙:当前卷积层数 • 𝐿:卷积曾总数 • 𝑠:句子长度 • 𝑘 𝑡𝑜𝑝:最高层卷积层参数 • 例 • IF, 𝐿 = 3, 𝑠 = 18 𝑘 𝑡𝑜𝑝 = 3 • Then, 1 3 1 max(3, 18 ) max(3,12)=12 3 k         2 3 2 max(3, 18 ) max(3,6)=6 3 k        
  • 11.
    • Folding • 问题: •卷积操作独立作用于每一行 • 同一行中建立了复杂的依赖 • 全连接层之前,不同行之间相 互独立 • 因此: • Folding操做将每两行相加 • d行降低为d/2 • 每一行都依赖于下层中的两行
  • 12.
  • 13.
  • 14.
  • 15.
    Enrich word vectors •使用了字符级的向量 (character-level embeddings),将词向量和字 符向量的合并在一起作为其向量表示。 [5] • 使用传统的文本特征来扩展词向量,主要包括:大写单词数量、 表情符号、拉长的单词 (Elongated Units)、情感词数量、否定词、 标点符号、clusters、n-grams。[6]
  • 16.
    MVCNN: Multichannel Variable-Size Convolution[7] • 不同word embeddings所含有的单 词不一样 • HLBL • Huang • GloVe • SENNA • Word2vec • 对某些unknown words的处理 • Randomly initialized • Projection: (mutual learning) 𝑎𝑟𝑔𝑚𝑖𝑛||𝑤𝑗 − 𝑤𝑗||2
  • 17.
    MVCNN:Training • Pretraining • Unsupervisedtraining • Average of context word vectors as a predicted representation of the middle word • To produce good initial values • Training • Logistic regression
  • 18.
    References [1] Kim, Y.(n.d.). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). [2] Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (n.d.). A Convolutional Neural Network for Modelling Sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). [3] Wang, P., Xu, J., Xu, B., Liu, C. L., Zhang, H., Wang, F., & Hao, H. (2015). Semantic Clustering and Convolutional Neural Network for Short Text Categorization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 2, pp. 352- 357). [4] Johnson, R., & Zhang, T. (n.d.). Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. [5] dos Santos, C. N., & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland. [6] Tang, D., Wei, F., Qin, B., Liu, T., & Zhou, M. (2014, August). Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 208-212). [7] Wenpeng Yin, Hinrich Schütze. Multichannel Variable-Size Convolution for Sentence Classification. The 19th SIGNLL Conference on Computational Natural Language Learning (CoNLL'2015, long paper). July 30-31, Peking, China.
  • 19.