Multi-layers Convolutional Neural Network for Tweet Sentiment Classification

Multi-layers Convolutional Neural Network for Twitter
Sentiment Ordinal Scale Classification
AlAli Muath, Mohd Sharef Nurfadhlina, Hamdan Hazlina,
Azmi Murad Masrah Azrifah, and Husin Nor Azura
Intelligent Computing Research Group, Faculty of Computer Science and
Information Technology, UPM
nurfadhlina@upm.edu.my

Sentiment Analysis
• Opinion of the mass is important
– Political party may want to know
whether people support their
program or not.
– Before investing into a company,
one can leverage the sentiment of
the people for the company to
find out where it stands.
– A company might want find out
the reviews of its products

Twitter Sentiment Analysis Task
Credit: SemEval2015, Task 10

Need for Deep Learning
• Pros
• Automatic Feature
Selection.
• Ability to re-use
basic building blocks
to compose models
tailored to different
applications.
• Cons
• Tendency to over fit.
• Requires lots of data
• Powerful apparatus for learning
complex functions for ML
• Better at certain NLP tasks than
previous methods
• Pre-trained distributed
representation vector
• Word2vec, GloVe, GenSim, doc2vec.
• Vector space properties: similarity.
• Less feature engineering needed
• Network learns abstract representations
• Joint learning/execution of NLP
steps possible

Convolutional Neural Networks
• CNNs (ConvNets) widely used in image processing
• Location invariance
• Compositionality
• Fast
• Convolution layers
• “sliding window” over input representation: filter/kernel/feature generator
• Local connectivity
• Sharing weights
• Hyperparameters
• padding
• Filter size
• Number of filters
• Stride size
• Channels (R, G, B)

CNNs for Sentence Classification
Zhang, Y., & Wallace, B. (2015). A Sensitivity Analysis of (and Practitioners’ Guide to)
Convolutional Neural Networks for Sentence Classification

Problems
CNN approaches applied to TSA for 5 Point
Classification (SemEval2016)
– based on distance supervision to enhance the performance
(He et al. 2016; Ruder, Ghaffari, and Breslin 2016).
• However, distance supervision did not contribute to the performance.
– mainly used a simple structure of CNN (1 convolutional
layer with max pooling)
• based on methods that utilized for 2 and 3 scales to fit the ordinal
classification problem.
– In the case of ordinal scale, this structure of CNN is not enough to
capture the problem (He et al. 2016; Ruder, Ghaffari, and Breslin 2016).
MODULE Number of
convolutional
Filter Size Number of filters MAEM
INSIGHT-1 (He et al, 2016) 1 3,4,5 100 0.939
YZU-NLP (Rude et al, 2016) 1 3 64 1.111

The Proposed MLCNN
• Dataset provided by SemEval 2016 [8]; divided into training,
development, development test and test.
– Five-point scale (Highly Positive, Positive, Neutral, Negative, Highly
Negative)
• MLCNN Model
– trained on word embedding, we used publicly available GloVe
Embeddings [9] pre-trained on 2B tweets to initialize our word
embedding with 200- dimensional Glove vector.
– Maximum tweet length of 50, Embedding size =200
– applied different filter size (2,3,4) separately, and the combination of
them with different pooling technique (Max and Average).
– trained using Adam optimizer for 10 epoch and batch size 100.
Implemented using Keras library on a Theano backend.

The Proposed MLCNN
Input
Fully Connected
H+
+
N
-
H-
W1
W2
W3
.
.
.
.
Wn
d=200 Filter=100, Size=2
Average Pooling
Filter=100, Size=3
Average Pooling
Filter=100, Size=4
Average Pooling
SemEval2016
2 1 0 -1 -2 Total
Train 437 3154 1654 668 87 6000
DEVTEST 148 1005 583 233 31 2000
Test 382 7830 10081 2201 138 20632

Results
Model Number of Filter Filter Size
MAEM
Max Pooling Average PoolingMLCNN
100
2 0.768 0.763
3 0.782 0.780
4 0.803 0.772
2,3 0.735 0.676
3,4 0.766 0.715
2,3,4 0.628 0.617
INSIGHT-
1
100 3,4,5 0.939 N/A
YZU-NLP 64 3 1.111 N/A
TwiSE (Logistic Regression) 0.719

0.768 0.782 0.803
0.735
0.766
0.6280.763
0.78
0.772
0.676 0.715
0.617
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 3 4 2,3 3,4 2,3,4
Max-Pooling Avg-Pooling
The impact of pooling strategies
Results

Conclusion
• A complex CNN model is developed to extract a
comprehensive representation in five-point TSA
– Expressiveness of essential syntactic, semantics and sentiment in a
tweet and to enhance the learning capacity.
• Determine and investigate the performance between max
and average pooling to retrieve and preserve the most
significant features for ordinal scale.
– Average pooling outperforms the max pooling
• MLCNN outperforms the state of the art (Twise) by 14.1%
improvements and compared to the last best similar
approach (INSIGHT-1) by 34.3% improvements.

References
1. Balikas, G., & Amini, M. (2016). TwiSE at SemEval-2016 Task : Twitter Sentiment Classification. Int Workshop on
Sem-Eval, 2016., 85–91.
2. He, Y., Yu, L., Yang, C., Lai, K. R., & Liu, W. (2016). YZU-NLP Team at SemEval-2016 Task 4 : Ordinal Sentiment
Classifi- cation Using a Recurrent Convolutional Network, 256–260.
3. Ruder, S., Ghaffari, P., & Breslin, J. G. (2016). INSIGHT-1 at SemEval-2016 Task 4: Convolutional Neural Networks
for Sentiment Classification and Quantification. Int Workshop on Sem-Eval, 2016. Accepted, 178–182.
4. Er, M. J., Zhang, Y., Wang, N., & Pratama, M. (2016). Attention pooling-based convolutional neural network for
sentence modelling. Information Sciences, 373, 1339–1351. https://doi.org/10.1016/j.ins.2016.08.084
5. Esuli, A., Faedo, I. A., & Moruzzi, G. (2016). ISTI-CNR at SemEval-2016 Task 4 : Quantification on an Ordinal Scale.
Int Workshop on Sem-Eval, 2016. Accepted, 92–95.
6. Florean, C., Bejenaru, O., Apostol, E., Ciobanu, O., Iftene, A., & Trandab, D. (2016). SentimentalITsts at SemEval-
2016 Task 4 : building a Twitter sentiment analyzer in your backyard, 248–251.
7. Giachanou, A., & Crestani, F. (2016). Like it or not: A survey of Twitter sentiment analysis methods. ACM Comput
Surv, 49(2), Article 28; 1-41. https://doi.org/10.1145/2938640
8. P. Nakov, A. Ritter, S. Rosenthal, and F. Sebastiani.: SemEval-2016 Task 4 : Sentiment Analysis in Twitter.( 2016).
9. Go, A., Bhayani, R., & Huang, L. (2009). Twitter Sentiment Classification using Distant Supervision. Processing,
150(12), 1–6. https://doi.org/10.1016/j.sedgeo.2006.07.004
10. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In:Proceedings of the
2014 Conference On Empirical Methods In Natural Language Processing, pp. 1532–1543 (2014)
11. He, Y., Yu, L., Yang, C., Lai, K. R., & Liu, W. (2016). YZU-NLP Team at SemEval-2016 Task 4 : Ordinal Sentiment
Classifi- cation Using a Recurrent Convolutional Network, 256–260.
12. Johnson, R., & Zhang, T. (2015). Effective Use of Word Order for Text Categorization with Convolutional Neural
Networks. Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the
ACL, (2011), 103–112.
13. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A Convolutional Neural Network for Modelling
Sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 655–665.
Retrieved from http://goo.gl/EsQCuC

Thank You
nurfadhlina@upm.edu.my

Multi-layers Convolutional Neural Network for Tweet Sentiment Classification

More Related Content

Similar to Multi-layers Convolutional Neural Network for Tweet Sentiment Classification

More from Nurfadhlina Mohd Sharef

Recently uploaded

Multi-layers Convolutional Neural Network for Tweet Sentiment Classification