SlideShare a Scribd company logo
1 of 5
Download to read offline
International Journal of Modern Research in Engineering & Management (IJMREM)
||Volume|| 2||Issue|| 2 ||Pages|| 01-05 || February 2019|| ISSN: 2581-4540
www.ijmrem.com IJMREM Page 1
MIM (Mobile Instant Messaging) Classification using Term
Frequency-Inverse Document Frequency (TF-IDF) and Bayesian
Algorithm
1,
Kashaf-u-Duja, 2,
Muhammad Bux Alvi, 3,
Tariq Jameel Saifullah Khanzada,
4,
Nisha Kumari
1,4,
Institute of Information and Communication Technology, Mehran University of Engineering and Technology
Jamshoro
2,
Department of Computer Systems Engineering, The Islamia University of Bahawalpur
3,
Department of Computer Systems Engineering, Mehran University of Engineering and Technology Jamshoro
---------------------------------------------------ABSTRACT------------------------------------------------------
The focus of the study is based on binary sentiment classification on aspect level to develop a hybrid sentiment
classification framework of WhatsApp MIMs (Mobile Instant Messages). It has been carried out into two phases
i.e. training phase and testing phase. The training phase, 75% data is used for training dataset. Pre-processing
techniques like tokenization, removing stop words, case normalization, removing punctuation and stemming are
applied to acquire cleaner dataset to be used as input. The output is sent to the classifier after applying TF-IDF
for feature weighting. In the second phase, the classifier is trial with 25% testing dataset. Bernoulli’s Naïve
Bayesian classifier which is an improved form of traditional Naïve Bayesian classifier is used to classify
sentiments. There are 417 messages in total where 244 and 173 are classified as positive and negative
respectively. The proposed model has achieved satisfactory results up to 81.73% in comparison to base-line
classification model by getting 12 points higher accuracy i.e. 69.23%.
KEYWORDS: Mobile Instant Messages (MIMs), Naïve Bayesian, Sentiment classification, TF-IDF,
WhatsApp
-------------------------------------------------------------------------------------------------------------------------------------------
Date of Submission: 30 January 2019 Date of Accepted: 03 February 2019
-------------------------------------------------------------------------------------------------------------------------------------------
I. INTRODUCTION
Web development has changed human interaction and communication substantially and has prompted huge and
quick development in user generated data [4]. It is estimated that 95% of available data is unstructured. To
extract information and create knowledge from raw resources it needed to be processed properly and analyzed
correctly because knowledge present in text data is not directly accessible through computers [1]. With the
striking development of social media platforms like Facebook, Twitter, WhatsApp, WeChat etc, more and more
people post online texts on different platforms to express their opinions on social issues and share their reviews
[5]. Significant consideration has focused on examining this data in terms of the sentiment it conveys, which has
resulted in the emergence of the sentiment analysis research field. It involves the computational analysis of user-
generated data, such as reviews, to determine its orientation (positive, negative or neutral). There are two main
reasons to automate sentiment analysis: first, the abundance of online data is beyond human analysis; and
second, public opinion is a significant consideration when governments, institutions, and individuals are making
decisions [4]. Utilization of WhatsApp text data has increased more problems such as word-shortening,
neologism, and spelling variations. Traditional machine learning methods have proved inadequate to accomplish
the task.
To address this problem, we proposed a methodology based on binary sentiment classification on aspect level.
This work is focused on developing a hybrid sentiment classification framework for WhatsApp MIMs using
recursive preprocessing and machine learning combined approach to achieve higher accuracy for closed domain
dataset obtained from the WhatsApp group containing 417 messages. This dataset is labeled manually consisting
of 244 positive and 173 negative opinions. The dataset uses a cleaner data through preprocessing for better
accuracy and naïve Bayesian machine learning algorithm is used to develop the model to test its suitability.
MIM (Mobile Instant Messaging) Classification using Term…
www.ijmrem.com IJMREM Page 2
II. LITERATURE REVIEW
[1] Proposed a novel hybrid method with a recursive preprocessing approach for sentiment analysis on online
twitter data consists of 6090 tweets. The dataset is labeled manually with 3111 positive, 1114 negative and 1865
neutral tweets. Multinomial Naïve Bayesian, Linear SVM and Neural Network algorithms are used to develop
different hybrid models to test their suitability. Bag-of-words, TF-IDF and N-Gram are used as feature
engineering models. Hold out splitting method is used to evaluate the accuracy where 80% and 20% data is used
for training testing respectively. The model acquires 86.18% overall accuracy with 82% baseline accuracy.
Reference [2] compares six commonly used preprocessing techniques on two Twitter datasets for sentiment
analysis. The recommended preprocessing techniques are lemmatization, replacing repetitions of punctuation,
replacing contractions, and removing numbers. While five preprocessing techniques: replace URLs and user
mentions, replace contractions, remove numbers, replace repetition of punctuation and lemmatization for a
classic machine learning sentiment analysis is a winning combination.
[3] Uses preprocessing techniques and merged 10 existing sentiment lexicons to make a high-coverage lexical
resource (HCLr). Seven classifiers are used to evaluate their efficiency where SVM with 34.16% outperforms
among all. While the second best classifier was found to be boosted Naïve Bayesian with the overall accuracy of
30.61%.
They have proposed a two-phase hybrid method [4]. The first phase, contextual analysis consists of
preprocessing techniques while the second phase, ensemble clustering phase consists of feature extraction and
unsupervised machine algorithms. A sentiment lexicon SentiWordNet 3.0 is used to measure the strength of each
term’s polarity. The proposed method increased the accuracy rate by an average of 3.0% when applying
contextual analysis procedures. Feature weighting schemes including TF-IDF enhance the performance from (5-
20) %.
III. METHODOLOGY
Fig.1. shows methodology in this paper which comprises of 12 steps explaining further.
Figure 1 MIMs classification model
Data Collection: we have created a group on WHATSAPP named as “Internet; Good or Bad” consisting of 15
members. A total of 417 messages manually labeled as 244 “Favor” and 173 “Against” are collected. A copy of
the history of a group chat is been extracted using the email chat feature in “.txt document” format which is then
converted into “.csv” file to be used [8].
Tokenization: A process of breaking down the corpus into individual elements [6]. It is also termed as word
segmentation [1]
MIM (Mobile Instant Messaging) Classification using Term…
www.ijmrem.com IJMREM Page 3
Figure 2 MIM after tokenization
Removing Stop Words: Stop words are unnecessary word that commonly appear in the text such
as so, and, or, the … [2]. There are 153 English language stop words that need to be removed because they
possess insignificance with most of datasets [1].
Figure 3 MIM after removing stop words
Case Normalization: An irreversible process that converts the terms into lower case [1].
Figure 4 MIM after case normalization
Removing Punctuation: A classic technique in information retrieval and data mining that removes punctuation
marks from the text [2].
Figure 5 MIM after removing punctuation
Stemming: Converts the word into its root forms, effective for polarity detection [1] and generally yields good
results [2].
Figure 6 MIM after stemming
Term Frequency-Inverse Document Frequency (TF-IDF): A commonly used scoring scheme used to
evaluate the importance of a token in a document and ultimately in the given dataset. It can be used to remove
stop words, punctuations, most frequent and least frequent tokens successfully [1]. Term Frequency measures
how frequently a term occurs in a document. Inverse document frequency factor decreases the weight of terms
that occur very frequently in the document set and increases the weight of terms that rarely occur [7].
Mathematically [4],
(1)
Where,
▪ tfi,j is the term normalization of term i
▪ idfi is the inverse document frequency of term i.
Bernoulli’s Naïve Bayesian: Naïve Bayesian is a probabilistic classifier based on the Bayesian theorem to
calculate the probability of a data sample belonging to a specific class widely used in sentiment classification.
The Bayesian theorem supposes all features are completely independent of each other [3]. The probability of a
sample belonging to a class can be computed using the following formula.
MIM (Mobile Instant Messaging) Classification using Term…
www.ijmrem.com IJMREM Page 4
(2)
Where,
▪ P (c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
▪ P (c) is the prior probability of class.
▪ P (x|c) is the likelihood which is the probability of predictor given class.
▪ P (x) is the prior probability of predictor.
The Bernoulli Naïve Bayesian algorithm is a modified form of traditional Naïve Bayesian, where the weight of
each term is equal to 1 if it exists in the sentence and 0 if not [2].
IV. TOOLS AND TECHNOLOGIES
Python 3.0 (Anaconda Python Distribution) is used to acquire the results of the model. Python libraries like
NumPy (Numerical Python), NLTK (Natural Language Tool Kit), Sci-kit learn and Matplotlib are used for
scientific computing (arrays, mathematical calculations), preprocessing, machine learning and plotting library
(for graphs etc) respectively.
Figure 7 Tools and technologies used
V. RESULTS AND DISCUSSION
Hold out splitting is used to evaluate the accuracy of the proposed model where 75% data is used for training
and 25% data is used for testing the classifier. The model attained the accuracy of 81.73% with 69.23% baseline
accuracy. The results show that the proposed hybrid binary sentiment classification model with preprocessing
techniques have achieved satisfactory results by getting 12 points higher accuracy.
In Fig. 8 GRAPH 1(a) shows the message count, there were total 417 messages where 244 and 173 are labeled
as favor and against respectively. While GRAPH 1(b) shows the results after elimination of 4 repeated messages
which left 240 favor messages.
Figure 8 Impact of preprocessing at message level
MIM (Mobile Instant Messaging) Classification using Term…
www.ijmrem.com IJMREM Page 5
In Fig. 9 GRAPH 2(a) shows the results before preprocessing while GRAPH 2(b) shows the results after
preprocessing. It can be clearly concluded that the preprocessing techniques trims the lengthy and verbose
messages into important useful tokens to acquire a cleaner dataset to get better results.
Figure 9 Impact of preprocessing at token level
VI. CONCLUSION AND FUTURE WORK
The proposed model is based on binary sentiment classification on aspect level to develop a hybrid sentiment
classification framework with preprocessing techniques to process WhatsApp MIM dataset. A machine learning
technique is used to develop a sentiment classification model with TF-IDF feature weighting scheme. The model
attains satisfactory results as compared to the baseline accuracy. For future work, it is suggested to increase the
dataset to get better results as more data leverages better accuracy. Furthermore, applying more preprocessing
techniques with the well-ordered winning combination to extract significant features of sentiment classification.
REFERENCES
[1] Alvi, M.B., Mahoto, A.N., Alvi, M., Unar, M.A., Shaikh, M.A, Hybrid Classification Model for Twitter
Data- A Recurssive Preprocessing Approach, 5th International Multi-topic ICT Conference (IMTIC),
2018, 1-6
[2] Symeonidis, S., Effrosynidis , D., & Arampatzis, A, A comparative evaluation of pre-processing
techniques and their interactions for twitter sentiment analysis, Expert System With Applications,110,
2018, 298-310
[3] Abdi, A., Shamsuddin, S. M., Hasan, S., & MD, J. P, Machine learning-based multi-documents
sentiment-oriented summarization using linguistic treatment, Expert Systems with Applications,109,
2018, 66-85
[4] Al-Sharuee, M. T., Liu, F., & Pratama. M, Sentiment analysis: An automatic contextual analysis and
ensemble clustering approach and comparison, Data and Knowledge Engineering, 115, 2018, 194-213
[5] Liu,Y., Bi, J.W., & Fan, Z.P, Multi-class sentiment classification: The experimental comparisons of
feature selection and machine learning algorithms, Expert Systems With Applications, 80, 2017, 323-
339
[6] A. Faraz, An elaboration of text categorization and automatic text classification through mathematical
and graphical modeling, An International Journal (CSEIJ), 5(2), 2015, 239-248.
[7] Ahmed, I., Guan, D., & Chung, C.T, SMS Classification Based on Naïve Bayes Classifier and Apriori
Algorithm Frequent Itemset, International Journal of Machine Learning and Computing, 4(2), 2014
[8] Patil, S, WhatsApp Group Data Analysis with R, International Journal of Computer Applications, 154
(4), 2016, 31-36
[9] Tang, Y., Hew, K.F, Is mobile instant messaging (MIM) useful in education? Examining its
technological, pedagogical, and social affordances, Educational Research Review, 21, 2017, 85-104
[10] Appel, O., Chiclana, F., Carter, J., & Fujita, H., A hybrid approach to the sentiment analysis problem at
the sentence level, Knowledge-Based System, 108, 2016, 110-124
[11] Katz, G., Ofek, N., & Shapira, B, ConSent: Context-based sentiment analysis, Knowledge-Based
Systems, 84, 2015, 162-178
[12] Fersini, E., Messina, E., & Pozzi, F. A, Sentiment analysis: Bayesian Ensemble Learning, Decision
Support Systems, 68, 2014, 26-38

More Related Content

What's hot

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Dr. Amarjeet Singh
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
cscpconf
 
An intrusion detection algorithm for ami
An intrusion detection algorithm for amiAn intrusion detection algorithm for ami
An intrusion detection algorithm for ami
IJCI JOURNAL
 
A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system
IJECEIAES
 

What's hot (20)

Results of Fitted Neural Network Models on Malaysian Aggregate Dataset
Results of Fitted Neural Network Models on Malaysian Aggregate DatasetResults of Fitted Neural Network Models on Malaysian Aggregate Dataset
Results of Fitted Neural Network Models on Malaysian Aggregate Dataset
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
 
B045041114
B045041114B045041114
B045041114
 
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
 
Probability Density Functions of the Packet Length for Computer Networks With...
Probability Density Functions of the Packet Length for Computer Networks With...Probability Density Functions of the Packet Length for Computer Networks With...
Probability Density Functions of the Packet Length for Computer Networks With...
 
Throughput in cooperative wireless networks
Throughput in cooperative wireless networksThroughput in cooperative wireless networks
Throughput in cooperative wireless networks
 
An intrusion detection algorithm for ami
An intrusion detection algorithm for amiAn intrusion detection algorithm for ami
An intrusion detection algorithm for ami
 
A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
 
The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
IRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live ImageIRJET - Face Recognition in Digital Documents with Live Image
IRJET - Face Recognition in Digital Documents with Live Image
 
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSSENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS
 
9517cnc03
9517cnc039517cnc03
9517cnc03
 
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence InformationLatent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
Image similarity using fourier transform
Image similarity using fourier transformImage similarity using fourier transform
Image similarity using fourier transform
 
573 248-259
573 248-259573 248-259
573 248-259
 

Similar to MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Document Frequency (TF-IDF) and Bayesian Algorithm

Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
mayurik19
 

Similar to MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Document Frequency (TF-IDF) and Bayesian Algorithm (20)

Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...
 
An intelligent auto-response short message service categorization model using...
An intelligent auto-response short message service categorization model using...An intelligent auto-response short message service categorization model using...
An intelligent auto-response short message service categorization model using...
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT O...
MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT O...MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT O...
MACHINE LEARNING AND DEEP LEARNING TECHNIQUES FOR DETECTING ABUSIVE CONTENT O...
 
sentiment analysis.pdf
sentiment analysis.pdfsentiment analysis.pdf
sentiment analysis.pdf
 
A survey of modified support vector machine using particle of swarm optimizat...
A survey of modified support vector machine using particle of swarm optimizat...A survey of modified support vector machine using particle of swarm optimizat...
A survey of modified support vector machine using particle of swarm optimizat...
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
 
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
IRJET -  	  Cognitive based Emotion Analysis of a Child Reading a BookIRJET -  	  Cognitive based Emotion Analysis of a Child Reading a Book
IRJET - Cognitive based Emotion Analysis of a Child Reading a Book
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning Models
 
Fake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine LearningFake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine Learning
 
Comparison of Text Classifiers on News Articles
Comparison of Text Classifiers on News ArticlesComparison of Text Classifiers on News Articles
Comparison of Text Classifiers on News Articles
 
A review of Fake News Detection Methods
A review of Fake News Detection MethodsA review of Fake News Detection Methods
A review of Fake News Detection Methods
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
A Survey on Spam Filtering Methods and Mapreduce with SVM
A Survey on Spam Filtering Methods and Mapreduce with SVMA Survey on Spam Filtering Methods and Mapreduce with SVM
A Survey on Spam Filtering Methods and Mapreduce with SVM
 
Handwritten Text Recognition Using Machine Learning
Handwritten Text Recognition Using Machine LearningHandwritten Text Recognition Using Machine Learning
Handwritten Text Recognition Using Machine Learning
 
Fake News Detection using Deep Learning
Fake News Detection using Deep LearningFake News Detection using Deep Learning
Fake News Detection using Deep Learning
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
 
FACIAL EMOTION RECOGNITION SYSTEM
FACIAL EMOTION RECOGNITION SYSTEMFACIAL EMOTION RECOGNITION SYSTEM
FACIAL EMOTION RECOGNITION SYSTEM
 
Facial recognition based on enhanced neural network
Facial recognition based on enhanced neural networkFacial recognition based on enhanced neural network
Facial recognition based on enhanced neural network
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 

More from IJMREMJournal

Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...
Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...
Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...
IJMREMJournal
 
Energy Conservation through Smart Building and Smart Lighting System
Energy Conservation through Smart Building and Smart Lighting SystemEnergy Conservation through Smart Building and Smart Lighting System
Energy Conservation through Smart Building and Smart Lighting System
IJMREMJournal
 
Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...
Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...
Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...
IJMREMJournal
 
Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout...
 Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout... Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout...
Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout...
IJMREMJournal
 
Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...
Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...
Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...
IJMREMJournal
 

More from IJMREMJournal (20)

Anti-Smog Radar Application for Vehicles
Anti-Smog Radar Application for VehiclesAnti-Smog Radar Application for Vehicles
Anti-Smog Radar Application for Vehicles
 
IoT based Environmental Monitoring and Control System
IoT based Environmental Monitoring and Control SystemIoT based Environmental Monitoring and Control System
IoT based Environmental Monitoring and Control System
 
Analysis and Implementation of Solid-State Relays in Industrial application F...
Analysis and Implementation of Solid-State Relays in Industrial application F...Analysis and Implementation of Solid-State Relays in Industrial application F...
Analysis and Implementation of Solid-State Relays in Industrial application F...
 
Is Lean Management applicable to the hospital and for which results?
Is Lean Management applicable to the hospital and for which results?Is Lean Management applicable to the hospital and for which results?
Is Lean Management applicable to the hospital and for which results?
 
Fabrication and Performance Analysis of Solar Tracking System by Using By-Pas...
Fabrication and Performance Analysis of Solar Tracking System by Using By-Pas...Fabrication and Performance Analysis of Solar Tracking System by Using By-Pas...
Fabrication and Performance Analysis of Solar Tracking System by Using By-Pas...
 
Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...
Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...
Reactivity Feedback Effect on the Reactor Behaviour during SBLOCA in a 4-loop...
 
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
 
The Policies of Government Intervention as Third-Party for Conflict in Bike-S...
The Policies of Government Intervention as Third-Party for Conflict in Bike-S...The Policies of Government Intervention as Third-Party for Conflict in Bike-S...
The Policies of Government Intervention as Third-Party for Conflict in Bike-S...
 
Epidemiological study for Trichomonas Vaginalis for Discrete time model and C...
Epidemiological study for Trichomonas Vaginalis for Discrete time model and C...Epidemiological study for Trichomonas Vaginalis for Discrete time model and C...
Epidemiological study for Trichomonas Vaginalis for Discrete time model and C...
 
Energy Conservation through Smart Building and Smart Lighting System
Energy Conservation through Smart Building and Smart Lighting SystemEnergy Conservation through Smart Building and Smart Lighting System
Energy Conservation through Smart Building and Smart Lighting System
 
The Effect of Workplace Relationship toward Job Satisfaction of Divine Word C...
The Effect of Workplace Relationship toward Job Satisfaction of Divine Word C...The Effect of Workplace Relationship toward Job Satisfaction of Divine Word C...
The Effect of Workplace Relationship toward Job Satisfaction of Divine Word C...
 
Semi-automatic Picture Book Generation based on Story Model and Agent-based S...
Semi-automatic Picture Book Generation based on Story Model and Agent-based S...Semi-automatic Picture Book Generation based on Story Model and Agent-based S...
Semi-automatic Picture Book Generation based on Story Model and Agent-based S...
 
Efficiency Evaluation of Thailand Gross Domestic Product Using DEA
Efficiency Evaluation of Thailand Gross Domestic Product Using DEAEfficiency Evaluation of Thailand Gross Domestic Product Using DEA
Efficiency Evaluation of Thailand Gross Domestic Product Using DEA
 
Predicting Trade Conflict Outcomes using a Third-Party Intervention Model
Predicting Trade Conflict Outcomes using a Third-Party Intervention ModelPredicting Trade Conflict Outcomes using a Third-Party Intervention Model
Predicting Trade Conflict Outcomes using a Third-Party Intervention Model
 
Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...
Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...
Proposed Framework for Effective Management of End-User Stakeholders’ in Publ...
 
Fresh and Hardened Properties of Ground Granulated Blast Furnace Slag Made Co...
Fresh and Hardened Properties of Ground Granulated Blast Furnace Slag Made Co...Fresh and Hardened Properties of Ground Granulated Blast Furnace Slag Made Co...
Fresh and Hardened Properties of Ground Granulated Blast Furnace Slag Made Co...
 
Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout...
 Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout... Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout...
Design and Evaluation of Open Graded Hot Mix Asphalt Using Cement as A Grout...
 
Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...
Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...
Design of Cold Recycled Emulsified Asphalt Mixtures Using Portland Cement as ...
 
Thermodynamic Analysis of Cooling Tower with Air to Air Heat Exchanger for Re...
Thermodynamic Analysis of Cooling Tower with Air to Air Heat Exchanger for Re...Thermodynamic Analysis of Cooling Tower with Air to Air Heat Exchanger for Re...
Thermodynamic Analysis of Cooling Tower with Air to Air Heat Exchanger for Re...
 
Modeling and Development of Pneumatic Accumulating System
Modeling and Development of Pneumatic Accumulating SystemModeling and Development of Pneumatic Accumulating System
Modeling and Development of Pneumatic Accumulating System
 

Recently uploaded

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Recently uploaded (20)

Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 

MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Document Frequency (TF-IDF) and Bayesian Algorithm

  • 1. International Journal of Modern Research in Engineering & Management (IJMREM) ||Volume|| 2||Issue|| 2 ||Pages|| 01-05 || February 2019|| ISSN: 2581-4540 www.ijmrem.com IJMREM Page 1 MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Document Frequency (TF-IDF) and Bayesian Algorithm 1, Kashaf-u-Duja, 2, Muhammad Bux Alvi, 3, Tariq Jameel Saifullah Khanzada, 4, Nisha Kumari 1,4, Institute of Information and Communication Technology, Mehran University of Engineering and Technology Jamshoro 2, Department of Computer Systems Engineering, The Islamia University of Bahawalpur 3, Department of Computer Systems Engineering, Mehran University of Engineering and Technology Jamshoro ---------------------------------------------------ABSTRACT------------------------------------------------------ The focus of the study is based on binary sentiment classification on aspect level to develop a hybrid sentiment classification framework of WhatsApp MIMs (Mobile Instant Messages). It has been carried out into two phases i.e. training phase and testing phase. The training phase, 75% data is used for training dataset. Pre-processing techniques like tokenization, removing stop words, case normalization, removing punctuation and stemming are applied to acquire cleaner dataset to be used as input. The output is sent to the classifier after applying TF-IDF for feature weighting. In the second phase, the classifier is trial with 25% testing dataset. Bernoulli’s Naïve Bayesian classifier which is an improved form of traditional Naïve Bayesian classifier is used to classify sentiments. There are 417 messages in total where 244 and 173 are classified as positive and negative respectively. The proposed model has achieved satisfactory results up to 81.73% in comparison to base-line classification model by getting 12 points higher accuracy i.e. 69.23%. KEYWORDS: Mobile Instant Messages (MIMs), Naïve Bayesian, Sentiment classification, TF-IDF, WhatsApp ------------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 30 January 2019 Date of Accepted: 03 February 2019 ------------------------------------------------------------------------------------------------------------------------------------------- I. INTRODUCTION Web development has changed human interaction and communication substantially and has prompted huge and quick development in user generated data [4]. It is estimated that 95% of available data is unstructured. To extract information and create knowledge from raw resources it needed to be processed properly and analyzed correctly because knowledge present in text data is not directly accessible through computers [1]. With the striking development of social media platforms like Facebook, Twitter, WhatsApp, WeChat etc, more and more people post online texts on different platforms to express their opinions on social issues and share their reviews [5]. Significant consideration has focused on examining this data in terms of the sentiment it conveys, which has resulted in the emergence of the sentiment analysis research field. It involves the computational analysis of user- generated data, such as reviews, to determine its orientation (positive, negative or neutral). There are two main reasons to automate sentiment analysis: first, the abundance of online data is beyond human analysis; and second, public opinion is a significant consideration when governments, institutions, and individuals are making decisions [4]. Utilization of WhatsApp text data has increased more problems such as word-shortening, neologism, and spelling variations. Traditional machine learning methods have proved inadequate to accomplish the task. To address this problem, we proposed a methodology based on binary sentiment classification on aspect level. This work is focused on developing a hybrid sentiment classification framework for WhatsApp MIMs using recursive preprocessing and machine learning combined approach to achieve higher accuracy for closed domain dataset obtained from the WhatsApp group containing 417 messages. This dataset is labeled manually consisting of 244 positive and 173 negative opinions. The dataset uses a cleaner data through preprocessing for better accuracy and naïve Bayesian machine learning algorithm is used to develop the model to test its suitability.
  • 2. MIM (Mobile Instant Messaging) Classification using Term… www.ijmrem.com IJMREM Page 2 II. LITERATURE REVIEW [1] Proposed a novel hybrid method with a recursive preprocessing approach for sentiment analysis on online twitter data consists of 6090 tweets. The dataset is labeled manually with 3111 positive, 1114 negative and 1865 neutral tweets. Multinomial Naïve Bayesian, Linear SVM and Neural Network algorithms are used to develop different hybrid models to test their suitability. Bag-of-words, TF-IDF and N-Gram are used as feature engineering models. Hold out splitting method is used to evaluate the accuracy where 80% and 20% data is used for training testing respectively. The model acquires 86.18% overall accuracy with 82% baseline accuracy. Reference [2] compares six commonly used preprocessing techniques on two Twitter datasets for sentiment analysis. The recommended preprocessing techniques are lemmatization, replacing repetitions of punctuation, replacing contractions, and removing numbers. While five preprocessing techniques: replace URLs and user mentions, replace contractions, remove numbers, replace repetition of punctuation and lemmatization for a classic machine learning sentiment analysis is a winning combination. [3] Uses preprocessing techniques and merged 10 existing sentiment lexicons to make a high-coverage lexical resource (HCLr). Seven classifiers are used to evaluate their efficiency where SVM with 34.16% outperforms among all. While the second best classifier was found to be boosted Naïve Bayesian with the overall accuracy of 30.61%. They have proposed a two-phase hybrid method [4]. The first phase, contextual analysis consists of preprocessing techniques while the second phase, ensemble clustering phase consists of feature extraction and unsupervised machine algorithms. A sentiment lexicon SentiWordNet 3.0 is used to measure the strength of each term’s polarity. The proposed method increased the accuracy rate by an average of 3.0% when applying contextual analysis procedures. Feature weighting schemes including TF-IDF enhance the performance from (5- 20) %. III. METHODOLOGY Fig.1. shows methodology in this paper which comprises of 12 steps explaining further. Figure 1 MIMs classification model Data Collection: we have created a group on WHATSAPP named as “Internet; Good or Bad” consisting of 15 members. A total of 417 messages manually labeled as 244 “Favor” and 173 “Against” are collected. A copy of the history of a group chat is been extracted using the email chat feature in “.txt document” format which is then converted into “.csv” file to be used [8]. Tokenization: A process of breaking down the corpus into individual elements [6]. It is also termed as word segmentation [1]
  • 3. MIM (Mobile Instant Messaging) Classification using Term… www.ijmrem.com IJMREM Page 3 Figure 2 MIM after tokenization Removing Stop Words: Stop words are unnecessary word that commonly appear in the text such as so, and, or, the … [2]. There are 153 English language stop words that need to be removed because they possess insignificance with most of datasets [1]. Figure 3 MIM after removing stop words Case Normalization: An irreversible process that converts the terms into lower case [1]. Figure 4 MIM after case normalization Removing Punctuation: A classic technique in information retrieval and data mining that removes punctuation marks from the text [2]. Figure 5 MIM after removing punctuation Stemming: Converts the word into its root forms, effective for polarity detection [1] and generally yields good results [2]. Figure 6 MIM after stemming Term Frequency-Inverse Document Frequency (TF-IDF): A commonly used scoring scheme used to evaluate the importance of a token in a document and ultimately in the given dataset. It can be used to remove stop words, punctuations, most frequent and least frequent tokens successfully [1]. Term Frequency measures how frequently a term occurs in a document. Inverse document frequency factor decreases the weight of terms that occur very frequently in the document set and increases the weight of terms that rarely occur [7]. Mathematically [4], (1) Where, ▪ tfi,j is the term normalization of term i ▪ idfi is the inverse document frequency of term i. Bernoulli’s Naïve Bayesian: Naïve Bayesian is a probabilistic classifier based on the Bayesian theorem to calculate the probability of a data sample belonging to a specific class widely used in sentiment classification. The Bayesian theorem supposes all features are completely independent of each other [3]. The probability of a sample belonging to a class can be computed using the following formula.
  • 4. MIM (Mobile Instant Messaging) Classification using Term… www.ijmrem.com IJMREM Page 4 (2) Where, ▪ P (c|x) is the posterior probability of class (c, target) given predictor (x, attributes). ▪ P (c) is the prior probability of class. ▪ P (x|c) is the likelihood which is the probability of predictor given class. ▪ P (x) is the prior probability of predictor. The Bernoulli Naïve Bayesian algorithm is a modified form of traditional Naïve Bayesian, where the weight of each term is equal to 1 if it exists in the sentence and 0 if not [2]. IV. TOOLS AND TECHNOLOGIES Python 3.0 (Anaconda Python Distribution) is used to acquire the results of the model. Python libraries like NumPy (Numerical Python), NLTK (Natural Language Tool Kit), Sci-kit learn and Matplotlib are used for scientific computing (arrays, mathematical calculations), preprocessing, machine learning and plotting library (for graphs etc) respectively. Figure 7 Tools and technologies used V. RESULTS AND DISCUSSION Hold out splitting is used to evaluate the accuracy of the proposed model where 75% data is used for training and 25% data is used for testing the classifier. The model attained the accuracy of 81.73% with 69.23% baseline accuracy. The results show that the proposed hybrid binary sentiment classification model with preprocessing techniques have achieved satisfactory results by getting 12 points higher accuracy. In Fig. 8 GRAPH 1(a) shows the message count, there were total 417 messages where 244 and 173 are labeled as favor and against respectively. While GRAPH 1(b) shows the results after elimination of 4 repeated messages which left 240 favor messages. Figure 8 Impact of preprocessing at message level
  • 5. MIM (Mobile Instant Messaging) Classification using Term… www.ijmrem.com IJMREM Page 5 In Fig. 9 GRAPH 2(a) shows the results before preprocessing while GRAPH 2(b) shows the results after preprocessing. It can be clearly concluded that the preprocessing techniques trims the lengthy and verbose messages into important useful tokens to acquire a cleaner dataset to get better results. Figure 9 Impact of preprocessing at token level VI. CONCLUSION AND FUTURE WORK The proposed model is based on binary sentiment classification on aspect level to develop a hybrid sentiment classification framework with preprocessing techniques to process WhatsApp MIM dataset. A machine learning technique is used to develop a sentiment classification model with TF-IDF feature weighting scheme. The model attains satisfactory results as compared to the baseline accuracy. For future work, it is suggested to increase the dataset to get better results as more data leverages better accuracy. Furthermore, applying more preprocessing techniques with the well-ordered winning combination to extract significant features of sentiment classification. REFERENCES [1] Alvi, M.B., Mahoto, A.N., Alvi, M., Unar, M.A., Shaikh, M.A, Hybrid Classification Model for Twitter Data- A Recurssive Preprocessing Approach, 5th International Multi-topic ICT Conference (IMTIC), 2018, 1-6 [2] Symeonidis, S., Effrosynidis , D., & Arampatzis, A, A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis, Expert System With Applications,110, 2018, 298-310 [3] Abdi, A., Shamsuddin, S. M., Hasan, S., & MD, J. P, Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment, Expert Systems with Applications,109, 2018, 66-85 [4] Al-Sharuee, M. T., Liu, F., & Pratama. M, Sentiment analysis: An automatic contextual analysis and ensemble clustering approach and comparison, Data and Knowledge Engineering, 115, 2018, 194-213 [5] Liu,Y., Bi, J.W., & Fan, Z.P, Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms, Expert Systems With Applications, 80, 2017, 323- 339 [6] A. Faraz, An elaboration of text categorization and automatic text classification through mathematical and graphical modeling, An International Journal (CSEIJ), 5(2), 2015, 239-248. [7] Ahmed, I., Guan, D., & Chung, C.T, SMS Classification Based on Naïve Bayes Classifier and Apriori Algorithm Frequent Itemset, International Journal of Machine Learning and Computing, 4(2), 2014 [8] Patil, S, WhatsApp Group Data Analysis with R, International Journal of Computer Applications, 154 (4), 2016, 31-36 [9] Tang, Y., Hew, K.F, Is mobile instant messaging (MIM) useful in education? Examining its technological, pedagogical, and social affordances, Educational Research Review, 21, 2017, 85-104 [10] Appel, O., Chiclana, F., Carter, J., & Fujita, H., A hybrid approach to the sentiment analysis problem at the sentence level, Knowledge-Based System, 108, 2016, 110-124 [11] Katz, G., Ofek, N., & Shapira, B, ConSent: Context-based sentiment analysis, Knowledge-Based Systems, 84, 2015, 162-178 [12] Fersini, E., Messina, E., & Pozzi, F. A, Sentiment analysis: Bayesian Ensemble Learning, Decision Support Systems, 68, 2014, 26-38