POLITICAL PREDICTION
ANAYSIS USING TEXT
MINING AND DEEP
LEARNING
PROJECT GROUP DETAILS :
GROUP NO : 34
DESHPANDE VISHWAMBHAR
GAIKWAD DINESH
KOLHE SNEHAL
INDROL PRATIKSHA
Introduction :
 Previous implementations of prediction analysis on Twitter Data were not successful to full analysis on live
Twitter Data. Thus, we have proposed a system which will predict the result on live Twitter data and also
generate the statistical form with the help graphs, reports, trends and tweets one can predict the future
results of the political party and also can be used to Create the Campaigns.
 We have proposed a system to determine current sentiment on twitter using Twitter API for open access
which includes opinions from different content structures like latest news, audits, articles and social
media posts. and Deep Learning method to study Historic Data for predicting future results.
Name Year Author Objective Contribution Methodology Conclusion
Computing
For Sustainable
Global Development
2016 Jinyu Chen &
Ao Yang
For Development of the
dataset and classification
Fetching and loading
data into trainng model
Lexical Analyzer Classified
Dataset for
preprocessing
"Microblog
Sentiment Analysis
Algorithm Research
and
Implementation
Based on
Classication",
2015 Zeng Hu, Lo
Ngqin
Sentiment analysis on the
data by classification
Classification algorithm
implementation
Classification
Algorithm
Prediction based
on the
classification
algorithm
Weblogs and
Social Media
2014 Shubo Liu &
Xiao Ya
Analysis based on the
social media
Real time data
sentiment prediction
Raw Dataset,
Sentiment analysis.
Prediction Based
on Real time
data in graphical
representation
"Sentiment
analysis in twitter
using machine
learning
techniques."
2013 Saurav Sarkar Implementation on
twitter dataset using ML
Unsupervised data
classification and
implementation
ML, NLP Prediction on
current sentiment
based on live
twitter data
Literature Overview:
Existing System :
 Existing Systems Like SVM Support Vector Machines works on Labelled data
which prompts error at the time or unlabelled data.
 Prediction using Electoral College model does not study historic data for
prediction instead it accepts manuscripts for analysis.
 Lexicon based sentiment analysis does not different kinds of data and also it
does not support multi lingual sentiment.
 Using LEXIPERS does not predict due to cost of annotated data and depends on
structure of language.
 Prediction using aggregation makes the chances of loss of data at the time of
preprocessing so result may differ than actual result.
Proposed System :
 Use of Live Twitter Social Media dataset and Google Trends dataset from API as an
input
 Data Preprocessing and data classification for training the data model for prediction.
 Sentiment Analysis based on live dataset and deep learning algorithm for
classification of data.
 Prediction is represented in the form of the graphs, scatter plot, trends, tweet
popularity for creating strong conclusion.
 The system will be divided in two parts for processing i.e
1. Machine Learning Algorithm for classification of data
2. Sentiment Analysis for prediction on live twitter dataset and google trends dataset.
1.Local Control Environment
 The system will interact with the user using a GUI that would be built using
Angular 1.0, HTML,CSS, JS, IDE in Visual Studio.
 Backend manipulations will be done with the help of Django and MongoDB for
database.
2. Central Control Environment
 Input Dataset:
The dataset can be in the form of the structured or unstructured form. The data in our system is in text
format. and this dataset is imported from Twitter API for sentiment analysis and Google Trends API for deep
learning of historic data.
 NLTK:
the NLTK stands for Natural Language Toolkit is a platform which is used to build python programs which
work with human language data for the application of Natural Language Processing. it has text processing
libraries.
 Scikit Learn:
It is machine learning library for python programming which includes algorithms like SVM, random forests,
K-neighbours. And it supports libraries like scipy, numpy.
 Scipy:
it is an open source library used for computing scientic problems. which includes pandas, SymPy like tools.
System Architecture:
Level 0 Dataflow Diagram:
Level 1 Dataflow Diagram:
Class Diagram:
Use Case Diagram:
Activity Diagram
(Sentiment Analysis):
State Diagram
(Sentiment Analysis):
Algorithm:
Algorithm For Prediction based on Sentiment Analysis
Step 1. Data Collection via Twitter API and Google Trends API.
Step 2. Insert into the database using sqlite server for storing data in database.
Step 3. Preprocessing of Raw data in key value pair.
Step 4. Sentiment Analysis and implementation of Naive Bayes Implementation.
Step 5. Generate Prediction result in Statistical form.
Step 6. Expect the output as Prediction Analysis result in the form of statistical data
like graph, report, trends, tweets.
Algorithm(additional)
Other Algorithms Used:
• Naive Bayes Classier:
It is a simple probabilistic classier based on the Bayes theorem. It assumes
every feature is independent of each other. To assign labels for every input
vector features is utilized using the formula below.
P (label | features) = P(label) * P(features | label) P (features)
Application Area :
 To know public opinions for political leaders and their Activities in terms
of Development.
 In business and Government intelligence for knowing customer
attitudes and trends in market.
 Detection of Insensitive data on social media platforms like Facebook,
Twitter, Instagram Etc.
 Resolving Customer Experiences for growing sales and prot.
 For Analyzing Social Media return of investment on social media
marketing.
Future Scope :
 In future multi lingual based sentiment analysis can be done to analyze
tweets in different languages for more Accurate Prediction..
 To increase the size of dataset by including big social media platforms like
Facebook, Linkedin for Sentiment Analysis.
 A strong Prediction Analysis system can be built to Analyze and Improve
GDP
 and growth in different sectors like Education, Defence, Culture and
Manufacturing.
Conclusion :
Our system generates location based prediction in the form of statistical data.
The system uses Naive Bayes classier, Natural Language Processing and
Sentiment analysis on Live data so it Improves Accuracy in Prediction Analysis and
Reduces Loss of Data.
Use of unsupervised learning algorithm reduces model training efforts as well.
References :
1 Alexander Pak and Patrick Paroubek. "Twitter as a corpus for sentiment analysis and opinion mining". In
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), may
2010.
2 Y. Yang and F. Zhou, "Microblog Sentiment Analysis Algorithm Research and Implementation Based on
Classication", 2015 14th International Symposium on Distributed Computing and Applications for Business
Engineering and Science (DCABES) ,2015.
3 P. D. Turney, "Thumbs up or thumbs down?: semantic orientationapplied to unsupervised classication of
reviews," presented at theProceedings of the 40th Annual Meeting on Association for Computational Linguistics,
Philadelphia, Pennsylvania,2002.
4 M. Taboada, J. Brooke, M. Toloski, K. Voll, and M. Stede,"Lexiconbased methods for sentiment analysis,"
Comput. Linguist., vol. 37, pp. 267-307, 2011.
5 "The Streaming APIs | Twitter Developers", dev.twitter.com, 2016. [Online]. Available:
https://dev.twitter.com/streaming/overview. [Accessed: 25-Apr- 2016].
6 Neethu, M. S., and R. Rajasree. "Sentiment analysis in twitter using machine learning techniques."
Computing, Communications and Networking Technologies (ICCCNT), 2013 Fourth International Conference
on. IEEE, 2013.
Political prediction analysis using text mining and deep learning

Political prediction analysis using text mining and deep learning

  • 1.
    POLITICAL PREDICTION ANAYSIS USINGTEXT MINING AND DEEP LEARNING PROJECT GROUP DETAILS : GROUP NO : 34 DESHPANDE VISHWAMBHAR GAIKWAD DINESH KOLHE SNEHAL INDROL PRATIKSHA
  • 2.
    Introduction :  Previousimplementations of prediction analysis on Twitter Data were not successful to full analysis on live Twitter Data. Thus, we have proposed a system which will predict the result on live Twitter data and also generate the statistical form with the help graphs, reports, trends and tweets one can predict the future results of the political party and also can be used to Create the Campaigns.  We have proposed a system to determine current sentiment on twitter using Twitter API for open access which includes opinions from different content structures like latest news, audits, articles and social media posts. and Deep Learning method to study Historic Data for predicting future results.
  • 3.
    Name Year AuthorObjective Contribution Methodology Conclusion Computing For Sustainable Global Development 2016 Jinyu Chen & Ao Yang For Development of the dataset and classification Fetching and loading data into trainng model Lexical Analyzer Classified Dataset for preprocessing "Microblog Sentiment Analysis Algorithm Research and Implementation Based on Classication", 2015 Zeng Hu, Lo Ngqin Sentiment analysis on the data by classification Classification algorithm implementation Classification Algorithm Prediction based on the classification algorithm Weblogs and Social Media 2014 Shubo Liu & Xiao Ya Analysis based on the social media Real time data sentiment prediction Raw Dataset, Sentiment analysis. Prediction Based on Real time data in graphical representation "Sentiment analysis in twitter using machine learning techniques." 2013 Saurav Sarkar Implementation on twitter dataset using ML Unsupervised data classification and implementation ML, NLP Prediction on current sentiment based on live twitter data Literature Overview:
  • 4.
    Existing System : Existing Systems Like SVM Support Vector Machines works on Labelled data which prompts error at the time or unlabelled data.  Prediction using Electoral College model does not study historic data for prediction instead it accepts manuscripts for analysis.  Lexicon based sentiment analysis does not different kinds of data and also it does not support multi lingual sentiment.  Using LEXIPERS does not predict due to cost of annotated data and depends on structure of language.  Prediction using aggregation makes the chances of loss of data at the time of preprocessing so result may differ than actual result.
  • 5.
    Proposed System : Use of Live Twitter Social Media dataset and Google Trends dataset from API as an input  Data Preprocessing and data classification for training the data model for prediction.  Sentiment Analysis based on live dataset and deep learning algorithm for classification of data.  Prediction is represented in the form of the graphs, scatter plot, trends, tweet popularity for creating strong conclusion.  The system will be divided in two parts for processing i.e 1. Machine Learning Algorithm for classification of data 2. Sentiment Analysis for prediction on live twitter dataset and google trends dataset.
  • 6.
    1.Local Control Environment The system will interact with the user using a GUI that would be built using Angular 1.0, HTML,CSS, JS, IDE in Visual Studio.  Backend manipulations will be done with the help of Django and MongoDB for database.
  • 7.
    2. Central ControlEnvironment  Input Dataset: The dataset can be in the form of the structured or unstructured form. The data in our system is in text format. and this dataset is imported from Twitter API for sentiment analysis and Google Trends API for deep learning of historic data.  NLTK: the NLTK stands for Natural Language Toolkit is a platform which is used to build python programs which work with human language data for the application of Natural Language Processing. it has text processing libraries.  Scikit Learn: It is machine learning library for python programming which includes algorithms like SVM, random forests, K-neighbours. And it supports libraries like scipy, numpy.  Scipy: it is an open source library used for computing scientic problems. which includes pandas, SymPy like tools.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Algorithm: Algorithm For Predictionbased on Sentiment Analysis Step 1. Data Collection via Twitter API and Google Trends API. Step 2. Insert into the database using sqlite server for storing data in database. Step 3. Preprocessing of Raw data in key value pair. Step 4. Sentiment Analysis and implementation of Naive Bayes Implementation. Step 5. Generate Prediction result in Statistical form. Step 6. Expect the output as Prediction Analysis result in the form of statistical data like graph, report, trends, tweets.
  • 16.
    Algorithm(additional) Other Algorithms Used: •Naive Bayes Classier: It is a simple probabilistic classier based on the Bayes theorem. It assumes every feature is independent of each other. To assign labels for every input vector features is utilized using the formula below. P (label | features) = P(label) * P(features | label) P (features)
  • 17.
    Application Area : To know public opinions for political leaders and their Activities in terms of Development.  In business and Government intelligence for knowing customer attitudes and trends in market.  Detection of Insensitive data on social media platforms like Facebook, Twitter, Instagram Etc.  Resolving Customer Experiences for growing sales and prot.  For Analyzing Social Media return of investment on social media marketing.
  • 18.
    Future Scope : In future multi lingual based sentiment analysis can be done to analyze tweets in different languages for more Accurate Prediction..  To increase the size of dataset by including big social media platforms like Facebook, Linkedin for Sentiment Analysis.  A strong Prediction Analysis system can be built to Analyze and Improve GDP  and growth in different sectors like Education, Defence, Culture and Manufacturing.
  • 19.
    Conclusion : Our systemgenerates location based prediction in the form of statistical data. The system uses Naive Bayes classier, Natural Language Processing and Sentiment analysis on Live data so it Improves Accuracy in Prediction Analysis and Reduces Loss of Data. Use of unsupervised learning algorithm reduces model training efforts as well.
  • 20.
    References : 1 AlexanderPak and Patrick Paroubek. "Twitter as a corpus for sentiment analysis and opinion mining". In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), may 2010. 2 Y. Yang and F. Zhou, "Microblog Sentiment Analysis Algorithm Research and Implementation Based on Classication", 2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) ,2015. 3 P. D. Turney, "Thumbs up or thumbs down?: semantic orientationapplied to unsupervised classication of reviews," presented at theProceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania,2002. 4 M. Taboada, J. Brooke, M. Toloski, K. Voll, and M. Stede,"Lexiconbased methods for sentiment analysis," Comput. Linguist., vol. 37, pp. 267-307, 2011. 5 "The Streaming APIs | Twitter Developers", dev.twitter.com, 2016. [Online]. Available: https://dev.twitter.com/streaming/overview. [Accessed: 25-Apr- 2016]. 6 Neethu, M. S., and R. Rajasree. "Sentiment analysis in twitter using machine learning techniques." Computing, Communications and Networking Technologies (ICCCNT), 2013 Fourth International Conference on. IEEE, 2013.