The document proposes a multi-tier sentiment analysis system called MSABDP to analyze large-scale social media data more efficiently. MSABDP uses Hadoop for its distributed processing and storage capabilities. It collects Twitter data using Apache Flume and stores it in HDFS. It then applies a multi-tier classification approach combining lexicon-based and machine learning techniques to classify tweets into multiple sentiment classes, reducing complexity compared to single-tier architectures. Evaluation on real Twitter data showed MSABDP improved classification accuracy over single-tier approaches by 7%.
An adaptive clustering and classification algorithm for Twitter data streamin...TELKOMNIKA JOURNAL
On-going big data from social networks sites alike Twitter or Facebook has been an entrancing
hotspot for investigation by researchers in current decades as a result of various aspects including
up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy.
Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which
can cluster data within a lesser computational time, especially for data streaming is needed. The presented
adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome
the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is
viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally
improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data
streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is
classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization.
At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to
discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs
dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority of
presented approach comparing with the existing methods in terms of precision, recall, F-score,
convergence, ROC curve and accuracy.
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
This document discusses evaluating and enhancing the efficiency of recommendation systems using big data analytics. It begins with an abstract that outlines recommendation systems, collaborative filtering, and the need for big data analytics due to large datasets. It then discusses specific collaborative filtering techniques like user-based, item-based, and matrix factorization. It describes challenges like scalability that big data analytics can help address. The document evaluates recommendation algorithms using metrics like MAE, RMSE, precision and time taken on movie recommendation datasets. It aims to design an efficient recommendation system using the best techniques.
An Efficient Approach for Clustering High Dimensional DataIJSTA
The document discusses clustering high dimensional data using an efficient approach called "Big Data Clustering using k-Mediods BAT Algorithm" (KMBAT). KMBAT simultaneously considers all data points as potential exemplars and exchanges real-valued messages between data points until a high-quality set of exemplars and corresponding clusters emerges. It is demonstrated on Facebook user profile data stored in an HDInsight Hadoop cluster. KMBAT finds better clustering solutions than other methods in less time for high dimensional big data.
This document summarizes a research paper on analyzing sentiments from Twitter data using data mining techniques. The paper presents an approach for analyzing user sentiments using data mining classifiers and compares the performance of single classifiers versus an ensemble of classifiers for sentiment analysis. Experimental results show that the k-nearest neighbor classifier achieved very high predictive accuracy, and single classifiers outperformed the ensemble approach.
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Molly Gibbons (she/her)
This document summarizes an analysis of tweets containing the terms "Brazil" and "Michel Temer" to understand the political and economic scenario in Brazil. RapidMiner was used to collect tweets over 17 days and the Rosette Text Toolkit categorized tweets and analyzed sentiment. For "Michel Temer", there was a weak to moderate negative correlation between sentiment and retweets, and 75% of tweets were negative. For tweets about "Brazil" categorized as law/politics, 62% were negative and the most mentioned entities were the Senate, President, and Supreme Court. The analysis demonstrates how RapidMiner and Rosette can be used together to understand sentiment in social media posts about political topics.
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
1. The document proposes techniques to improve search performance by matching schemas between structured and unstructured data sources.
2. It involves constructing schema mappings using named entities and schema structures. It also uses strategies to narrow the search space to relevant documents.
3. The techniques were shown to improve search accuracy and reduce time/space complexity compared to existing methods.
This document discusses social data mining. It begins by defining data, information, and knowledge. It then defines data mining as extracting useful unknown information from large datasets. Social data mining is defined as systematically analyzing valuable information from social media, which is vast, noisy, distributed, unstructured, and dynamic. Common social media platforms are described. Graph mining and text mining are discussed as important techniques for social data mining. The generic social data mining process of data collection, modeling, and various mining methods is outlined. OAuth 2.0 authorization is also summarized as an important process for applications to access each other's data.
An adaptive clustering and classification algorithm for Twitter data streamin...TELKOMNIKA JOURNAL
On-going big data from social networks sites alike Twitter or Facebook has been an entrancing
hotspot for investigation by researchers in current decades as a result of various aspects including
up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy.
Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which
can cluster data within a lesser computational time, especially for data streaming is needed. The presented
adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome
the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is
viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally
improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data
streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is
classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization.
At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to
discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs
dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority of
presented approach comparing with the existing methods in terms of precision, recall, F-score,
convergence, ROC curve and accuracy.
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
This document discusses evaluating and enhancing the efficiency of recommendation systems using big data analytics. It begins with an abstract that outlines recommendation systems, collaborative filtering, and the need for big data analytics due to large datasets. It then discusses specific collaborative filtering techniques like user-based, item-based, and matrix factorization. It describes challenges like scalability that big data analytics can help address. The document evaluates recommendation algorithms using metrics like MAE, RMSE, precision and time taken on movie recommendation datasets. It aims to design an efficient recommendation system using the best techniques.
An Efficient Approach for Clustering High Dimensional DataIJSTA
The document discusses clustering high dimensional data using an efficient approach called "Big Data Clustering using k-Mediods BAT Algorithm" (KMBAT). KMBAT simultaneously considers all data points as potential exemplars and exchanges real-valued messages between data points until a high-quality set of exemplars and corresponding clusters emerges. It is demonstrated on Facebook user profile data stored in an HDInsight Hadoop cluster. KMBAT finds better clustering solutions than other methods in less time for high dimensional big data.
This document summarizes a research paper on analyzing sentiments from Twitter data using data mining techniques. The paper presents an approach for analyzing user sentiments using data mining classifiers and compares the performance of single classifiers versus an ensemble of classifiers for sentiment analysis. Experimental results show that the k-nearest neighbor classifier achieved very high predictive accuracy, and single classifiers outperformed the ensemble approach.
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Molly Gibbons (she/her)
This document summarizes an analysis of tweets containing the terms "Brazil" and "Michel Temer" to understand the political and economic scenario in Brazil. RapidMiner was used to collect tweets over 17 days and the Rosette Text Toolkit categorized tweets and analyzed sentiment. For "Michel Temer", there was a weak to moderate negative correlation between sentiment and retweets, and 75% of tweets were negative. For tweets about "Brazil" categorized as law/politics, 62% were negative and the most mentioned entities were the Senate, President, and Supreme Court. The analysis demonstrates how RapidMiner and Rosette can be used together to understand sentiment in social media posts about political topics.
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
1. The document proposes techniques to improve search performance by matching schemas between structured and unstructured data sources.
2. It involves constructing schema mappings using named entities and schema structures. It also uses strategies to narrow the search space to relevant documents.
3. The techniques were shown to improve search accuracy and reduce time/space complexity compared to existing methods.
This document discusses social data mining. It begins by defining data, information, and knowledge. It then defines data mining as extracting useful unknown information from large datasets. Social data mining is defined as systematically analyzing valuable information from social media, which is vast, noisy, distributed, unstructured, and dynamic. Common social media platforms are described. Graph mining and text mining are discussed as important techniques for social data mining. The generic social data mining process of data collection, modeling, and various mining methods is outlined. OAuth 2.0 authorization is also summarized as an important process for applications to access each other's data.
1) The document discusses using k-means clustering to analyze big data. K-means is an algorithm that partitions data into k clusters based on similarity.
2) It provides background on big data characteristics like volume, variety, and velocity. It also discusses challenges of heterogeneous, decentralized, and evolving data.
3) The document proposes applying k-means clustering to big data to map data into clusters according to its properties in a fast and efficient manner. This allows statistical analysis and knowledge extraction from large, complex datasets.
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
This document describes a proposed algorithm for improving recommendation systems for e-services. It involves the following key steps:
1. Clustering customer transaction histories to group similar purchase patterns and derive customer-based recommendations.
2. Using incremental association rule mining on the transaction data to detect frequently purchased item sets and relationships between items.
3. Developing a fuzzy model to classify customers and provide dynamic recommendations tailored to different customer types. The recommendations will be based on matching customer preferences and purchase histories to specific product sets.
4. The algorithm clusters transactions, mines association rules incrementally as new data is added, and generates recommendations by classifying customers and matching them to relevant product clusters. This provides a personalized and
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
As of late there has been a growth in data. This paper presents a methodology to investigate the text classification of data gathered from twitter. In this study sentiment analysis has been done on online comment data giving us picture of how to discover the demands of a people. Ziya Fatima | Er. Vandana "Detailed Investigation of Text Classification and Clustering of Twitter Data for Business Analytics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38527.pdf Paper Url: https://www.ijtsrd.com/engineering/computer-engineering/38527/detailed-investigation-of-text-classification-and-clustering-of-twitter-data-for-business-analytics/ziya-fatima
Time-Ordered Collaborative Filtering for News RecommendationIRJET Journal
This document presents a novel news recommendation system called Time-Ordered Collaborative Filtering for News Recommendation. The system filters and recommends news to users based on their choices and preferences using collaborative filtering and content-based filtering. It extracts news data from popular online newspapers and stores it in a database. The recommendation framework then preprocesses the news data and applies algorithms like calculating sparse matrices to rank news and recommend the top news to each user based on their interests.
Structural Balance Theory Based Recommendation for Social Service PortalYogeshIJTSRD
There is enormous data present in our world. Therefore in order to access the most accurate information is becoming more difficult and complicated. As a result many relevant information gets missed which leads to much duplication of work and effort. Due to the huge search results, the user will generally have difficulty in identifying the relevant ones. To solve this problem, a recommendation system is used. A recommendation system is nothing but a filtering information system, which is used to predict the relevance of retrieved information according to the user’s needs for some criteria. Hence, it can provide the user with the results that best fit their needs. The services provided through the web normally provide huge records about any requested item or service. A proper recommendation system is used to separate this information result. A recommendation system can be improved further if supported with a level of trust information. That is, recommendations are prioritized according to their level of trust. Recommending appropriate needs social service to the target volunteers will become the key to ensure continuous success of social service. Today, many social service systems does not adopt any recommendation techniques. They provide advertisement or highlights request for a small commission. G. Banupriya | M. Anand "Structural Balance Theory-Based Recommendation for Social Service Portal" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41216.pdf Paper URL: https://www.ijtsrd.comengineering/software-engineering/41216/structural-balance-theorybased-recommendation-for-social-service-portal/g-banupriya
A real-time big data sentiment analysis for iraqi tweets using spark streamingjournalBEEI
The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction.
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
This document summarizes an approach to distributed link prediction in large graphs using Apache Spark. It discusses using machine learning techniques like locality sensitive hashing to predict links between nodes in a graph based on document similarity metrics and other structural features. The approach is tested on a graph of 27,770 academic papers linked by 352,857 citations. Both supervised and unsupervised machine learning methods are explored, including treating it as a binary classification problem and using locality sensitive hashing and MinHashLSH through Apache Spark to efficiently handle the large data volumes. The results suggest this distributed approach can accurately predict new links in large graphs.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
This document presents a system for visualizing emotions expressed on Twitter streams over time from different geographic locations. The system collects tweets from the US, Japan, Indonesia, and Taiwan related to iPhones and performs sentiment analysis using naive Bayes classification. It then visualizes the results using two-dimensional heat maps, interactive stream graphs, and context focus brushing. The visualizations allow exploration of temporal patterns in customer emotional behavior across locations expressed in Twitter data.
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
The document presents a framework for predicting brand popularity from brand metadata on social networks. It identifies thoughtful comments from brand posts using natural language processing and classifies them as favorable or unfavorable. Brand metadata like numbers of likes, shares, and identified thoughtful comments are then combined to forecast future brand popularity. The performance of the proposed framework is evaluated against recent works in terms of thoughtful comment identification accuracy, execution time, popularity prediction accuracy, and prediction time. Results show improvements over existing approaches.
The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.
Details at: http://dmml.asu.edu/smm/
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
This document discusses controlling the spread of fake or misleading information on social media. It proposes a system to analyze information diffusion on social networks, identify diffused data, and control the spread of fake diffused data. The system would extract data from social media, perform sentiment analysis to determine the veracity of information, and discard fake or untrustworthy information from the database to prevent further propagation. A variety of machine learning techniques could be used for the sentiment analysis, including naive Bayes classification, linear regression, and gradient boosted trees. The goal is to curb the spread of misinformation while still allowing the diffusion of real or truthful information.
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
Big data is a term for massive data sets having large, more varied and complex structure with the difficulties of storing, analyzing and visualizing for further processes or results. Bigdata includes data from email, documents, pictures, audio, video files, and other sources that do not fit into a relational database. This unstructured data brings enormous challenges to Bigdata.The process of research into massive amounts of data to reveal hidden patterns and secret correlations named as big data analytics. Therefore, big data implementations need to be analyzed and executed as accurately as possible. The proposed model structures the unstructured data from social media in a structured form so that data can be queried efficiently by using Hadoop MapReduce framework. The Bigdata mining is essential in order to extract value from massive amount of data. MapReduce is efficient method to deal with Big data than traditional techniques.The proposed Linguistic string matching Knuth-Morris-Pratt algorithm and K-Means clustering algorithm gives proper platform to extract value from massive amount of data and recommendation for user.Linguistic matching techniques such as Knuth–Morris–Pratt string matching algorithm are very useful in giving proper matching output to user query. The K-Means algorithm is one which works on clustering data using vector space model. It can be an appropriate method to produce recommendation for user
This document summarizes a study analyzing social media influence and credibility. A team of students and professors extracted different types of data from Twitter, including mentions of users, tweets by authorities, and keywords. They developed a semantic parser to analyze tweet content using ontological semantic technology. An initial linear score was formulated to measure user influence, and network analysis identified pivotal users between communities. Validation will compare content analysis to real-world events to supplement credibility assessment. The study has potential applications in public policy, business, psychology, and traffic monitoring.
This document summarizes research posters being presented at a computer science and electrical engineering department research review. It describes 8 posters presented by BS, MS, and PhD students. The posters cover topics such as identifying political affiliations in blogs, statistically weighted visualization hierarchies, voter verifiable optical-scan voting, predictive caching in mobile networks, generating statistical volume models, predicting appropriate semantic web terms, approximating online social network community structure, and utilizing semantic policies for managing BGP route dissemination.
This document presents a novel approach to anomaly detection in link mining based on applying mutual information. It adapts the CRISP-DM methodology for link mining and applies it to a case study using co-citation data. The methodology includes data description, preprocessing, transformation, exploration, modeling through graph mapping and hierarchical clustering, and evaluation. Mutual information is used to interpret the semantics of anomalies identified in clusters. The case study identifies collective and community anomalies and confirms mutual information can validate clustering results by showing strong links within clusters but independence between objects in one cluster.
Analysis on Recommended System for Web Information Retrieval Using HMMIJERA Editor
Web is a rich domain of data and knowledge, which is spread over the world in unstructured manner. The
number of users is continuously access the information over the internet. Web mining is an application of data
mining where web related data is extracted and manipulated for extracting knowledge. The data mining is used
in the domain of web information mining is refers as web mining, that is further divided into three major
domains web uses mining, web content mining and web structure mining. The proposed work is intended to
work with web uses mining. The concept of web mining is to improve the user feedbacks and user navigation
pattern discovery for a CRM system. Finally a new algorithm HMM is used for finding the pattern in data,
which method promises to provide much accurate recommendation.
1) The document discusses a review of semantic approaches for nearest neighbor search. It describes using an ontology to add a semantic layer to an information retrieval system to relate concepts using query words.
2) A technique called spatial inverted index is proposed to locate multidimensional information and handle nearest neighbor queries by finding the hospitals closest to a given address.
3) Several semantic approaches are described including using clustering measures, specificity measures, link analysis, and relation-based page ranking to improve search and interpret hidden concepts behind keywords.
Political prediction analysis using text mining and deep learningVishwambhar Deshpande
We have proposed a system to determine current sentiment on twitter using Twit-
ter API for open access which includes opinions from dierent content structures like
latest news, audits, articles and social media posts. and Deep Learning method to
study Historic Data for predicting future results. we utilized Naive Bayes and dictio-
nary based algorithms to predict the sentiment on Live Twitter Data.
The advent of social networks has changed the research in computer science. Now, the massive volume of data has present in the form of twitter, facebook, emails, IOT (Internet of Things). So, the storage and analysis of these data has become great challenge for researchers. Traditional frameworks have failed for the processing of large data. R is open source programming framework developed for the analysis of large data results better accuracy. It also gives the opportunity of the implementation in R programming language. In this paper, a study on the use of R for the classification of large social network data. Naïve Bayes algorithm is used for the classification of large twitter data. The experiment has shown that enormous amount of data can be sufficiently classified using the R framework with promising results.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
The document presents a proposed approach for sentiment analysis on big social data using Spark. It discusses collecting opinions from social media to analyze large events by tracking public behavior in real-time. The proposed system provides a adaptable sentiment analysis approach using Spark that analyzes social media posts and classifies them by subject in real-time. It also discusses using sentiment data from social media to inform decisions.
1) The document discusses using k-means clustering to analyze big data. K-means is an algorithm that partitions data into k clusters based on similarity.
2) It provides background on big data characteristics like volume, variety, and velocity. It also discusses challenges of heterogeneous, decentralized, and evolving data.
3) The document proposes applying k-means clustering to big data to map data into clusters according to its properties in a fast and efficient manner. This allows statistical analysis and knowledge extraction from large, complex datasets.
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
This document describes a proposed algorithm for improving recommendation systems for e-services. It involves the following key steps:
1. Clustering customer transaction histories to group similar purchase patterns and derive customer-based recommendations.
2. Using incremental association rule mining on the transaction data to detect frequently purchased item sets and relationships between items.
3. Developing a fuzzy model to classify customers and provide dynamic recommendations tailored to different customer types. The recommendations will be based on matching customer preferences and purchase histories to specific product sets.
4. The algorithm clusters transactions, mines association rules incrementally as new data is added, and generates recommendations by classifying customers and matching them to relevant product clusters. This provides a personalized and
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
As of late there has been a growth in data. This paper presents a methodology to investigate the text classification of data gathered from twitter. In this study sentiment analysis has been done on online comment data giving us picture of how to discover the demands of a people. Ziya Fatima | Er. Vandana "Detailed Investigation of Text Classification and Clustering of Twitter Data for Business Analytics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38527.pdf Paper Url: https://www.ijtsrd.com/engineering/computer-engineering/38527/detailed-investigation-of-text-classification-and-clustering-of-twitter-data-for-business-analytics/ziya-fatima
Time-Ordered Collaborative Filtering for News RecommendationIRJET Journal
This document presents a novel news recommendation system called Time-Ordered Collaborative Filtering for News Recommendation. The system filters and recommends news to users based on their choices and preferences using collaborative filtering and content-based filtering. It extracts news data from popular online newspapers and stores it in a database. The recommendation framework then preprocesses the news data and applies algorithms like calculating sparse matrices to rank news and recommend the top news to each user based on their interests.
Structural Balance Theory Based Recommendation for Social Service PortalYogeshIJTSRD
There is enormous data present in our world. Therefore in order to access the most accurate information is becoming more difficult and complicated. As a result many relevant information gets missed which leads to much duplication of work and effort. Due to the huge search results, the user will generally have difficulty in identifying the relevant ones. To solve this problem, a recommendation system is used. A recommendation system is nothing but a filtering information system, which is used to predict the relevance of retrieved information according to the user’s needs for some criteria. Hence, it can provide the user with the results that best fit their needs. The services provided through the web normally provide huge records about any requested item or service. A proper recommendation system is used to separate this information result. A recommendation system can be improved further if supported with a level of trust information. That is, recommendations are prioritized according to their level of trust. Recommending appropriate needs social service to the target volunteers will become the key to ensure continuous success of social service. Today, many social service systems does not adopt any recommendation techniques. They provide advertisement or highlights request for a small commission. G. Banupriya | M. Anand "Structural Balance Theory-Based Recommendation for Social Service Portal" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41216.pdf Paper URL: https://www.ijtsrd.comengineering/software-engineering/41216/structural-balance-theorybased-recommendation-for-social-service-portal/g-banupriya
A real-time big data sentiment analysis for iraqi tweets using spark streamingjournalBEEI
The scale of data streaming in social networks, such as Twitter, is increasing exponentially. Twitter is one of the most important and suitable big data sources for machine learning research in terms of analysis, prediction, extract knowledge, and opinions. People use Twitter platform daily to express their opinion which is a fundamental fact that influence their behaviors. In recent years, the flow of Iraqi dialect has been increased, especially on the Twitter platform. Sentiment analysis for different dialects and opinion mining has become a hot topic in data science researches. In this paper, we will attempt to develop a real-time analytic model for sentiment analysis and opinion mining to Iraqi tweets using spark streaming, also create a dataset for researcher in this field. The Twitter handle Bassam AlRawi is the case study here. The new method is more suitable in the current day machine learning applications and fast online prediction.
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
This document summarizes an approach to distributed link prediction in large graphs using Apache Spark. It discusses using machine learning techniques like locality sensitive hashing to predict links between nodes in a graph based on document similarity metrics and other structural features. The approach is tested on a graph of 27,770 academic papers linked by 352,857 citations. Both supervised and unsupervised machine learning methods are explored, including treating it as a binary classification problem and using locality sensitive hashing and MinHashLSH through Apache Spark to efficiently handle the large data volumes. The results suggest this distributed approach can accurately predict new links in large graphs.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
This document presents a system for visualizing emotions expressed on Twitter streams over time from different geographic locations. The system collects tweets from the US, Japan, Indonesia, and Taiwan related to iPhones and performs sentiment analysis using naive Bayes classification. It then visualizes the results using two-dimensional heat maps, interactive stream graphs, and context focus brushing. The visualizations allow exploration of temporal patterns in customer emotional behavior across locations expressed in Twitter data.
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
The document presents a framework for predicting brand popularity from brand metadata on social networks. It identifies thoughtful comments from brand posts using natural language processing and classifies them as favorable or unfavorable. Brand metadata like numbers of likes, shares, and identified thoughtful comments are then combined to forecast future brand popularity. The performance of the proposed framework is evaluated against recent works in terms of thoughtful comment identification accuracy, execution time, popularity prediction accuracy, and prediction time. Results show improvements over existing approaches.
The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.
Details at: http://dmml.asu.edu/smm/
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
This document discusses controlling the spread of fake or misleading information on social media. It proposes a system to analyze information diffusion on social networks, identify diffused data, and control the spread of fake diffused data. The system would extract data from social media, perform sentiment analysis to determine the veracity of information, and discard fake or untrustworthy information from the database to prevent further propagation. A variety of machine learning techniques could be used for the sentiment analysis, including naive Bayes classification, linear regression, and gradient boosted trees. The goal is to curb the spread of misinformation while still allowing the diffusion of real or truthful information.
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
Big data is a term for massive data sets having large, more varied and complex structure with the difficulties of storing, analyzing and visualizing for further processes or results. Bigdata includes data from email, documents, pictures, audio, video files, and other sources that do not fit into a relational database. This unstructured data brings enormous challenges to Bigdata.The process of research into massive amounts of data to reveal hidden patterns and secret correlations named as big data analytics. Therefore, big data implementations need to be analyzed and executed as accurately as possible. The proposed model structures the unstructured data from social media in a structured form so that data can be queried efficiently by using Hadoop MapReduce framework. The Bigdata mining is essential in order to extract value from massive amount of data. MapReduce is efficient method to deal with Big data than traditional techniques.The proposed Linguistic string matching Knuth-Morris-Pratt algorithm and K-Means clustering algorithm gives proper platform to extract value from massive amount of data and recommendation for user.Linguistic matching techniques such as Knuth–Morris–Pratt string matching algorithm are very useful in giving proper matching output to user query. The K-Means algorithm is one which works on clustering data using vector space model. It can be an appropriate method to produce recommendation for user
This document summarizes a study analyzing social media influence and credibility. A team of students and professors extracted different types of data from Twitter, including mentions of users, tweets by authorities, and keywords. They developed a semantic parser to analyze tweet content using ontological semantic technology. An initial linear score was formulated to measure user influence, and network analysis identified pivotal users between communities. Validation will compare content analysis to real-world events to supplement credibility assessment. The study has potential applications in public policy, business, psychology, and traffic monitoring.
This document summarizes research posters being presented at a computer science and electrical engineering department research review. It describes 8 posters presented by BS, MS, and PhD students. The posters cover topics such as identifying political affiliations in blogs, statistically weighted visualization hierarchies, voter verifiable optical-scan voting, predictive caching in mobile networks, generating statistical volume models, predicting appropriate semantic web terms, approximating online social network community structure, and utilizing semantic policies for managing BGP route dissemination.
This document presents a novel approach to anomaly detection in link mining based on applying mutual information. It adapts the CRISP-DM methodology for link mining and applies it to a case study using co-citation data. The methodology includes data description, preprocessing, transformation, exploration, modeling through graph mapping and hierarchical clustering, and evaluation. Mutual information is used to interpret the semantics of anomalies identified in clusters. The case study identifies collective and community anomalies and confirms mutual information can validate clustering results by showing strong links within clusters but independence between objects in one cluster.
Analysis on Recommended System for Web Information Retrieval Using HMMIJERA Editor
Web is a rich domain of data and knowledge, which is spread over the world in unstructured manner. The
number of users is continuously access the information over the internet. Web mining is an application of data
mining where web related data is extracted and manipulated for extracting knowledge. The data mining is used
in the domain of web information mining is refers as web mining, that is further divided into three major
domains web uses mining, web content mining and web structure mining. The proposed work is intended to
work with web uses mining. The concept of web mining is to improve the user feedbacks and user navigation
pattern discovery for a CRM system. Finally a new algorithm HMM is used for finding the pattern in data,
which method promises to provide much accurate recommendation.
1) The document discusses a review of semantic approaches for nearest neighbor search. It describes using an ontology to add a semantic layer to an information retrieval system to relate concepts using query words.
2) A technique called spatial inverted index is proposed to locate multidimensional information and handle nearest neighbor queries by finding the hospitals closest to a given address.
3) Several semantic approaches are described including using clustering measures, specificity measures, link analysis, and relation-based page ranking to improve search and interpret hidden concepts behind keywords.
Political prediction analysis using text mining and deep learningVishwambhar Deshpande
We have proposed a system to determine current sentiment on twitter using Twit-
ter API for open access which includes opinions from dierent content structures like
latest news, audits, articles and social media posts. and Deep Learning method to
study Historic Data for predicting future results. we utilized Naive Bayes and dictio-
nary based algorithms to predict the sentiment on Live Twitter Data.
The advent of social networks has changed the research in computer science. Now, the massive volume of data has present in the form of twitter, facebook, emails, IOT (Internet of Things). So, the storage and analysis of these data has become great challenge for researchers. Traditional frameworks have failed for the processing of large data. R is open source programming framework developed for the analysis of large data results better accuracy. It also gives the opportunity of the implementation in R programming language. In this paper, a study on the use of R for the classification of large social network data. Naïve Bayes algorithm is used for the classification of large twitter data. The experiment has shown that enormous amount of data can be sufficiently classified using the R framework with promising results.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
The document presents a proposed approach for sentiment analysis on big social data using Spark. It discusses collecting opinions from social media to analyze large events by tracking public behavior in real-time. The proposed system provides a adaptable sentiment analysis approach using Spark that analyzes social media posts and classifies them by subject in real-time. It also discusses using sentiment data from social media to inform decisions.
Political Prediction Analysis using text mining and deep learning.pptxDineshGaikwad36
Social media platforms have vast users connected platforms like Twitter has 330 Million,
Facebook has 2.2 Billion, Google+ has 111 Million and LinkedIn has 467 Million which is
enough to create an impression through social media. We have proposed a system to
determine current sentiment on twitter using Twitter API for open access which includes
opinions from different content structures like latest news, audits, articles and social media
posts. And Deep Learning method to study Historic Data for predicting future results.
Previous implementations of prediction analysis on Twitter Data were not successful to fulfill
analysis on live Twitter Data. Thus, we have proposed a system which will predict the result
on live Twitter data and also generate the statistical graph which classify the polarity of
positive and negative tweets. With the help graphs, reports, trends and tweets one can predict
the future results of the political party and also can be used to create the campaigns.
Sentiment Analysis in Social Media and Its OperationsIRJET Journal
This document summarizes a literature review on sentiment analysis in social media. It explores the styles, platforms, and applications of sentiment analysis. Most papers used either a dictionary-based approach or machine learning approach to analyze sentiment in social media text, with some combining both. Twitter was the most common social media platform used to collect data due to its large volume of public posts. Sentiment analysis has been applied in various domains including business, politics, health, and tracking world events. It can provide valuable insights for organizations and help improve products, services, and decision making.
Recommender System in light of Big DataKhadija Atiya
This document summarizes a research paper that investigates using singular value decomposition (SVD) to address challenges faced by recommender systems in light of big data. It discusses how collaborative filtering recommender systems are impacted by issues like scalability, sparsity, and cold starts with large datasets. The document then provides background on SVD and how it can be applied to collaborative filtering as a model-based approach. It describes an implementation of SVD-based recommender system using Apache Hadoop and Spark on a large dataset to validate its applicability for big data and evaluate the tools. The results showed SVD approach provides comparable performance to previous experiments on smaller datasets.
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
This document summarizes research on sentiment polarity analysis of Twitter data from different events. It discusses how Twitter data can be used for opinion mining and sentiment analysis. Several papers that used techniques like naive Bayes classifier, support vector machines, and dual sentiment analysis on Twitter data are summarized. The document also provides an overview of the key steps involved in a Twitter sentiment analysis system, including data collection, preprocessing, feature extraction, training a classification model, and evaluating accuracy. The goal of analyzing sentiments on Twitter is to understand public opinions on different topics and events.
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET Journal
The document discusses implementing a social customer relationship management (CRM) system for an online grocery shopping platform using customer reviews. It proposes collecting customer reviews from social media and other sources, refining the data, analyzing it using natural language processing and machine learning techniques, and storing the results in a database. This would allow the platform to better understand customer sentiment and needs to improve products, services and the customer experience.
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET Journal
This document presents a proposed system architecture for implementing a social customer relationship management (CRM) system for an online grocery shopping platform using customer reviews and sentiment analysis. The proposed architecture involves collecting customer reviews from social media, preprocessing and analyzing the data using natural language processing techniques like stemming, and storing the results in a database. Sentiment analysis is performed to categorize reviews by aspects and sentiment. The analyzed data is then presented to users through an interface to help the online grocery shopping platform better understand customer needs and improve products/services based on feedback.
A Hybrid Approach for Supervised Twitter Sentiment Classification ....................................................1
K. Revathy and Dr. B. Sathiyabhama
A Survey of Dynamic Duty Cycle Scheduling Scheme at Media Access Control Layer for Energy
Conservation .....................................................................................................................................1
Prof. M. V. Nimbalkar and Sampada Khandare
A Survey on Privacy Preserving Data Mining Techniques ....................................................................1
A. K. Ilavarasi, B. Sathiyabhama and S. Poorani
An Ontology Based System for Predicting Disease using SWRL Rules ...................................................1
Mythili Thirugnanam, Tamizharasi Thirugnanam and R. Mangayarkarasi
Performance Evaluation of Web Services in C#, JAVA, and PHP ..........................................................1
Dr. S. Sagayaraj and M. Santhosh Kumar
Semi-Automated Polyhouse Cultivation Using LabVIEW......................................................................1
Prathiba Jonnala and Sivaji Satrasupalli
Performance of Biometric Palm Print Personal Identification Security System Using Ordinal Measures 1
V. K. Narendira Kumar and Dr. B. Srinivasan
MIMO System for Next Generation Wireless Communication..............................................................1
Sharif, Mohammad Emdadul Haq and Md. Arif Rana
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
RUNNING HEADER: Analytics Ecosystem 1
Analytics Ecosystem 4
Analytics Ecosystem
Lisa Garay
Rasmussen College
Authors Note
This paper is being submitted for Anastasia Rashtchian’s B288 Business Analytics Course.
This paper looks at the nine clusters of the ecosystem. Clustering refers to a system of grouping functions that are similar so as to set them out from others. It begins by highlighting them before proceeding to defining them. It then identifies clusters that represent technology developers and technology users. Peer reviewed materials are used in this endeavor.
They include executive sponsor cluster which contains information that concerns administrators for directing the system. Another one is end-user tools and dashboards cluster that is made of functions that facilitate ability of persons to ultimately engage the system. Data owners cluster is made up of programs that are related to persons who have data in the system. Business users’ cluster is made up of functions that are related to clients of the system. Business applications and systems cluster is made up programs related to features of a given system. Developers cluster is made of programs that are related to the development of programs in the system. Analyst cluster is made up of materials that are related to analysis of data in the system. SME cluster that is made up switches that run SME applications in the system. Lastly, operational data stores that are made up of programs that are concerned with storage of data in a system (Pitelis, 2012).
While developers cluster is made up of technology developers in the system, business users’ cluster is made up of technology users in the system. In conclusion, clustering serves to bring roles together as well as separating roles that are not related in a system (Cameron, Gelbach & Miller, 2012).
They can be represented as follows:-
References
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2012). Robust inference with multiway clustering. Journal of Business & Economic Statistics.
Pitelis, C. (2012). Clusters, entrepreneurial ecosystem co-creation, and appropriability: a conceptual framework. Industrial and Corporate Change, dts008.
Infrastructure
Executive Sponsor Cluster
End-user tools and dashboards cluster
operational data stores
Data Owners Cluster
Business users' cluster
Business systems and applications cluster
Developers Cluster
Analysts Cluster
SME cluster
4
Running head: Sentiment analysis
Sentiment Analysis
Lisa Garay
Rasmussen College
Authors Note
This paper is being submitted for Anastashia Rashtcian’s B288 Business Analytics course.
Sentiment analysis has played a significant role in the concurrent marketing field, specifically in product marketing. According to Somasundaran, Swapna, (2010), the process’ operational module is structured on a data mining sequence, whereby the end users of given particulars the feedback pertaining a used.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
The document presents a new model for intelligent social networks based on semantic tag ranking. It uses a multi-agent system approach with agents performing indexing and ranking. For indexing, it uses an enhanced Latent Dirichlet Allocation (E-LDA) model that optimizes LDA parameters. Tags above a threshold from E-LDA output are ranked using Tag Rank. Simulation results showed improvements in indexing and ranking over conventional methods. The model introduces semantics to social networks to improve search and link recommendation.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
There are numerous ways to analyse the web information, generally web substance are housed in
large information sets and basic inquiries are utilized to parse such information sets. As the requests
expanded with time, mining web information amended to meet challenging task in a web analysis.
Machine learning methodologies are the most up to date one to go into these analysis forms. Different
approaches like decision trees, association rules, Meta heuristic and basic learning methods are embraced
for making web data appraisal and mining data from various web instances. This study will highlight these
approaches in perspective of web investigation. One of the prime goals of this exploration is to investigate
more data mining approaches alongside machine learning systems, and to express emerging collaboration
of web analytics with artificial intelligence.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
Framework for Product Recommandation for Review Datasetrahulmonikasharma
In the social networking era, product reviews have a significant influence on the purchase decisions of customers so the market has recognized this problem The problem with this is that the customers do not know how these systems work which results in trust issues. Therefore a different system is needed that helps customers with their need to process the information in product reviews. There are different approaches and algorithms of data filtering and recommendation .Most existing recommender systems were developed for commercial domains with millions of users. In this paper we have discussed the recommendation system and its related research and implemented different techniques of the recommender system .
A large-scale sentiment analysis using political tweetsIJECEIAES
Twitter has become a key element of political discourse in candidates’ campaigns. The political polarization on Twitter is vital to politicians as it is a popular public medium to analyze and predict public opinion concerning political events. The analysis of the sentiment of political tweet contents mainly depends on the quality of sentiment lexicons. Therefore, it is crucial to create sentiment lexicons of the highest quality. In the proposed system, the domain-specific of the political lexicon is constructed by using the supervised approach to extract extreme political opinions words, and features in tweets. Political multi-class sentiment analysis (PMSA) system on the big data platform is developed to predict the inclination of tweets to infer the results of the elections by conducting the analysis on different political datasets: including the Trump election dataset and the BBC News politics. The comparative analysis is the experimental results which are better political text classification by using the three different models (multinomial naïve Bayes (MNB), decision tree (DT), linear support vector classification (SVC)). In the comparison of three different models, linear SVC has the better performance than the other two techniques. The analytical evaluation results show that the proposed system can be performed with 98% accuracy in linear SVC.
Similar to Multi-Tier Sentiment Analysis System in Big Data Environment (20)
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Multi-Tier Sentiment Analysis System in Big Data Environment
1. Multi-Tier Sentiment Analysis System in Big
Data Environment
Wint Nyein Chan*, Thandar Thein**
* University of Computer Studies, Yangon, **University of Computer Studies, Maubin
Email: wintnyeinchan2012@gmail.com, thandartheinn@gmail.com
Abstract- Over recent years, big data, a huge amount of structured and unstructured data is generated from social
Network. There needs to extract the valulable information from the social big data. The traditional analytic platform needs to be
scaled up for analyzing social big data in an efficient and timely manner. Sentiment Analysis of social big data helps the
organizations by providing business insights with public opinion. Sentiment analysis based on multi-class classification scheme is
oriented towards classification of text into more detailed sentiment labels. Multi-class classification with single tier architecture
where single model is developed and entire labeled data is trained may increase the classification complexity. In this paper,
multi-tier sentiment analysis system on big data analytics platform (MSABDP) is proposed to reduce the multi class classification
complexity and efficiently analyze large scale data set. Hadoop is built for big data analytics and it is a good platform for being
able to manage large data at scale and which can improve scalability and efficiency by adopting distributed processing
environment since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The
MSABDP is implemented by combining SentiStrength lexicon and learning based classification scheme with multi-tier
architecture and run on big data analytics platform for being able to manage large data at scale. The proposed system collects a
large amount of real Twitter data by using Apache Flume and the data was used for evaluation. The evaluation results have
shown that the proposed multi class classification system with multi-tier architecture is able to significantly improve the
classification accuracy over multi class classification based on single-tier architecture by 7%.
Keywords: big data analytics, hadoop, multi class classification, sentiment analysis, social big data
1. INTRODUCTION
As the rapid growth of the Internet and online activity, many services such as blogging, podcasting, social
networking and bookmarking are popular. These services allow users to create and share information within open
and closed communities and contribute to the volumes of the data. Moreover, in social network, data is created and
delivered from various systems operating in real-time by aggregating constant information about user activities and
interactions. Twitter has 320 million monthly active users and they posts 500 million tweets every day; Facebook
has 936 million daily and 1,440 million monthly active users as of December, 2015. These factors are reasons of a
rise of Big Data [5]. Big data is characterized by the volume, velocity, veracity, variety, value and volatility of data.
Nevertheless, the appropriate tools are needed to acquire, organize and derive value and volatility of data.
With the advent of social media, data is captured from different sources, such as mobile devices and web
browsers, and it is stored in various data formats. The traditional storage and analytics platform can’t handle the
different sources and different formats of the structured and unstructured data. Big Data Analytics has become
popular for analysing and managing large volume of the structure and unstructured data [12]. Hadoop is a good
platform for Big Data Analytics as it provides scalability, cost-effective, flexible, fast, secure and authentication,
parallel processing, Availability and resilient nature. It is an open-source software framework comprises of two
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
204 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. parts: storage part and processing part. The storage part is called the Hadoop Distributed File System (HDFS) and
the processing part is called MapReduce.
Analysing social data by providing customer certification through sentiment analysis helps the organization to
determine marketing strategy and improve customer services. Sentiment analysis is one of the main agenda in big
data that focus on various ways to analyse big data to identify patterns and relationships, make informed predictions,
deliver actionable intelligence and gain business insight from this steady influx of information [13]. Twitter is one of
the popular social media data platforms, which combines features of blogs and social network services. Twitter was
established in 2006 and experienced rapid growth of users in the first years [15]. Twitter is a good source of
information in the sense of snapshots of moods and feelings as well as up-to-date events and current situation
commenting. Actually, achieving a high level of performance for sentiment analysis for twitter data is very difficult.
At first, the large amount of volume and high velocity of twitter stream data is needed to be efficiently stored and
processed. Second, Twitter allows users to post no more than 140 characters for each post, which makes Twitter
sentiment classification can be lower performance than that when mining longer texts. In addition, classification into
multiple classes can increase the classification complexities, which tend to decrease the classification accuracy.
In this paper, Sentiment analysis system is proposed which is implemented by combing lexicon based and
learning based approaches, and shows that the proposed system is good performances in terms of classification
accuracy. The main contributions of this paper are as follows:
1. Big Data Analytics Platform is developed to scale up the traditional analytics platform for analyzing social
big data
2. Multi-tier sentiment analysis on Big Data Analytics platform (MSABDP) is proposed To reduce the multi
class classification complexity.
The remainder of this paper is arranged as follows: In section 2, the related work presents large scale sentiment
analysis in a distributed environment and multi class classification. Section 3 illustrates the system design to analyze
the social big data sets and presents the implementation of the proposed system. Section 4 shows the experimental
setup and results. Section 5 concludes the paper and describes the possible directions for future works.
2. RELATED WORK
In the past years, many works has been released in sentiment analysis, but there is a little work for sentiment
analysis on big data. There exist many possible variants; some of the papers related with the proposed system are
reviewed in this section.
The paper [11] presented multi-tier classification architecture for sentiment analysis. The architecture consists of
three models. The first model classifies three classes and the remaining two models are performed as the binary
classifier. To achieve high performance, features were selected by applying various feature selection techniques.
150,000 movie reviews posted on social media were used to train and test the performance of the system. Four
classifiers (Naïve Bayes, SVM, Random Forest, and SGD) were used in the experiments. The result showed that the
performance of proposed architecture was significantly improved prediction accuracy over the simple single tier
model by more than 10%. The proposed system runs on traditional analytics platform and classifies the movie
reviews into multi class by applying only supervised machine learning classifiers.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
205 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. The authors [10] presented an approach that relies on writing patterns and special unigrams for multiclass
sentiment analysis. They extracted patterns that rely on PoS -Tag of words and then calculate the resemblance
degree res (p, t) of each pattern in the training set p to the tweet t. For each tweet, they extracted a 4 set of features;
referred to the training set and use Random Forest machine learning algorithms to perform the classification.
Training data set contains 21000 tweets which had been manually classified the class. The system classified the texts
which collected from Twitter into seven classes: happiness, sadness, anger, love, hate, sarcasm and neutral. In the
experiment, the accuracy of the multiclass sentiment analysis is 56.9%. Since the training data are manually
classified the class, no more methods were required to develop the training data sets.
In [17], As the total volume of tweets is extremely high and it taks a long time to process, the authors proposed a
distributed system for Twitter sentiment analysis with two components: lexicon builder and a sentiment classifier.
The lexicon builder was implemented based on Hadoop and HBase and they demonstrated that it scales with
increasing number of machines and the amount of training data. In particular, for 100000 tweets, the training time
was decreased by 35% when moving from 2 to 3 machines, and 40% for 2 to 4 machines, 47% for 2 to 5 machines.
For 300000 tweets, the running time was decreased by 23% when moving from a cluster with 4 machines to 5
machines. To achieve higher accuracy than the lexicon-based approach, the lexicon-based and machine-learning
based approaches are combined. Online Logistic Regression from the mahout machine learning library was used for
learning based approach. The experiment results show that their system can obtain higher accuracy by combining
lexicon-based and learning based approaches. The proposed system used the existing class labeled dataset and
classified into binary class.
B.Liu et al. [4] presented a simple and complete system for sentiment mining on large data sets using a Naive
Bayes classifier. To achieve fine-grain control of the analysis procedure, they implemented the NBC on top of
Hadoop framework with additional four modules: the work flow controller (WFC), the data parser, the user terminal
and the result collector. They evaluated the scalability of NBC in large data set and the result is encouraging in that
the accuracy of NBC is improved and approaches 82% when the dataset size increases. They have demonstrated that
NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput. The two
datasets in the experiments had already been labeled the class. Instead of applying Mahout Machine learning library,
NBC is implemented on top of Hadoop framework with additional modules.
In the proposed system, multi-tier sentiment analysis system is developed on big data analytics platform to reduce
the multi class classification complexity. Multi-tier sentiment classification is implemented by combining lexicon
and Supervised learning-based approaches. In this system, SentiStrength is used to label the class in order to develop
the training data sets. Mahout Naïve bayes with multi-tier architecture is used for multi class sentiment
classification. The proposed system can improve the scalability and efficiency by adopting distributed processing
environment since they are implemented using HDFS and MapReduce framework.
3. BIG DATA ANALYTICS PLATFORM DEVELOPMENT
Multi-tier Sentiment Analysis system is implemented on Big Data Analytics Platform with Apache Flume
[14], HDFS, MapReduce [6] and Mahout Machine learning library [3]. High level architecture of the MSABDP is
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
206 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. illustrated in Figure 1. It consists of four layers: Data Ingestion Layer, Storage Layer, Processing Layer, and
Analytics Layer.
In Data Ingestion Layer, data are collected from Twitter steaming API with Apache Flume. The collected data is
stored in Hadoop File System (HDFS), distributed storage, which is located in Storage Layer. One name node and
three data nodes are developed for distributed storage. Data cleaning and preprocessing, class labeling and multi-tier
classification are developed in Analytics Layer. All of the processes from Analytics Layer are implemented with
Map Reduce paradigm which is used for distributed processing and located in Processing Layer. The detail process
description of each layer is described the following subsection. The system can provide the capabilities such as
scalability, cost effective, and parallel processing by using Hadoop platform.
Processing Layer
Data
Ingestion
Layer
….
ApacheFlume(DataCollection)
Analytics Layer
Data Cleaning and Preprocessing Class Labeling Mahout Machine Learning(Multi-tier Classification)
Slave Node 1
Data Node
Node Manager
Map Task 2
Reduce Task 1
Containers
Slave Node 2
Data Node
Node Manager
Application Master
Map Task 1
Containers
Slave Node n
Data Node
Node Manager
Map Task N
Reduce Task M
Containers
Scheduler
Application Manager
Resource ManagerYarn
Master
Node
HDFS
Master
Node
Name Node
S Name Node
Processing Layer
Container Launch RequestNode Status Resource Request
Map Reduce Coordination Heart beats, balancing, replication
Figure1. High Level Architecture of MSABDP
3.1 Data Ingestion Layer
In this layer, Apache Flume is used to collect the tweet stream data and the data is ingested to HDFS through the
memory channel. Apache Flume is deployed by Twitter Agent which has 3 main components: a TwitterSource, a
Memory Channel and a HDFS Sink. The incoming tweets are limited by keywords and language in the
TwitterSource. The source comes as an event-driven source and receives events through mechanisms like callbacks
or RPC calls. In this TwitterSource, the twitter4j library is used to keep access to the Twitter Streaming API. The
application secrets like Consumer Key and Consumer Secret, Access Token, Access Secret have been used to
connect to the Twitter APIs. The memory channel acts as a pathway between the TwitterSource and HDFS Sink.
The channel uses as an in memory queue to store events until they’re ready to be written to a sink. As the channel
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
207 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. holds all events in memory, the channel’s capacity and transaction capacity are limited by the capacity and
transaction capacity parameters in the configuration file. Events are added to the channel by TwitterSource, and later
removed from the Channel by HDS Sink. HDFS Sink, which writes events to a configured location in HDFS. In the
HDFS Sink configuration, the size of files can be defined with the roll count parameter and set up to 10,000, so each
file will end up containing 10,000 tweets. The original data format is retained by setting the file Type to Data
Stream. The file path is defined as that the files will end up in a series of directories for the year, month, day and
hour during which the event occurs. The timestamp is set to true in the configuration and which is used by Flume to
determine the timestamp of the event, and is used to resolve the full path where the event should end up.
3.2 Storage Layer
In Storage Layer, HDFS is used to provide scalable and reliable data storage. HDFS serves master/slave
architecture and single NameNode serve as a master server. Name Node executes file system namespace operations
like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes.
DataNodes is used to store the actual data in HDFS. The input file is split into one or more blocks and these blocks
are stored in a set of DataNodes. Each block size is 64 MB. DataNodes are responsible for serving read and write
requests from the clients. The DataNodes also perform block creation, deletion, and replication upon instruction
from the NameNode.
3.3 Processing Layer
Yarn and MapReduce-2 is located in the Processing Layer to process vast amounts of data in-parallel on clusters
of commodity hardware in a reliable, fault-tolerant manner. The MapReduce job splits the input data set into
independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the
outputs of the maps, which are then input to the reduce tasks. Both the input and the output of the job are stored in
HDFS. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. The
MapReduce framework consists of a single master ResourceManager, one slave NodeManager per cluster-node, and
MRAppMaster per application. The application specifies the input/output locations and supply map and reduce
functions via implementations of interfaces and abstract-classes. These, and other job parameters, comprise the job
configuration. The Hadoop job client then submits the job and configuration to the ResourceManager which then
assumes the responsibility of distributing the software and configuration to the slaves, scheduling tasks and
monitoring them, providing status and diagnostic information to the job-client.
3.4 Analytics Layer
In this layer, sentiment analysis is performed by combing lexicon based and machine learning based approaches.
SentiStrength lexicon based scheme is applied for class labeling and the class labeled data is used as the training
data to build the multitier classification model. Data cleaning and preprocessing have been performed to develop the
effective classification model. Mahout scalable machine-learning library, specifically designed to use Hadoop for
enabling scalable processing of huge data sets, is used to develop the classification model.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
208 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. 4. SENTIMENT ANALYSIS SYSTEM FOR SOCIAL BIG DATA
4.1 Sentiment Analysis
Sentiment analysis [7] is an increasingly popular text mining method to determine the opinion of a text. It is
often referred to as opinion mining. Sentiment analysis uses individual elements of a text on different text levels,
like whole document, paragraph, sentence or only a text window, and tries to correctly determine and assign
particular moods and emotions to the respective entities, also called objects. The challenge of this analysis [9] is to
replicate a human process understanding of the text information assignment of the individual polarized information
and interpretation of the human mind. The simplest way to do sentiment classification is using the lexicon-based
approach, also known as a bag of words, or dictionary-based approach. This allows more accurate assessment of
the individual components of the text. The machine learning techniques have widely applied in the text
classification area and most of them are supervised learning classification methods. Acquiring effective training
data is a challenge, although learning based approaches outperforms the lexicon-based (Pang et al.,). Manual
Labeling for training data is time and labor consuming.
4.2 Sentiment Analysis of Social Big Data
The data from social media [13] is multistructured (Variety), very large scale and distributed (Volume) and
sometimes needs to be analyzed in a real time manner (Velocity). In addition, the fourth V, Value, is needed for
sentiment analysis on social data. The specification of such data source is known as the 4 Vs aspect in big data
domain. Sentiment analysis on the social media data helps the organization to determine marketing strategy by
providing public opinion. Efficient techniques to collect social data and extracting valuable information from
collected data are essential demand. Traditional sentiment classification techniques do not perform in social data. In
this work, Mult-tier Sentiment Analysis System on Big Data Analytics Platform is proposed for analyzing social big
data.
4.3 Mult-tier Sentiment Analysis System on Big Data Analytics Platform
The MSABDP is developed with four processes: data collection, data cleaning and preprocessing, class labeling
and multi-titer sentiment classification. Apache Flume is used to collect the real twitter data. As the collected tweets
include noise and irrelevant data, data cleaning and preprocessing are performed in order to produce normalized data
for labeling the class. The sentiment classification is developed by combining lexicon-based classifier and
supervised machine learning approach. The classification process is implemented on Map Reduce software
framework to support distributed computing, and it allows parallel programming using the function concept called
Map functions. Instead of labeling the class manually, SentiStrength lexicon based classifier is used for the task of
annotating the training data for the learning-based classifier. Concurrently, the tweets text data is preprocessed to
improve the accuracy of classification. And then preprocessed data with class label are combined as the input of
learning based classification. In reduce stage, the result data are combined and it produces a new set of output. And
the classification model is developed by using the mahout machine learning library. The newly coming test data is
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
209 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
7. classified using the model in order to produce the classified tweets. The process flow of the proposed system is
described in Figure 2.
Twitter Service(Twitter
Stream API )
Twitter
Source
HDFS
Sink
Memory Channel
Data Collection
FLume
Data Cleaning and Preprocessing
New Instances
SP : strongly_positive; P: positive; Neu: neutral; SN: strongly_negative; N: negative
Multi-tier Classification
Removing Character Repetitions
Removing StopWords
Negation Handling
Removing Noisy Data
Selecting Tweet Text feature
Removing Duplicate Tweets
Replacing Emoticons with their Sentiment
Values
Replacing Abbreviation with their
expression
Training
Data Set
Testing
Data Set
Class
Labeled
Data Set
Machine Leaning Classifier
NegPos
Model I
(SP/P (Pos), Neu, SN/N(Neg) )
Classification Model
SP P SN NNeu
Model II
(SP, P)
Model III
(SN, N)
Class Labeling
Sentiment Score Calculation
Class Labeling
Lexicon
Figure2. Process Flow Diagram of MSABDP
5. Data Collection
The real tweet data are collected by using Apache Flume [9] and the collected real data is used to evaluate the
system. Apache Flume deploys the Twitter Agent in order to collect data that is generated from TwitterSource
(Twitter Stream API) and transferred over to HDFS through MemoryChannel.
5.1. Twitter Source
Twitter source processes events and moves them along by sending the stream data into a Memory channel
[10]. The sources operate by gathering discrete pieces of data, translating the data into individual events, and then
using the channel to process events one at a time, or as a batch. . In the system, custom Twitter Source is used as a
data source. Data are limited by key words and language. In the TwitterSource, the twitter4j library is used to keep
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
210 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
8. access to the Twitter Streaming API. In order to connect to the Twitter APIs, some of the application specific secrets
like Consumer Key and Consumer Secret, Access Token, Access Secret are needed to be used. To get the
application specific secrets, it is needed to create a Twitter application. The Twitter application is created by the
following link https://apps.twitter.com/. It generates Consumer key, Consumer secret, Access token, and Access
token secret.
5.2. Memory Channel
The channel acts as a pathway between the TwitterSource and HDFS Sink. Events are added to the channel by
TwitterSource, and later removed from the Channel by HDFS Sink. It is uses as an in-memory queue to store events
until they’re ready to be written to a sink. As the channel holds all events in memory, the channel’s capacity and
transaction capacity is limited by the “capacity” and “transaction Capacity” parameters in the configuration file. In
the work, the channel’s capacity is set up to 10000 and the transaction capacity is setup to 100.
5.3. HDFS Sink
HDFS Sink, which writes events to a configured location in HDFS. In the HDFS Sink configuration, defines the
size of the files with the roll Count parameter and set up to 10,000, so each file will end up containing 10,000
tweets. It also retains the original data format, by setting the file Type to DataStream and setting writing Format to
Text.
6. Data Cleaning and Preprocessing
As flume ingests the data as the nested json format and it may contain irrelevant and duplicated data, this data has
to be cleaned and preprocessed for effective analysis. Selecting tweet text features, removing duplicate tweets,
removing noisy data, removing character repetition, removing stopwords and negation handling is performed during
this process. The process not only simplifies the classification task, but also serves to greatly decrease the processing
cost in the training phase.
6.1. Selecting Tweet Text Feature
Many tweet data features are included in one record of tweet stream data as shown in Figure 3. The sentiment
classification system is focused on tweet text feature with English language. The tweet text feature is selected
among other feature because it expresses twitter users’ feeling and opinion. The tweet id feature is also selected to
assign as a key and tweet text feature is assigned as a value in Map Reduce process. To select tweet id and tweet text
feature, the collected tweet data with json format is fetched as a json object using json parser. And “tweets_id” and
“tweets_text” are extracted as a string from json object.
6.2. Removing Duplicate Tweets
Twitter Stream data may include many duplicate retweets. To remove duplicate retweets, the extracted
“tweets_id” and “tweets_text” is checked whether already stored or not in the list of tweet data. If the extracted
features have not been already stored, they are added as the list of tweet data. Otherwise, they are removed. Table 1
shows the selected sample tweet_text and tweets_id features.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
211 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
9. 6.3. Removing Noisy Data
The term noisy data is used to describe any piece of information within the tweet that will not be useful for the
machine learning algorithm to assign a class to that tweet. There are included noisy data such as character
repetitions, website links with URL, @username, punctuation additional white space Replace hash tags with the
same word without the hash tags. For example, #fun is replaced with fun. Replaced website links with URL so links
to websites that start with www.* or http. Converted @username to “usermentionsymbol by replacing @username
instances found in tweets with “usermentionsymbol” for the classifier to easily identify that a user is being
referenced. Non-Alphabets are replaced with spaces.
6.4. Removing Character Repetition
To remove the character repetition, there is needed to be checked whether the repetitive characters contain or not
in the input data. To check whether the presence of repetitive character, the former and pervious characters are
compared by indexing word in one record of tweet. If the previous and current characters are the same, the character
is counted. If the quantity of character is two or more, the procedure returns as true. Otherwise, it returns false. If the
repetitive characters are found and the character count is more than 2, replace the character itself by deleting the
repetitive characters with substring function.
6.5. Removing Stopwords
When working with text classification methods, removal of stopwords is a common approach to reduce noise in
the data. In this work, not only common stopwords but also stopwords based on classification domain are considered
by manually examining the data. For example, domain stopwords contain iphone, apple, mobile, etc.
6.6. Negation Handling
Negation handling is one of the factors that significantly affect the accuracy of learning based classifier. For
example: the word “good” in the phrase “not good” will be contributing to positive sentiment rather that negative
sentiment as the presence of “not” before it is not taken into account. It transforms a word followed by a not or n’t
into “not_” + word.
6.7. Replacing Emoticons with Sentiment Values
As the maximum length of a tweet is 140 characters, the case of emoticon will represent the sentiment of that
tweet. [7] Therefore, emoticons can be replaced with their sentimental values (such as “feeling sad” and “feeling
happy”) by using happy and sad emotion dictionary. The happy emoticon dictionary is used to label with 90
emotions and sad emoticon dictionary is used to label with 107 emoticons. For example, “:)” is labeled as “feeling
happy” whereas “:=(” is “feeling sad”. Table 3 shows the sample emoticons and their sentiment values. The
preprocessed tweets that replaced emoticons can be seen in Table 1.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
212 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
10. Table 1: Sample Emoticons and their Sentiment Values
6.8. Replace Abbreviation with their expansions
As the abbreviation in Tweets can boost the sentiment values, it is replaced with their expansions by using
abbreviation dictionary. The dictionary has translations for 330 abbreviations [2]. The most frequent used
abbreviations can be seen in Table 2.
Table 2: Sample abbreviations and their expansion
7. Class Labeling
Instead of manual labeling the class, SentiStrength lexicon based approach is used for annotating the training data
of the learning-based classifier. SentiStrength [16], a lexicon-based classifier, uses additional (non-lexical) linguistic
information and rules to detect sentiment strength in short informal English text. The contextual valence shifter:
negation and intensifier are used to evaluate the context sentiment and to solve the context dependent problem by
applying NegationWordList and BoosterWordList. There are consists of eight dictionaries: BoosterWordList,
EmoticonLookupTable, EmotionLookupTable, EnglishWordList, IdionLookupTable, NegationWordList,
QuestionWords and slangLookupTable.
EmotionLookupTable consists of 2546 sentiment words with their strength. Some words include Kleene star
stemming (e.g., ador*). The word “miss” is a special case with a positive and negative strength of 2. It is frequently
used to express sadness and loves simultaneously. A spelling correction algorithm deletes repeated letters in a word
when the letters are more frequently repeated than normal for English or, if a word is not found in an English
dictionary, when deleting repeated letters creates a dictionary word. A booster word list is used to strengthen or
weaken the emotion of sentiment words. A booster word list consists of 28 words and their strength of sentiment.
Some booster words are “totally, completely and might”. An idiom list is used to identify the sentiment of a few
common phrases. This overrides individual sentiment word strengths. Negations are used to reverse the semantic
polarity of a particular term, and skip any intervening booster words. Default multiplier for negated words is 0.5. In
this case, instead use of default multiplier for negated words, the multiplier is set to 1. At least two repeated letters
added to words give a strength boost sentiment words by 1. For instance, haaaappy is more positive than happy. An
emoticon list with polarities is used to identify additional sentiment. The emoticon list consists of 116 common
emoticon words and their polarity strength (-1 or 1). Sentences with exclamation marks have a minimum positive
strength of 2, unless negative. Repeated punctuation with one or more exclamation marks boost the strength of the
Emoticons Sentiment Values
:-) :) :o) :] :3 :c) Feeling happy
:-( :( :c :[ Feeling sad
Abbreviations English expansion
gr8,gr8t great
lol Laughing out loud
rotf Rolling on the floor
bff Best friend forever
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
213 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
11. immediately preceding sentiment word by 1. Negative sentiment is ignored in questions. For each text,
SentiStrength outputs a positive sentiment score from 1 to 5 and a negative score from -1 to -5.
Figure3 shows the procedure for calculating the sentiment score by applying SentiStrength. The procedure is
performed with Map Reduce function. At the Mapper stage, the collected raw data is parsed with the JsonParser in
order to select the tweets and tweets_id. After performing data cleaning and preprocessing, the sentiment score is
calculated with SentiStrength. If the sentiment score is greater than 1, the output label is “strongly_positive. If the
sentiment score is greater than 0 and equal with or greater than 1, the output is “positive”. For strongly_negative, the
score is less than -1. If the score is less than 0 and equal to or greater than -1 is negative. Otherwise, the output is
neutral. The Reduce stage is not necessary due to the fact that the result combination is not needed. Currently, the
Reduce stage outputs the results obtained by the Mappers.
Figure3. Class Labeling Procedure
8. Multi-tier Classification
In order to implement the multi-tier classification, there are two main parts: multi-tier classification model
development and classification by developed model. The classification model is developed with multi-tier
architecture and the new data (unknown data) are classified by the developed models.
8.1. Multi-tier Classification Model Development
Developing the classification model is a vital part of the proposed sentiment analysis system. Mahout naive Bayes
classifier, scalable machine learning algorithm, is conducted to develop the classification model. Naïve Bayes is a
learning algorithm that is frequently employed to tackle text classification problems. It is computationally very
efficient and easy to implement. Preprocessed class labeled data set is split into training and testing datasets in order
Procedure: Class Labeling Job
1. Input: raw tweets(twitter text data) //k:key, v:value
2. Output: k: preprocessed tweet & tweets_id, v: class label
3. Function MAPPER(key, values)
4. while(value € values):
5. Parse the data (tweet, tweet_id) with JsonParser
6. Perform data cleaning and preprocessing
7. Calculate the total polarity strength for each sentence by applying SentiStrength_Data
8. If (score > 1) then print “strongly_positive”
9. Else if (score > 0 && score <= 1) then print “positive”
10. Else if ( score <0 && score >- 1) then print “negative”
11. Else if (score <- 1) then print “strongly_negative”
12. Else if (score==0) then print “neutral”
13. emit(tweet & tweet_id, class label)
14. End
15. end while
16. Function Reducer(key, values)
17. emit(tweet & tweet_id, class label)
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
214 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
12. to build the classification model. The input data is transformed into the sequence file. As this sequence file consists
of key value pairs, class category and tweets_id are set to key and tweet texts are set to value. Feature generation is
performed by creating sparse vector. In feature generation, TFIDF feature vectors are used for improving the
performance of classification models. For multi-tier architecture, three classification models are developed and each
model inherits the same configuration as the first model. Figure 4 illustrates the procedure for developing
classification model I. To develop the first model (Model I), class labeled datasets which class category is “P” and
“SP” is identified as “Pos” and the class category which class category is “N” and “SN” is identified as “Neg”. In
model I, all of the labeled datasets (class category is Pos, Neu, Neg) are used to train the classifier and new test
instance (unlabeled data) is classified into three classes (Pos, Neu, Neg). For the first model, positive and
strongly_positive sentiment data are classified as “Pos” and negative and strongly_negative sentiment data are
classified as “Neg”. Neutral sentiment data is still labeled as “Neu” in the Model I and the neutral class are directly
appended to the final classification result. The new instances which are classified as “Pos” are going to Model II and
other instances are going to Model
III.
Figure4. Procedure for Developing Classification Model I
In model II, the labeled datasets which class category is “P” and “SP” are used to train the classifier and the test
instances which class category is “Pos” are classified into two classes: “positive” and “strongly_positive”. The
procedure for developing Model II is illustrated in Figure5.
Figure5. Procedure for Developing Classification Model II
Figure 6 shows the procedure for developing classification model III. In model III, the class labeled datasets
which class category is “N” and “SN” are used to train the classifier and new test instances which class category is
“Neg” are classified into two classes: “negative” and “strongly negative”.
Procedure : Developing Classification Model I
Input: Class Labeled data (Pos, Neu, Neg)
Output: Classification model I
1. Begin
2. Transform input data to Sequence File
3. Generate TFIDF feature vector to create the model
4. Develop classification model I
5. End
Procedure : Developing Classification Model II
Input: Class Labeled data (P,SP)
Output: Classification model II
1. Begin
2. Transform input data to Sequence File
3. Generate TFIDF feature vector to create the model
4. Develop classification model II
5. End
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
215 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
13. Figure6. Procedure for Developing Classification Model III
8.2. Classification by Developed model
Figure7. Distributed Classification Procedure
The three trained models are used to classify the new instances. For classification, naïve bayes classifier uses
probabilities to decide which class best matches for a given input text. Word id and tfidf weight are used to create
vector for the new tweet. The naïve bayes classifier is classified by using the vector. For model I, the score of three
class label is calculated. The “bestscore” is set to “-Double.MAX_VALUE”. The “bestcategoryId” is set to “-1” and
“categoryId” is set to index of classification result. If the indexed score is greater than “bestscore”, bestcategoryId is
replaced with categoryId. If the “bestcategoryId” is equal with “1”, the classifier classify as “Pos”. If the
“bestcategoryId” is equal with “0”, the classifier classify as “Neu”. Otherwise, it classify as “Neg”. For model II,
Procedure : Developing Classification Model II
Input: Class Labeled data (N,SN)
Output: Classification model II
1. Begin
2. Transform input data to Sequence File
3. Generate TFIDF feature vector to create the model
4. Develop classification model III
5. End
Procedure : Classification Job
Input: New Collected Tweet Stream Data
Output: k : tweet, tweet_id, v: class category
1. Function Mapper(k1, v1)
2. while(value € values):
3. Performed Data cleaning and Preprocessing
4. Create feature vector by using wordid, tfidf value
5. Classify feature vector of the new instances by applying Model I
6. Print classified results(“Pos”, “Neu”, “Neg”)
7. If (results= = “Pos”)
8. Classify feature vector of tweet(class category is “Pos”) by applying Model II
9. Print classified results(“positive”, “strongly_positive”)
10. End if
11. Else If (results= = “Neg”)
12. Classify feature vector of tweet(class category is “Neg”) by applying Model III
13. Print classified results(“negative”, “strongly_negative”)
14. End if
15. Else Print classified results(“neutral”)
16. Emit(tweet & tweet_id, class category)
17. End while
18. End function
19. Function Reduce(k2, v2)
20. Function Reducer(key, values)
21. emit(tweet & tweet_id, class category)
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
216 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
14. the score of two class label is calculated. If the “bestcategoryId” is equal with “1”, the classifier classify as
“strongly_positive”. Otherwise, the classifier classify as “positive”. For model III, the score of two class label is
calculated. If the “bestcategoryId” is equal with “1”, the classifier classify as “strongly_negative”. Otherwise, the
classifier classify as “negative”. The algorithm is considered naive because it assumes that the value of a particular
feature is independent of the value of any other feature, given the class variable. Laplace smoothing is performed
with value of α set to 1.
9. Result Analysis
In this work, experimental parameters for implementing the proposed system, the dataset and explanations about
evaluation result are described.
9.1 Experimental Environment
The specifications of devices and necessary software component of proposed system are described in table 3.
Table3. Experiment Parameters
Parameters Specification
Server/Client OS Ubuntu 14.04 LTS
Host Specification Intel ® Core i7-3770 CPU @ 3.40GHz,
8GB Memory, 1TB Hard Disk
VMs Specification 4GB RAM, 100 GB Hard Disk
Software Component - Hadoop 2.7.1
- Flume 1.6
- SentiStrength 2
- Mahout 0.10.0
9.2 Datasets
In order to test the functionality of the proposed system and prove the achieved results with promising accuracy,
tweets stream data related with iphone and samsung mobile product is examined. 200,000 tweet data are utilized as
the training datasets and 1000 new batch of tweets are applied as the test set.
9.3 Evaluation Results
In the experiment, the cluster is composed of 4 computing nodes (VMs) with one name node and three data
nodes. Data collection to classification model development module is developed as the offline process. At another
side, new batch of tweets are collected, and the new tweets are classified by applying the multitier classification
model.
To present the evaluation results of the system, beginning from evaluating the performance of lexicon based
classifier. To establish the ground truth, the evaluation result of SentiStrength lexicon based classifier is compared
with manual classification. In this work, 5,000 tweets are randomly selected from training data of two mobile phone
brand and then evaluate the result of SentiStrength lexicon based classification.
Figure 8 illustrates the comparative results of lexicon based classification and manual classification for iphone.
The results show the tweets Percentage of classification with SentiStrength based approach for positive,
strongly_positive, negative, strongly_negative, neutral class labels are 17, 7, 14, 3 and 59. And manual classified
Tweets percentages for positive, strongly_positive, negative, strongly_negative, neutral are 22, 10, 12, 4, and 52.
Therefore, error rate for lexicon based classification is 5% in positive, 3% in strongly_positive and 2% in negative,
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
217 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
15. 1% in strongly_negative, and 7 % in neutral. Therefore, the overall accuracy rate is 82%and error rate is 18% for
iphone data.
Figure8. Comparison of SentiStrength lexicon based classification and Manual Classification for iphone
Figure9. Comparison of SentiStrength Lexicon based Classification and Manual Classification for samsung
The comparative results of SentiStrength lexicon based classification and manual classification for samsung is
illustrated in Figure 9. The results show the tweets Percentage of classification with SentiStrength based approach
for positive, strongly_positive, negative, strongly_negative, neutral class labels are 20, 5, 16, 6 and 53. And manual
classified Tweets percentages for positive, strongly_positive, negative, strongly_negative, neutral are 24, 8, 11, 8,
and 49. Therefore, error rate for lexicon based classification is 5% in positive, 3% in strongly_positive and 2% in
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
218 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
16. negative, 1% in strongly_negative, and 7 % in neutral. Therefore, the overall accuracy rate is 84%and error rate is
16% for samsung.
To evaluate the performance of multi-tier classification, two set of experiments are implemented. Reference data,
which is correctly classified instances by using the preceding models, are used in the first case. In this case, only
local mistakes are identified as the misclassified instances from the previous model have not been considered. For
second case, the accuracy is computed by using global data which is based on the subsequent results of preceding
models. The global data may contain the misclassified instances from all of the hierarchical classification models.
Table3. Comparison of single-tier and multi-tier result for iphone
Table4. Comparison of single-tier and multi-tier result for samsung
Table 3 shows the comparative results of single-tier and multi-tier classification for iphone. The results show a
higher accuracy in reference data while it is slightly lower for global data in multi-tier classification. The multi-tier
Architecture Classified as Accuracy(%) Overall
Accuracy(%)
Multi-tier classification
(with reference data)
Model I Pos
Neg
Neu
83.4
59.7
69.4
82.37Model II positive
strongly_positive
71.7
77.4
Model III Negative
Strongly_negative
68.2
44.8
Multi-tier classification
(with global data)
Model I Pos
Neg
Neu
83.4
59.7
69.4
80.03Model II positive
strongly_positive
42.8
59.6
Model III negative
strongly_negative
66.7
61.4
Single-tier classification
neutral
positive
strongly_positive
negative
strongly_negative
82.1
41.3
67.9
46.2
48.4
75.24
Architecture Classified as Accuracy(%) Overall
Accuracy(%)
Multi-tier classification
(with reference results)
Model I Pos
Neg
Neu
81.4
71.7
59.4
80.14Model II positive
strongly_positive
71.7
77.4
Model III negative
strongly_negative
48.2
59.8
Multi-tier classification
(with global results)
Model I Pos
Neg
Neu
80.4
49.3
53.4
78.13Model II positive
strongly_positive
32.9
54.9
Model III negative
strongly_negative
67.1
68.2
Single-tier classification
neutral
positive
strongly_positive
negative
strongly_negative
82.7
42.1
66.5
39.9
43.7
74.53
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
219 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
17. classification (with reference data) is higher than single-teir with 7% and the multi-tier classification (with global
data) is higher than single-tier with 5%. The comparative results of single-tier and multi-tier classifications for
Samsung are illustrated in Table 4. The results show the multi-tier classification (with reference data) is achieved
higher accuracy than multi-tier classification (with global data) and single-teir with 4% and 6% respectively.
Figure 10. Comparison Result for Performance (with Running Time)
In order to test the scalability of the system, the MSABDP run the job with different number of tweets and with
different number of nodes. Each node is developed on each machine. Figures 10 show the scalability of MSABDP
and single node cluster to four node cluster. According to the result, the running time of MSABDP with different
volumes of data decreases when adding more nodes into the cluster.
10. CONCLUSION
In this paper, Big Data Analytics Platform (Hadoop) is developed to scale up the traditional analytics platform for
analyzing social big data . Multi-tier sentiment analysis on Big Data Analytics platform (MSABDP) is proposed to
reduce the multi class classification complexity. Apache Flume is used to collect a huge amount of real twitter data
and MSABDP classifies the real twitter data into five classes: strongly_positive, positive, neutral, negative, and
strongly_negative. The sentiment classification is implemented by combining lexicon and supervised machine
learning-based approach. The system enables high-level performance of learning based classification while taking
advantage of the lexicon-based classifier’s effortless setup process. To reduce the multi class classification
complexity, learning based approach with multi-tier architecture is applied to classify the multi class. The proposed
system evaluates real twitter data of iphone and samsung mobile phone product. The Evaluation result shows that
the reliability of the performance of SentiStrength lexicon based classification by comparing manual classification
results and achieved the accuracy rate is 84% for iphone and 82% for samsung. To evaluate the multi class
classification scheme with mult-tier architecture, two set of experiments are implemented with reference data and
global data. For iphone product, the multi-tier classification (with reference data) is higher than single-teir with 7%
and the multi-tier classification (with global data) is higher than single-tier with 5%. For samsung, the multi-tier
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
220 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
18. classification (with reference data) is achieved higher accuracy than multi-tier classification (with global data) and
single-teir with 4% and 6% respectively. The running time of MSABDP with different volumes of data decreases
when adding more nodes into the cluster. In future works, the proposed sentiment analysis system will be
implemented with Spark and Mahout Samsara. The regarding future work, more experiments by more classifier and
data from another domain will be experimented.
REFERENCES
[1] A.Tsakalidis, S.Papadopoulos,I.Kompatsiairs, “ An Ensemble Model for Cross-Domain Polarity Classification on Twitter”, 15 th Web
Information System Engineering, Thessaloniki, Greece, 14 October 2014
[2] Abbreviation List. Available at “http://www.englishclub.com/eslchat/abbreviations.html.” [Online; accessed 10-June-2016].
[3] Andrew C. Oliver. “Machine-learning-with-mahout”. Available: http://www.infoworld.com/article/2608418/application-
development/enjoy-machine-learning-with-mahout-on-hadoop.html” , May 29, 2014, [Online; accessed 3-December-2016].
[4] B.Liu, E.Blasch, Y.Chen, D.Shen, G.Chen, “Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier”,
IEEE International Conference on Big Data, IEEE, (2013), pp. 99-104.
[5] G. Vaitheeswaran, Dr. L. Arockiam,”Combining Lexicon and Machine Learning Method to Enhance the Accuracy of Sentiment
Analysis on Big Data”, International Journal of Computer Science and Information Technologies, Vol.7(1) , 2016, 306-311.
[6] Hadoop Yarn. Available at “https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html.” [Online; accessed 20-
August-2016]
[7] Harsh Thakkar and Dhiren Patel, “Approaches for Sentiment Analysis on Twitter: A State of Art study”, CoRR abs/1512.01043
(2015). 2014.
[8] Johan Bollen, Huina Mao, and Alberto Pepe. Modeling public mood and emotion:Twitter sentiment and socio-economic phenomena.
In ICWSM, 2011.
[9] L.Sigler. “Text Analytics and Sentiment Analysis. Available: http://www.clarabridge.com/text-analytics-vs-sentiment-analysis/ ,July
27, 2015, [Online; accessed 12-October-2016].
[10] M.Bouazizi, T.Ohtsuki, “Sentiment Analysis: from Binary to Multi-Class Classfication”, IEEE ICC 2016 SAC Social Networking,
2016.
[11] M.Moh, A.Gajjala, S.C.R.Gangireddy, T.S.Moh, “On Multi-Tier Sentiment Analysis using Supervised Machine Learning”,
IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2015.
[12] M.Skuza, A.Romanowski, “ Sentiment Analysis of Twitter Data within Big Data Distributed Environment for Stock Prediction”,
Computer Science and Information Systems (FedCSIS), 13-16 Sept. 2015.
[13] N.M.Sharef, H.M.Zin and S.Nadali, “Overview and Future Opportunities of Sentiment Analysis”, Journal of Computer Sciences ,
January 2016.
[14] S.Shaikh. “Flume Installation and Streaming twitter data using flume”, Available: https://www.eduonix.com/blog/bigdata-and-
hadoop/flume-installation-and-streaming-twitter-data-using-flume/, June 30, 2015, [Online; accessed 20-Oct-2016]
[15] “ Twitter Statistics “, Available at: http://www.statisticbrain.con/twitter-statistics/, 2013, [Online; accessed 2-January-2016]
[16] V.K. Singh, R. Piryani, A. Uddin, P.Waila, “Sentiment Analysis of Movie Reviews. A new Feature-based Heuristic for Aspect-level
Sentiment Classification.”, International Multi-Conference on Automation, Computing, Communication, Control and Compressed
Sensing (iMac4s), 22-23 March 2013.
[17] V.N.Khuc, C.Shivade, R.Ramnath, J.Ramanathan, “Towards Building Large-Scale Distributed Systems for Twitter Sentiment
Analysis”, 12 Proceedings of the 27th Annual ACM Symposium on Applied Computing, Pages 459-464, Trento, Italy, March 26-30,
2012
Wint Nyein Chan is a tutor of Computer University (Hpa-an). She received her M.C.Sc degree in computer
science from Computer University (Dawei), Myanmar, in 2011. She is a Ph.D candidate of UCSY. Her
research interests include big data, cloud computing, and machine learning.
Dr. Thandar Thein received her M.Sc. (computer science) and Ph.D. degrees in 1996 and 2004, respectively
from University of Computer Studies, Yangon (UCSY), Myanmar. She did post doctorate research in Korea
Aerospace University. She is currently a Pro-Rector of University of Computer Studies, Maubin. Her
research interests include cloud computing, mobile cloud computing, big data, digital forensic, security
engineering, and network security.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 9, September 2017
221 https://sites.google.com/site/ijcsis/
ISSN 1947-5500