Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text preprocessing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...aciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
Evidence Data Preprocessing for Forensic and Legal AnalyticsCSCJournals
Electronic evidential data pertaining to a legal case, or a digital forensic investigation can be enormous given the extensive electronic data generation mechanisms of companies and users coupled with cheap storage alternatives. Working with such volumes of data can be tasking, sometimes requiring matured analytical processes and a degree of automation. Once electronic data is collected post eDiscovery hold or post forensic acquisition, it can be framed into datasets for analytical research. This paper focuses on data preprocessing of such evidentiary datasets outlining best practices and potential pitfalls prior to undertaking analytical experiments.
2. an efficient approach for web query preprocessing edit satIAESIJEECS
The emergence of the Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, comments, and reviews on the web. To extract useful information from this raw data can be a very challenging task. Search engines play a critical role in these circumstances. User queries are becoming main issues for the search engines. Therefore a preprocessing operation is essential. In this paper, we present a framework for natural language preprocessing for efficient data retrieval and some of the required processing for effective retrieval such as elongated word handling, stop word removal, stemming, etc. This manuscript starts by building a manually annotated dataset and then takes the reader through the detailed steps of process. Experiments are conducted for special stages of this process to examine the accuracy of the system.
Top cited articles 2020 - Advanced Computational Intelligence: An Internation...aciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
Evidence Data Preprocessing for Forensic and Legal AnalyticsCSCJournals
Electronic evidential data pertaining to a legal case, or a digital forensic investigation can be enormous given the extensive electronic data generation mechanisms of companies and users coupled with cheap storage alternatives. Working with such volumes of data can be tasking, sometimes requiring matured analytical processes and a degree of automation. Once electronic data is collected post eDiscovery hold or post forensic acquisition, it can be framed into datasets for analytical research. This paper focuses on data preprocessing of such evidentiary datasets outlining best practices and potential pitfalls prior to undertaking analytical experiments.
2. an efficient approach for web query preprocessing edit satIAESIJEECS
The emergence of the Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, comments, and reviews on the web. To extract useful information from this raw data can be a very challenging task. Search engines play a critical role in these circumstances. User queries are becoming main issues for the search engines. Therefore a preprocessing operation is essential. In this paper, we present a framework for natural language preprocessing for efficient data retrieval and some of the required processing for effective retrieval such as elongated word handling, stop word removal, stemming, etc. This manuscript starts by building a manually annotated dataset and then takes the reader through the detailed steps of process. Experiments are conducted for special stages of this process to examine the accuracy of the system.
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Performance evaluation of decision tree classification algorithms using fraud...journalBEEI
Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, 5 decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.
Incentive Compatible Privacy Preserving Data Analysisrupasri mupparthi
Now a days, data management applications have evolved from pure storage and retrieval of information to finding interesting patterns and associations from large amounts of data. With the advancement of Internet and networking technologies, more and more computing applications, including data mining programs, are required to be conducted among multiple data sources that scattered around different spots, and to jointly conduct the computation to reach a common result. However, due to legal constraints and competition edges, privacy issues arise in the area of distributed data mining, thus leading to the interests from research community of both data mining.
In this project each party participates in a protocol to learn the output of some function f over the joint inputs of the parties. We mainly focus on the DNCC model instead of considering a probabilistic extension. Deterministic Non Cooperative Computation needs to be extended to include the possibility of collusion.
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
Community detection is becoming a highly demanded topic in social networking-based applications. It involves finding the maximum intraconnected and minimum inter-connected sub-graphs in given social networks. Many approaches have been developed for community’s detection and less of them have focused on the dynamical aspect of the social network. The decision of the community has to consider the pattern of changes in the social network and to be smooth enough. This is to enable smooth operation for other community detection dependent application. Unlike dynamical community detection Algorithms, this article presents a non-dominated aware searching Algorithm designated as non-dominated sorting based community detection with dynamical awareness (NDS-CD-DA). The Algorithm uses a non-dominated sorting genetic algorithm NSGA-II with two objectives: modularity and normalized mutual information (NMI). Experimental results on synthetic networks and real-world social network datasets have been compared with classical genetic with a single objective and has been shown to provide superiority in terms of the domination as well as the convergence. NDS-CD-DA has accomplished a domination percentage of 100% over dynamic evolutionary community searching DECS for almost all iterations.
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Performance evaluation of decision tree classification algorithms using fraud...journalBEEI
Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, 5 decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.
Incentive Compatible Privacy Preserving Data Analysisrupasri mupparthi
Now a days, data management applications have evolved from pure storage and retrieval of information to finding interesting patterns and associations from large amounts of data. With the advancement of Internet and networking technologies, more and more computing applications, including data mining programs, are required to be conducted among multiple data sources that scattered around different spots, and to jointly conduct the computation to reach a common result. However, due to legal constraints and competition edges, privacy issues arise in the area of distributed data mining, thus leading to the interests from research community of both data mining.
In this project each party participates in a protocol to learn the output of some function f over the joint inputs of the parties. We mainly focus on the DNCC model instead of considering a probabilistic extension. Deterministic Non Cooperative Computation needs to be extended to include the possibility of collusion.
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
Advanced Computational Intelligence: An International Journal (ACII) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of computational intelligence. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced computational intelligence concepts and establishing new collaborations in these areas.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
Community detection is becoming a highly demanded topic in social networking-based applications. It involves finding the maximum intraconnected and minimum inter-connected sub-graphs in given social networks. Many approaches have been developed for community’s detection and less of them have focused on the dynamical aspect of the social network. The decision of the community has to consider the pattern of changes in the social network and to be smooth enough. This is to enable smooth operation for other community detection dependent application. Unlike dynamical community detection Algorithms, this article presents a non-dominated aware searching Algorithm designated as non-dominated sorting based community detection with dynamical awareness (NDS-CD-DA). The Algorithm uses a non-dominated sorting genetic algorithm NSGA-II with two objectives: modularity and normalized mutual information (NMI). Experimental results on synthetic networks and real-world social network datasets have been compared with classical genetic with a single objective and has been shown to provide superiority in terms of the domination as well as the convergence. NDS-CD-DA has accomplished a domination percentage of 100% over dynamic evolutionary community searching DECS for almost all iterations.
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
Methods for Sentiment Analysis: A Literature Studyvivatechijri
Sentiment analysis is a trending topic, as everyone has an opinion on everything. The systematic
study of these opinions can lead to information which can prove to be valuable for many companies and
industries in future. A huge number of users are online, and they share their opinions and comments regularly,
this information can be mined and used efficiently. Various companies can review their own product using
sentiment analysis and make the necessary changes in future. The data is huge and thus it requires efficient
processing to collect this data and analyze it to produce required result.
In this paper, we will discuss the various methods used for sentiment analysis. It also covers various techniques
used for sentiment analysis such as lexicon based approach, SVM [10], Convolution neural network,
morphological sentence pattern model [1] and IML algorithm. This paper shows studies on various data sets
such as Twitter API, Weibo, movie review, IMDb, Chinese micro-blog database [9] and more. The paper shows
various accuracy results obtained by all the systems.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The prediction analysis is approach which may predict future possibilities. This research work is based on the
sarcasm detection from the text data. In the previous time SVM classification is applied for the sarcasm detection. The SVM
classifier classifies data based on the hyper plane which give low accuracy. To improve accuracy for sarcasm detection
logistic regression is applied during this work. The existing and proposed techniques are implemented in python and results
are analysed in terms of accuracy, execution time. The proposed approach has high accuracy and low execution time as
compared to SVM classifier for sarcasm detection.
Neural Network Based Context Sensitive Sentiment AnalysisEditor IJCATR
Social media communication is evolving more in these days. Social networking site is being rapidly increased in recent years, which provides platform to connect people all over the world and share their interests. The conversation and the posts available in social media are unstructured in nature. So sentiment analysis will be a challenging work in this platform. These analyses are mostly performed in machine learning techniques which are less accurate than neural network methodologies. This paper is based on sentiment classification using Competitive layer neural networks and classifies the polarity of a given text whether the expressed opinion in the text is positive or negative or neutral. It determines the overall topic of the given text. Context independent sentences and implicit meaning in the text are also considered in polarity classification.
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...AM Publications
Sentiment analysis is an interdisciplinary field between natural language processing, artificial intelligence and text mining. The main key of the sentiment analysis is the polarity that is meant by the sentiment is positive or negative (Chen, 2012). In this study using the method of classification support vector machine with the amount of data consumer reviews amounted to 648 data. The data obtained from consumer reviews from the marketplace with products sold is hand phone. The results of this study get 3 aspects that indicate sentiment analysis on the marketplace aspects of service, delivery and products. The slang dictionary used for the normation process is 552 words slang. This study compares the characteristic analysis to obtain the best classification result, because classification accuracy is influenced by characteristic analysis process. The result of comparison value from characteristic analysis between n-gram and TF-IDF by using Support Vector Machine method found that Unigram has the highest accuracy value, with accuracy value 80,87%. The results of this study explain that in the case of analysis sentiment at the aspect level with the comparison of characteristics with the classification model of support vector machine found that the analysis model of unigram character and classification of support vector machine is the best model
Graph embedding approach to analyze sentiments on cryptocurrencyIJECEIAES
This paper presents a comprehensive exploration of graph embedding techniques for sentiment analysis. The objective of this study is to enhance the accuracy of sentiment analysis models by leveraging the rich contextual relationships between words in text data. We investigate the application of graph embedding in the context of sentiment analysis, focusing on it is effectiveness in capturing the semantic and syntactic information of text. By representing text as a graph and employing graph embedding techniques, we aim to extract meaningful insights and improve the performance of sentiment analysis models. To achieve our goal, we conduct a thorough comparison of graph embedding with traditional word embedding and simple embedding layers. Our experiments demonstrate that the graph embedding model outperforms these conventional models in terms of accuracy, highlighting it is potential for sentiment analysis tasks. Furthermore, we address two limitations of graph embedding techniques: handling out-of-vocabulary words and incorporating sentiment shift over time. The findings of this study emphasize the significance of graph embedding techniques in sentiment analysis, offering valuable insights into sentiment analysis within various domains. The results suggest that graph embedding can capture intricate relationships between words, enabling a more nuanced understanding of the sentiment expressed in text data.
Over recent years, big data, a huge amount of structured and unstructured data is generated from social Network. There needs to extract the valulable information from the social big data. The traditional analytic platform needs to be scaled up for analyzing social big data in an efficient and timely manner. Sentiment Analysis of social big data helps the organizations by providing business insights with public opinion. Sentiment analysis based on multi-class classification scheme is oriented towards classification of text into more detailed sentiment labels. Multi-class classification with single tier architecture where single model is developed and entire labeled data is trained may increase the classification complexity. In this paper, multi-tier sentiment analysis system on big data analytics platform (MSABDP) is proposed to reduce the multi class classification complexity and efficiently analyze large scale data set. Hadoop is built for big data analytics and it is a good platform for being able to manage large data at scale and which can improve scalability and efficiency by adopting distributed processing environment since they have been implemented using a MapReduce framework and a Hadoop distributed storage (HDFS). The MSABDP is implemented by combining SentiStrength lexicon and learning based classification scheme with multi-tier architecture and run on big data analytics platform for being able to manage large data at scale. The proposed system collects a large amount of real Twitter data by using Apache Flume and the data was used for evaluation. The evaluation results have shown that the proposed multi class classification system with multi-tier architecture is able to significantly improve the classification accuracy over multi class classification based on single-tier architecture by 7%.
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm IJECEIAES
Social media provides convenience in communicating and can present twoway communication that allows companies to interact with their customer. Companies can use information obtained from social media to analyze how the communities respond to their services or products. The biggest challenge in processing information in social media like Twitter, is the unstructured sentences which could lead to incorrect text processing. However, this information is very important for companies’ survival. In this research, we proposed a method to extract keywords from tweets in Indonesian language, WPKE. We compared it with RAKE, an algorithm that is language independent and usually used for keyword extraction. Finally, we develop a method to do clustering to groups the topics of complaints with data set obtained from Twitter using the “komplain” hashtag. Our method can obtain the accuracy of 72.92% while RAKE can only obtain 35.42%.
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Halogenation process of chemical process industries
Text pre-processing of multilingual for sentiment analysis based on social network data
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 12, No. 1, February 2022, pp. 776~784
ISSN: 2088-8708, DOI: 10.11591/ijece.v12i1.pp776-784 776
Journal homepage: http://ijece.iaescore.com
Text pre-processing of multilingual for sentiment analysis based
on social network data
Neha Garg, Kamlesh Sharma
Department of Computer Science and Engineering, Manav Rachna International Institute of Research and Studies, Faridabad, India
Article Info ABSTRACT
Article history:
Received Mar 25, 2021
Revised Jul 15, 2021
Accepted Aug 1, 2022
Sentiment analysis (SA) is an enduring area for research especially in the
field of text analysis. Text pre-processing is an important aspect to perform
SA accurately. This paper presents a text processing model for SA, using
natural language processing techniques for twitter data. The basic phases for
machine learning are text collection, text cleaning, pre-processing, feature
extractions in a text and then categorize the data according to the SA
techniques. Keeping the focus on twitter data, the data is extracted in domain
specific manner. In data cleaning phase, noisy data, missing data,
punctuation, tags and emoticons have been considered. For pre-processing,
tokenization is performed which is followed by stop word removal (SWR).
The proposed article provides an insight of the techniques, that are used for
text pre-processing, the impact of their presence on the dataset. The accuracy
of classification techniques has been improved after applying text pre-
processing and dimensionality has been reduced. The proposed corpus can
be utilized in the area of market analysis, customer behaviour, polling
analysis, and brand monitoring. The text pre-processing process can serve as
the baseline to apply predictive analysis, machine learning and deep learning
algorithms which can be extended according to problem definition.
Keywords:
Code-switch
Linguistic-switching
Machine learning
Multilingual
Pre-processing
Sentiment analysis
This is an open access article under the CC BY-SA license.
Corresponding Author:
Neha Garg
Department of Computer Science and Engineering, Manav Rachna International Institute of Research and
Studies
Sector-43, Delhi Surajkund Road, Faridabad, Haryana, 121005, India
Email: gargsneha99@gmail.com
1. INTRODUCTION
Sentiment analysis (SA) is a field of natural language processing (NLP) for analyzing the opinion,
expression and attitude of the user towards an entity which can be an individual, a place, an event, product,
and issue or a discussion. The markets, firms and organizations utilize sentiment analysis to attract and
satisfy more customers. Political parties of different countries use the SA data to achieve the satisfaction of
the citizens or to know the feedback of its people about its administration and policies. SA offers more
challenging opportunities to develop new applications where sentiments of customers, viewers or users play a
vital role. The high level of precision of any decision can bring an organization at the top position among its
competitors and can make any organization fall from the roof [1] with the advanced technology and increasing
the use of gadgets, sentiments are not only obtained from orthodox feedbacks only but may also be in the form of
audios, videos, images, texts, micro-blogs, tweets, posts and comments on social media or emoticons.
Analyzing such heterogeneous data in real- time with appreciable level of precision has always been
a challenge. Most of the organizations use some third-party tool such as WorkForce, Glassdoor or sometimes
make a separate cell to analyze this data which provide them genuine sentiment analysis followed by an
accurate decision making. Inaccurate SA or poor level of accuracy in SA may prove to be disastrous for any
2. Int J Elec & Comp Eng ISSN: 2088-8708
Text pre-processing of multilingual for sentiment analysis based on social network data (Neha Garg)
777
organization in present time of cut throat market competition. A lot of improvement in SA and its
applications has been made in recent past.
2. LITERATURE REVIEW
More often the information available on social media sites is not well formed in structure like it has
abbreviations, miss-spelling, emoticons, slang language, and html tag which turn into a difficult task for
sentiment analysis [2], [3]. This type of data increases the dimensionality which may leads to bad
performance of sentiment analysis (SA) techniques [4]. To reduce the dimensionality, before applying SA
technique, the data has been passed through two phases i.e., data cleaning and pre-processing. The accuracy
of SA techniques majorly depends on these phases.
Many techniques have been used for data classification and data clustering. It is found that in most
of the cases, hybrid methods are used for classification and clustering based on the problem at hand [5].
Classification techniques are the sub-category of supervised techniques where some kind of labelled data is
used to train the dataset based on which outcome is provided. The degree of accuracy is highly dependent on
the accuracy of training data [6]. A semi-supervised technique, regularized least squares (RLS), was
introduced to represent unlabeled and labelled data on bipartite graph representation to analyze the sentiment
of documents as well as of words [7]. The blogs considering enterprise software products, politics and movie
reviews were considered and the technique produced 90% accuracy. Using supervised techniques such as support
vector machine (SVM), multinomial Naïve Bayes (MNB) and maximum entropy (Max Ent), Erik and Francine
[8] worked on multilingual unformatted dataset of English, Dutch and French languages and achieved 58%
accuracy. In this technique, the implementation of automated analysis provided impending benefits such as
word-of-mouth marketing, real-time response and neutral fetching of information. The smallness of dataset
did not help much in managing noise. Term-document matrix was employed for tri-factorization by making
some simple updates in rules where sentiment lexicons were used as the first set of constraints. Second set of
constraints retained domain specific supervisions [9].
Blogs having four different dimensions of discussion i.e., software product, politics, amazon product
reviews and movie reviews were worked upon. In a new approach rule base classification and machine
learning approaches were coupled together and 50% accuracy was achieved [2]. A compact semi-supervised
classifier was introduced in which classifiers were assigned according to the type of text. In this pipeline
approach, 10-fold cross validation was performed which resulted in higher efficiency but consumed more time.
Li et al. [10] had utilized SVM and NB techniques for reviews of Books, DVD, kitchen appliances
and electronic items. The data was split into two categories, personal and impersonal text as co-training data.
This, along with improving baseline accuracy [11], reduced classification noises and needed no proper
syntactical rules. Lack of labelled data was treated well by dividing imbalance population into multiple sets
of balanced population for sentiment classification and multiple iteration improved performance [12]. A new
approach was provided for active-learning in multi-domain framework. The term frequency method was used
to weigh the features along with LIBLINEAR SVM. This method was compared with spam mail filtering,
newsgroup classification and sentiment classification where human efforts were reduced by 33.2%, 42.9%
and 68.7% respectively.
To classify sentiments of micro blogs, a method was proposed in which machine learning was
combined with domain specific techniques and a system called opinion miner was introduced [13]. The
precision of opinion miner stood at 96%. Lack of stop-criteria to control iteration created unnecessary data
sampling. Numerical matrix representation was used for movie reviews and positive or negative reviews
were obtained with 89.5% accuracy with NB. SVM enhanced accuracy to 94% [1]. Sentiment analysis was
performed on reviews [14]. The performance of unigram with stop word settled at 82.9% and that of without
stop words came 83% with positive class. The same was higher for negative class.
Based on deep learning parameters, a model was proposed to address implicit and explicit sentiment
factors on text data and used word embedded representation in Vietnamese and English language [15]. The
proposed model proved to be better than traditional machine learning methods and provided results up to
87% of sentiment analysis in all available corpora.
Clustering techniques are used to cluster the data based on different parameters and to form groups
as per the requirements for business analytics for better decision making [16]. A techniques using k-means
clustering was proposed to cluster the document in combination with scoring technique [17]. The movie
review dataset was examined with 77.17% accuracy. Spam filtering technique was developed which was
based on the vector space model by using text clustering k-means and balanced iterative reducing and clustering
using hierarchies (BIRCH) technique [18]. K-means clustering provided better results for smaller data. k nearest
neighbours (k-NN) and BIRCH were shown to be good for larger datasets.
Venkatasubramanian et al. [19] proposed that without employing syntactic processing, stop words
could be used for classification. A semantic clustering algorithm, latent dirichlet allocation (LDA), was
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 776-784
778
applied and 66% for the polarity of reviews was observed. This method proved to be complementary for the
syntactic approach for sentiment analysis. For product reviews, a semi-supervised technique was given where
words and phrases belonging to similar domain were grouped under same feature set [20]. The expectation
maximization (EM) algorithm based on naïve Bayes was applied on five datasets and results were found to
be superior to the 13 baselines which presented current state of art solution. The authors have presented a
contextual multimodal method to analyze visual, textual and audio cues of approx 800 utterances for Persian
language and achieve a performance of around 91% [21].
An approach was suggested which was based on clustering with term frequency – inverse document
frequency (TF-IDF) weighing technique, voting mechanism and important term score [22]. This approach
was shown to be efficient, automated, accurate and faster than other supervised learning techniques. Further
modifications were introduced in clustering technique which worked without prior knowledge of training
dataset, human intervention and linguistic knowledge [23]. This automated method increased the accuracy of
baseline up to 76%.
Suresh and Raj [24] presented an aspect level method to find the sentiment of a particular brand
using twitter feed with the help of novel fizzy clustering and obtained accuracy of 76.4% with faster
execution. Combinational effects of clustering were shown along with sentiment analysis on review datasets
[25]. It was found that the K-means clustering algorithm provided better results than the balanced review
datasets. The newly designed weighing system was shown to be better than traditional ones. Sentiments of
movie reviews were analyzed by applying Word2Vec algorithm and K-means++ algorithm [26]. It was
argued that this approach could be used for sarcasm and question detection with additional modifications.
A comparative study for sentiment analysis approach adopted by researchers has been shown in
Table 1 by utilizing domain information and the languages for which the proposed work has been done is
discussed. To extend the understanding, the stages of pre-processing has been taken into account followed by
the machine learning techniques for analysis purpose and the accuracy that has been achieved so far.
Table 1. Comparative analysis of approaches adopted by researchers
Ref Domain Language Preprocessing Classification and
clustering Algorithm
Limitations Result
[27] Labeled product
review of four
domains: books,
DVDs, kitchen
appliances,
Electronics
English 1 Gram, 2
Gram, 1+2
Gram and 1
Gram +2 Gram
approach
Classifier level fusion
and feature approach
The classifier level fusion
faced unbalanced
performance for multi-
domain data.
80%
[8] Blogs, forum text,
review related to
products
English,
Dutch,
French
Unigram, Bi
Subjectivity,
Bigram
Cascade learner
classifier
Lack of training data,
Conflicts sentiments,
lack of pattern detections
Approx
70%
[28] Tweets which
contain some
noise, associated to
the mobile
operators
English Normalized
words
Annotated ensembles Word with different
spelling and
representation cannot be
mapped
47% F-
score
[26] Movie review English Word2Vec K-means/K-means++ Couldn’t enhance
accuracy of baseline
-
[29] Tweets Spanish,
English
Unigram,
Bigram
Cascade Classifiers Normalization techniques
and Slang dictionary can
be include for Spanish
69%
approx
[30] Tweets on Indian
political parties
Hindi Hashtag, URL,
Stopwords
removal,
negation
handling
Dictionary based,
naive Bayes and SVM
algorithm
Limited data size,
Emoticons are not include
in data set
Only text data included
78%
[31] Travel destination
reviews
Hindi,
Marathi
POS SVM missing concepts for Marathi
language was there, by
considering these the
accuracy can be enhanced
72%
[32] Social media text Hinglish Lowering Case,
Lemmatization,
Multiword
Grouping
Convolution neural
network (CNN),
long short term
frequency (LSTM),
convolution neural
network- bidirectional
LSTM (CNN-
BiLSTM)
Bilingual model required 83%
4. Int J Elec & Comp Eng ISSN: 2088-8708
Text pre-processing of multilingual for sentiment analysis based on social network data (Neha Garg)
779
Researchers have worked on sentiment analysis in English as well as in native languages like
Chinese, Spanish, Marathi, and Tamil. The work has been done in bilingual as well as multilingual like Hindi
and Hindi-English combination, Tamilish, and English+French. There are many such combinations used in
other pair of languages. However, combination of more than two languages has not been worked upon
extensively. Due to the difficulties encountered for various reasons for example ambiguous words,
inconsistent spelling, and part of speech and bag of words, sentiment analysis for the combination of Hindi-
Hinglish-English languages have been missing from.
The researcher has mainly extended their hands on social network data, blogs, product reviews, and
political reviews and subjective tweets. After the data collected by user from multiple sources, the pre-
processing of data is done with the help of commonly used pre-processing techniques. As far as pre-
processing steps are concern the most commonly used techniques are stop-word removal, lemmatization,
stemming, lowering cases, part of speech, and normalization, which has shown the great effect on the
accuracy of machine learning models.
The processed data is inputted to machine learning models for sentiment analysis purpose. For the
classification and clustering methods, researchers have utilized from the simplest algorithms like naïve
Bayes, SVM, K-means to the cascade classifiers and neural network techniques too. These techniques are
mostly restricted themselves to the monolingual to bilingual concept, has taken into account the text data
only and are restricted themselves with a small training dataset. Since the accuracy of these techniques is
restricted to the range of 70-80%. So, the more chances are lies to enhance the accuracy level. Present paper
is an attempt to highlight dataset creation from the tweets fetched from the twitter based on the hashtags. The
pre-processing techniques utilized by researchers, the impact of these techniques and the comparative
analysis of them are discussed in Table 1 (see in Appendix).
3. METHOD
To predict the human behaviour against an organization or entity or product, the sentiment analysis
techniques play a vital role. In today’s world where word-of-mouth, customer feedback, reviews and
opinions have become major issues, sentiment analysis (SA) and opinion mining are the two techniques
being used invariably [33], [34]. Subjective extraction of opinion related to an entity falls under opinion
mining whereas is SA complete text analysis is performed [35]. SA represents sentiment identification in a
text then followed by its analysis. The accuracy of decision making lies in the accuracy of sentiment analysis.
The complete procedure of SA of multiple events, parallel running sub-event on social media (SM) and their
influence on behaviour, reaction and even on thoughts of people have been discussed in [36]. The generalized
process of performing SA from social networking data is as follows:
3.1. The opinion of users
The first step of SA is to retrieve the information (opinionative words, and phrases) from the huge
amount of data available and to store this information in the required format [37]. Keeping main concern on
finding influence of events and sub-events, data has been picked from twitter using the hashtag information
to find events, user mentions and retweets to find the sub-event count. The information in the similar way is
extracted and represented [38]. Figure 1 representing the word cloud for the extracted hashtags.
Figure 1. Wordcloud for extracted
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 776-784
780
3.2. Data cleaning
Data cleaning process is to make the text noise-free. The noise in the data is in the form of missing
words or unwanted words. These missing or unwanted words are present in the disguise of some symbols
which are not handled by the code. Data cleaning comprises information filtering, removal of stop-
words/punctuation and tokenization process [36]. The Figure 2 given below provides the glimpse of the
format of data, which is collected from twitter, is being cleaned after applying the clean text function. After
the cleaning process data is converted into lowercase, tokenized and stopword have been removed and Figure 3
represents the process.
Figure 2. Clean text hashtags
3.3. Data pre-processing
Data pre-processing consists of tokenization, part of speech, normalization, lemmatization and
stemming of words where network among words is established [39]. Through this, all the relevant events and
sub-events are mapped onto connecting words for example: -thanking you, thank you can be mapped onto
thank. Here the stemming algorithm is applied on clean text. Figure 3 shows the stemmed dataset. The
Algorithm 1 has been introduced to represent the overall procedure used for text processing.
Figure 3. Tokenized and stemmed data
6. Int J Elec & Comp Eng ISSN: 2088-8708
Text pre-processing of multilingual for sentiment analysis based on social network data (Neha Garg)
781
Algorithm 1. Text preprocessing algorithm
for each tweet in Document do
perform tokenization by splitting the text
ignoring ‘.‘ & ‘;’
end for
for each token in Document do
tweet.remove_stopwords(english)
tweet.remove_punctuations
tweet.remove_ colon-symbol
tweet.replace_non_ASCII char with space
tweet.remove_emoticons.
end for
for each remaining word in dataset do
perform stemming using stemmer and store in Vector (Word List)
end for
3.4. Feature extraction
The feature extraction involves vectorization, bag of words, TF-IDF, N-Gram and word embedment
techniques. The feature extraction maps data in vector space. In this phase, different hashtags which have
similar meaning or which signify similar events are mapped together. For example, hashtags #chandrayan,
#Chandrayan1, #ISRO, #IndiaFails signifies the same event ‘chandrayan’ and is clubbed together. Similarly,
the tweets which have user-mentions similar to the events have also been clubbed together and considered as
sub events.
3.5. Choosing machine learning techniques
The machine learning approaches based on problem definition are applied to find more accurate
results. As per the discussion above, SVM and naïve Bayes classification techniques are mainly used when
users have some predefined rules which are needed to be followed. For clustering techniques, K-means and
Fuzzy logic algorithms are mainly employed due to their simplicity and accuracy [40]. However, some
people have also used semi-supervised and hybrid techniques as well [20], [26].
3.6. Output for predictive level of sentiment analysis
Base on the predictive polarity levels, the polarity of the text is calculated [41]. Thereafter, on this
calculated level of the polarity, sentiment is fixed as negative sentiment, positive sentiment or a neutral one.
The process of performing sentiment analysis and then finding the polarity can be summarized as shown in
Figure 4.
Figure 4. Method for sentiment analysis
4. RESULT
The results after applying each of pre-processing techniques namely tokenization, stopword removal
and stemming have been shown in the Table 2 and plotted against the dataset size in Figure 5. Different
values of dataset have been taken for testing the behaviour based on pre-processing stages. It has been
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 776-784
782
demonstrated that stopword removal and stemming are the compulsory parts for pre-processing. It has also
been shown that the stopword removal has reduced the dimensionality of the text handsomely. From the
Figure 5, it has been concluded that the application of pre-processing techniques has a positive impact on the
number of terms selected. The results represent that negligible difference is shown in terms of numbers
selected by stemming.
Table 2. Statistics for preprocessing
Dataset (Tweets) Tokenization Stop-word removal Stemming
2,000 12,000 10,000 9,000
5,000 30,000 22,000 20,000
9,000 53,000 45,000 42,000
2,000 70,000 57,000 55,000
Figure 5. Effect of preprocessing
5. CONCLUSION
The present article discusses the supervised and unsupervised SA techniques. In this paper, the basic
techniques of data extraction followed by the data cleaning and data pre-processing techniques have been
presented. Three basic techniques for pre-processing i.e. tokenization, stopword removal and stemming have
been introduced on twitter dataset. From the results, it can be concluded that pre-processing bears a huge
impact to reduce the dimensionality of data which in-turn results in a high performing and more accurate SA
techniques. The results prove that the stopword removal technique removes unnecessary words from the
dataset and thereby improving accuracy. The same technique may be applied to the different dataset
belonging to different domain. One can improve upon the list of stopword as per the domain and achieve
better accuracy.
REFERENCES
[1] A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentimental reviews using machine learning techniques,” Procedia
Computer Science, vol. 57, pp. 821-829, 2015, doi: 10.1016/j.procs.2015.07.523.
[2] T. Baldwin, P. Cook, M. Lui, A. Mackinlay, and L. Wang, “How noisy social media text, how diffrnt social media sources ?,”
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013, pp. 356-364.
[3] A. Sarker and G. Gonzalez, “Data, tools and resources for mining social media drug chatter,” Proceedings of the Fifth Workshop
on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016), 2016, pp. 99-107.
[4] C. S. P. Kumar and L. D. D. Babu, “Novel text preprocessing framework for sentiment analysis,” Smart Innovation, Systems and
Technologies, vol. 105, 2019, pp. 309-317, doi: 10.1007/978-981-13-1927-3_33.
[5] G. Neha, “A review on the study of big data and big data analytics,” 3rd International Conference on Computers and
Management (ICCM 2017), 2017, pp. 276-285.
[6] F. Hemmatian and M. K. Sohrabi, “A survey on classification techniques for opinion mining and sentiment analysis,” Artificial
Intelligence Review, vol. 52, pp. 1495-1545, 2019, doi: 10.1007/s10462-017-9599-6.
[7] V. Sindhwani and P. Melville, “Document-word co-regularization for semi-supervised sentiment analysis,” 2008 Eighth IEEE
International Conference on Data Mining, 2008, pp. 1025-1030, doi: 10.1109/ICDM.2008.113.
8. Int J Elec & Comp Eng ISSN: 2088-8708
Text pre-processing of multilingual for sentiment analysis based on social network data (Neha Garg)
783
[8] B. Erik and M. M. Francine, “A machine learning approach to sentiment analysis in multilingual web text,” Information Retrieval,
vol. 12, no. 5, pp. 526-558, 2008, doi: 10.1007/s10791-008-9070-z.
[9] J. Blitzer, “Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification,” Proceedings of
the 45th Annual Meeting of the Association of Computational Linguistics, 2006, pp. 440-447.
[10] S. Li, C. R. Huang, G. Zhou, and S. Y. M. Lee, “Employing personal/impersonal views in supervised and semi-supervised
sentiment classification,” Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010,
pp. 414-423.
[11] X. Zhu, “Semi-supervised learning literature survey contents,” Sci. York, vol. 10, no. 1530, pp. 10, 2008, doi: 10.1.1.146.2352.
[12] S. Li, Z. Wang, G. Zhou, and S. Y. M. Lee, “Semi-supervised learning for imbalanced sentiment classification,” IJCAI
International Joint Conference on Artificial Intelligence, 2011, pp. 1826-1831, doi: 10.5591/978-1-57735-516-8/IJCAI11-306.
[13] D. B. Liang PW, “Opinion mining on social media data,” 2013 IEEE 14th International Conference on Mobile Data
Management, 2013, pp. 91-96, doi: 10.1109/MDM.2013.73.
[14] P. H. Shahana and B. Omman, “Evaluation of features on sentimental analysis,” Procedia Computer Science, vol. 46,
pp. 1585-1592, 2015, doi: 10.1016/j.procs.2015.02.088.
[15] T. K. Tran and T. T. Phan, “Deep learning application to ensemble learning-the simple, but effective, approach to sentiment
classifying,” Applied Sciences, vol. 9, no. 13, 2019, doi: 10.3390/app9132760.
[16] F. Musumeci et al., “An overview on application of machine learning techniques in optical networks,” IEEE Communications
Surveys & Tutorials, vol. 21, no. 2, pp. 1383-1408, 2018, doi: 10.1109/COMST.2018.2880039.
[17] G. Li and F. Liu, “A clustering-based approach on sentiment analysis,” 2010 IEEE International Conference on Intelligent
Systems and Knowledge Engineering, 2010, pp. 331-337, doi: 10.1109/ISKE.2010.5680859.
[18] M. Basavaraju and D. R. Prabhakar, “A novel method of spam mail detection using text based clustering approach,” International
Journal of Computer Applications, vol. 5, no. 4, pp. 15-25, 2010, doi: 10.5120/906-1283.
[19] S. Venkatasubramanian, A. Veilumuthu, A. Krishnamurthy, C. E. V. Madhavan, K. Nath, and S. Arvindam, “A non-syntactic
approach for text sentiment classification with stopwords,” Proceedings of the 20th International Conference Companion on
World Wide Web, 2011, pp. 137-138, doi: 10.1145/1963192.1963262.
[20] Z. Zhai, B. Liu, H. Xu, and P. Jia, “Clustering product features for opinion mining,” WSDM '11: Proceedings of the fourth ACM
international conference on Web search and data mining, 2011, pp. 347-354, doi: 10.1145/1935826.1935884.
[21] K. Dashtipour, M. Gogate, E. Cambria, and A. Hussain, “A novel context-aware multimodal framework for persian sentiment
analysis,” Neurocomputing, vol. 457, pp. 377-388, 2021, doi: 10.1016/j.neucom.2021.02.020.
[22] G. Li and F. Liu, “Application of a clustering method on sentiment analysis,” Journal of Information Science, vol. 38, no. 2,
pp. 127-139, 2012, doi: 10.1177/0165551511432670.
[23] G. Li and F. Liu, “Sentiment analysis based on clustering: A framework in improving accuracy and recognizing neutral opinions,”
Applied Intelligence volume, vol. 40, no. 3, pp. 441-452, 2014, doi: 10.1007/s10489-013-0463-3.
[24] H. Suresh and S. G. Raj, “An unsupervised fuzzy clustering method for twitter sentiment analysis,” 2016 International
Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), 2016, pp. 80-85, doi:
10.1109/CSITSS.2016.7779444.
[25] B. Ma, H. Yuan, and Y. Wu, “Exploring performance of clustering methods on document sentiment analysis,”Journal of
Information Science, vol. 43, no. 1, pp. 54-74, 2017, doi: 10.1177/0165551515617374.
[26] K. Chakraborty, S. Bhattacharyya, R. Bag, and A. E. Hassanien, “Comparative sentiment analysis on a set of movie reviews using
deep learning approach,” International Conference on Advanced Machine Learning Technologies and Applications, vol. 723,
2018, pp. 311-318, doi: 10.1007/978-3-319-74690-6_31.
[27] S. Li and C. Zong, “Multi-domain sentiment classification,” Proceedings of ACL-08: HLT, Short Papers, 2008, pp. 257-260,
doi: 10.3115/1557690.1557765.
[28] A. Celikyilmaz, D. Hakkani-Tür, and J. Feng, “Probabilistic model-based sentiment analysis of twitter messages,” 2010 IEEE
Spoken Language Technology Workshop, 2010, pp. 79-84, doi: 10.1109/SLT.2010.5700826.
[29] A. Balahur and J. M. P. Ortega, “Sentiment analysis system adaptation for multilingual processing: The case of tweets,”
Information Processing & Management, vol. 51, no. 4, pp. 547-556, 2015, doi: 10.1016/j.ipm.2014.10.004.
[30] P. Sharma and T. S. Moh, “Prediction of Indian election using sentiment analysis on Hindi Twitter,” 2016 IEEE International
Conference on Big Data (Big Data), 2016, pp. 1966-1971, doi: 10.1109/BigData.2016.7840818.
[31] Balamurali AR, A. Joshi, and P. Bhattacharyya, “Cross-lingual sentiment analysis for indian languages using linked WordNets,”
Proceedings of COLING 2012, vol. 1, 2012, pp. 73-82.
[32] T. T. Sasidhar, B. Premjith, and K. P. Soman, “Emotion detection in hinglish(hindi+english) code-mixed social media text,”
Procedia Computer Science, vol. 171, pp. 1346-1352, 2020, doi: 10.1016/j.procs.2020.04.144.
[33] N. Rizk, A. Ebada, and E. Nasr, “Investigating mobile application‘Requirements evolution through sentiment analysis of users’
reviews,” 2015 11th International Computer Engineering Conference (ICENCO), 2015, pp. 123-130, doi:
10.1109/ICENCO.2015.7416336.
[34] B. Liu, "Sentiment analysis and opinion mining," Morgan & amp, Claypool Publishers, 2012.
[35] M. Tsytsarau and T. Palpanas, “Survey on mining subjective data on the web,” Data Mining and Knowledge Discovery, vol. 24,
pp. 478-514, 2011, doi: 10.1007/s10618-011-0238-6.
[36] N. Garg and K. Sharma, “Machine learning in text analysis, in handbook of research on emerging trends and applications of
machine learning,” IGI Global, 2020, pp. 383-402, doi: 10.4018/978-1-5225-9643-1.
[37] C. L. Philip Chen and C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big
Data,” Information Sciences, vol. 275, pp. 314-347, 2014, doi: 10.1016/j.ins.2014.01.015.
[38] N. Garg and K. Sharma, “Sentiment analysis of events on social web,” International Journal of Innovative Technology and
Exploring Engineering (IJITEE), vol. 9, no. 4, pp. 1232-1238, 2020, doi: 10.35940/ijitee.f3946.049620.
[39] M. J. Denny and A. Spirling, “Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do
about it,” Political Analysis, vol. 26, no. 2, pp. 168-189, 2018, doi: 10.1017/pan.2017.44.
[40] S. Yogesh, P. Bhatia, and O. Sangwan, “A review of studies on machine learning techniques,” International Journal of Computer
Science and Security, vol. 1, no. 1, pp. 70-84, 2007.
[41] B. Bringmann, M. Berlingerio, F. Bonchi, and A. Gionis, “Learning and predicting the evolution of social networks,” IEEE
Intelligent Systems, vol. 25, no. 4, pp. 26-35, 2010, doi: 10.1109/MIS.2010.91.
9. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 776-784
784
BIOGRAPHIES OF AUTHORS
Neha Garg is currently working as an Assisstant Professor, MRIIRS, Faridabad,
India (around 10 years teaching experience), M.Tech from Banasthali Vidyapith, Near
Niwai, Rajasthan and Ph.D. pursuing in computer science and engineering from MRIIRS,
Faridabad, India. She has recently published a book “Analysis and Design of Algorithms- a
Beginner Hope” with BPB Publication House. She has supervised and Guided research
projects of B.Tech She have published research papers and book chapter in field of big data,
and data mining. Her research interests are in the area of “Big Data Analytics” and “Machine
Learning”. She can be contacted at email: gargsneha99@gmail.com.
Kamlesh Sharma is currently working as an Associate Professor, MRIIRS,
Faridabad, India (more than 14 years teaching experience). MCA, M.Tech from MDU
University and Ph.D. in Computer Science and Engineering from Lingaya`s Vidyapeeth,
India. Is currently Supervising five Ph.D. scholars. She has also supervised and guided
research projects of M.Tech, B.Tech and application based projects for different
competitions. She is also associated with four Govt. research projects in field of health
recommender system, IOT, machine learning, AI and NLP. She has published more than 45
research papers in field of NLP, IOT, big data, green computing and data mining in reputed
Journal (Web of Science, Scopus, UGC, Elsevier) and Conferences (ACM, IEEE). Her
research area “Natural Language Processing” is based on innovative idea of reducing the
mechanized efforts and adapting the software to Hindi dialect. She can be contacted at email:
kamlesh.fet@mriu.edu.in.