In recent times, the trend of online shopping through e-commerce stores and websites has grown to a huge extent. Whenever a product is purchased on an e-commerce platform, people leave their reviews about the product. These reviews are very helpful for the store owners and the product’s manufacturers for the betterment of their work process as well as product quality. An automated system is proposed in this work that operates on two datasets D1 and D2 obtained from Amazon. After certain preprocessing steps, N-gram and word embedding-based features are extracted using term frequency-inverse document frequency (TF-IDF), bag of words (BoW) and global vectors (GloVe), and Word2vec, respectively. Four machine learning (ML) models support vector machines (SVM), logistic regression (RF), logistic regression (LR), multinomial Naïve Bayes (MNB), two deep learning (DL) models convolutional neural network (CNN), long-short term memory (LSTM), and standalone bidirectional encoder representations (BERT) are used to classify reviews as either positive or negative. The results obtained by the standard ML, DL models and BERT are evaluated using certain performance evaluation measures. BERT turns out to be the best-performing model in the case of D1 with an accuracy of 90% on features derived by word embedding models while the CNN provides the best accuracy of 97% upon word embedding features in the case of D2. The proposed model shows better overall performance on D2 as compared to D1.
Grow Your Telecom Business with AI-backed Knowledge Base SoftwareRounakpreetSingh
Telecoms are struggling to grow with rising competition. Thus, they need to standardize brand experience. Check out how a knowledge base software can help.
IMAGE CONTENT IN LOCATION-BASED SHOPPING RECOMMENDER SYSTEMS FOR MOBILE USERSacijjournal
This paper shows how image content can be used to realize a shopping recommender system for intuitively supporting mobile users in decision making. A mobile user equipped with a camera enabled smart phone combined with Global Positioning System (GPS) capabilities would benefit in using a recommender system for mobile users. This recommender system is queried by image sent by a smart phone together with the smart phone’s GPS coordinates then the system returns a recommended retail shop together with its GPS coordinates, the image similar to the query image and other items on special
offer. This recommender system shows a drastic reduction if not elimination of usage of text by mobile users using mobile devices when accessing the system. This paper presents the proposed recommender system and the simulated results of the recommender system. In summary the main contribution of this
paper is to show how image retrieval, image content and camera enabled smart mobile device with GPS capabilities can be used to realize a location-based shopping recommender system for mobile users.
Grow Your Telecom Business with AI-backed Knowledge Base SoftwareRounakpreetSingh
Telecoms are struggling to grow with rising competition. Thus, they need to standardize brand experience. Check out how a knowledge base software can help.
IMAGE CONTENT IN LOCATION-BASED SHOPPING RECOMMENDER SYSTEMS FOR MOBILE USERSacijjournal
This paper shows how image content can be used to realize a shopping recommender system for intuitively supporting mobile users in decision making. A mobile user equipped with a camera enabled smart phone combined with Global Positioning System (GPS) capabilities would benefit in using a recommender system for mobile users. This recommender system is queried by image sent by a smart phone together with the smart phone’s GPS coordinates then the system returns a recommended retail shop together with its GPS coordinates, the image similar to the query image and other items on special
offer. This recommender system shows a drastic reduction if not elimination of usage of text by mobile users using mobile devices when accessing the system. This paper presents the proposed recommender system and the simulated results of the recommender system. In summary the main contribution of this
paper is to show how image retrieval, image content and camera enabled smart mobile device with GPS capabilities can be used to realize a location-based shopping recommender system for mobile users.
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
In today’s e-commerce in Nigeria, customers access online stores to browse through and place orders for products or services via the internet on their devices while some are skeptical due to the experiences from what I ordered versus what I got syndrome. Though this method of business has flourished to an extent, it greatly faces a crucial challenge in unravelling consumer’s sentiments particularly in the realm of product reviews. This deficiency inhibits most e-commerce platforms in Nigeria from gaining effective sensitivity into users’ preferences, thus, limiting their ability to boost their product recommendations and, understand and improve customers’ experiences. This research aims to bridge this gap by developing a sentiment analyzer of product in the e-commerce domain using Natural language processing and machine learning approach. The model will analyze the customers’ reviews based on positive or negative. The experimental data was collected from kaggle.com. Stemming and lemmatization were approaches used for cleaning the collected data. Features were extracted and transformed using CountVectorizer. Gaussian Naïve Bayes classifier was used as the machine learning technique. The model’s performance was evaluated and it returned 90% of accuracy, hence, an efficient and reliable model for product review sentiment analysis is developed.
As consumers’ service expectations climb in the Digital Age, providers rely on mobile technology to work faster and smarter, by Don Talend, brand storytelling, content strategy and demand generation expert, field sales and aftermarket service industries
Adequate Solution for Business to Customer (B2C) by an ongoing Mobile SystemAM Publications
Today’s the era of mobile science and mobile secure technology in which business needs a solution that
impact on current scenario to their customers for making the easy purchase of the goods and items. Mobile system
based application have the design formation in the different developing languages which developed the systematic
arrangement of selling goods and particular as orders have been made by their premium customer, those could
consider as the adequate management system for business to customer. This solution would resolve by the design
views of application over mobile technology those helps for placing orders of goods as customer opted on the
application system.
The Digital Transformation Lifecycle contains a set of phases in which each phase is interconnected with the before phase. It gives a highly useful sequence of phases and instructions Digital Transformation performer, executant.
The era of the Mobile Enterprise is here to stay. Mobile penetration is unprecedented: subscribers are growing four times faster than the world’s population. To remain successful, CIOs must continuously investigate, prioritize, fund, adopt, and integrate multiple new technologies to support vital organizational objectives.
21st century has been defined by application of and advancement in information technology. Information technology has become an integral part of our daily life. According to Information Technology Association of America, information technology is defined as “the study, design, development, application, implementation, support or management of computer-based information systems.”
Information technology has served as a big change agent in different aspect of business and society. It has proven game changer in resolving economic and social issues.
Advancement and application of information technology are ever changing.
Increasing effectiveness, efficiency & mobility of field employees with wirel...JTOX
The development of mobile wireless devices provide great potential to support work processes of field employees and has an enormous impact on the development of other strategic applications for businesses. This presentation will investigate how both the mobile worker and stationary offices can benefit from using wireless applications to increase effectiveness and efficiency of the overall organization. Simultaneously contemporary conceptual frameworks will be presented
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
In this study, a microstrip patch antenna that works at 3.6 GHz was built and tested to see how well it works. In this work, Rogers RT/Duroid 5880 has been used as the substrate material, with a dielectric permittivity of 2.2 and a thickness of 0.3451 mm; it serves as the base for the examined antenna. The computer simulation technology (CST) studio suite is utilized to show the recommended antenna design. The goal of this study was to get a more extensive transmission capacity, a lower voltage standing wave ratio (VSWR), and a lower return loss, but the main goal was to get a higher gain, directivity, and efficiency. After simulation, the return loss, gain, directivity, bandwidth, and efficiency of the supplied antenna are found to be -17.626 dB, 9.671 dBi, 9.924 dBi, 0.2 GHz, and 97.45%, respectively. Besides, the recreation uncovered that the transfer speed side-lobe level at phi was much better than those of the earlier works, at -28.8 dB, respectively. Thus, it makes a solid contender for remote innovation and more robust communication.
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
In this paper, snake optimization algorithm (SOA) is used to find the optimal gains of an enhanced controller for controlling congestion problem in computer networks. M-file and Simulink platform is adopted to evaluate the response of the active queue management (AQM) system, a comparison with two classical controllers is done, all tuned gains of controllers are obtained using SOA method and the fitness function chose to monitor the system performance is the integral time absolute error (ITAE). Transient analysis and robust analysis is used to show the proposed controller performance, two robustness tests are applied to the AQM system, one is done by varying the size of queue value in different period and the other test is done by changing the number of transmission control protocol (TCP) sessions with a value of ± 20% from its original value. The simulation results reflect a stable and robust behavior and best performance is appeared clearly to achieve the desired queue size without any noise or any transmission problems.
More Related Content
Similar to Amazon products reviews classification based on machine learning, deep learning methods and BERT
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
In today’s e-commerce in Nigeria, customers access online stores to browse through and place orders for products or services via the internet on their devices while some are skeptical due to the experiences from what I ordered versus what I got syndrome. Though this method of business has flourished to an extent, it greatly faces a crucial challenge in unravelling consumer’s sentiments particularly in the realm of product reviews. This deficiency inhibits most e-commerce platforms in Nigeria from gaining effective sensitivity into users’ preferences, thus, limiting their ability to boost their product recommendations and, understand and improve customers’ experiences. This research aims to bridge this gap by developing a sentiment analyzer of product in the e-commerce domain using Natural language processing and machine learning approach. The model will analyze the customers’ reviews based on positive or negative. The experimental data was collected from kaggle.com. Stemming and lemmatization were approaches used for cleaning the collected data. Features were extracted and transformed using CountVectorizer. Gaussian Naïve Bayes classifier was used as the machine learning technique. The model’s performance was evaluated and it returned 90% of accuracy, hence, an efficient and reliable model for product review sentiment analysis is developed.
As consumers’ service expectations climb in the Digital Age, providers rely on mobile technology to work faster and smarter, by Don Talend, brand storytelling, content strategy and demand generation expert, field sales and aftermarket service industries
Adequate Solution for Business to Customer (B2C) by an ongoing Mobile SystemAM Publications
Today’s the era of mobile science and mobile secure technology in which business needs a solution that
impact on current scenario to their customers for making the easy purchase of the goods and items. Mobile system
based application have the design formation in the different developing languages which developed the systematic
arrangement of selling goods and particular as orders have been made by their premium customer, those could
consider as the adequate management system for business to customer. This solution would resolve by the design
views of application over mobile technology those helps for placing orders of goods as customer opted on the
application system.
The Digital Transformation Lifecycle contains a set of phases in which each phase is interconnected with the before phase. It gives a highly useful sequence of phases and instructions Digital Transformation performer, executant.
The era of the Mobile Enterprise is here to stay. Mobile penetration is unprecedented: subscribers are growing four times faster than the world’s population. To remain successful, CIOs must continuously investigate, prioritize, fund, adopt, and integrate multiple new technologies to support vital organizational objectives.
21st century has been defined by application of and advancement in information technology. Information technology has become an integral part of our daily life. According to Information Technology Association of America, information technology is defined as “the study, design, development, application, implementation, support or management of computer-based information systems.”
Information technology has served as a big change agent in different aspect of business and society. It has proven game changer in resolving economic and social issues.
Advancement and application of information technology are ever changing.
Increasing effectiveness, efficiency & mobility of field employees with wirel...JTOX
The development of mobile wireless devices provide great potential to support work processes of field employees and has an enormous impact on the development of other strategic applications for businesses. This presentation will investigate how both the mobile worker and stationary offices can benefit from using wireless applications to increase effectiveness and efficiency of the overall organization. Simultaneously contemporary conceptual frameworks will be presented
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
In this study, a microstrip patch antenna that works at 3.6 GHz was built and tested to see how well it works. In this work, Rogers RT/Duroid 5880 has been used as the substrate material, with a dielectric permittivity of 2.2 and a thickness of 0.3451 mm; it serves as the base for the examined antenna. The computer simulation technology (CST) studio suite is utilized to show the recommended antenna design. The goal of this study was to get a more extensive transmission capacity, a lower voltage standing wave ratio (VSWR), and a lower return loss, but the main goal was to get a higher gain, directivity, and efficiency. After simulation, the return loss, gain, directivity, bandwidth, and efficiency of the supplied antenna are found to be -17.626 dB, 9.671 dBi, 9.924 dBi, 0.2 GHz, and 97.45%, respectively. Besides, the recreation uncovered that the transfer speed side-lobe level at phi was much better than those of the earlier works, at -28.8 dB, respectively. Thus, it makes a solid contender for remote innovation and more robust communication.
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
In this paper, snake optimization algorithm (SOA) is used to find the optimal gains of an enhanced controller for controlling congestion problem in computer networks. M-file and Simulink platform is adopted to evaluate the response of the active queue management (AQM) system, a comparison with two classical controllers is done, all tuned gains of controllers are obtained using SOA method and the fitness function chose to monitor the system performance is the integral time absolute error (ITAE). Transient analysis and robust analysis is used to show the proposed controller performance, two robustness tests are applied to the AQM system, one is done by varying the size of queue value in different period and the other test is done by changing the number of transmission control protocol (TCP) sessions with a value of ± 20% from its original value. The simulation results reflect a stable and robust behavior and best performance is appeared clearly to achieve the desired queue size without any noise or any transmission problems.
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
Vehicular ad-hoc networks (VANETs) are wireless-equipped vehicles that form networks along the road. The security of this network has been a major challenge. The identity-based cryptosystem (IBC) previously used to secure the networks suffers from membership authentication security features. This paper focuses on improving the detection of intruders in VANETs with a modified identity-based cryptosystem (MIBC). The MIBC is developed using a non-singular elliptic curve with Lagrange interpolation. The public key of vehicles and roadside units on the network are derived from number plates and location identification numbers, respectively. Pseudo-identities are used to mask the real identity of users to preserve their privacy. The membership authentication mechanism ensures that only valid and authenticated members of the network are allowed to join the network. The performance of the MIBC is evaluated using intrusion detection ratio (IDR) and computation time (CT) and then validated with the existing IBC. The result obtained shows that the MIBC recorded an IDR of 99.3% against 94.3% obtained for the existing identity-based cryptosystem (EIBC) for 140 unregistered vehicles attempting to intrude on the network. The MIBC shows lower CT values of 1.17 ms against 1.70 ms for EIBC. The MIBC can be used to improve the security of VANETs.
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
Understanding the primary factors of internet banking (IB) acceptance is critical for both banks and users; nevertheless, our knowledge of the role of users’ perceived risk and trust in IB adoption is limited. As a result, we develop a conceptual model by incorporating perceived risk and trust into the technology acceptance model (TAM) theory toward the IB. The proper research emphasized that the most essential component in explaining IB adoption behavior is behavioral intention to use IB adoption. TAM is helpful for figuring out how elements that affect IB adoption are connected to one another. According to previous literature on IB and the use of such technology in Iraq, one has to choose a theoretical foundation that may justify the acceptance of IB from the customer’s perspective. The conceptual model was therefore constructed using the TAM as a foundation. Furthermore, perceived risk and trust were added to the TAM dimensions as external factors. The key objective of this work was to extend the TAM to construct a conceptual model for IB adoption and to get sufficient theoretical support from the existing literature for the essential elements and their relationships in order to unearth new insights about factors responsible for IB adoption.
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
The smart antennas are broadly used in wireless communication. The least mean square (LMS) algorithm is a procedure that is concerned in controlling the smart antenna pattern to accommodate specified requirements such as steering the beam toward the desired signal, in addition to placing the deep nulls in the direction of unwanted signals. The conventional LMS (C-LMS) has some drawbacks like slow convergence speed besides high steady state fluctuation error. To overcome these shortcomings, the present paper adopts an adaptive fuzzy control step size least mean square (FC-LMS) algorithm to adjust its step size. Computer simulation outcomes illustrate that the given model has fast convergence rate as well as low mean square error steady state.
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
This paper presents the design and implementation of a forest fire monitoring and warning system based on long range (LoRa) technology, a novel ultra-low power consumption and long-range wireless communication technology for remote sensing applications. The proposed system includes a wireless sensor network that records environmental parameters such as temperature, humidity, wind speed, and carbon dioxide (CO2) concentration in the air, as well as taking infrared photos.The data collected at each sensor node will be transmitted to the gateway via LoRa wireless transmission. Data will be collected, processed, and uploaded to a cloud database at the gateway. An Android smartphone application that allows anyone to easily view the recorded data has been developed. When a fire is detected, the system will sound a siren and send a warning message to the responsible personnel, instructing them to take appropriate action. Experiments in Tram Chim Park, Vietnam, have been conducted to verify and evaluate the operation of the system.
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
Cognitive radio is a smart radio that can change its transmitter parameter based on interaction with the environment in which it operates. The demand for frequency spectrum is growing due to a big data issue as many Internet of Things (IoT) devices are in the network. Based on previous research, most frequency spectrum was used, but some spectrums were not used, called spectrum hole. Energy detection is one of the spectrum sensing methods that has been frequently used since it is easy to use and does not require license users to have any prior signal understanding. But this technique is incapable of detecting at low signal-to-noise ratio (SNR) levels. Therefore, the wavelet-based sensing is proposed to overcome this issue and detect spectrum holes. The main objective of this work is to evaluate the performance of wavelet-based sensing and compare it with the energy detection technique. The findings show that the percentage of detection in wavelet-based sensing is 83% higher than energy detection performance. This result indicates that the wavelet-based sensing has higher precision in detection and the interference towards primary user can be decreased.
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
In this paper, we present the design of a new wide dual-band bandstop filter (DBBSF) using nonuniform transmission lines. The method used to design this filter is to replace conventional uniform transmission lines with nonuniform lines governed by a truncated Fourier series. Based on how impedances are profiled in the proposed DBBSF structure, the fractional bandwidths of the two 10 dB-down rejection bands are widened to 39.72% and 52.63%, respectively, and the physical size has been reduced compared to that of the filter with the uniform transmission lines. The results of the electromagnetic (EM) simulation support the obtained analytical response and show an improved frequency behavior.
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
A distributed denial of service (DDoS) attack is where one or more computers attack or target a server computer, by flooding internet traffic to the server. As a result, the server cannot be accessed by legitimate users. A result of this attack causes enormous losses for a company because it can reduce the level of user trust, and reduce the company’s reputation to lose customers due to downtime. One of the services at the application layer that can be accessed by users is a web-based lightweight directory access protocol (LDAP) service that can provide safe and easy services to access directory applications. We used a deep learning approach to detect DDoS attacks on the CICDDoS 2019 dataset on a complex computer network at the application layer to get fast and accurate results for dealing with unbalanced data. Based on the results obtained, it is observed that DDoS attack detection using a deep learning approach on imbalanced data performs better when implemented using synthetic minority oversampling technique (SMOTE) method for binary classes. On the other hand, the proposed deep learning approach performs better for detecting DDoS attacks in multiclass when implemented using the adaptive synthetic (ADASYN) method.
The appearance of uncertainties and disturbances often effects the characteristics of either linear or nonlinear systems. Plus, the stabilization process may be deteriorated thus incurring a catastrophic effect to the system performance. As such, this manuscript addresses the concept of matching condition for the systems that are suffering from miss-match uncertainties and exogeneous disturbances. The perturbation towards the system at hand is assumed to be known and unbounded. To reach this outcome, uncertainties and their classifications are reviewed thoroughly. The structural matching condition is proposed and tabulated in the proposition 1. Two types of mathematical expressions are presented to distinguish the system with matched uncertainty and the system with miss-matched uncertainty. Lastly, two-dimensional numerical expressions are provided to practice the proposed proposition. The outcome shows that matching condition has the ability to change the system to a design-friendly model for asymptotic stabilization.
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
Many systems, including digital signal processors, finite impulse response (FIR) filters, application-specific integrated circuits, and microprocessors, use multipliers. The demand for low power multipliers is gradually rising day by day in the current technological trend. In this study, we describe a 4×4 Wallace multiplier based on a carry select adder (CSA) that uses less power and has a better power delay product than existing multipliers. HSPICE tool at 16 nm technology is used to simulate the results. In comparison to the traditional CSA-based multiplier, which has a power consumption of 1.7 µW and power delay product (PDP) of 57.3 fJ, the results demonstrate that the Wallace multiplier design employing CSA with first zero finding logic (FZF) logic has the lowest power consumption of 1.4 µW and PDP of 27.5 fJ.
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
The flaw in 5G orthogonal frequency division multiplexing (OFDM) becomes apparent in high-speed situations. Because the doppler effect causes frequency shifts, the orthogonality of OFDM subcarriers is broken, lowering both their bit error rate (BER) and throughput output. As part of this research, we use a novel design that combines massive multiple input multiple output (MIMO) and weighted overlap and add (WOLA) to improve the performance of 5G systems. To determine which design is superior, throughput and BER are calculated for both the proposed design and OFDM. The results of the improved system show a massive improvement in performance ver the conventional system and significant improvements with massive MIMO, including the best throughput and BER. When compared to conventional systems, the improved system has a throughput that is around 22% higher and the best performance in terms of BER, but it still has around 25% less error than OFDM.
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
In this study, it is aimed to obtain two different asymmetric radiation patterns obtained from antennas in the shape of the cross-section of a parabolic reflector (fan blade type antennas) and antennas with cosecant-square radiation characteristics at two different frequencies from a single antenna. For this purpose, firstly, a fan blade type antenna design will be made, and then the reflective surface of this antenna will be completed to the shape of the reflective surface of the antenna with the cosecant-square radiation characteristic with the frequency selective surface designed to provide the characteristics suitable for the purpose. The frequency selective surface designed and it provides the perfect transmission as possible at 4 GHz operating frequency, while it will act as a band-quenching filter for electromagnetic waves at 5 GHz operating frequency and will be a reflective surface. Thanks to this frequency selective surface to be used as a reflective surface in the antenna, a fan blade type radiation characteristic at 4 GHz operating frequency will be obtained, while a cosecant-square radiation characteristic at 5 GHz operating frequency will be obtained.
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
A simple and low-cost fiber based optical sensor for iron detection is demonstrated in this paper. The sensor head consist of an unclad optical fiber with the unclad length of 1 cm and it has a straight structure. Results obtained shows a linear relationship between the output light intensity and iron concentration, illustrating the functionality of this iron optical sensor. Based on the experimental results, the sensitivity and linearity are achieved at 0.0328/ppm and 0.9824 respectively at the wavelength of 690 nm. With the same wavelength, other performance parameters are also studied. Resolution and limit of detection (LOD) are found to be 0.3049 ppm and 0.0755 ppm correspondingly. This iron sensor is advantageous in that it does not require any reagent for detection, enabling it to be simpler and cost-effective in the implementation of the iron sensing.
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
This article discusses the progressive learning for structural tolerance online sequential extreme learning machine (PSTOS-ELM). PSTOS-ELM can save robust accuracy while updating the new data and the new class data on the online training situation. The robustness accuracy arises from using the householder block exact QR decomposition recursive least squares (HBQRD-RLS) of the PSTOS-ELM. This method is suitable for applications that have data streaming and often have new class data. Our experiment compares the PSTOS-ELM accuracy and accuracy robustness while data is updating with the batch-extreme learning machine (ELM) and structural tolerance online sequential extreme learning machine (STOS-ELM) that both must retrain the data in a new class data case. The experimental results show that PSTOS-ELM has accuracy and robustness comparable to ELM and STOS-ELM while also can update new class data immediately.
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
This study aimed to develop a brain-computer interface that can control an electric wheelchair using electroencephalography (EEG) signals. First, we used the Mind Wave Mobile 2 device to capture raw EEG signals from the surface of the scalp. The signals were transformed into the frequency domain using fast Fourier transform (FFT) and filtered to monitor changes in attention and relaxation. Next, we performed time and frequency domain analyses to identify features for five eye gestures: opened, closed, blink per second, double blink, and lookup. The base state was the opened-eyes gesture, and we compared the features of the remaining four action gestures to the base state to identify potential gestures. We then built a multilayer neural network to classify these features into five signals that control the wheelchair’s movement. Finally, we designed an experimental wheelchair system to test the effectiveness of the proposed approach. The results demonstrate that the EEG classification was highly accurate and computationally efficient. Moreover, the average performance of the brain-controlled wheelchair system was over 75% across different individuals, which suggests the feasibility of this approach.
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
For image segmentation, level set models are frequently employed. It offer best solution to overcome the main limitations of deformable parametric models. However, the challenge when applying those models in medical images stills deal with removing blurs in image edges which directly affects the edge indicator function, leads to not adaptively segmenting images and causes a wrong analysis of pathologies wich prevents to conclude a correct diagnosis. To overcome such issues, an effective process is suggested by simultaneously modelling and solving systems’ two-dimensional partial differential equations (PDE). The first PDE equation allows restoration using Euler’s equation similar to an anisotropic smoothing based on a regularized Perona and Malik filter that eliminates noise while preserving edge information in accordance with detected contours in the second equation that segments the image based on the first equation solutions. This approach allows developing a new algorithm which overcome the studied model drawbacks. Results of the proposed method give clear segments that can be applied to any application. Experiments on many medical images in particular blurry images with high information losses, demonstrate that the developed approach produces superior segmentation results in terms of quantity and quality compared to other models already presented in previeous works.
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
Drug addiction is a complex neurobiological disorder that necessitates comprehensive treatment of both the body and mind. It is categorized as a brain disorder due to its impact on the brain. Various methods such as electroencephalography (EEG), functional magnetic resonance imaging (FMRI), and magnetoencephalography (MEG) can capture brain activities and structures. EEG signals provide valuable insights into neurological disorders, including drug addiction. Accurate classification of drug addiction from EEG signals relies on appropriate features and channel selection. Choosing the right EEG channels is essential to reduce computational costs and mitigate the risk of overfitting associated with using all available channels. To address the challenge of optimal channel selection in addiction detection from EEG signals, this work employs the shuffled frog leaping algorithm (SFLA). SFLA facilitates the selection of appropriate channels, leading to improved accuracy. Wavelet features extracted from the selected input channel signals are then analyzed using various machine learning classifiers to detect addiction. Experimental results indicate that after selecting features from the appropriate channels, classification accuracy significantly increased across all classifiers. Particularly, the multi-layer perceptron (MLP) classifier combined with SFLA demonstrated a remarkable accuracy improvement of 15.78% while reducing time complexity.
ResNet-n/DR: Automated diagnosis of diabetic retinopathy using a residual neu...TELKOMNIKA JOURNAL
Diabetic retinopathy (DR) is a progressive eye disease associated with diabetes, resulting in blindness or blurred vision. The risk of vision loss was dramatically decreased with early diagnosis and treatment. Doctors diagnose DR by examining the fundus retinal images to develop lesions associated with the disease. However, this diagnosis is a tedious and challenging task due to growing undiagnosed and untreated DR cases and the variability of retinal changes across disease stages. Manually analyzing the images has become an expensive and time-consuming task, not to mention that training new specialists takes time and requires daily practice. Our work investigates deep learning methods, particularly convolutional neural network (CNN), for DR diagnosis in the disease’s five stages. A pre-trained residual neural network (ResNet-34) was trained and tested for DR. Then, we develop computationally efficient and scalable methods after modifying a ResNet-34 with three additional residual units as a novel ResNet-n/DR. The Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset was used to evaluate the performance of models after applying multiple pre-processing steps to eliminate image noise and improve color contrast, thereby increasing efficiency. Our findings achieved state-of-the-art results compared to previous studies that used the same dataset. It had 90.7% sensitivity, 93.5% accuracy, 98.2% specificity, 89.5% precision, and 90.1% F1 score.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Amazon products reviews classification based on machine learning, deep learning methods and BERT
1. TELKOMNIKA Telecommunication Computing Electronics and Control
Vol. 21, No. 5, October 2023, pp. 1084~1101
ISSN: 1693-6930, DOI: 10.12928/TELKOMNIKA.v21i5.24046 1084
Journal homepage: http://telkomnika.uad.ac.id
Amazon products reviews classification based on machine
learning, deep learning methods and BERT
Saman Iftikhar1
, Bandar Alluhaybi1
, Mohammed Suliman1
, Ammar Saeed2
, Kiran Fatima3
1
Faculty of Computer Studies, Arab Open University, Riyadh, Saudi Arabia
2
Department of Computer Science, Commission on Science and Technology for Sustainable Development in the South (COMSATS),
Islamabad, Wah Campus, Wah Cantt, Pakistan
3
Technical and Further Education (TAFE), New South Wales, Australia
Article Info ABSTRACT
Article history:
Received May 23, 2022
Revised Dec 31, 2022
Accepted Feb 16, 2023
In recent times, the trend of online shopping through e-commerce stores and
websites has grown to a huge extent. Whenever a product is purchased on an
e-commerce platform, people leave their reviews about the product. These
reviews are very helpful for the store owners and the product’s manufacturers
for the betterment of their work process as well as product quality. An
automated system is proposed in this work that operates on two datasets D1
and D2 obtained from Amazon. After certain preprocessing steps, N-gram and
word embedding-based features are extracted using term frequency-inverse
document frequency (TF-IDF), bag of words (BoW) and global vectors
(GloVe), and Word2vec, respectively. Four machine learning (ML) models
support vector machines (SVM), logistic regression (RF), logistic regression
(LR), multinomial Naïve Bayes (MNB), two deep learning (DL) models
convolutional neural network (CNN), long-short term memory (LSTM), and
standalone bidirectional encoder representations (BERT) are used to classify
reviews as either positive or negative. The results obtained by the standard
ML, DL models and BERT are evaluated using certain performance
evaluation measures. BERT turns out to be the best-performing model in the
case of D1 with an accuracy of 90% on features derived by word embedding
models while the CNN provides the best accuracy of 97% upon word
embedding features in the case of D2. The proposed model shows better
overall performance on D2 as compared to D1.
Keywords:
Deep learning
Feature extraction
Machine learning
Sentiment analysis
Transformer technique
This is an open access article under the CC BY-SA license.
Corresponding Author:
Saman Iftikhar
Faculty of Computer Studies, Arab Open University
Riyadh, Saudi Arabia
Email: s.iftikhar@aou.edu.sa
1. INTRODUCTION
In the twenty-first century, the revolutionary growth in technology has changed the perspective by
which things are done around the world. The comfortable availability of the internet and access to almost
anywhere in the world has encouraged people to rely more on online services to fulfill their daily demands
such as shopping, hiring, selling, staying updated with news, and using social media platforms. Online
shopping is one of the most prominent domains that has seen tremendous uprising trends with this internet
revolution. Several well-established e-commerce websites are already ruling the internet and more are being
launched on daily basis [1]. People around the world have started relying more on those e-commerce sites to
purchase groceries, technical gadgets, and almost anything they can demand. The reason for using online
platforms to purchase goods is the ease of access as people do not have to go through the tiresome way of
exploring markets to find what they are looking for, rather they can sit in the comfort of their homes and easily
2. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1085
explore, compare, browse and purchase the items of their choice and can get them at their doorstep [2].
Amazon, the US-based leading e-commerce platform with a brand value of 684 billion USD has registered
sales of 386 billion USD in 2020 which mostly included shopping for electronic and technological products,
and the net sales revenue generated by it until now is 469 billion USD [3]. Another online business-to-business
(B2B) marketplace Alibaba processed 538,000 transactions each second during 2020 and generated sales
revenue of 717 billion Yuan by March 2020. Alibaba is also expected to achieve net sales of 3.62 trillion USD
in China by 2025 [4]. Flipkart made a revenue of 433 billion INR in a year which accounted for an increase of
25% as compared to 2020 [5]. The sales revenue generated by the same platforms during the year 2015 shows
that Amazon, Alibaba, and Flipkart sold products worth 75.6 USD, 76.2 million Yuan, and 12.5 billion USD.
When the sale trends of these giant online stores are compared from 2015 to 2020, it demonstrates a tremendous
increase in online shopping trends, therefore, leading to massive revenue production.
The statistics indicate the increasing trends of online shopping which are overtaking conventional
means. As far as e-commerce platforms and other online stores are concerned, it becomes necessary for them
to meet customer demands, correctly interpret them, listen to their complaints, and maintain quality to cement
their place and establish themselves. All authoritative e-commerce platforms use a review and rating system
through which customers and buyers can leave a review post product purchase and retrieval. This provides a
real-time evaluation of several aspects of online shopping including product quality feedback, customer
satisfaction with delivery service and procedure as well as suggestions for improvement [6]. The successful
standing giants use these evaluations and feedback for their constant improvement thus enhancing their
customer’s experience which in turn increases their sales and leads to augmented revenue origination. Since the
number of reviews and feedback are real-time and are performed by millions of users depending upon the diversity
of the concerned platform, it becomes very difficult to go through all of them. Most of the time, this phase is
performed manually, and it involves an inspection or quality assurance team that goes through some of the
feedback to formalize a report stating complaints, improvements, and recommendations. Due to the involvement
of physical personnel, it becomes a time-consuming, less accurate, and costly task. Even after investing time and
money, all reviews cannot be covered because they are in bulk and add up quickly in real-time [7]. This creates
an urge for the development of an automated system that can automatically take those reviews as input, perform
text-based analysis also known as sentiment analysis (SA) over them, deduct the meanings inducted inside
them, and classify them as good, and bad, or neutral. This can help e-commerce platforms to be more productive
in terms of customer feedback understanding and can timely adopt the measures and methods they are currently
lacking. An automated system can also rectify the problem of not being able to cover all the reviews as it can perform
sentiment analysis and generate results within seconds therefore it can go through the whole corpus within no time.
Evolutionary research in the field of artificial intelligence has made it very easy to develop such an
automated system. The approaches of machine learning (ML) and deep learning (DL) are being widely used to
formulate self-working systems that are implemented in mainstream real-world businesses and companies to
increase productivity, reduce manual workforce, increase accuracy and maintain brisk speed [8]. Moreover, the
use of word embedding methods such as term frequency-inverse document frequency (TF-IDF), N-gram-based
feature extraction techniques, transformer-based methods, and topic modeling approaches are being extensively
used in the SA tasks in the domain of data analytics and engineering [9]. Keeping in view the aspects, the
proposed work starts with the acquisition of two datasets from Amazon. The acquired datasets contain user
reviews for cell phones and other products and are randomly chosen. Certain preprocessing steps are employed
on the data to cleanse it and make it ready for the upcoming phases. The reviews are labeled in positive and
negative classes based on their star rating out of 5 stars. For feature extraction, N-gram methods TF-IDF, bag
of words (BoW), and word embedding techniques global vectors (GloVe), and Word2vec are implemented.
Bidirectional encoder representations (BERT) is also adopted to derive deep insights from data based on its
transformation layers. Certain ML classifiers including support vector machine (SVM), Naïve Bayes (NB),
random forest (RF), logistic regression (LR), and DL models including a custom convolutional neural network
(CNN) and long-short term memory (LSTM) are employed for classification. The results are evaluated based
on certain evaluation measures. The main contributions of proposed work are as:
a) Two datasets D1 and D2 have been acquired from Amazon. D1 is a collection of cell phone reviews,
while D2 is an amalgamation of D1 and random product reviews. Datasets are prepared and engineered
using various SA techniques.
b) The proposed work used both textual features extracted by TF-IDF, BoW and deep features extracted by
Word2Vec, Glove, and classified them using ML and DL models, respectively.
c) Transformer-based model BERT is also implemented to classify features and later on, its results are
compared with those of ML and DL models.
d) ML classifiers are used to classify textual features, DLs are used to classify deep features, and BERTs are
used to classify overall standalone features. BERT performs better on D1 while DL-based CNN proves
to be better in case of D2.
3. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1086
The rest of the paper is formatted as: section 2 gives an overview of the strategies employed in prior
publications for the analysis and classification of Amazon product reviews. The proposed methodology of this
research work is discussed in section 3. Section 4 lists all the experiments done, together with their findings
and performance evaluations. Section 5 discusses the findings, and section 6 concludes with the conclusion of
the proposed work.
2. RELATED WORK
Several studies have been performed for SA of reviews posted on e-commerce platforms and their
classification. We discussed some of them in the following subsections. We looked at their opted methodologies
and the results they achieved.
2.1. ML-based techniques for reviews classification
Daniel and Meena [10] proposed a hybrid framework composed of lexicon methods and ML techniques
for the SA of Amazon data that contains 48,500 reviews performed on various product categories such as
electronics, furniture, kid’s toys, and camera accessories. Each review contains information regarding the
reviewer, date, time, location, and review itself. The data is cleaned through several steps including URL removal,
conversion into a single case, tokenization, and stop word rectification. The data is then manually labeled into
positive and negative reviews based on their rating out of 5 using VADER. Feature extraction is performed using
fastText and GloVe attribute embedding methods. To minimize the attribute dimensionality and maintain the
model’s speed, a tunicate swarm intelligence (TSA) based feature selection approach is introduced over the
extracted features. Finally, the result is given to several ML classifiers localized support vector machine (LSVM),
DT, NB, and K-nearest neighbor (KNN). The proposed model achieves a maximum accuracy of 93% over the
furniture dataset with LSVM classifier without TSA and 91% with TSA whereas the execution time is also
reduced to 43% when TSA is implemented. Li et al. [11] obtained a dataset based on 10,261 reviews performed
on musical instruments from Amazon. Analphabetic sign removal, lowercase conversion, and removal of
unnecessary words are among the preprocessing steps performed for data cleansing. The frequency of all the
words is calculated using TF-IDF and a term set is formulated based on 160 terms with the highest frequency.
WordNet and SentiWordNet are utilized as lexicons to map each word with a corresponding weight in the feature
space. To generate word embeddings, BERT is implemented where the included words are vectorized.
Bi-directional LSTM along with an attention scheme is used for feature classification where the model achieves
an accuracy of 96% after the adjustment of the loss rate. The proposed model performs better when compared
with other baseline methods using several evaluation metrics. Shrestha and Nasoz [12] retrieved 3.5 million
reviews conducted on random product categories from Amazon. The dataset contains all information about the
review, user, time, and date and is compared with the 5 stars baseline. Preprocessing steps such as hyperlinks,
unwanted space, stop words, informal words, and punctuation removal are performed on the corpus obtained.
For data vectorization and semantic information deduction, the paragraph vectors (PV) are utilized that perform
next-word prediction and provide context to the sample paragraphs. Both memory distribution (PV-DM) and bag
of words distribution (PV-DBOW) versions are implemented in the proposed work. After converting text-based
reviews to dimensional vectors, the input is given to gated recurrent unit (GRU) for the derivation of embedding
information. Finally, the output is fitted to the SVM classifier that achieves a maximum accuracy of 81.29% and
81.82% on embedding derived for reviews and products, respectively.
Elmurngi and Gherbi [13] obtained the reviews performed on clothing, shoes, and jewelry categories
on the Amazon platform. The dataset is divided into 5 classes based on its star rating while the empty rows are
rectified. String to word vector (STWV) is used as the filtration method that performs stop words removal and
tokenization. For feature selection, a combination of best first, subset evaluation, and genetic search is used.
The final stage of classification is performed with the help of NB, DT, LR, and SVM classifiers upon the
selected features. The LR classifier maintains accuracies of 81.61%, 80.09%, and 60.72% on clothing, shoes,
and jewelry datasets, respectively, and stands out among other classifiers. Rao and Sindhu [14] performed
sarcasm analysis on product reviews from Amazon. A random dataset comprising product reviews is retrieved
from Amazon that contains information about the brand, product identity, and some useful information about
the reviewer. Each review is treated as a separate document and its labeling is performed based on rating
polarity. The preprocessing includes tokenization, stemming, lemmatization, and labeling based on polarity.
Feature extraction is performed using TF-IDF and N-gram methods (uni, bi, and trigrams). The extracted
features are finally provided to certain ML classifiers SVM, KNN, and RF where the SVM classifier achieves
a higher accuracy rate of 67.58% as compared to RF (62.34%) and KNN (61.08%). Wassan et al. [15]
formulated a data collection comprising reviews from 28,000 customers for 60 products on Amazon. The dataset
is categorized into positive, negative, and neutral reviews based on 5-star rating polarity. Stop word removal and
other data cleansing steps are performed using the natural language Toolkit (NLTK) and text blob libraries.
4. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1087
Features are extracted using the bag of words method and computation matrices such as recall, and f-measurement
are adopted along with cross-validation to map the reviews in their respective categories.
2.2. DL-based techniques for reviews classification
Hawlader et al. [16] collected a pre-labeled dataset from Amazon containing product reviews.
Preprocessing steps including tokenization, stop words removal, stemming and tagging are performed on the
data. The rating parameter is binarized to construct a premise for the classification. A new feature is introduced
into the dataset that categorizes the review into positive or negative classes. Search the classification is binary
so the neutral reviews are discarded given the dataset with 24000 negative and 70000 positive assessments.
Feature extraction is performed using BoW, TF-IDF, and Word2vec techniques all implemented individually
on the dataset. Several ML classifiers such as SVM, NB, LR, DT, RF as well as DL-based MLP are utilized to
map data into their concerning categories. The proposed model achieves the maximum accuracy of 91% via MLP
for TF-IDF features, 92% via MLP for BoW features, and 71% via MLP for Word2vec features. This indicates
that MLP outperforms ML models in terms of performance for the currently used dataset. Alharbi et al. [17]
obtained a publicly available dataset containing 400,000 reviews written on sold mobile phones on Amazon and
passed through several preprocessing steps which include spelling correction, tokenization, stop words removal,
punctuation removal, and lemmatization. The factorization of data is performed by using fastText, GloVe, and
Word2vec methods. Data are classified using a custom tuned recurrent neural network (RNN) that takes as
input the combination of embeddings extracted in the previous phase of feature engineering, passes it through
several normalization layers, and performs the prediction using the Softmax layer. Two variants of RNN are
utilized in this work which includes LSTM-RNN and general regression neural network (GRNN).
The proposed model achieves maximum accuracies of 88.38% and 87.25% on GLSTM and update gate RNN
(UGRNN) based frameworks, respectively. Dadhich and Thankachan [18] formulated a system that takes the
user reviews and comments obtained from Flipkart and Amazon as input, applies certain data cleansing steps,
performs feature extraction using the SentiWordNet algorithm, and classifies the data with the help of ML
classifiers NB, LR, RF, and KNN. The system achieves an overall accuracy of 91% with the Flipkart dataset.
Norinder and Norinder [19] collected an imbalanced reviews dataset from Amazon for five product and their
12 categories along with their star ratings. Non-alpha numeric values are removed from the data together with
stop words removal and tokenization using the NLTK toolkit. A deep architecture DNN is used for data
classification that comprises embedding, LSTM, and output layers. Conformal and mondrian predictions are
used for model calibration using the original data. The model shows accuracy rates ranging from 89.8% to
92.2% with an error rate of 12.5%.
Bhuvaneshwari et al. [20] employed data from Amazon containing 19,988 reviews on various products
where 15,000 reviews appear to have less than 100 words. 15,990 reviews are maintained for training and 3997
for testing. Some mandatory preprocessing steps such as URL, stop words, punctuation, and connecting words
removal are performed along with tokenization, stemming, and spelling correction. The skip-gram-based
Word2vec model is utilized to device word vectors from the prepared corpus. A Bi-LSTM model is formulated
that contains 100 parameters united and self-attention functionality. It is based on ReLU, CNN, pool, ully
connected, and classification layers. It used RMSProp as an optimizer and entropy as a loss function. The model
achieves an accuracy of over 85% when compared with standard CNN, Bi-directional gated recurrent unit
(BGRU), BCNN, and NB models along with decent training time. Nandal et al. [21] utilized a web crawler to
retrieve data on user reviews on Amazon products. The data contains information about the user, rating, review,
and URL. Vectorization, stop word removal, parts of speech (POS) tagging, stemming and lemmatization are
among the preprocessing steps employed. Aspect aggregation is employed to derive key aspects from the data.
The derived aspects and their polarities are given to the SVM classifier. Mean square error (MSE) is used as a
loss rate evaluator where the bipolar inputs show the minimum loss rate of 0.4% and the model’s validation
accuracy reaches 86%. Dey et al. [22] utilized a dataset from Amazon based on 147,000 reviews of various books.
Tokenization, stop word removal, and re-filling the missing values are some of the key preprocessing steps
employed. The feature extraction is performed using TF-IDF and classification is performed using the SVM and
NB classifiers. The LSVM classifier shows better accuracy (84%) as compared to NB (82%) Zhao et al. [23]
proposed a model for the SA of reviews given on e-commerce platforms including Amazon, eBay, and Taobao.
Data is passed through tokenization, lemmatization, and snowball stemming phases followed by a term weighting
phase based on least term frequency. The earthworm and bat feature selection algorithms are used to reduce feature
dimensionality and increase model briskness. The feature extraction is performed using Word2vec and TF methods.
The LSIBA-ENN model achieves better recall and precision rates when compared with standalone NB, SVM, and
ENN classifiers for both TF and Word2vec features. Mukherjee et al. [24] obtained 82,000 reviews posted on sold
cellphones on Amazon and performed standard preprocessing steps on it such as stop words removal,
tokenization, URL, and punctuation removal. TF-IDF is used for feature extraction whereas ML classifiers NB,
SVM, and DL models RNN and artificial neural networks (ANN) are used to classify data in their respective classes.
The ANN model along with negation yields the best accuracy rate of 95.67% in competition with other employed
5. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1088
models. ANN also performs better in terms of other performance metrics. Sivakumar and Uyyala [25]
implemented fine-tuned LSTM based on fuzzy logic over the amazon reviews for cell phones (ACPR), Amazon
reviews for video games (AVGR), and consumer random reviews for Amazon products (CRAP) datasets. They
performed tokenization, spelling correction, normalization, lemmatization, and long-sentence splitting over the
datasets and passed it through BoW-based word embedding. The proposed LSTM is tested on 100−500 epoch
settings, batch sizes of 4−32, Adam optimizer, and dropout layer where it achieves an overall accuracy rate of
96.93%, 83.82 and 90.92% for ACPR, AVGR, and CRAP datasets, respectively.
2.3. Opinion-based reviews classification using ML and DL models
Huang [26] collected the data from e-commerce, social, and comment websites and applied certain
preprocessing steps including frequent term filtration, rules mining and dictionary formation. After the mining
phase, the feature vectors are generated against each acquired comment sentiment by identifying the comment’s
polarity, considering of formulated dictionary, and computing the scores of effectiveness. Finally, the data is
analyzed for any trace of risk or fraud by merging all the computed indices with feature vectors and mining the
anomalies. The analysis phase used methods of syntactic analysis, true comments are detected based on their
unanimity, and false comments are detected based on the deviation of their indices from those of the original
comment’s indices. The proposed schema reached a maximum credibility (accuracy) score of 83.14% on the
utilized e-commerce product datasets. Vanaja and Belwal [27] performed SA on customer opinions and
feedback against Amazon products after identifying and tagging parts of speech from the formulated dataset.
The Apriori algorithm is used to perform feature derivation. It is applied to the dataset to extract commonly
used elements and is based on association rules, which are employed in databases to establish relationships
between various features. Feature simplification is followed by the extraction of opinion words. Opinion words
are a group of adjectives that describe the features of the product. Finally, the phase of classification is performed
to categorize the opinions into positive, negative, and neutral classes. SVM and NB are amongst the ML-based
classifiers used in this case that are integrated with SentiWordNet for polarity score computation. Results indicate
that NB achieves the highest accuracy score of 90.423% as compared to SVM’s score of 83.423%.
Alzahrani et al. [28] utilized the Amazon technology products reviews dataset and implemented
mandatory preprocessing steps such as stop words removal, tokenization, and speech tagging on it. The opinion
lexicon is utilized to compute sentiment scores. The integration of CNN and LSTM is formulated to classify
reviews into positive or negative classes. The standalone LSTM achieved an accuracy of 91.03% while the
integrated CNN-LSTM framework achieved an accuracy of 94%. Dadhich and Thankachan [18] performed
the SA and classification of product reviews provided on Amazon and Flipkart. They implemented certain data
preparation steps and generated a knowledge tree. Natural language toolkit is employed to generate a word
dictionary, compute word information, and extract textual features. Five ML classifiers are used to categorize
comments and reviews into their respective polarity classes including NB, LR, SentiWordNet, KNN, and RF.
The proposed model achieved an accuracy of 91.13% on a total of 79655 reviews. They integrated several DL
models including robustly optimized BERT (RoBERT), LSTM, GRU, and bidirectional LSTM (BiLSTM) to
perform SA of internet movie database (IMDb), American airline and Sentiment140 data corpora. Case
conversion and punctuation corrections are amongst some of the preprocessing steps performed on the utilized
datasets. Glove-based word embedding is applied to the data to perform data augmentation and feature
extraction. Experiments are conducted with various utilized DL-models’ combinations where RoBERT-LSTM
models yielded 91.37% accuracy, RoBERT-BiLSTM model yielded 91.21% accuracy, RoBERT-LSTM model
yielded 91.37% accuracy and RoBERT-GRU achieved 91.52% accuracy. Tan et al. [29] derived a large amount
of textual data from various web sources, blogs, networks, and search mediums during the COVID tenure.
Language, geographical, time, and creator filtering are performed for opinion mining. Noise filtering,
tokenization, and normalization are performed for aspect mining. Opinion classification is then performed
using supervised, semi and non-supervised ML algorithms. The proposed work shows that ML models show
promising accuracy in the classification of opinions within sentiments. Qureshi et al. [30] applied SA to Roman
Urdu upon reviews dataset collected from YouTube. The reviews were based on comments given against
different Pakistani and Indian songs. After applying mandatory preprocessing steps, the data from different
files are integrated into a single comprehensive file and annotated as either positive or negative by the language
experts. Finally, the classification is carried out using several ML algorithms such as NB, SVM, LR, DT, KNN,
and CNN. Among all the applied models, LR turned out to be the best-performing model with an accuracy of
92.25% while the CNN performed worst with an accuracy of just 66.54% when applied to a total of 24,000
reviews containing half divisions of positive and negative ones.
Taking the advancements in AI to the next level, Cambria et al. [31] suggested a commonsense-based
neuromyotonic framework named SenticNet 7, that seeks to address these problems. They developed reliable
symbolic representations that transform natural language into a kind of protolanguage and, as a result, extract
polarity from text in a fully interpretable and comprehensible way using unsupervised and reproducible sub
6. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1089
symbolic techniques like auto-regressive language models and kernel methods. The formulated model utilizes
techniques of primitive discovery, affective similarity, primitive pairing and setting sentiment pavements for
the learned predicates. The proposed model achieves an average accuracy of over 80% when tested against
10 benchmark SA datasets. For aspect-based sentiment analysis, Zhao and Yu [32] suggested the BERT model
of knowledge-enhanced language representation. By incorporating sentiment domain knowledge into the
language representation model, which extricates the vectorization formats of mappings included in the knowledge
graph and predicates in the text in a consistent vector space, their proposal makes use of the additional information
from a sentiment knowledge graph. Additionally, by introducing outside information into the language
representation model to make up for the sparse training data, the model can perform better with less training data.
The model may therefore deliver comprehensive and understandable findings for aspect-based sentiment analysis.
After going through some of the mentioned literature work, it is noticed that most works have either used
standalone N-gram methods or deep word embedding techniques in their works but very few have used a
combination of both for the classification of Amazon product reviews. Also, there has not been any elaborative
comparison of core natural language processing (NLP) transformer-based methods with the standard ML and DL
models. Taking the lead from all these aspects, the presented study aims to provide an elaborative comparison of
transformer-based methods with standard ML and DL models. Also, the proposed work looks to implement multiple
feature extraction methodologies including both N-gram and word embedding models for better conceptualization.
3. METHOD
A framework for the classification of user-posted reviews on Amazon products is proposed in this
work. Two publicly available Amazon product datasets are obtained where one contains reviews for cell phones
and the other contains consumer reviews for random products. The datasets are cleaned and preprocessed with
steps such as stop words and null values removal, data balancing, tokenization, and lemmatization. The dataset
reviews are labeled as positive and negative based on their star ratings out of 5 stars. Features are extracted
using N-gram methods TF-IDF, BoW, and word embedding models GloVe, and Word2vec. The extricated
features are classified using several ML algorithms including SVM, NB, RF, LR, DL-based CNN, and LSTM.
For performance comparison of model performance with current data, a transformer-based BERT model is also
formulated in this work that operates directly on preprocessed data and performs classification in parallel with
the rest of the procedure. The results of standard ML and DL approaches with BERT are compared and
analyzed with the help of certain evaluation metrics. Figure 1 shows the architectural framework for the
proposed model. All mentioned steps are discussed in their specific sections.
3.1. Data acquisition and pre-processing
Two publicly available datasets from Kaggle are obtained for this work. Both datasets are based on
reviews posted on Amazon products. The first dataset (D1) contains reviews posted for locked and unlocked cell
phones belonging to ten brands such as ASUS, Google, Xiaomi, Sony, Samsung, and others. It contains important
information such as title, brand, URL, reviews, ratings, and price of cell phones. The second dataset (D2) is the
combination of the cell phone reviews dataset (D1) with another dataset comprising reviews posted by 34,000
customers against random Amazon products including electronic appliances, gadgets, and products mostly.
It contains main data attributes such as brand, category, manufacturer, reviews, date of review posting, and date
of it being seen. Table 1 provides an overview of data statistics after exploratory data analysis (EDA). EDA on
both datasets is performed after using preprocessing techniques including tokenization and lemmatization.
Table 1. Dataset statistics
Attribute Value
Dataset 1 (D1) 67986
D1-positive sentiments 51328
D1-negative sentiments 16658
D1-word count 6239978
D1-character count 24727439
D1-sentence count 102656
Dataset 2 (D2) 130978
D2-positive sentiments 109189
D2-negative sentiments 21789
D2-word count 11079466
D2-character count 43865611
D2-sentence count 218378
7. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1090
Both the D1 and D2 are highly unbalanced in their natural state which can cause the model to be
biased in its predictions. To make sure both datasets are equally balanced, D1 is balanced using the approach
of under-sampling while D2 is balanced using over-sampling. Since the datasets are based on real-world
reviews of people around the world and they belong to various product categories, they contain noise and other
artifacts that may cause performance deification. Therefore, these problems are addressed through the
implication of several preprocessing steps including stop words removal, balancing the data classes,
tokenization, and lemmatization. Apart from applying the mentioned preprocessing techniques, data labeling
is also performed as per the review score out of 5. Reviews with a rating greater than 3 are considered positive
while those with a rating below it, are considered negative and are labeled accordingly.
Figure 1. Architecture for the proposed Amazon product reviews analysis and classification
3.2. Feature extraction
The model cannot work on or categorize the data in its regular textual form after data preparation and
balancing, which is why it must be translated into mathematical and vector format so that the ML and DL
algorithms can interpret it. The vectorial data retrieved from the text is then fed into the ML and DL models as
features. A complete representation of the words in the corpus must be extracted, and there are several methods
for doing so. Deep and textual feature extrication approaches such as word embedding, and N-gram methods are
used to extract the features. GloVe from Sandford NLP and Word2vec from Google news vectors are utilized as
pre-trained word embeddings in the proposed study. BoW and TF-IDF are used to extract N-gram-based features.
These approaches are addressed below with the planned study.
3.2.1. Textual features
Any series of word tokens in each data is referred to as an N-gram, with 𝑛 = 1 denoting a unigram,
𝑛 = 2 denoting a bigram, and so on. An N-gram model can calculate and forecast the likelihood of N-grams
in a data corpus. Such models are effective in text classification problems where the number of particular terms
contained in the vocabulary from the corpus must be counted [26]. The TF-IDF is a metric that assesses how
closely a word in a catalog corresponds to its meaning or mood. It works by taking the frequency of terms in a
document and multiplying it by the inverse frequency of words that appear in several texts regularly [27].
The frequency of documents in a corpus is calculated using TF-capabilities IDF [29], which is represented in (1).
8. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1091
𝑤𝑚,𝑛 = 𝑓𝑚,𝑛
𝑡
𝑥 log(
𝐾
𝑓𝑚
) (1)
Where, 𝑤𝑚,𝑛 indicates the weight of data points 𝑚 and 𝑛, 𝑓𝑚,𝑛
𝑡
is used to compute the occurrence
frequency of target point 𝑚 within reference point 𝑛, 𝐾 shows the total number of included documents within
the compilation, 𝑙𝑜𝑔(
𝐾
𝑓𝑚
) is used for 𝑙𝑜𝑔 computation of target data point 𝑚 in all the dataset documents. BoW
may also be used to extract valuable qualities from textual material that has to be categorized. It operates based
on a predetermined vocabulary and searches for the frequency of certain terms in the document in question
using that vocabulary. The model simply cares if known terms appear in the document, not where they appear,
and it generates a histogram of such words within the data that can be readily fed to classifiers [30]. The (2) is
used by BoW to build bags containing words.
𝑑𝑚 = ∑ 𝑤𝑚
𝑛
𝐾
𝑚=1 𝑥 𝑤𝑚 (2)
Where 𝑑𝑏 indicates the document in which target data point 𝑚 is present. 𝑤𝑚
𝑛
assigns the weights to the
target point 𝑚 concerning reference point 𝑛. 𝑤𝑚 is the weight of target point 𝑚 which in this scenario is our point
of concern. Both TF-IDF and BoW are employed to derive features from the preprocessed dataset in the proposed
study. A collection of four ML classifiers is used to evaluate and classify the retrieved features from both models.
3.2.2. Word embeddings
Word embedding [29] is a technique for converting and representing textual data made up of words
into a vector and mathematical form. There are several models available for this purpose, however, we used
the pre-trained GloVe from Stanford NLP [32] and Wor2vec [33] from Google news vectors in this study.
GloVe is an unsupervised learning technique that uses the global word co-occurrence matrix to extract word
embeddings from an input data corpus. When applied to any data, it directly obtains information about the words
occurring frequently in that data and maps the words into vector spaces [30]. It is trained on global statistics of
words included in a large corpus compiled from online sources and when applied to any data, it obtains
information about the words occurring frequently in that data and maps the words into vector spaces. It has been
frequently used to derive features and pass them on to classification models in text classification challenges.
As (3) shows, it is based on the bilinear (LBL) model, which operates on the idea of weighted least squares [31].
𝑤𝑚 . 𝑤𝑛 = log 𝑝𝑟𝑜𝑏(𝑚|𝑛) (3)
Here, 𝑤𝑚, 𝑤𝑛 is the weightage of target and reference data points and 𝑝𝑟𝑜𝑏(𝑚|𝑛) is their probability
of occurrence. The working logic behind GloVe is represented in (4).
𝑙𝑜𝑠𝑠 = ∑ 𝑓(𝑋𝑚,𝑛)(𝑤𝑚
𝑡
𝑤𝑛 − log 𝑋𝑚,𝑛)2
𝐾
𝑚,𝑛=1 (4)
Where, 𝑓(𝑋𝑚,𝑛) is the least-squares mapping function between the data points and 𝑤𝑚
𝑡
𝑤𝑛 shows the
weights for points concerning time 𝑡. Word2vec is a word embedding approach that uses the skip-gram method
to provide this capability and is based on shallow deep networks. Based on the frequency of documents and
their co-occurrence matrix, it builds vectors of textual data included in the corpus. The skip-gram approach is
used by Word2vec to execute computations, as shown in (5).
1
𝑇
∑ ∑ log 𝑝𝑟𝑜𝑏(𝑤𝑑𝑝𝑜𝑠+1|𝑤𝑑𝑚)
−1≤𝑚≤1,𝑚≠0
𝐾
𝑝𝑜𝑠=1 (5)
Where 𝐾 is the size of the corpus, pos is the position of a word 𝑤𝑑𝑚 in data 𝐾, log 𝑝𝑟𝑜𝑏(𝑤𝑑𝑚+1|𝑤𝑑𝑚)
is the log of 𝑤𝑑𝑚 as it keeps on updating its positions and locales within the document [34]. The preprocessed
data is also sent to both the GloVe and Word2vec models in the proposed study, and the features created by them
are then given a customized CNN as well as LSTM where the results are assessed.
3.3. Transformer-based mode
Deep models based on transformers are currently commonly utilized in NLP. For user-based
e-commerce product reviews classification in the proposed study, BERT is implemented. The encoder and
decoder are the two major components of a transformer. The encoder takes words as input and generates
embedding that encapsulates the meaning of the word, while the decoder uses the encoder’s embedding to
construct the next word until the sentence is completed. To effectively extract a contextual representation of
provided phrases, we used BERT as a sentence encoder. BERT overcomes the unidirectional constraint by
employing mask language modeling (MLM) [35]. It masks multiple tokens from the input at random and uses
just the input to predict the original vocabulary id of the masked word. MLM has increased BERT’s ability to
outperform when compared to previous embedding methodologies. It is a deeply bidirectional system that can
9. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1092
analyze unlabeled text at all levels by conditioning on both left and right contexts using a transformer backend
and the attention mechanism. When the attention mechanism receives the input data, it maps it to a
multidimensional space and calculates the significance of each data point. The inputs are then contained in output
transformations, and output solutions are generated by the layer stacks in both the encoder and decoder [36].
3.4. Classification
After all the above-mentioned steps, the feature sets from the N-gram methods are given as input to
the four ML classifiers, and those from the word embedding methods to DL-based CNN and LSTM are
classified in their respective classes. The proposed work uses four ML-classifiers random forest (RF), support
vector machine (SVM), logistic regression (LR), and multinomial Naïve Bayes (MNB)as well as DL-based
CNN comprising embedding, convolutional, max-pooling, and SoftMax layer and LSTM for the classification
of textual and word embedding feature sets respectively. In parallel, BERT also performs classification by taking
in the preprocessed dataset and providing their class prediction. All the results are discussed in detail in section 4.
4. EXPERIMENTATION AND RESULTS
The proposed framework takes raw input data containing Amazon product reviews and applies certain
preprocessing and data balancing steps to it. N-gram methods TF-IDF [37], BoW, and word embedding
methods GloVe and Word2vec are then implemented for feature extraction. The extricated future sets are
finally classified using certain ML and DL algorithms. BERT is also implemented to derive transformer-based
features from the preprocessed data and the predictions generated by it are compared with ML and DL models
for a comparative study with the help of performance evaluation metrics. The experimental work is carried out
on both datasets individually where all the steps are applied on D1 followed by the implementation of the same
steps on D2. The sections below will cover the experiments conducted on D1 followed by D2 and then provide
an elaborative comparison of both.
In the first experiment conducted on D1, textual features are given to four ML classifiers SVM, RF,
LR, and MNB. The results evaluated by performance evaluation measures (PEM) such as accuracy, precision,
recall, and f1-score, are shown in Table 2. The experiments are carried out in Python, and the package used to
integrate the model into the experimental space is called the “sklearn” ensemble. All the models are trained
and evaluated using 90% and 10% of the dataset respectively.
Table 2. Classification results of ML models with textual features of D1
PEM SVM-TF-
IDF (%)
SVM-BoW
(%)
RF-TF-IDF
(%)
RF-BoW
(%)
LR-TFIDF
(%)
LR-BoW
(%)
MNB-TF-
IDF (%)
MNB-BoW
(%)
Accuracy 86.16 86.67 82.53 82.32 88.47 86.52 86.88 87.18
Precision 86 87 83 82 89 87 87 87
F1 score 86 87 83 82 89 87 87 87
Recall 86 87 83 82 89 87 87 87
Figure 2. Results comparison of ML models with textual features of D1
10. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1093
Figure 2 shows a graphical performance comparison of all four ML models with textual features derived
from D1. The LR and MNB classifiers provide the highest accuracies of 88.47% and 87.18% with TF-IDF and
BoW features respectively. Figure 3 shows the performance comparison for both LR and MNB. The RF classifier
yields the lowest results while only achieving accuracies of 82.53% and 82.32% on TF-IDF and BoW features
respectively. RF also has the lowest precision, f1-score and recall as compared to the rest. Apart from RF, other
classifiers perform almost alike with a small difference with not a huge accuracy gap between TF-IDF and BoW
derived features.
Figure 3. Performance comparison of best performing LR and MNB models on D1
In the second experiment conducted on D1, the two DL models including CNN and LSTM are provided
with the word embedding features extracted by both GloVe and Word2vec models. CNN, which is selected for
speed and accuracy is trained and tested on the same data split utilized in ML algorithms. The number of epochs
varied from 5 to 100 and a batch size of 32 is maintained for CNN training. In the case of LSTM, the batch size
is kept at 32, epochs are set to 5, while the main layers that constitute LSTM are embedding, dense and SoftMax
layers. Apart from providing the word embeddings to these ML and DL models, the preprocessed dataset is given
as an input to BERT, which derives its encodings from the preprocessed D1, takes as input the preprocessed D1,
extracts embedding representations from it, and maps transformations on it. Finally, it decodes the representations
back into vocabulary-based representations and uses its deep layers to perform classification. The same data split
is maintained for BERT as well while the number of epochs is kept at 5 epochs and a batch size of 16 is maintained.
Table 3 shows the results of CNN, LSTM, and BERT when word embeddings and prepared datasets are given to
them respectively.
As evident from Table 3 that CNN performs better as compared to LSTM in terms of all PEMs when
applied to word embeddings derived from D1. CNN excels in terms of accuracy and other PEMs for both GloVe
and Word2vec features. On the other hand, BERT outperforms both CNN and LSTM and achieves the highest
performance rates with an accuracy of 90%. BERT is the best-performing model on D1 as its performance
dominates that of ML and DL model’s performance concerning accuracy and other PEMs. Figure 4 shows the
graphical visualization of the accuracy and loss of CNN in Figure 4(a) and Figure 4(b) respectively.
Table 3. Classification results of DL algorithms with word embedding features of D1
PEM CNN-GloVe (%) CNN-Word2vec (%) LSTM-GloVe (%) LSTM-Word2vec (%) BERT
Accuracy 87.75 86.46 86.37 86.28 90
Precision 88 87 86 86 90
F1 score 88 87 86 86 89
Recall 88 86 86 86 89
As shown in Figure 5 CNN initiates with less accuracy and a higher loss rate while training on both
GloVe and Word2vec features but then goes on to achieve a considerably higher accuracy rate. The reason for
that is that CNN gradually trains on the input data, starts learning deep features from the data using its deep layers.
As time progresses and layers get more and more trained, the predictions start becoming better and loss rate
significantly falls. DL models perform better on larger datasets as they have much input to learn their features from
so the more data is given to them, better prediction starts showing up. Figure 5 shows the accuracy and loss ratio for
the LSTM model when trained and evaluated on word embeddings in Figure 5(a) and Figure 5(b) respectively.
11. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1094
LSTM also shows an uprising curve with the time when the accuracy rate increases, and the loss rate
decreases. Figure 6 shows the accuracy graph for BERT for epochs when trained and tested on D1 in Figure 6(a)
and Figure 6(b) respectively. The reason for that is that LSTM gradually trains on the input data, starts learning deep
features from the data using its deep layers. As time progresses and layers get more and more trained, the predictions
start becoming better and loss rate significantly falls. DL models perform better on larger datasets as they have much
input to learn their features from so the more data is given to them, better prediction starts showing up.
(a) (b)
Figure 4. Accuracy and loss ratio visualization of CNN with (a) GloVe features of D1 and (b) Word2vec
embedding features of D1
After the implementation of all steps included in the proposed methodology on D1, the same steps are
repeated for D2. In the first experiment conducted on D2, textual features are given to four ML classifiers SVM,
RF, LR, and MNB. The results evaluated by performance evaluation metrics accuracy, precision, recall, and
f1-score, are shown in Table 4. These experiments are also carried out in Python with the “sklearn” ensemble
integration package. All the models are trained and evaluated using 90% and 10% of the dataset, respectively.
Figure 7 shows a graphical performance comparison of all four ML models with textual features derived
from D2. The SVM and LR classifiers provide the highest accuracy rates of 94.02% and 93.99% with Figure 8
shows the performance comparison of both LR and SVM when applied to textual features of D2. TF-IDF and
BoW features, respectively and hence are the best-performing models. All the ML models perform a lot better in
general in the case of D2 as compared to D1 as can be observed by comparing Table 2 and Table 3. The reason
could be that D2 is better prepared and engineered as compared to D1.
Same to the experiments conducted on D1, the two DL models including CNN and LSTM are
provided with the word embedding features extracted by both GloVe and Word2vec models from D2. In the
case of CNN, the number of epochs is increased from 5 to 10 while a batch size of 32 is maintained. In the case
of LSTM, the batch size is kept at 32, epochs are set to 5, and the same deep layers are maintained. In parallel
to that, the preprocessed D2 is fed into BERT for the derivation of transformer-based representations while the
same number of epochs and batch size are maintained. Table 5 shows the results of CNN, LSTM, and BERT
when word embedding features and processed D2 are given to them, respectively.
As evident from Table 5 CNN outperforms both LSTM and BERT for both GloVe and Word2vec
features. It achieves an accuracy of 97.12% for GloVe and 96.66% for Word2vec features which is
considerably superior to its counterparts. The reason for such an increment in performance over D2 could be
the preparation and feature engineering of D2 as compared to D1. This better preparation and engineering led
to an increment in the performance of all ML models, DL-based CNN, LSTM as well as BERT in the case of
D2 whereas all the models were limited to a maximum of 90% accuracy in case of D1. Figure 9 shows the
graphical visualization of the accuracy and loss of CNN in Figure 9(a) and Figure 9(b). CNN started with a
high loss rate, eventually learns deep features and improves its performance over time and epochs.
Table 4. Classification results of ML models with textual features of D2
PEM
SVM-TF-
IDF (%)
SVM-BoW
(%)
RF-TF-IDF
(%)
RF-BoW
(%)
LR-TFIDF
(%)
LR-BoW
(%)
MNB-TF-
IDF (%)
MNB-BoW
(%)
Accuracy 94.02 93.07 92.98 93.47 93.73 93.99 93.53 92.93
Precision 93 92 86 94 93 93 89 92
F1 score 94 93 93 93 94 94 94 93
Recall 93 93 90 90 91 93 91 92
12. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1095
(a) (b)
Figure 5. Accuracy and loss ratio visualization of LSTM with (a) GloVe features of D1 and (b) Word2vec
embedding features of D1
(a) (b)
Figure 6. BERT model with (a) training accuracy on D1 and (b) loss graph on D1
Figure 7. Results comparison of ML models with textual features of D2
13. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1096
Figure 8. Performance comparison of best performing LR and SVM models on D2
Table 5. Classification results of DL models and BERT with word embedding features of D2
PEM CNN-GloVe (%) CNN-Word2vec (%) LSTM-GloVe (%) LSTM-Word2vec (%) BERT
Accuracy 97.12 96.66 93.91 92.26 95.17
Precision 97 97 94 93 95
F1 score 97 97 94 92 85
Recall 97 97 94 92 85
Figure 10 shows the accuracy and loss ratio for the LSTM model in Figure10(a) and Figure 10(b)
respectively, when trained and evaluated on word embedding features derived from D2 while maintaining the
same settings as in the case of D1. Here again LSTM starts with a high loss rate and minimum accuracy and
eventually excels in terms of performance. The reason for that is that LSTM gradually trains on the input data,
starts learning deep features from the data using its deep layers. As time progresses and layers get more and
more trained, the predictions start becoming better and loss rate significantly falls. DL models perform better
on larger datasets as they have much input to learn their features from so the more data is given to them, better
prediction starts showing up.
(a) (b)
Figure 9. Accuracy and loss ratio visualization of CNN with (a) GloVe features of D2 and (b) Word2vec
embedding features of D2
After the implementation of CNN and LSTM upon D2’s word embedding features. The preprocessed
D2 is given to BERT to compare the performance of the ML and DL models performance with it. BERT takes in
the data and performs self-driven classification results which are discussed in Table 5. Figure 11 shows the accuracy
to lose ratio graph for BERT for epochs when trained and tested on D2 in Figure 11(a) and Figure 11(b) respectively.
14. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1097
(a) (b)
Figure 10. Accuracy and loss ratio visualization of LSTM model with (a) GloVe features of D2 and
(b) Word2vec embedding features of D2
(a) (b)
Figure 11. BERT model for (a) training accuracy on D2 and (b) loss graph on D2
5. DISCUSSION
All the experiments performed for the proposed work are discussed in detail in the preceding section
along with the results. It is quite evident from experiments conducted on D1 that ML models perform better in
general as compared to DL models in terms of accuracy and other PEMs. Although BERT outperforms both
ML and DL models regarding the accuracy of 90% as well as all other PEMs. The accuracy comparison of
ML, DL models, and BERT when applied to D1, is visualized in Figure 12.
In the case of D2, DL-based CNN outperforms all the ML models along with BERT. It achieves the
highest accuracy rates of 97.12% for GloVe and 96.66% for Word2vec features which is considerably superior
to its counterparts. The accuracy comparison of ML, DL models, and BERT when applied on D2, is visualized
in Figure 13.
15. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1098
The fact to be noticed here is that experiments conducted on D1 show considerably fewer performance
rates in general as the highest accuracy, in this case, turn out to be 90% by the BERT. After merging D1 with
Amazon’s general product reviews dataset to form D2 and performing over-sampling on it, the performance
of the model is increased to a huge extent. This phenomenon is demonstrated in Figure 14 where ML, DL, and
BERT models are compared based on accuracy upon textual features, word embedding features, and datasets
themselves, respectively, in the case of both D1 and D2. The merged dataset D2 produces better results in
general, as compared to D1. The proposed model excels in terms of accuracy and other PEMs as compared to
D1. It is evident from Figure 14, Figure 15, and Figure 16 that the performance of ML models, DL models and
BERT is better on D2 as compared to D1.
Figure 12. Accuracy comparison of ML, DL
models, and BERT on D1
Figure 13. Accuracy Comparison of ML, DL
models, and BERT on D2
Figure 14. Accuracy comparison of ML models
over D1 and D2
Figure 15. Accuracy comparison of DL models over
D1 and D2
Figure 16. Accuracy comparison of BERT over D1 and D2
16. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1099
6. CONCLUSION
The ever-growing trend of online shopping through e-commerce platforms has resulted in a massive
surge of reviews posted regarding various product categories on them. Analyzing these reviews helps product
and platform owners to improve their services and maintain higher standards. Since the reviews are present in
bulk, contain opinions of people around the globe, and are based on positive and negative sentiments, manually
sorting, and analyzing them is not possible. Therefore, an automated system is presented in this work that takes
input from the two product review datasets gathered from Amazon. The implication of several preprocessing
steps ensures that the data is well-balanced and ready for the feature extraction phase. The feature extraction is
carried out with textual feature derivation techniques, such as TF-IDF, BoW, and word embedding feature
extractors GloVe, and Word2vec. Multiple ML models including SVM, RF, LR, MNB, and a couple of DL
models CNN, and LSTM are applied to textual and word embedding features, respectively. BERT is also applied
to both datasets to compare its performance with ML and DL models. BERT turns out to be the best-performing
model in the case of D1 with an accuracy of 90% on features derived by word embedding models while RF
provides the minimum accuracy rate of 86% on textual features. In the case of D2, CNN provides the best
accuracy of 97% upon word embedding features while RF and MNB show the minimum accuracy rates of
92%. The proposed model shows better overall performance on D2 as compared to D1. The model shows better
overall results on D2 with the increase in data when compared with several performance metrics. The proposed
model uses two datasets to perform preprocessing, feature engineering, textual feature extraction, deep feature
extraction, classification based on ML as well as DL models and BERT-based classification. When textual
features are classified using ML models, The LR and MNB classifiers provide the highest accuracies of 88.47%
and 87.18% with TF-IDF and BoW features, respectively whereas in case of deep features, BERT has the
highest accuracy of 90% as compared to both ML and DL models. This causes the limitation in case of D1 as
the overall accuracy is not exceeding 90%. In future we might have to apply much better methodologies for
data cleaning, pruning and feature engineering and also optimize ML, DL models and BERT according to them
to increase our overall accuracy.
In case of D2, the DL-based CNN achieves an accuracy of 97.12% on deep features extracted by
GloVE and Word2vec, outperforming ML models by a large margin and BERT by a significant margin. This
proves that D1 and D2 have provided different results indicating that their preparation or feature extraction has
a lot of difference. We need to study the difference closely and apply the best result deriving methodology in
future works. To derive much better results, we can further enhance our CNN model, BERT models and look
to apply generative pre-trained transformer (GPT) for much better results.
ACKNOWLEDGEMENTS
The authors would like to thank Arab Open University research group number: “AOURG-2023-020”,
Saudi Arabia for supporting this study.
REFERENCES
[1] R. Liang and J. -Q. Wang, “A linguistic intuitionistic cloud decision support model with sentiment analysis for product selection in
E-commerce,” International Journal of Fuzzy Systems, vol. 21, pp. 963–977, 2019, doi: 10.1007/s40815-019-00606-0.
[2] Y. Basani, H. V. Sibuea, S. I. P. Sianipar, and J. P. Samosir, “Application of sentiment analysis on product review e-commerce,”
Journal of Physics: Conference Series, 2019, doi: 10.1088/1742-6596/1175/1/012103.
[3] D. Coppola, Amazon - Statistics & Facts, Statista, Jun. 2023. [Online]. Available:
https://www.statista.com/topics/846/amazon/#topicOverview
[4] Y. Ma, Alibaba Group - Statistics & Facts, Statista Key Figures of E-Commerce, Nov. 2022. [Online]. Available:
https://www.statista.com/topics/2187/alibaba-group/
[5] Revenue of Flipkart Private Limited between financial year 2014 and 2021, Statista Key Figures of E-Commerce, 2022. [Online].
Available: statista.com/statistics/1053314/india-flipkart-revenue/
[6] R. Ireland and A. Liu, “Application of data analytics for product design: Sentiment analysis of online product reviews,” CIRP
Journal of Manufacturing Science and Technology, vol. 23, pp. 128–144, 2018, doi: 10.1016/j.cirpj.2018.06.003.
[7] B. S. Rintyarna, R. Sarno, and C. Fatichah, “Evaluating the performance of sentence level features and domain sensitive features of product
reviews on supervised sentiment analysis tasks,” Journal of Big Data, vol. 6, no. 84, 2019, doi: 10.1186/s40537-019-0246-8.
[8] C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning and deep learning,” Electronic Markets, vol. 31, pp. 685–695, 2021,
doi: 10.1007/s12525-021-00475-2.
[9] A. M. Hoyle, P. Goel, and P. Resnik, “Improving neural topic models using knowledge distillation,” in Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 1752–1771, doi: 10.18653/v1/2020.emnlp-main.137.
[10] D. A. J. Daniel and M. J. Meena, “A Novel Sentiment Analysis for Amazon Data with TSA based Feature Selection,” Scalable
Computing: Practice and Experience, vol. 22, no. 1, 2021, doi: 10.12694/scpe.v22i1.1839.
[11] X. Li, X. Sun, Z. Xu and Y. Zhou, “Explainable Sentence-Level Sentiment Analysis for Amazon Product Reviews,” 2021 5th International
Conference on Imaging, Signal Processing and Communications (ICISPC), 2021, pp. 88-94, doi: 10.1109/ICISPC53419.2021.00024.
[12] N. Shrestha and F. Nasoz, “Deep learning sentiment analysis of amazon. com reviews and ratings,” International Journal on Soft
Computing, Artificial Intelligence and Applications (IJSCAI), vol. 8, no. 1, 2019, doi: 10.48550/arXiv.1904.04096.
[13] E. I. Elmurngi and A. Gherbi, “Unfair reviews detection on amazon reviews using sentiment analysis with supervised learning
techniques,” Journal of Computer Science, vol. 14, no. 5, pp. 714–726, 2018, doi: 10.3844/jcssp.2018.714.726.
17. ISSN: 1693-6930
TELKOMNIKA Telecommun Comput El Control, Vol. 21, No. 5, October 2023: 1084-1101
1100
[14] M. V. Rao and Sindhu C., “Detection of Sarcasm on Amazon Product Reviews using Machine Learning Algorithms under Sentiment
Analysis,” 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET),
2021, pp. 196-199, doi: 10.1109/WiSPNET51692.2021.9419432.
[15] S. Wassan, X. Chen, T. Shen, M. Waqar, and N. Jhanjhi, “Amazon product sentiment analysis using machine learning techniques,”
Revista Argentina de Clínica Psicológica, vol. 30, no. 1, pp. 695-703, 2021. [Online]. Available:
https://www.researchgate.net/publication/349772322_Amazon_Product_Sentiment_Analysis_using_Machine_Learning_Techniques
[16] M. Hawlader, A. Ghosh, Z. K. Raad, W. A. Chowdhury, M. S. H. Shehan, and F. B. Ashraf, “Amazon Product Reviews: Sentiment
Analysis Using Supervised Learning Algorithms,” 2021 International Conference on Electronics, Communications and Information
Technology (ICECIT), 2021, pp. 1-6, doi: 10.1109/ICECIT54077.2021.9641243.
[17] N. M. Alharbi, N. S. Alghamdi, E. H. Alkhammash, and J. F. Al Amri, “Evaluation of sentiment analysis via word embedding and
RNN variants for Amazon online reviews,” Mathematical Problems in Engineering, vol. 2021, 2021, doi: 10.1155/2021/5536560.
[18] A. Dadhich and B. Thankachan, “Sentiment Analysis of Amazon Product Reviews Using Hybrid Rule-based Approach,” International
Journal of Engineering and Manufacturing (IJEM), vol. 11, no. 2, 2021, doi: 10.5815/ijem.2021.02.04.
[19] U. Norinder and P. Norinder, “Predicting Amazon customer reviews with deep confidence using deep learning and conformal prediction,”
Journal of Management Analytics, vol. 9, no. 1, pp. 1–16, 2022, doi: 10.1080/23270012.2022.2031324.
[20] P. Bhuvaneshwari, A. N. Rao, Y. H. Robinson, and M. Thippeswamy, “Sentiment analysis for user reviews using Bi-LSTM self-attention
based CNN model,” Multimedia Tools and Applications, vol. 81, pp. 12405–12419, 2022, doi: 10.1007/s11042-022-12410-4.
[21] N. Nandal, R. Tanwar, and J. Pruthi, “Machine learning based aspect level sentiment analysis for Amazon products,” Spatial
Information Research, vol. 28, pp. 601–607, 2020, doi: 10.1007/s41324-020-00320-2.
[22] S. Dey, S. Wasif, D. S. Tonmoy, S. Sultana, J. Sarkar, and M. Dey, “A Comparative Study of Support Vector Machine and Naive
Bayes Classifier for Sentiment Analysis on Amazon Product Reviews,” 2020 International Conference on Contemporary
Computing and Applications (IC3A), 2020, pp. 217-220, doi: 10.1109/IC3A48958.2020.233300.
[23] H. Zhao, Z. Liu, X. Yao, and Q. Yang, “A machine learning-based sentiment analysis of online product reviews with a novel term weighting
and feature selection approach,” Information Processing & Management, vol. 58, no. 5, 2021, doi: 10.1016/j.ipm.2021.102656.
[24] P. Mukherjee, Y. Badr, S. Doppalapudi, S. M. Srinivasan, R. S. Sangwan, and R. Sharma, “Effect of negation in sentences on sentiment
analysis and polarity detection,” Procedia Computer Science, vol. 185, pp. 370–379, 2021, doi: 10.1016/j.procs.2021.05.038.
[25] M. Sivakumar and S. R. Uyyala, “Aspect-based sentiment analysis of mobile phone reviews using LSTM and fuzzy logic,”
International Journal of Data Science and Analytics, vol. 12, pp. 355–367, 2021, doi: 10.1007/s41060-021-00277-x.
[26] A. Huang, “A risk detection system of e-commerce: researches based on soft information extracted by affective computing web
texts,” Electronic Commerce Research, vol. 18, pp.143-157, 2018, doi: 10.1007/s10660-017-9262-y.
[27] S. Vanaja and M. Belwal, “Aspect-Level Sentiment Analysis on E-Commerce Data,” 2018 International Conference on Inventive
Research in Computing Applications (ICIRCA), 2018, pp. 1275-1279, doi: 10.1109/ICIRCA.2018.8597286.
[28] M. E. Alzahrani, T. H. H. Aldhyani, S. N. Alsubari, M. M. Althobaiti, and A. Fahad, “Developing an Intelligent System with Deep
Learning Algorithms for Sentiment Analysis of E-Commerce Product Reviews,” Computational Intelligence and Neuroscience,
vol. 2022, 2022, doi: 10.1155/2022/3840071.
[29] K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,”
in IEEE Access, vol. 10, pp. 103694-103704, 2022, doi: 10.1109/ACCESS.2022.3210182.
[30] M. A. Qureshi et al., “Sentiment Analysis of Reviews in Natural Language: Roman Urdu as a Case Study,” in IEEE Access, vol. 10,
pp. 24945-24954, 2022, doi: 10.1109/ACCESS.2022.3150172.
[31] E. Cambria, Q. Liu, S. Decherchi, F. Xing, and K. Kwok, “SenticNet 7: A Commonsense-based NeurosymbolicAI Framework for
Explainable Sentiment Analysis,” Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), 2022,
pp. 3829–3839. [Online]. Available: https://sentic.net/senticnet-7.pdf
[32] A. Zhao and Y. Yu, “Knowledge-enabled BERT for aspect-based sentiment analysis,” Knowledge-Based Systems, vol. 227, 2021,
doi: 10.1016/j.knosys.2021.107220.
[33] Google news vector, kaggle, Apr. 2022. [Online]. Available: https://www.kaggle.com/datasets/adarshsng/googlenewsvectors
[34] L. Khreisat, “Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study,” International Conference on
Data Mining, 2006, pp. 78–82. [Online]. Available: https://dblp.uni-trier.de/rec/conf/dmin/Khreisat06.html
[35] P. Wang, J. Hu, H. -J. Zeng, and Z. Chen, “Using Wikipedia knowledge to improve text classification,” Knowledge and Information
Systems, vol. 19, pp. 265–281, 2009, doi: 10.1007/s10115-008-0152-4.
[36] W. Zhang, T. Yoshida, and X. Tang, “A comparative study of TF* IDF, LSI and multi-words for text classification,” Expert Systems
with Applications, vol. 38, no. 3, pp. 2758–2765, 2011, doi: 10.1016/j.eswa.2010.08.066.
[37] Z. Y. -Tao, G. Ling, and W. Y. -Cheng, “An improved TF-IDF approach for text classification,” Journal of Zhejiang University-
Science A, vol. 6, pp. 49–55, 2005, doi: 10.1007/bf02842477.
BIOGRAPHIES OF AUTHORS
Saman Iftikhar received her M.S and Ph.D. degrees in Information Technology
in 2008 and 2014, respectively, from National University of Sciences and Technology
(NUST), Islamabad, Pakistan. Currently she is serving Arab Open University, Saudi Arabia
as an Assistant Professor. Her research interests include information security, cyber security,
machine learning, data mining, distributed computing, and semantic web. On her credit,
several research papers have been published in various reputed journals and in prestigious
conferences. She can be contacted at email: s.iftikhar@arabou.edu.sa.
18. TELKOMNIKA Telecommun Comput El Control
Amazon products reviews classification based on machine learning, … (Saman Iftikhar)
1101
Bandar Alluhaybi received the B.S. degree in computer science from King
Abdulaziz University, Saudi Arabia, in 2009, the M.S. degree in engineering system
management from St. Mary’s University, San Antonio, TX, USA, in 2012, and the Ph.D.
degree in computer science from King Abdulaziz University. He is a Lecturer at FCS, Arab
Open University, KSA. His current research interests include information security, computer
networks, networks security, big data, and high-performance computing. He can be contacted
at email: b.alluhaybi@arabou.edu.sa.
Mohammed Suliman received the BCA and MCA degree in computer
applications from Bangalore University, India, in 2004, 2007, respectively. He is a Lecturer
at FCS, Arab Open University, KSA. His current research interests include information
security, network security, cloud computing, big data, and software engineering. He can be
contacted at email: msuliman@arabou.edu.sa.
Ammar Saeed did his Bachelor in Computer science from COMSATS University
Islamabad, Pakistan in 2019. Currently, he is pursuing Masters in Computer Science from
COMSATS University Islamabad, Pakistan. His major areas of research interest are Machine
Learning, Natural language processing and Data Analytics. He can be contacted at email:
ammarsaeed1997@gmail.com.
Kiran Fatima Completed PHD (CS) in 2018 from National University of
Computer and Emerging Sciences Islamabad Pakistan. Majors are Artificial intelligence,
machine learning and image processing. Currently working in TAFE NSW Australia as
Network and Web Programming trainer. She can be contacted at email:
kiran.fatima4@tafensw.edu.au.