Insurance fraud claims have become a major problem in the insurance industry. Several investigations have been carried out to eliminate negative impacts on the insurance industry as this immoral act has caused the loss of billions of dollars. In this paper, a comparative study was carried out to assess the performance of various classification models, namely logistic regression, neural network (NN), support vector machine (SVM), tree augmented naïve Bayes (NB), decision tree (DT), random forest (RF) and AdaBoost with different model settings for predicting automobile insurance fraud claims. Results reveal that the tree augmented NB outperformed other models based on several performance metrics with accuracy (79.35%), sensitivity (44.70%), misclassification rate (20.65%), area under curve (0.81) and Gini (0.62). In addition, the result shows that the AdaBoost algorithm can improve the classification performance of the decision tree. These findings are useful for insurance professionals to identify potential insurance fraud claim cases.
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTIONmlaij
Fraud is a critical issue in our society today. Losses due to payment fraud are on the increase as ecommerce keeps evolving. Organizations, governments, and individuals have experienced huge losses due
to payment. Merchant Savvy projects that global losses due to payment fraud will increase to about $40.62
billion in 2027 . Among all payment fraud, credit card fraud results in a higher loss. Therefore, we intend
to leverage the potential of machine learning to deal with the problem of fraud in credit cards which can
be generalized to other fraud types. This paper compares the performance of logistic regression, decision
trees, random forest classifier, isolation forest, local outlier factor, and one-class support vector machines
(SVM) based on their AUC and F1-score. We applied a smote technique to handle the imbalanced nature
of the data and compared the performance of the supervised models on the oversampled data to the raw
data. From the results, the Random Forest classifier outperformed the other models with a higher AUC
score and better f1-score on both the actual and oversampled data. Oversampling the data didn't change
the result of the decision trees. One-class SVM performs better than isolation forest in terms of AUC score
but has a very low f1-score compared to isolation forest. The local outlier factor had the poorest
performance.
PREDICTING ACCIDENT SEVERITY: AN ANALYSIS OF FACTORS AFFECTING ACCIDENT SEVER...IJCI JOURNAL
Road accidents have significant economic and societal costs, with a small number of severe accidents
accounting for a large portion of these costs. Predicting accident severity can help in the proactive
approach to road safety by identifying potential unsafe road conditions and taking well-informed
actions to reduce the number of severe accidents. This study investigates the effectiveness of the
Random Forest machine learning algorithm for predicting the severity of an accident. The model is
trained on a dataset of accident records from a large metropolitan area and evaluated using various
metrics. Hyperparameters and feature selection are optimized to improve the model's performance.
The results show that the Random Forest model is an effective tool for predicting accident severity with
an accuracy of over 80%. The study also identifies the top six most important variables in the model,
which include wind speed, pressure, humidity, visibility, clear conditions, and cloud cover. The fitted
model has an Area Under the Curve of 80%, a recall of 79.2%, a precision of 97.1%, and an F1 score
of 87.3%. These results suggest that the proposed model has higher performance in explaining the
target variable, which is the accident severity class. Overall, the study provides evidence that the
Random Forest model is a viable and reliable tool for predicting accident severity and can be used to
help reduce the number of fatalities and injuries due to road accidents in the United States.
Progress of Machine Learning in the Field of Intrusion Detection Systemsijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into
computer systems are increasing. Implementing intrusion detection systems have become extremely
important to help maintain good network security. Support vector machines (SVMs), a classic pattern
recognition tool, have been widely used in intrusion detection. They can handle very large data with high
efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model
enriched with a Gaussian kernel function based on the features of the training data for intrusion detection.
The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection
efficiency and false alarm rate, which can give better coverage and make detection more efficient.
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...ijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into computer systems are increasing. Implementing intrusion detection systems have become extremely important to help maintain good network security. Support vector machines (SVMs), a classic pattern recognition tool, have been widely used in intrusion detection. They can handle very large data with high efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model enriched with a Gaussian kernel function based on the features of the training data for intrusion detection. The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection efficiency and false alarm rate, which can give better coverage and make detection more efficient.
Predicting reaction based on customer's transaction using machine learning a...IJECEIAES
Banking advertisements are important because they help target specific customers on subscribing to their packages or other deals by giving their current customers more fixed-term deposit offers. This is done through promotional advertisements on the Internet or media pages, and this task is the responsibility of the shopping department. In order to build a relationship with them, offer them the best deals, and be appropriate for the client with the company's assurance to recover these deposits, many banks or telecommunications firms store the data of their customers. The Portuguese bank increases its sales by establishing a relationship with its customers. This study proposes creating a prediction model using machine learning algorithms, to see how the customer reacts to subscribe to those fixed-term deposits or offers made with the aid of their past record. This classification is binary, i.e., the prediction of whether or not a customer will embrace these offers. Four classifiers that include k-nearest neighbor (k-NN) algorithm, decision tree, naive Bayes, and support vector machines (SVM) were used, and the best result was obtained from the classifier decision tree with an accuracy of 91% and the other classifier SVM with an accuracy of 89%.
Concept drift and machine learning model for detecting fraudulent transaction...IJECEIAES
In a streaming environment, data is continuously generated and processed in an ongoing manner, and it is necessary to detect fraudulent transactions quickly to prevent significant financial losses. Hence, this paper proposes a machine learning-based approach for detecting fraudulent transactions in a streaming environment, with a focus on addressing concept drift. The approach utilizes the extreme gradient boosting (XGBoost) algorithm. Additionally, the approach employs four algorithms for detecting continuous stream drift. To evaluate the effectiveness of the approach, two datasets are used: a credit card dataset and a Twitter dataset containing financial fraudrelated social media data. The approach is evaluated using cross-validation and the results demonstrate that it outperforms traditional machine learning models in terms of accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system in various industries, including finance, insurance, and e-commerce.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTIONmlaij
Fraud is a critical issue in our society today. Losses due to payment fraud are on the increase as ecommerce keeps evolving. Organizations, governments, and individuals have experienced huge losses due
to payment. Merchant Savvy projects that global losses due to payment fraud will increase to about $40.62
billion in 2027 . Among all payment fraud, credit card fraud results in a higher loss. Therefore, we intend
to leverage the potential of machine learning to deal with the problem of fraud in credit cards which can
be generalized to other fraud types. This paper compares the performance of logistic regression, decision
trees, random forest classifier, isolation forest, local outlier factor, and one-class support vector machines
(SVM) based on their AUC and F1-score. We applied a smote technique to handle the imbalanced nature
of the data and compared the performance of the supervised models on the oversampled data to the raw
data. From the results, the Random Forest classifier outperformed the other models with a higher AUC
score and better f1-score on both the actual and oversampled data. Oversampling the data didn't change
the result of the decision trees. One-class SVM performs better than isolation forest in terms of AUC score
but has a very low f1-score compared to isolation forest. The local outlier factor had the poorest
performance.
PREDICTING ACCIDENT SEVERITY: AN ANALYSIS OF FACTORS AFFECTING ACCIDENT SEVER...IJCI JOURNAL
Road accidents have significant economic and societal costs, with a small number of severe accidents
accounting for a large portion of these costs. Predicting accident severity can help in the proactive
approach to road safety by identifying potential unsafe road conditions and taking well-informed
actions to reduce the number of severe accidents. This study investigates the effectiveness of the
Random Forest machine learning algorithm for predicting the severity of an accident. The model is
trained on a dataset of accident records from a large metropolitan area and evaluated using various
metrics. Hyperparameters and feature selection are optimized to improve the model's performance.
The results show that the Random Forest model is an effective tool for predicting accident severity with
an accuracy of over 80%. The study also identifies the top six most important variables in the model,
which include wind speed, pressure, humidity, visibility, clear conditions, and cloud cover. The fitted
model has an Area Under the Curve of 80%, a recall of 79.2%, a precision of 97.1%, and an F1 score
of 87.3%. These results suggest that the proposed model has higher performance in explaining the
target variable, which is the accident severity class. Overall, the study provides evidence that the
Random Forest model is a viable and reliable tool for predicting accident severity and can be used to
help reduce the number of fatalities and injuries due to road accidents in the United States.
Progress of Machine Learning in the Field of Intrusion Detection Systemsijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into
computer systems are increasing. Implementing intrusion detection systems have become extremely
important to help maintain good network security. Support vector machines (SVMs), a classic pattern
recognition tool, have been widely used in intrusion detection. They can handle very large data with high
efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model
enriched with a Gaussian kernel function based on the features of the training data for intrusion detection.
The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection
efficiency and false alarm rate, which can give better coverage and make detection more efficient.
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...ijcisjournal
With the growth in the use of the Internet and local area networks, malicious attacks and intrusions into computer systems are increasing. Implementing intrusion detection systems have become extremely important to help maintain good network security. Support vector machines (SVMs), a classic pattern recognition tool, have been widely used in intrusion detection. They can handle very large data with high efficiency, are easy to use, and exhibit good prediction behavior. This paper presents a new SVM model enriched with a Gaussian kernel function based on the features of the training data for intrusion detection. The new model is tested with the CICIDS2017 dataset. The test proves better results in terms of detection efficiency and false alarm rate, which can give better coverage and make detection more efficient.
Predicting reaction based on customer's transaction using machine learning a...IJECEIAES
Banking advertisements are important because they help target specific customers on subscribing to their packages or other deals by giving their current customers more fixed-term deposit offers. This is done through promotional advertisements on the Internet or media pages, and this task is the responsibility of the shopping department. In order to build a relationship with them, offer them the best deals, and be appropriate for the client with the company's assurance to recover these deposits, many banks or telecommunications firms store the data of their customers. The Portuguese bank increases its sales by establishing a relationship with its customers. This study proposes creating a prediction model using machine learning algorithms, to see how the customer reacts to subscribe to those fixed-term deposits or offers made with the aid of their past record. This classification is binary, i.e., the prediction of whether or not a customer will embrace these offers. Four classifiers that include k-nearest neighbor (k-NN) algorithm, decision tree, naive Bayes, and support vector machines (SVM) were used, and the best result was obtained from the classifier decision tree with an accuracy of 91% and the other classifier SVM with an accuracy of 89%.
Concept drift and machine learning model for detecting fraudulent transaction...IJECEIAES
In a streaming environment, data is continuously generated and processed in an ongoing manner, and it is necessary to detect fraudulent transactions quickly to prevent significant financial losses. Hence, this paper proposes a machine learning-based approach for detecting fraudulent transactions in a streaming environment, with a focus on addressing concept drift. The approach utilizes the extreme gradient boosting (XGBoost) algorithm. Additionally, the approach employs four algorithms for detecting continuous stream drift. To evaluate the effectiveness of the approach, two datasets are used: a credit card dataset and a Twitter dataset containing financial fraudrelated social media data. The approach is evaluated using cross-validation and the results demonstrate that it outperforms traditional machine learning models in terms of accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system in various industries, including finance, insurance, and e-commerce.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGgerogepatton
The impressive predictive performance of deep learning techniques on a wide range of tasks has led to its
widespread use. Estimating the confidence of these predictions is paramount for improving the safety and
reliability of such systems. However, the uncertainty estimates provided by neural networks (NNs) tend to
be overconfident and unreasonable. Previous studies have found out that ensemble of NNs typically
produce good predictions and uncertainty estimates. Inspired by these, this paper presents a new
framework that can quantitatively estimate the uncertainties by leveraging the advances in multi-task
learning through slight modification to the existing training pipelines. This promising algorithm is
developed with an intention of deployment in real world problems which already boast a good predictive
performance by reusing those pretrained models. The idea is to capture the behavior of the trained NNs for
the base task by augmenting it with the uncertainty estimates from a supplementary network. A series of
experiments show that the proposed approach produces well calibrated uncertainty estimates with high
quality predictions.
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGijaia
The impressive predictive performance of deep learning techniques on a wide range of tasks has led to its
widespread use. Estimating the confidence of these predictions is paramount for improving the safety and
reliability of such systems. However, the uncertainty estimates provided by neural networks (NNs) tend to
be overconfident and unreasonable. Previous studies have found out that ensemble of NNs typically
produce good predictions and uncertainty estimates. Inspired by these, this paper presents a new
framework that can quantitatively estimate the uncertainties by leveraging the advances in multi-task
learning through slight modification to the existing training pipelines. This promising algorithm is
developed with an intention of deployment in real world problems which already boast a good predictive
performance by reusing those pretrained models. The idea is to capture the behavior of the trained NNs for
the base task by augmenting it with the uncertainty estimates from a supplementary network. A series of
experiments show that the proposed approach produces well calibrated uncertainty estimates with high
quality predictions.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
With increasing number of mobile operators, user is entitled with unlimited freedom to switch from one mobile operator to another if he is not satisfied with service or pricing. This trend is not good for operators as they lose their revenue because of customer switch. To solve it, operators are looking for machine learning tools which can predict well in advance which customer may churn, so that they can predict any alternative plans to satisfy and retain them. In this paper, we design a hybrid machine learning classifier to predict if the customer will churn based on the CDR parameters and we also propose a rule engine to suggest best plans.
An efficient convolutional neural network-based classifier for an imbalanced ...IAESIJAI
Imbalanced datasets pose a major challenge for the researchers while addressing machine learning tasks. In these types of datasets, samples of different classes are not in equal proportion rather the gap between the numbers of individual class samples is significantly large. Classification models perform better for datasets having equal proportion of data tuples in both the classes. But, in reality, the medical image datasets are skewed and hence are not always suitable for a model to achieve improved classification performance. Therefore, various techniques have been suggested in the literature to overcome this challenge. This paper applies oversampling technique on an imbalanced dataset and focuses on a customized convolutional neural network model that classifies the images into two categories: diseased and non-diseased. Outcome of the proposed model can assist the health experts in the detection of oral cancer. The proposed model exhibits 99% accuracy after data augmentation. Performance metrics such as precision, recall and F1-score values are very close to 1. In addition, statistical test is performed to validate the statistical significance of the model. It has been found that the proposed model is an optimised classifier in terms of number of network layers and number of neurons.
An intelligent auto-response short message service categorization model using...IJECEIAES
Short message service (SMS) is one of the quickest and easiest ways used for communication, used by businesses, government organizations, and banks to send short messages to large groups of people. Categorization of SMS under different message types in their inboxes will provide a concise view for receivers. Former studies on the said problem are at the binary level as ham or spam which triggered the masking of specific messages that were useful to the end user but were treated as spam. Further, it is extended with multi labels such as ham, spam, and others which is not sufficient to meet all the necessities of end users. Hence, a multi-class SMS categorization is needed based on the semantics (information) embedded in it. This paper introduces an intelligent auto-response model using a semantic index for categorizing SMS messages into 5 categories: ham, spam, info, transactions, and one time password’s, using the multi-layer perceptron (MLP) algorithm. In this approach, each SMS is classified into one of the predefined categories. This experiment was conducted on the “multi-class SMS dataset” with 7,398 messages, which are differentiated into 5 classes. The accuracy obtained from the experiment was 97%.
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...ijscmcj
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality.
A rule-based machine learning model for financial fraud detectionIJECEIAES
Financial fraud is a growing problem that poses a significant threat to the banking industry, the government sector, and the public. In response, financial institutions must continuously improve their fraud detection systems. Although preventative and security precautions are implemented to reduce financial fraud, criminals are constantly adapting and devising new ways to evade fraud prevention systems. The classification of transactions as legitimate or fraudulent poses a significant challenge for existing classification models due to highly imbalanced datasets. This research aims to develop rules to detect fraud transactions that do not involve any resampling technique. The effectiveness of the rule-based model (RBM) is assessed using a variety of metrics such as accuracy, specificity, precision, recall, confusion matrix, Matthew’s correlation coefficient (MCC), and receiver operating characteristic (ROC) values. The proposed rule-based model is compared to several existing machine learning models such as random forest (RF), decision tree (DT), multi-layer perceptron (MLP), k-nearest neighbor (KNN), naive Bayes (NB), and logistic regression (LR) using two benchmark datasets. The results of the experiment show that the proposed rule-based model beat the other methods, reaching accuracy and precision of 0.99 and 0.99, respectively.
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTIONIJCI JOURNAL
This study proposes a scalable asset selection and allocation approach using machine learning that
integrates clustering methods into portfolio optimization models. The methodology applies the Uniform
Manifold Approximation and Projection method and ensemble clustering techniques to preselect assets
from the Ibovespa and S&P 500 indices. The research compares three allocation models and finds that the
Hierarchical Risk Parity model outperformed the others, with a Sharpe ratio of 1.11. Despite the
pandemic's impact on the portfolios, with drawdowns close to 30%, they recovered in 111 to 149 trading
days. The portfolios outperformed the indices in cumulative returns, with similar annual volatilities of
20%. Preprocessing with UMAP allowed for finding clusters with higher discriminatory power, evaluated
through internal cluster validation metrics, helping to reduce the problem's size during optimal portfolio
allocation. Overall, this study highlights the potential of machine learning in portfolio optimization,
providing a useful framework for investment practitioners.
An efficient enhanced k-means clustering algorithm for best offer prediction...IJECEIAES
Telecom companies usually offer several rate plans or bundles to satisfy the customers’ different needs. Finding and recommending the best offer that perfectly matches the customer’s needs is crucial in maintaining customer loyalty and the company’s revenue in the long run. This paper presents an effective method of detecting a group of customers who have the potential to upgrade their telecom package. The used data is an actual dataset extracted from call detail records (CDRs) of a telecom operator. The method utilizes an enhanced k-means clustering model based on customer profiling. The results show that the proposed k-means-based clustering algorithm more effectively identifies potential customers willing to upgrade to a higher tier package compared to the traditional k-means algorithm. Our results showed that our proposed clustering model accuracy was over 90%, while the traditional k-means accuracy was under 70%.
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
More Related Content
Similar to Predicting automobile insurance fraud using classical and machine learning models
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGgerogepatton
The impressive predictive performance of deep learning techniques on a wide range of tasks has led to its
widespread use. Estimating the confidence of these predictions is paramount for improving the safety and
reliability of such systems. However, the uncertainty estimates provided by neural networks (NNs) tend to
be overconfident and unreasonable. Previous studies have found out that ensemble of NNs typically
produce good predictions and uncertainty estimates. Inspired by these, this paper presents a new
framework that can quantitatively estimate the uncertainties by leveraging the advances in multi-task
learning through slight modification to the existing training pipelines. This promising algorithm is
developed with an intention of deployment in real world problems which already boast a good predictive
performance by reusing those pretrained models. The idea is to capture the behavior of the trained NNs for
the base task by augmenting it with the uncertainty estimates from a supplementary network. A series of
experiments show that the proposed approach produces well calibrated uncertainty estimates with high
quality predictions.
UNCERTAINTY ESTIMATION IN NEURAL NETWORKS THROUGH MULTI-TASK LEARNINGijaia
The impressive predictive performance of deep learning techniques on a wide range of tasks has led to its
widespread use. Estimating the confidence of these predictions is paramount for improving the safety and
reliability of such systems. However, the uncertainty estimates provided by neural networks (NNs) tend to
be overconfident and unreasonable. Previous studies have found out that ensemble of NNs typically
produce good predictions and uncertainty estimates. Inspired by these, this paper presents a new
framework that can quantitatively estimate the uncertainties by leveraging the advances in multi-task
learning through slight modification to the existing training pipelines. This promising algorithm is
developed with an intention of deployment in real world problems which already boast a good predictive
performance by reusing those pretrained models. The idea is to capture the behavior of the trained NNs for
the base task by augmenting it with the uncertainty estimates from a supplementary network. A series of
experiments show that the proposed approach produces well calibrated uncertainty estimates with high
quality predictions.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
With increasing number of mobile operators, user is entitled with unlimited freedom to switch from one mobile operator to another if he is not satisfied with service or pricing. This trend is not good for operators as they lose their revenue because of customer switch. To solve it, operators are looking for machine learning tools which can predict well in advance which customer may churn, so that they can predict any alternative plans to satisfy and retain them. In this paper, we design a hybrid machine learning classifier to predict if the customer will churn based on the CDR parameters and we also propose a rule engine to suggest best plans.
An efficient convolutional neural network-based classifier for an imbalanced ...IAESIJAI
Imbalanced datasets pose a major challenge for the researchers while addressing machine learning tasks. In these types of datasets, samples of different classes are not in equal proportion rather the gap between the numbers of individual class samples is significantly large. Classification models perform better for datasets having equal proportion of data tuples in both the classes. But, in reality, the medical image datasets are skewed and hence are not always suitable for a model to achieve improved classification performance. Therefore, various techniques have been suggested in the literature to overcome this challenge. This paper applies oversampling technique on an imbalanced dataset and focuses on a customized convolutional neural network model that classifies the images into two categories: diseased and non-diseased. Outcome of the proposed model can assist the health experts in the detection of oral cancer. The proposed model exhibits 99% accuracy after data augmentation. Performance metrics such as precision, recall and F1-score values are very close to 1. In addition, statistical test is performed to validate the statistical significance of the model. It has been found that the proposed model is an optimised classifier in terms of number of network layers and number of neurons.
An intelligent auto-response short message service categorization model using...IJECEIAES
Short message service (SMS) is one of the quickest and easiest ways used for communication, used by businesses, government organizations, and banks to send short messages to large groups of people. Categorization of SMS under different message types in their inboxes will provide a concise view for receivers. Former studies on the said problem are at the binary level as ham or spam which triggered the masking of specific messages that were useful to the end user but were treated as spam. Further, it is extended with multi labels such as ham, spam, and others which is not sufficient to meet all the necessities of end users. Hence, a multi-class SMS categorization is needed based on the semantics (information) embedded in it. This paper introduces an intelligent auto-response model using a semantic index for categorizing SMS messages into 5 categories: ham, spam, info, transactions, and one time password’s, using the multi-layer perceptron (MLP) algorithm. In this approach, each SMS is classified into one of the predefined categories. This experiment was conducted on the “multi-class SMS dataset” with 7,398 messages, which are differentiated into 5 classes. The accuracy obtained from the experiment was 97%.
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...ijscmcj
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality.
A rule-based machine learning model for financial fraud detectionIJECEIAES
Financial fraud is a growing problem that poses a significant threat to the banking industry, the government sector, and the public. In response, financial institutions must continuously improve their fraud detection systems. Although preventative and security precautions are implemented to reduce financial fraud, criminals are constantly adapting and devising new ways to evade fraud prevention systems. The classification of transactions as legitimate or fraudulent poses a significant challenge for existing classification models due to highly imbalanced datasets. This research aims to develop rules to detect fraud transactions that do not involve any resampling technique. The effectiveness of the rule-based model (RBM) is assessed using a variety of metrics such as accuracy, specificity, precision, recall, confusion matrix, Matthew’s correlation coefficient (MCC), and receiver operating characteristic (ROC) values. The proposed rule-based model is compared to several existing machine learning models such as random forest (RF), decision tree (DT), multi-layer perceptron (MLP), k-nearest neighbor (KNN), naive Bayes (NB), and logistic regression (LR) using two benchmark datasets. The results of the experiment show that the proposed rule-based model beat the other methods, reaching accuracy and precision of 0.99 and 0.99, respectively.
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTIONIJCI JOURNAL
This study proposes a scalable asset selection and allocation approach using machine learning that
integrates clustering methods into portfolio optimization models. The methodology applies the Uniform
Manifold Approximation and Projection method and ensemble clustering techniques to preselect assets
from the Ibovespa and S&P 500 indices. The research compares three allocation models and finds that the
Hierarchical Risk Parity model outperformed the others, with a Sharpe ratio of 1.11. Despite the
pandemic's impact on the portfolios, with drawdowns close to 30%, they recovered in 111 to 149 trading
days. The portfolios outperformed the indices in cumulative returns, with similar annual volatilities of
20%. Preprocessing with UMAP allowed for finding clusters with higher discriminatory power, evaluated
through internal cluster validation metrics, helping to reduce the problem's size during optimal portfolio
allocation. Overall, this study highlights the potential of machine learning in portfolio optimization,
providing a useful framework for investment practitioners.
An efficient enhanced k-means clustering algorithm for best offer prediction...IJECEIAES
Telecom companies usually offer several rate plans or bundles to satisfy the customers’ different needs. Finding and recommending the best offer that perfectly matches the customer’s needs is crucial in maintaining customer loyalty and the company’s revenue in the long run. This paper presents an effective method of detecting a group of customers who have the potential to upgrade their telecom package. The used data is an actual dataset extracted from call detail records (CDRs) of a telecom operator. The method utilizes an enhanced k-means clustering model based on customer profiling. The results show that the proposed k-means-based clustering algorithm more effectively identifies potential customers willing to upgrade to a higher tier package compared to the traditional k-means algorithm. Our results showed that our proposed clustering model accuracy was over 90%, while the traditional k-means accuracy was under 70%.
Similar to Predicting automobile insurance fraud using classical and machine learning models (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Predicting automobile insurance fraud using classical and machine learning models
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 14, No. 1, February 2024, pp. 911~921
ISSN: 2088-8708, DOI: 10.11591/ijece.v14i1.pp911-921 911
Journal homepage: http://ijece.iaescore.com
Predicting automobile insurance fraud using classical and
machine learning models
Shareh-Zulhelmi Shareh Nordin1
, Yap Bee Wah1,2
, Ng Kok Haur3
, Asmawi Hashim4
,
Norimah Rambeli4
, Norasibah Abdul Jalil4
1
School of Computing Sciences, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, Selangor, Malaysia
2
UNITAR Graduate School, UNITAR International University, Selangor, Malaysia
3
Institute of Mathematical Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia
4
Department of Economics, Universiti Pendidikan Sultan Idris, Perak, Malaysia
Article Info ABSTRACT
Article history:
Received Mar 26, 2023
Revised Jul 3, 2023
Accepted Jul 17, 2023
Insurance fraud claims have become a major problem in the insurance
industry. Several investigations have been carried out to eliminate negative
impacts on the insurance industry as this immoral act has caused the loss of
billions of dollars. In this paper, a comparative study was carried out to
assess the performance of various classification models, namely logistic
regression, neural network (NN), support vector machine (SVM), tree
augmented naïve Bayes (NB), decision tree (DT), random forest (RF) and
AdaBoost with different model settings for predicting automobile insurance
fraud claims. Results reveal that the tree augmented NB outperformed other
models based on several performance metrics with accuracy (79.35%),
sensitivity (44.70%), misclassification rate (20.65%), area under curve
(0.81) and Gini (0.62). In addition, the result shows that the AdaBoost
algorithm can improve the classification performance of the decision tree.
These findings are useful for insurance professionals to identify potential
insurance fraud claim cases.
Keywords:
AdaBoost
Data science
Fraud detection
Insurance fraud
Machine learning
Tree augmented naïve Bayes
This is an open access article under the CC BY-SA license.
Corresponding Author:
Yap Bee Wah
UNITAR Graduate School, UNITAR International University
Jalan SS6/3, 47301 Petaling Jaya, Selangor, Malaysia
Email: bee.wah@unitar.my
1. INTRODUCTION
The insurance industry has become a fundamental pillar of our modern world for many years. This
industry is relevant to society as it can offer financial security in the event of an accident or illness. However,
some irresponsible individuals have taken advantage of making false insurance claims to obtain compensation
or benefits which is called insurance fraud. Fraudulent claims are a serious offense as they not only threaten
the insurer’s or policyholder’s profitability but also harmfully affect the insurance industry and the existing
social and economic systems.
For the past few decades, the insurance fraud claims problem has been unresolved from the beginning
of the insurance industry due to a lack of resources, research, documentation, and technology. However, in
the late nineteenth century, the growing systematic data collection allowed the use of pattern recognition
techniques such as regression, neural network (NN), and fuzzy clustering [1]. In the late twentieth century,
data played a central role in the insurance industry, and today, most insurance companies have obtained more
access than before. When the size of databases grows, the traditional approach may overlook a significant
portion of fraud as it is difficult to manually review large databases. With emerging technology, this issue
can be solved using machine learning models to identify and predict fraud claims [2]–[4].
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 911-921
912
Several machine learning models have been proposed to accurately predict fraudulent automobile
insurance claims. A comparative study by Viaene et al. [5] considered various machine learning models
which include logistic regression, decision tree (C4.5), naïve Bayes (NB), k-nearest neighbor (KNN),
Bayesian neural network (BNN), support vector machine (SVM) and tree augmented naïve Bayes (TAN) for
predicting fraudulent claims in automobile insurance in the state of Massachusetts in 1993. The performances
were measured with the mean percentage correctly classified and the area under the receiver operating
characteristic (ROC). Experimental results showed that the logistic regression and SVM with linear kernel
performed excellent predictive capabilities while the naive Bayes also performed well, and decision tree
model results are rather disappointing. However, there is no discussion on the distribution of fraudulent data
and whether it is balanced in nature. The random forest model was used by Li et al. [6] to predict automobile
insurance fraud claims using an empirical example. The empirical results show that the random forest model
has better accuracy and robustness. In addition, this model is suitable for large data sets and unbalanced data.
In [7], several machine learning models (random forest, decision tree (J48), and naïve Bayes) were used to
predict fraudulent automobile insurance claims. The random forest model was found to outperform the other
models in terms of accuracy, precision, and recall. Similarly, Pranavi [8] compared the accuracy of detecting
automobile insurance fraud claims using the random forest and KNN models and found that the random
forest model outperforms the KNN model. They also noted the bad performance of the KNN model is due to
unevenly distributed data. The findings by Prasasti et al. [9] also indicated that random forest outperformed
other classifiers namely multilayer perceptron (MLP) and decision tree C4.5, with 98.5% accuracy. Recently,
Aslam et al. [4] compared the predictive performance of three models (logistic regression, SVM, and naïve
Bayes). Their results showed SVM has high accuracy but very low sensitivity. This is due to the problem of
imbalanced data. Overall, the logistic regression model performed better than SVM and naïve Bayes, and
logistic regression achieved the highest F-measure score.
Recently, some hybrid models have been developed as an extension of the existing machine learning
models to improve prediction accuracy. A Bayesian learning neural network was used for automobile insurance
fraud detection [10] while Tao et al. [11] proposed a dual membership fuzzy SVM (DFSVM) with radial basis
function as kernel function for detecting automobile insurance fraud claims to overcome the overfitting problems
of SVM models. However, the choice of kernel function used was not discussed. The result based on the cases in
Beijing showed that the DFSVM outperforms the regular SVM models evidenced by their F-score, recall, and
precision. Sundarkumar and Ravi [12] introduced a novel hybrid undersampling method based on the one-class
SVM (OCSVM) and k-reverse nearest neighbor (kRNN) to improve the performance of machine learning
models for detecting automobile insurance fraud claims. Their results showed that all models under the hybrid
approach improved the overall performance when compared with the same models trained under the original
unbalanced distributed data. Decision tree and SVM models performed better than logistic regression and MLP
models. In addition, the decision tree model is computationally faster and the “if-then” decision rules are easy to
understand. Wang and Xu [2] proposed a latent Dirichlet allocation (LDA)-based text analytics and deep learning
model for automobile insurance fraud claims detection. Experimental results show that the combined deep
learning NNs and LDA framework outperform those widely used machine learning models, such as random
forest and SVM combined with LDA. Other hybrid models include a principal component analysis based random
forest with nearest neighbor method [13], an artificial bee colony-based kernel ridge regression [14], back
propagation neural network with an improved adaptive genetic algorithm [15], unsupervised deep learning
variable importance method [16] and eRFSVM which entails random forest and SVM [17].
Various new complex machine learning algorithms such as AdaBoost [18] and XGBoost [19] have
been reported to show potential in better prediction performances, especially for imbalanced and large
datasets [20]. Itri et al. [21] compared the performance of ten machine learning models including the
AdaBoost algorithm and found that the random forest model performs better among all the models compared.
Meanwhile, Rukhsar et al. [22] conducted a comparative analysis of various machine learning models
including the AdaBoost algorithm on insurance fraud prediction. The decision tree was reported to have the
highest accuracy of 79% compared to the other models. In addition, the AdaBoost algorithm shows an
accuracy of 78% which is comparable to the decision tree model. In the study on the predictive performance
of the logistic regression and XGBoost models for the frequency of motor insurance claims, the XGBoost
model achieved slightly better than logistic regression. Although the XGBoost model is better in terms of
performance, it can easily overfit the data and thus, requires numerous model-tuning procedures to prevent
overfitting [20]. Similar findings by Abdelhadi et al. [23] also showed that the XGBoost algorithm
outperformed NN, J48 and naïve Bayes. However, in [3], the random forest was found to perform better than
XGBoost in predicting automobile insurance claims. Useful and informative reviews on machine learning
models with applications to automobile insurance fraud claims can be found in [24]–[26].
Literature studies have shown that there is no conclusive evidence that any specific machine
learning model outperforms the others for predicting automobile insurance claims. The performance of
3. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting automobile insurance fraud using classical and machine … (Shareh-Zulhelmi Shareh Nordin)
913
machine learning models depends on many factors, such as data quality and model settings. The novelty of
this paper is the evaluation of machine learning classification techniques for automobile insurance fraudulent
prediction. We also emphasized the importance of the data preparation stage to ensure data quality and
statistical tests for identifying the relevant independent variables. The aim of this paper is to analyze the
performance of the classical statistical model (logistic regression) with machine learning models (NN, SVM,
decision tree, random forest, AdaBoost algorithm) and a semi-naïve Bayesian learning model (TAN). All the
models were evaluated using various performance metrics.
The remainder of the paper is organized as follows: section 2 describes the empirical data and data
preparation phase. Section 3 discusses the classical statistical model and machine learning models as well as
the evaluation criteria to assess the model performance. Discussions of the results are provided in section 4.
Section 5 concludes the paper.
2. DATA DESCRIPTION AND DATA PREPARATION PHASE
The car insurance claim data which consists of a dependent variable, fraudulent insurance claim
(denoted as claim_flag), and 23 independent variables with a total of 10,299 cases were retrieved from
Kaggle website [27]. These independent variables can be further categorized into four major groups
including drivers’ demographic factor: age of the driver (denoted as age), gender (gender), maximum
education level (education), job category (occupation), income (income), marital status (mstatus), single
parent (parent1), number of children (homekids), year on the job (yoj), home value (home_val), number of
driving children (kidsdriv), distance to work (travtime) and home/work area (urbancity), vehicle factor:
vehicle age (car_age), value of the vehicle (bluebook), type of car (car_type), vehicle use (car_use) and red
car (red_car), insurance factor: number of a claim for the past 5 years (claim_freq), the total claim for the
past 5 years (oldclaims) and time in force (tif) and license/charges factor: license revoked for the past 7 years
(revoked) and motor vehicle record points (mvr_points).
As data preparation is one of the important stages in the data mining process, data cleaning and
transformation such as removing illogical data, data reclassification, data encoding, data imputation and binning
of continuous variables were carried out via the IBM SPSS Modeler 18. After removing illogical data for age, a
total of 10,191 cases are used for the analysis. We observed that the distribution of the dependent variable
claim_flag with no fraud claim (73.59%) is much higher than with fraud claims (26.41%) which indicates that
the claim_flag data is imbalanced. Twenty-three independent variables with ten continuous variables and
thirteen categorical variables. After performing data cleaning and data transformation, we reclassify three
categorical variables (homekids, kidsdriv, and clam_freq), eight variables are encoded (gender, education,
mstatus, parent1, urbancity, car_type, car_use, and revoked), six variables with missing values are imputed
(age, occupation, income, yoj, home_val, and car_age), binning was carried out for continuous variables
(income, home_val, bluebook, and oldclaims). We first carried out chi-squared tests to determine the
significance of the relationship between the dependent variable (claim_flag) with each of the independent
variables. Results show that the value of the vehicle (bluebook) and red car (red_car) were not significant at the
5% level. Therefore, these two variables were omitted, and the remaining twenty-one independent variables
were used for developing the machine learning model. A complete “clean” dataset is available upon request.
3. RESEARCH METHOD
Many machine learning models are used in insurance fraud detection. Among them, the logistic
regression model ranked the top with the most common model, followed by NNs, Bayesian belief network,
decision trees, and naïve Bayes [24]. As the logistic regression model is easy to implement, this classification
model is still widely used in many recent applications [28].
Due to the complexity of the data, machine learning models have shown their potential in
classification problems [29]. In this paper, we evaluate the empirical classification performance of seven
predictive models, namely: logistic regression, NN, SVM, TAN, decision tree, random forest, and AdaBoost.
a. Logistic regression is a common model for the prediction of a dichotomous dependent variable based on
one or more independent variables. Different model selection settings including enter, forward, and
backward stepwise are considered.
b. NN is a useful machine learning model for categorical and continuous dependent variables. NN has three
or more layers that are interconnected. Each hidden layer consists of a few neurons. These neurons send
data to the deeper layers, which in turn will send the final output data to the output layer. Through the
backpropagation method, each time the output is labeled as an error during the supervised training phase,
the information is sent backward to update the weights until the error is minimized [30].
c. SVM [31] is a classification technique that obtains the optimum separation hyperplane between the target
(fraud or not fraud) in a multidimensional space that maximizes the separation between the two classes.
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 911-921
914
The performance of SVM greatly depends on the choice of the kernel function. Among the kernel
functions considered include sigmoid, radial basis function (RBF), linear and polynomial [31], [32].
d. Naïve Bayes model is based on the Bayes theorem with an assumption of independence among
independent variables, X. It involves the calculation of the posterior probability (Y|X), from 𝑃(Y), 𝑃(X|Y)
and 𝑃(X) where Y is the dependent variable. In this model, continuous features are converted to
categorical types [32]. Meanwhile, TAN is a semi-naive Bayesian learning model. It relaxes the naïve
Bayes attribute independence assumption by employing a tree structure, in which each attribute only
depends on the class and one other attribute [33].
e. Decision tree is a non-parametric supervised machine learning algorithm which predicts the target
variable by building a tree based on some splitting criteria. If the dependent variable is of continuous
type, the model is called a regression tree, while a classification tree involves a categorical dependent
variable. The tree structure begins with the parent (or root) node and splits into child nodes and ends with
the leaf nodes. Each node of a classification tree is split based on a splitting criterion which is either the
Gini index, entropy, or chi-square. In IBM SPSS Modeler, the classification and regression tree (CART)
algorithm uses the Gini splitting criteria while C5 and chi-squared automatic interaction detection
(CHAID) use the entropy and chi-square, respectively. In the classification tree, each leaf node represents
a decision rule to predict the class of the dependent variable [32].
f. Random forest is an extension of the decision tree model. The model built the forest in a random manner
and the massive number of trees in the forest provides an ensemble model for prediction. It increases
model predictive accuracy by using bootstrap samples of the training data and random feature selection in
tree induction. Each time a split in a tree is considered, a random sample of m predictors is chosen as split
candidates from the full set of p predictors where 𝑚 ≈ √𝑝. Prediction is made by aggregating (majority
vote or averaging) the predictions of the ensemble [34].
g. The adaptive boosting algorithm, named AdaBoost [18] is a machine learning model for regression and
classification problems that produces a predictive model in the form of an ensemble of weak predictive
models. It constructs the model in a stage-wise manner as the other methods of boosting. The outputs of
weak learner models are combined into a weighted sum that will represent the final output of the
weighted classifier. AdaBoost is adapting in the sense that subsequent weak learners are adjusted to
support those samples that were misclassified by the previous classifier. The steps involved in the
boosting process can be found in [18], [35].
In order to assess the performance of these predictive models, accuracy is the most commonly used
performance metric [36]. However, this metric does not really capture the effectiveness of a classifier for
imbalanced data. Therefore, many other performance metrics have been proposed in terms of error and
fitness [37]. We evaluate the predictive performance using the following seven metrics.
a. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
TP+TN
TP+TN+FP+FN
where TP is true positive (fraud predicted as fraud), TN is true negative (non-
fraud predicted as non-fraud), FP is false positive (non-fraud predicted as fraud) and FN is false negative
(fraud predicted as non-fraud).
b. 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦/𝑅𝑒𝑐𝑎𝑙𝑙 =
TP
TP+FN
c. 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
TN
TN+FP
d. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
TP
TP+FP
e. 𝑀𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑒 (𝐸𝑟𝑟𝑜𝑟) =
FN+FP
TP+TN+FN+FP
f. Area under the ROC curve (AUC) was calculated for the ROC curve. The closer the AUC value is to one,
the better the model.
g. Gini coefficient is between 0 and 1. The higher the Gini coefficient, the better the model.
4. RESULTS AND DISCUSSION
A total of 10,191 cases was divided into two groups in which 90% (9,171 cases) of the dataset is
used in the training and testing stages for the models and the remaining 10% (1,020 cases) from the dataset
for the deployment stage. We then partitioned 9,171 cases into 6,389 cases (70% of data) for training the
model. The remaining 2,782 cases (30% of data) were used for testing the performance of the models.
Six different machine learning models, namely logistic regression (enter, forward and backward),
NN (MLP and RBF), SVM (sigmoid, RBF, linear, and polynomial), TAN, random forest (number of trees:
50 and 100) and decision tree (CHAID, CART, and C5) and a boosting algorithm (AdaBoost with CHAID,
CART and C5) were evaluated using several performance metrics. Each model was evaluated for different
settings and the performance results are shown in Table 1.
5. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting automobile insurance fraud using classical and machine … (Shareh-Zulhelmi Shareh Nordin)
915
Table 1. Performance metrics of various models for the training and testing stages
Performance metric
Model Setting Accuracy Sensitivity Specificity Precision Error AUC Gini
Logistic regression Enter 78.21
(79.22)
40.38
(41.76)
91.91
(92.50)
63.92
(66.38)
21.79
(20.18)
0.82
(0.81)
0.63
(0.62)
Forward 78.34
(79.19)
41.03
(41.62)
91.74
(92.50)
64.11
(66.30)
21.66
(20.81)
0.82
(0.81)
0.63
(0.62)
Backward 78.34
(79.19)
41.03
(41.62)
91.74
(92.50)
64.11
(66.30)
21.66
(20.81)
0.82
(0.81)
0.63
(0.62)
NN MLP 78.38
(78.07)
41.39
(41.21)
91.68
(91.14)
64.13
(62.24)
21.62
(21.93)
0.80
(0.80)
0.61
(0.59)
RBF 74.38
(75.74)
21.49
(22.80)
93.38
(94.50)
53.86
(59.50)
25.62
(24.26)
0.74
(0.74)
0.47
(0.48)
SVM Sigmoid 73.57
(73.81)
0.12
(0.00)
100
(99.95)
100
(0.00)
26.43
(26.19)
0.59
(0.62)
0.17
(0.25)
RBF 86.41
(76.89)
63.17
(45.74)
94.77
(87.93)
81.26
(57.31)
13.69
(23.11)
0.91
(0.77)
0.82
(0.55)
Linear 78.42
(79.26)
39.55
(40.93)
92.38
(92.84)
65.11
(66.97)
21.58
(20.74)
0.81
(0.81)
0.61
(0.62)
Polynomial 98.69
(67.90)
96.27
(46.70)
99.55
(75.41)
98.72
(40.24)
1.31
(32.10)
0.99
(0.66)
0.99
(0.31)
TAN 79.09
(79.35)
44.05
(44.70)
91.68
(91.62)
65.55
(65.39)
20.91
(20.65)
0.82
(0.81)
0.64
(0.62)
Random forest Tree = 50 100
(78.32)
100
(35.16)
100
(93.62)
100
(66.15)
0.00
(21.68)
1.00
(0.80)
1.00
(0.60)
Tree = 100 100
(78.36)
100
(33.79)
100
(94.16)
100
(67.21)
0.00
(21.64)
1.00
(0.80)
1.00
(0.59)
Decision tree CHAID 86.18
(75.84)
35.46
(37.23)
90.81
(89.53)
58.10
(55.76)
23.82
(24.16)
0.78
(0.76)
0.56
(0.53)
CART 75.88
(76.74)
28.24
(30.36)
93.00
(93.18)
59.18
(61.22)
24.12
(23.26)
0.75
(0.77)
0.50
(0.53)
C5 84.38
(75.77)
50.74
(35.99)
96.47
(89.87)
83.77
(55.74)
15.62
(24.23)
0.83
(0.74)
0.66
(0.49)
AdaBoost CHAID_boost 79.84
(75.41)
50.56
(43.82)
90.36
(86.61)
65.34
(53.70)
20.16
(24.59)
0.83
(0.77)
0.65
(0.54)
CART_boost 78.21
(77.75)
46.18
(44.23)
89.72
(89.63)
61.76
(60.19)
21.79
(22.25)
0.78
(0.78)
0.56
(0.56)
C5_boost 91.72
(77.71)
73.83
(46.15)
98.15
(88.90)
93.48
(59.57)
8.28
(22.29)
0.94
(0.80)
0.89
(0.59)
Note: i) The values of accuracy, sensitivity, specificity, precision, and error are given in percentage; ii) Values in parentheses
are the results for the testing stage.
Table 1 shows the seven performance metrics results of various models at the training and testing
stages. For the logistic regression model, the result shows that all variables are significant at the 5% level
except age, income, parent1, home_val and car_age which are useful to form the logistic regression model
regardless of the selection methods. Thus, the performance metrics also show merely identical results for the
training and testing stages regardless of the selection methods. Figure 1 shows that the ROC curves of enter,
forward and backward stepwise selection are identical for the training and testing stages. As all enter,
forward and backward also provide identical AUC and Gini values in both stages, a logistic regression model
with enter setting was selected for comparison with other models.
Note:
$L-CLAIM_FLAG=enter;
$L1-CLAIM_FLAG=forward;
$L2-CLAIM_FLAG=backward
Figure 1. ROC curves of the logistic regression models
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 911-921
916
For the neural network model, urbancity, mvr_points, and travtime are ranked the top three important
variables based on MLP, while mvr_points, age, and travtime are ranked the top three based on RBF. The NN
model based on MLP performed better than RBF in terms of accuracy, sensitivity, precision, misclassification
rate, AUC, and Gini for both stages. Although the model based on RBF has a slightly higher specificity rate
than the MLP, the overall performance of MLP is still better than that based on RBF. In addition, the ROC
curves given in Figure 2 show that the MLP (light blue line) model has a higher curve than RBF (maroon line)
model. As a result, the NN model based on MLP was selected to be compared with other models.
Note:
$N-CLAIM_FLAG=MLP;
$N1-CLAIM_FLAG=RBF
Figure 2. ROC curves of the NN models based on MLP and RBF
The performance metrics for the SVM model using four different kernels were then evaluated. All
variables are identically important for SVM (sigmoid). The top three important variables are education,
occupation, and homekids for SVM (RBF), while homekids, car_type, and mstatus are important variables for
SVM (polynomial) and education, mstatus, and occupation are the important variables for SVM (linear).
Among these kernels, the linear kernel, in general, outperforms other kernels in terms of their overall metrics
for the testing stage. The sigmoid kernel performs the worst in terms of its sensitivity and precision (0.00%).
Among these kernels, the linear kernel outperformed the sigmoid, RBF, and polynomial kernels in terms of
testing accuracy (79.26%), precision (66.97%), misclassification rate (20.74%), AUC (0.81), and Gini
(0.616). The sigmoid kernel has higher specificity (99.95%) but the sensitivity and precision are very low
indicating that the SVM (sigmoid) is unable to predict the true positive effectively. The result shows an
overfitting issue for SVM (polynomial) as the performance metrics are very much lower for the testing stage.
In Figure 3, the ROC curves show that the linear (maroon line) kernel has a consistent curve in both stages
compared to RBF (light blue line), sigmoid (blue line), and polynomial (green line) models. Hence, SVM
with the linear kernel is selected to be compared with other models.
Note:
$S-CLAIM_FLAG=RBF;
$S1-CLAIM_FLAG=linear;
$S2-CLAIM_FLAG=sigmoid;
$S3-CLAIM_FLAG=polynomial
Figure 3. ROC curves of the SVM models
All variables are equally important for the TAN model. The performance metrics are given by accuracy
(79.35%), sensitivity (44.70%), specificity (91.62%), precision (65.39%), misclassification rate (20.65%), AUC
7. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting automobile insurance fraud using classical and machine … (Shareh-Zulhelmi Shareh Nordin)
917
(0.809) and Gini (0.617) during the testing stage. Figure 4 shows the ROC curves have a similar pattern curve in
the training and testing stages which indicates that the model has a consistent performance.
Figure 4. ROC curves of the TAN model
The random forest model with the number of trees 50 and 100 shows that the top three important
variables are age, travtime, and yoj. All three models provide perfect accuracy, sensitivity, specificity, and
precision at the training stage. However, the accuracy, sensitivity, specificity, and precision decrease at the
testing stage regardless of the number of trees used which indicates that the model is overfitting. The ROC
curves in Figure 5 show that the curve of each model is not consistent at both stages. Moreover, the AUC values
for both stages are inconsistent. As a result, the model is not an ideal model for this insurance fraud data.
Note:
$R-CLAIM_FLAG=50;
$R1-CLAIM_FLAG=100
Figure 5. ROC curves of the random forest models
For the decision tree model using CHAID, CART and C5, the splitting criteria based on the Gini
index show that C5 produces the highest number (170) of decision rules. The most important variables are
occupation, claim_freq, and urbancity for the CHAID, claim_freq, revoked, and urbancity for the CART,
while urbancity, gender, and mstatus for the C5. From Table 1, the seven metrics performances of CHAID
and CART are consistent for both stages, while the performance of C5 is inconsistent for both stages. In the
testing stage, the accuracy (76.74%), specificity (93.18%), precision (61.22%), and misclassification rate
(23.26%) for the CART are slightly better than the CHAID. Furthermore, the AUC and Gini values for the
CART are slightly better compared to the other model settings in the testing stage. The ROC curves in
Figure 6 show that CHAID (maroon line) has slightly better curves than the CART (light blue line) and C5
(blue line) models. As a result, the decision tree with CHAID is selected for comparison with other models.
The AdaBoost algorithm in CHAID, CART, and C5 is denoted as CHAID_boost, CART_boost, and
C5_boost. The results show all three model settings gave different important variables. The top three
important variables for CHAID_Boost are claim_freq, mvr_points, and urbancity. For CART_boost, the top
three important variables include claim_freq, occupation, and mvr_points, while almost all variables are
equally important for C5_boost. For the testing stage, the accuracy (77.75%), specificity (89.63%), and
precision (60.19%) of CART_boost with the highest values. Also, the AUC and Gini values for CART_boost
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 911-921
918
are more consistent for both stages. The ROC curves from Figure 7 show that CART_boost (blue line) curve
is more consistent than CHAID_boost (maroon line) and C5_boost (light blue line) curves. Thus,
CART_boost is selected for comparison with other models. The result shows the AdaBoost algorithm
improves the performance of the decision tree.
Note:
$R-CLAIM_FLAG=C5;
$R1-CLAIM_FLAG=CHAID,
$C-CLAIM_FLAG=CART
Figure 6. ROC curves for the decision tree models
Note:
$C-CLAIM_FLAG=C5;
$R-CLAIM_FLAG=CHAID,
$R1-CLAIM_FLAG=CART
Figure 7. ROC curves for the AdaBoost models
Finally, we compare the performances among all selected models which are the logistic regression
(backward), NN (MLP), SVM (linear), TAN, random forest (100), and CART_boost. The logistic regression
(backward), SVM (linear), and TAN shared almost similar and higher accuracies which are 79.19%, 79.26%,
and 79.35% respectively compared to the CART_boost, NN (MLP), and random forest which are 77.75%,
78.07%, and 78.15% respectively. Overall, the TAN model has the highest testing accuracy (79.35%) and
sensitivity (44.70%). Thus, the TAN model was selected for deployment using 10% data (1,020 cases).
Results in Table 2 show that the performance of the model at the deployment stage is quite similar to the
testing stage.
Table 2. Performance metrics of TAN model for the deployment stage
Model Performance metric
Accuracy Sensitivity Specificity Precision Error
TAN 78.92 44.16 91.69 66.12 21.98
5. CONCLUSION
Machine learning models are useful tools for classification and prediction problems. The
performance of seven machine learning models, namely the logistic regression, decision tree, NN, TAN,
SVM, random forest, and AdaBoost for decision tree with different model settings were compared using a
publicly available automobile insurance fraudulent claims dataset. This study found that the TAN model has
better classification performance than the other models. The result of this study concurs with other studies
9. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting automobile insurance fraud using classical and machine … (Shareh-Zulhelmi Shareh Nordin)
919
that random forests are prone to overfitting issues. In addition, the result shows that the AdaBoost algorithm
can improve the classification performance of the decision tree. In this study, the sensitivity of all machine
learning models was less than 50%. This could be due to the fact that the dataset is slightly imbalanced with
only 26.41% fraudulent cases. Future works should consider sampling techniques such as synthetic minority
oversampling techniques to balance the data before applying machine learning models.
ACKNOWLEDGEMENTS
The authors thank the management of UNITAR International University for funding the publication
of this paper.
REFERENCES
[1] S. Tennyson and P. Salsas-Forn, “Claims auditing in automobile insurance: fraud detection and deterrence objectives,” Journal of
Risk & Insurance, vol. 69, no. 3, pp. 289–308, Sep. 2002, doi: 10.1111/1539-6975.00024.
[2] Y. Wang and W. Xu, “Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud,” Decision
Support Systems, vol. 105, pp. 87–95, Jan. 2018, doi: 10.1016/j.dss.2017.11.001.
[3] M. Hanafy and R. Ming, “Machine learning approaches for auto insurance big data,” Risks, vol. 9, no. 2, Feb. 2021, doi:
10.3390/risks9020042.
[4] F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, “Insurance fraud detection: Evidence from artificial intelligence and
machine learning,” Research in International Business and Finance, vol. 62, Dec. 2022, doi: 10.1016/j.ribaf.2022.101744.
[5] S. Viaene, R. A. Derrig, B. Baesens, and G. Dedene, “A comparison of state-of-the-art classification techniques for expert
automobile insurance claim fraud detection,” Journal of Risk & Insurance, vol. 69, no. 3, pp. 373–421, Sep. 2002, doi:
10.1111/1539-6975.00023.
[6] Y. Li, C. Yan, W. Liu, and M. Li, “Research and application of random forest model in mining automobile insurance fraud,” in
2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Aug.
2016, pp. 1756–1761, doi: 10.1109/FSKD.2016.7603443.
[7] G. Kowshalya and M. Nandhini, “Predicting fraudulent claims in automobile insurance,” in 2018 Second International
Conference on Inventive Communication and Computational Technologies (ICICCT), Apr. 2018, pp. 1338–1343, doi:
10.1109/ICICCT.2018.8473034.
[8] P. S. Pranavi, “Analysis of vehicle insurance data to detect fraud using machine learning,” International Journal for Research in
Applied Science and Engineering Technology, vol. 8, no. 7, pp. 2033–2038, Jul. 2020, doi: 10.22214/ijraset.2020.30734.
[9] I. M. N. Prasasti, A. Dhini, and E. Laoh, “Automobile insurance fraud detection using supervised classifiers,” in 2020
International Workshop on Big Data and Information Security (IWBIS), Oct. 2020, pp. 47–52, doi:
10.1109/IWBIS50925.2020.9255426.
[10] S. Viaene, G. Dedene, and R. Derrig, “Auto claim fraud detection using Bayesian learning neural networks,” Expert Systems with
Applications, vol. 29, no. 3, pp. 653–666, Oct. 2005, doi: 10.1016/j.eswa.2005.04.030.
[11] H. Tao, L. Zhixin, and S. Xiaodong, “Insurance fraud identification research based on fuzzy support vector machine with dual
membership,” in 2012 International Conference on Information Management, Innovation Management and Industrial
Engineering, Oct. 2012, pp. 457–460, doi: 10.1109/ICIII.2012.6340016.
[12] G. G. Sundarkumar and V. Ravi, “A novel hybrid undersampling method for mining unbalanced datasets in banking and
insurance,” Engineering Applications of Artificial Intelligence, vol. 37, pp. 368–377, Jan. 2015, doi:
10.1016/j.engappai.2014.09.019.
[13] Y. Li, C. Yan, W. Liu, and M. Li, “A principle component analysis-based random forest with the potential nearest neighbor
method for automobile insurance fraud identification,” Applied Soft Computing, vol. 70, pp. 1000–1009, Sep. 2018, doi:
10.1016/j.asoc.2017.07.027.
[14] C. Yan, Y. Li, W. Liu, M. Li, J. Chen, and L. Wang, “An artificial bee colony-based kernel ridge regression for automobile
insurance fraud identification,” Neurocomputing, vol. 393, pp. 115–125, Jun. 2020, doi: 10.1016/j.neucom.2017.12.072.
[15] C. Yan, M. Li, W. Liu, and M. Qi, “Improved adaptive genetic algorithm for the vehicle insurance fraud identification model
based on a BP neural network,” Theoretical Computer Science, vol. 817, pp. 12–23, May 2020, doi: 10.1016/j.tcs.2019.06.025.
[16] C. Gomes, Z. Jin, and H. Yang, “Insurance fraud detection with unsupervised deep learning,” Journal of Risk and Insurance,
vol. 88, no. 3, pp. 591–624, Sep. 2021, doi: 10.1111/jori.12359.
[17] M. Sathya and B. Balakumar, “Insurance fraud detection using novel machine learning technique,” International Journal of
Intelligent Systems and Applications in Engineering, vol. 10, no. 3, pp. 374–381, 2022.
[18] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of
Computer and System Sciences, vol. 55, no. 1, pp. 119–139, Aug. 1997, doi: 10.1006/jcss.1997.1504.
[19] T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, Aug. 2016, pp. 785–794, doi: 10.1145/2939672.2939785.
[20] J. Pesantez-Narvaez, M. Guillen, and M. Alcañiz, “Predicting motor insurance claims using telematics data-XGBoost versus
logistic regression,” Risks, vol. 7, no. 2, Jun. 2019, doi: 10.3390/risks7020070.
[21] B. Itri, Y. Mohamed, Q. Mohammed, and B. Omar, “Performance comparative study of machine learning algorithms for
automobile insurance fraud detection,” in 2019 Third International Conference on Intelligent Computing in Data Sciences
(ICDS), Oct. 2019, pp. 1–4, doi: 10.1109/ICDS47004.2019.8942277.
[22] L. Rukhsar, W. H. Bangyal, K. Nisar, and S. Nisar, “Prediction of insurance fraud detection using machine learning algorithms,”
Mehran University Research Journal of Engineering and Technology, vol. 41, no. 1, pp. 33–40, Jan. 2022, doi:
10.22581/muet1982.2201.04.
[23] S. Abdelhadi, K. Elbahnasy, and M. Abdelsalam, “A proposed model to predict auto insurance claims using machine learning
techniques,” Journal of Theoretical and Applied Information Technology, vol. 98, no. 22, 2020.
[24] E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun, “The application of data mining techniques in financial fraud detection:
A classification framework and an academic review of literature,” Decision Support Systems, vol. 50, no. 3, pp. 559–569, Feb.
2011, doi: 10.1016/j.dss.2010.08.006.
10. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 911-921
920
[25] A. Abdallah, M. A. Maarof, and A. Zainal, “Fraud detection system: A survey,” Journal of Network and Computer Applications,
vol. 68, pp. 90–113, Jun. 2016, doi: 10.1016/j.jnca.2016.04.007.
[26] W. Hilal, S. A. Gadsden, and J. Yawney, “Financial fraud: a review of anomaly detection techniques and recent advances,” Expert
Systems with Applications, vol. 193, May 2022, doi: 10.1016/j.eswa.2021.116429.
[27] X. Mengsun, “Car insurance claim data,” Kaggle. https://www.kaggle.com/datasets/xiaomengsun/car-insurance-claim-data
(accessed Jan. 12, 2021).
[28] M. K. Severino and Y. Peng, “Machine learning algorithms for fraud prediction in property insurance: Empirical evidence using
real-world microdata,” Machine Learning with Applications, vol. 5, Sep. 2021, doi: 10.1016/j.mlwa.2021.100074.
[29] M.-W. Hsu, S. Lessmann, M.-C. Sung, T. Ma, and J. E. V. Johnson, “Bridging the divide in financial market forecasting: machine
learners vs. financial economists,” Expert Systems with Applications, vol. 61, pp. 215–234, Nov. 2016, doi:
10.1016/j.eswa.2016.05.033.
[30] N. A. M. Salim et al., “Prediction of dengue outbreak in Selangor Malaysia using machine learning techniques,” Scientific
Reports, vol. 11, no. 1, Jan. 2021, doi: 10.1038/s41598-020-79193-2.
[31] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi:
10.1007/BF00994018.
[32] P.-N. Tan, M. Steinbach, A. Karpatne, and V. Kumar, “Introduction to data mining,” Pearson Education India, 2016.
[33] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classifiers,” Machine Learning, vol. 29, pp. 131–163, 1997.
[34] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston, “Random forest: a classification and regression
tool for compound classification and QSAR modeling,” Journal of Chemical Information and Computer Sciences, vol. 43, no. 6,
pp. 1947–1958, Nov. 2003, doi: 10.1021/ci034160g.
[35] B. W. Yap, K. A. Rani, H. A. A. Rahman, S. Fong, Z. Khairudin, and N. N. Abdullah, “An application of oversampling,
undersampling, bagging and boosting in handling imbalanced datasets,” in Lecture Notes in Electrical Engineering, Springer
Singapore, 2014, pp. 13–22.
[36] B. M. Henrique, V. A. Sobreiro, and H. Kimura, “Literature review: Machine learning techniques applied to financial market
prediction,” Expert Systems with Applications, vol. 124, pp. 226–251, Jun. 2019, doi: 10.1016/j.eswa.2019.01.012.
[37] M. Naser and A. Alavi, “Insights into performance fitness and error metrics for machine learning,” arXiv preprint
arXiv:2006.00887, 2021.
BIOGRAPHIES OF AUTHORS
Shareh-Zulhelmi Shareh Nordin received a B.Sc. degree in science majoring in
physics from Universiti Putra Malaysia (UPM), Malaysia, in 2018 and an M.S. degree in data
science from Universiti Teknologi MARA (UiTM), Malaysia, in 2021. Currently, he is a data
scientist at the Department of Data Science, Petronas Digital Sdn Bhd. His research interests
include supervised machine learning, classification, boosting algorithms, and statistical
analysis. He can be contacted at sharehzulhelmi@gmail.com.
Yap Bee Wah holds a Ph.D. in statistics from University of Malaya, Malaysia.
She has more than 30 years of service at Universiti Teknologi MARA, Malaysia, and recently
joined UNITAR International University as Director of the Research and Consultancy Centre.
Her research interests are in big data analytics and data science, computational statistics, and
multivariate data analysis. She was the conference chair for The International Conference on
Soft Computing in Data Science (SCDS) which was held from 2015-2019 and 2021. She was
also the conference chair for DaSET2022: International Conference on Data Science and
Emerging Technologies. She has served as Guest Editor for Applied Soft Computing and
Pertanika Journal of Science and Technology. Her research works are in the healthcare,
education, environment, and business domains. She has published more than 100 papers in
indexed journals and proceedings. She can be contacted at bee.wah@unitar.my.
Ng Kok Haur received a Bachelor of Science in Mathematics (2000) and a
Master of Science in Statistics (2002) from Universiti Putra Malaysia and a Ph.D. in Statistics
(2006) from Universiti Malaya. He is currently an associate professor at the Institute of
Mathematical Sciences, Faculty of Science, Universiti Malaya. His research interests include
volatility modeling and applications, modeling and analysis of high-frequency data arising in
financial economics, and statistical process control. He has published research papers in
various journals including The North American Journal of Economics and Finance, Studies in
Nonlinear Dynamics and Econometrics, International Review of Economics and Finance,
International Journal of Industrial Engineering: Theory, Applications, and Practice, Expert
Systems with Applications, and Communications in Statistics: Theory and Methods. Apart
from that, he also serves as an associate editor and reviewer for journals, guest editor for a
special issue of the Malaysian Journal of Science (2019), thesis examiner, and a committee
member of local and international conferences. He can be contacted at kokhaur@um.edu.my.
11. Int J Elec & Comp Eng ISSN: 2088-8708
Predicting automobile insurance fraud using classical and machine … (Shareh-Zulhelmi Shareh Nordin)
921
Asmawi Hashim received degrees (1999) and master's (2002) in economics from
Universiti Utara Malaysia and a Ph.D. in economics (Finance) from Sultan Idris Education
University (2021). He is currently a Senior Lecturer at Sultan Idris Education University,
Perak. He has authored or coauthored multiple number of articles published in
refereed/indexed journals and conference papers, book chapters, and books. He has published
more than 90+ papers in indexed journals and proceedings. His areas of expertise are financial
economics, managerial economics, economic development, and macroeconomics. He can be
contacted at asmawi@fpe.upsi.edu.my.
Norimah Rambeli received degrees (2002) and master’s degrees (2004) in
economics from University Kebangsaan Malaysia (UKM) and her Ph.D. in economics from
University of Southampton, United Kingdom (2012). She is currently an associate professor at
Universiti Pendidikan Sultan Idris (UPSI), Perak under the Economic Department, Faculty of
Management and Economics (FPE). She is currently an editor team for Management Research
Journal (MRJ), UPSI, and has actively published more than 174 articles in refereed/indexed
journals and conference papers, book chapters, and books. Her areas of expertise are
environmental economics, financial economics, applied econometrics, and time series data
studies. She can be contacted at norimah@gmail.com.
Norasibah Abdul Jalil received degrees in economics from Texas Tech
University, USA (1989), Master's (2000), and a Ph.D. in economics from International Islamic
University (2010). She is currently an associate professor at Universiti Pendidikan Sultan Idris,
Perak. She has authored or coauthored multiple number of articles published in
refereed/indexed journals and conference papers, book chapters, and books. Her areas of
expertise are macroeconomics, financial economics, and economic development. She can be
contacted at norasibah.abduljalil@gmail.com.