Machine learning models have been widely used in numerous classification problems and performance measures play a critical role in machine learning model development, selection, and evaluation. This paper covers a comprehensive overview of performance measures in machine learning classification. Besides, we proposed a framework to construct a novel evaluation metric that is based on the voting results of three performance measures, each of which has strengths and limitations. The new metric can be proved better than accuracy in terms of consistency and discriminancy.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
Sensitivity analysis is the study of how uncertainty in the inputs of a mathematical model propagates to uncertainty in the model's outputs. It is useful for understanding relationships between inputs and outputs, identifying important inputs, and reducing uncertainty. Sensitivity analysis typically involves running the model many times while varying inputs, and calculating sensitivity measures from the resulting outputs to determine which inputs most influence uncertainty in the outputs. Common methods include variance-based approaches and screening methods.
This document presents information on sensitivity analysis techniques. It discusses how sensitivity analysis is used to determine how changes to independent variables impact dependent variables given certain assumptions. It also describes how sensitivity analysis can predict outcomes if a situation differs from key predictions. Various sensitivity analysis methods are outlined, including correlation and screening techniques, regression analysis, and analyzing oscillations through measuring behavior patterns like period and amplitude. An example of applying sensitivity analysis to a simple supply chain model is also provided.
Data analysis is a process that involves gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. It describes several major techniques for data analysis, including correlation analysis, regression analysis, factor analysis, cluster analysis, correspondence analysis, conjoint analysis, CHAID analysis, discriminant/logistic regression analysis, multidimensional scaling, and structural equation modeling.
Factor analysis is a statistical technique used to simplify large sets of variables into underlying factors or subsets. It analyzes all variables together without distinguishing between dependent and independent variables. Factor analysis can help understand media habits, identify price-sensitive customer segments, determine important brand attributes, and understand distribution channel selection criteria. The technique benefits research by providing a more concise representation of the situation, potentially reducing future survey questions, and enabling perceptual maps.
Evaluation measures for models assessment over imbalanced data setsAlexander Decker
This document discusses various evaluation measures that can be used to assess models trained on imbalanced data sets, where one class is significantly underrepresented compared to other classes. It begins by introducing common measures like accuracy, precision, recall, and F1 score that are misleading for imbalanced data. It then describes alternative combined measures like G-mean, likelihood ratios, discriminant power, F-measure, balanced accuracy, Youden index, and Matthews correlation coefficient that provide a more credible evaluation for imbalanced data. Finally, it discusses graphical measures like ROC curves, AUC, lift charts, and cumulative gain curves that can be used for model comparison and evaluation. The document aims to provide a comprehensive overview of appropriate evaluation measures for assessing models trained
Descriptive statistics are methods of describing the characteristics of a data set. It includes calculating things such as the average of the data, its spread and the shape it produces.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
We aim at developing and improving the imbalanced business risk modeling via jointly using proper
evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques.
Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison
based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS)
and cluster centroid undersampling (CCUS), as well as two oversampling methods including random
oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly
interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR
(L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and
Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting
on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can
achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.
Sensitivity analysis is the study of how uncertainty in the inputs of a mathematical model propagates to uncertainty in the model's outputs. It is useful for understanding relationships between inputs and outputs, identifying important inputs, and reducing uncertainty. Sensitivity analysis typically involves running the model many times while varying inputs, and calculating sensitivity measures from the resulting outputs to determine which inputs most influence uncertainty in the outputs. Common methods include variance-based approaches and screening methods.
This document presents information on sensitivity analysis techniques. It discusses how sensitivity analysis is used to determine how changes to independent variables impact dependent variables given certain assumptions. It also describes how sensitivity analysis can predict outcomes if a situation differs from key predictions. Various sensitivity analysis methods are outlined, including correlation and screening techniques, regression analysis, and analyzing oscillations through measuring behavior patterns like period and amplitude. An example of applying sensitivity analysis to a simple supply chain model is also provided.
Data analysis is a process that involves gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. It describes several major techniques for data analysis, including correlation analysis, regression analysis, factor analysis, cluster analysis, correspondence analysis, conjoint analysis, CHAID analysis, discriminant/logistic regression analysis, multidimensional scaling, and structural equation modeling.
Factor analysis is a statistical technique used to simplify large sets of variables into underlying factors or subsets. It analyzes all variables together without distinguishing between dependent and independent variables. Factor analysis can help understand media habits, identify price-sensitive customer segments, determine important brand attributes, and understand distribution channel selection criteria. The technique benefits research by providing a more concise representation of the situation, potentially reducing future survey questions, and enabling perceptual maps.
Evaluation measures for models assessment over imbalanced data setsAlexander Decker
This document discusses various evaluation measures that can be used to assess models trained on imbalanced data sets, where one class is significantly underrepresented compared to other classes. It begins by introducing common measures like accuracy, precision, recall, and F1 score that are misleading for imbalanced data. It then describes alternative combined measures like G-mean, likelihood ratios, discriminant power, F-measure, balanced accuracy, Youden index, and Matthews correlation coefficient that provide a more credible evaluation for imbalanced data. Finally, it discusses graphical measures like ROC curves, AUC, lift charts, and cumulative gain curves that can be used for model comparison and evaluation. The document aims to provide a comprehensive overview of appropriate evaluation measures for assessing models trained
Descriptive statistics are methods of describing the characteristics of a data set. It includes calculating things such as the average of the data, its spread and the shape it produces.
The document is an introduction to sensitivity analysis that contains three exploratory exercises demonstrating how changes to parameter values affect system behavior in system dynamics models. The first exercise explores a lemonade stand model and finds that while parameter changes alter the appearance of behavior, they do not change the overall behavior mode. The second exercise on an epidemics model shows that different parameter changes can create different types of behavior changes. The exercises are intended to help readers understand how to identify important parameters for sensitivity testing and how parameter values can influence system dynamics.
Data analysis involves editing, coding, sorting, and entering data to discover useful information. The process includes gathering data from various sources, reviewing it, and analyzing it to form conclusions. Specifically, editing ensures quality by reviewing collected data, coding categorizes data for computer analysis, sorting arranges data in a meaningful order for comprehension, and data entry inputs collected information into a computer. Finally, a master chart compiles all essential data in one format.
The document compares different statistical significance tests for evaluating information retrieval systems:
1) Randomization, bootstrap, and Student's t-test produced similar significance values and are recommended.
2) The Wilcoxon and sign tests produced different p-values and can incorrectly predict or fail to detect significant differences between systems.
3) The randomization test is recommended as it can use any evaluation metric and does not assume a specific distribution of test statistics.
Evaluation metric plays a critical role in achieving the optimal classifier during the classification training.
Thus, a selection of suitable evaluation metric is an important key for discriminating and obtaining the
optimal classifier. This paper systematically reviewed the related evaluation metrics that are specifically
designed as a discriminator for optimizing generative classifier. Generally, many generative classifiers
employ accuracy as a measure to discriminate the optimal solution during the classification training.
However, the accuracy has several weaknesses which are less distinctiveness, less discriminability, less
informativeness and bias to majority class data. This paper also briefly discusses other metrics that are
specifically designed for discriminating the optimal solution. The shortcomings of these alternative metrics
are also discussed. Finally, this paper suggests five important aspects that must be taken into consideration
in constructing a new discriminator metric.
What if Analysis,Goal Seek Analysis,Sensitivity Analysis,Optimization Analysi...Sourav Das
The document discusses various analytical techniques used in data analysis:
1) What-if analysis allows making hypothetical changes to variables to observe impact on results, including scenario analysis, data tables analysis, and goal seek analysis.
2) Sensitivity analysis determines the effect of changes to independent variables on dependent variables under certain assumptions, including partial sensitivity analysis and best/worst case scenarios.
3) Optimization analysis finds optimal values for target variables given constraints by changing other variables repeatedly.
4) Data mining discovers interesting patterns from large datasets and involves exploration, model building/validation, and deployment stages.
There are two main statistical techniques for comparing systems: independent sampling and correlated sampling. When comparing two systems, it is necessary to use confidence intervals. There are three possible scenarios when computing confidence intervals depending on if the sampling is independent or correlated. When comparing several designs, the goals may be to estimate each performance measure, compare to a present system, or select the best. The Bonferroni approach can be used to make statements about multiple alternatives while controlling the overall confidence level. Design of experiments tools like factorial designs, screening, and response surface methods can help understand the effect of design alternatives on performance measures.
The technique used to determine how independent variable values will impact a particular dependent variable under a given set of assumptions is defined as sensitive analysis. It’s usage will depend on one or more input variables within the specific boundaries, such as the effect that changes in interest rates will have on a bond’s price.
-What is Sensitivity Analysis in Project Risk Management?
-Example on Sensitivity Analysis….
-Types of Sensitivity Analysis……
-Advantages & Disadvantages
The document discusses feature selection for machine learning models. It considers whether feature selection improves model accuracy or interpretability for random forests. The author tests different feature selection methods on a dataset about telemarketing campaigns. The results show feature selection did not improve accuracy and reduced interpretability by changing variable importance measures in unclear ways. In conclusion, feature selection may not improve accuracy or help interpret models to make causal inferences, though it can reduce features and decorrelate them. Regularization alone may suffice if the goal is just prediction.
pratik meshram -Unit 3 contemporary marketing research full notes pune univer...Pratik Meshram
This document provides an overview of experimental designs and sampling techniques used in marketing research. It discusses different types of experimental designs including classical designs like before-after and before-after with control group designs as well as statistical designs like randomized block, Latin square, and factorial designs. It also covers topics like test marketing, which involves conducting controlled experiments in select marketplaces to test new products or marketing mix variables. The document outlines techniques for studying the effectiveness of advertising and sales promotion campaigns through analyzing past experiences, consumer surveys, pre-testing ads, and post-test research.
This document discusses preliminary data analysis techniques. It begins by explaining that data analysis is done to make sense of collected data. The basic steps of preliminary analysis are editing, coding, and tabulating data. Editing involves checking for errors and inconsistencies. Coding transforms raw data into numerical codes for analysis. Tabulation involves counting how many cases fall into each coded category. Examples of tabulations like simple counts and cross-tabulations are provided to show relationships between variables. Preliminary analysis helps detect errors and develop hypotheses for further statistical testing.
This study examines customer switching behavior for bank services. It identifies four dimensions of customer satisfaction: personal, financial, environmental, and convenience factors. The study develops and tests hypotheses about the relationship between customer satisfaction and switching likelihood. It finds that satisfaction is negatively related to switching, and that importance of a service mediates this relationship. A survey was administered and factor analysis was conducted. The results show the effect of importance varies across service categories, and that competition levels also influence switching decisions. The findings suggest banks should focus more on personal, atmospheric, and convenience factors to reduce switching.
Models of Operations Research is addressedSundar B N
Introduction, Meaning and Characteristics of Operations Research is addressed.
MODELS IN OPERATIONS RESEARCH, Classification of Models, degree of abstraction, Purpose Models, Predictive models, Descriptive models, Prescriptive models, Mathematic / Symbolic models, Models by nature of an environment, Models by the extent of generality, Models by Behaviour, Models by Method of Solution, Models by Method of Solution, Static and dynamic models, Iconic models Iconic models, Analogue models.
Subscribe to Vision Academy for Video Assistance
https://www.youtube.com/channel/UCjzpit_cXjdnzER_165mIiw
This chapter discusses choosing appropriate statistical techniques for analyzing numerical and categorical data. For numerical variables, it identifies questions about describing characteristics, drawing conclusions about the mean/standard deviation, determining differences between groups, identifying influencing factors, predicting values, and determining stability over time. For each, it lists relevant techniques. For categorical variables, it addresses similar questions and outlines techniques like hypothesis testing, regression, and control charts. The goal is to match the right analysis to the data type and research purpose.
This document provides information about data interpretation and different ways to present data. It discusses numerical data tables including time series tables, spatial tables, frequency distribution tables, and cumulative frequency tables. Examples are given to show how to calculate capacity utilization, sales growth percentages, and solve other problems using the data in tables. Cartesian graphs are also introduced as a way to show the variation of a quantity with respect to two parameters on the X and Y axes.
Factor analysis is a statistical technique used to identify underlying factors that explain the pattern of correlations within a set of observed variables. It groups variables that are highly correlated with each other into factors to reduce data dimensionality. The key steps are extracting factors with eigenvalues greater than 1, evaluating factor loadings to interpret the grouping of variables, and rotating factors to maximize interpretability of the results. SPSS output includes correlation coefficients, KMO/Bartlett's tests of sampling adequacy, eigenvalues, communalities, scree plots, and rotated component matrices.
3rd alex marketing club (pharmaceutical forecasting) dr. ahmed sham'aMahmoud Bahgat
#Mahmoud_Bahgat
#Marketing_Club
Join us by WhatsApp to me 00966568654916
*اشترك في صفحة ال Marketing Club* عالفيسبوك
https://www.facebook.com/MarketingTipsPAGE/
*اشترك في جروب ال Marketing Club* عالفيسبوك
https://www.facebook.com/groups/837318003074869/
*Marketing Club Middle East*
25 Meetings in 6 Cities in 1 year & 2 months
Since October 2015
*We have 6 groups whatsapp*
*for almost 600 marketers*
From all middle east
*since 5 years*
& now 10 more groups
For Marketing Club Lovers as future Marketers
أهم حاجة الشروط
*Only marketers*
From all Industries
No students
*No sales*
*No hotels Reps*
*No restaurants Reps*
*No Travel Agents*
*No Advertising Agencies*
*Many have asked to Attend the Club*
((We Wish All can Attend,But Cant..))
*Criteria of Marketing Club Members*
•••••••••••••••••••••••••••••••••••••
For Better Harmony & Mind set.
*Must be only Marketer*
*Also Previous Marketing experience*
●Business Managers
●Country Manager,GM
●Directors, CEO
Are most welcomed to add Value to us.
■■■■■■■■■■■■■■■■
《 *Unmatched Criteria*》
Not Med Rep,
Not Key Account,
Not Product Specialist,
Not Sales Supervisor,
Not Sales Manager,
●●●●●●●●●●●●●●●●●●
But till you become a marketer
you can join other What'sApp group
*Marketing Lover Future Club Group*
■■■■■■■■■■■■■■■■
《 *Unmatched Criteria*》
For Conflict of Intrest
*Also Can't attend*
If Working in
*Marketing Services Provider*
=not *Hotel* Marketers
=not *Restaurant* Marketers
=not *Advertising* Marketer
=not *Event Manager*
=not *Market Researcher*.
■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■
*this Club for Only Marketers*
Very Soon we will have
*Business Leaders Club*
For Sales Managers & Directors
Will be Not for Markters
●●●●●●●●●●●●●●●●●●●●
■ *Only Marketers* ■
*& EPS Marketing Diploma*
●●●●●●●●●●●●●●●●●●●●
Confirm coming by Pvt WhatsApp
*To know the new Location*
*#Mahmoud_Bahgat*
00966568654916
*#Marketing_Club*
http://goo.gl/forms/RfskGzDslP
*اشترك بصفحة جمعية الصيادلة المصريين* عالفيسبوك
https://lnkd.in/fucnv_5
■ *Bahgat Facbook Page*
https://lnkd.in/fVAdubA
■ *Bahgat Linkedin*
https://lnkd.in/fvDQXuG
■ *Bahgat Twitter*
https://lnkd.in/fmNC72T
■ *Bahgat YouTube Channel*
https://www.Youtube.com /mahmoud bahgat
■ *Bahgat Instagram*
https://lnkd.in/fmWPXrY
■ *Bahgat SnapChat*
https://lnkd.in/f6GR-mR
*#Mahmoud_Bahgat*
*#Legendary_ADLAND*
www.TheLegendary.info
This document summarizes a master's thesis project that investigates using break-point analysis to predict bankruptcy risk. The project was motivated by weaknesses in previous bankruptcy prediction studies, namely that they did not consider the timing of financial data or rates of change. The project applies Bayesian change-point (break-point) analysis to financial ratios of firms, to identify significant changes that may indicate distress. If break-points are found closer to a firm's bankruptcy, it could improve prediction compared to only considering financial ratio levels. The document describes the data, analytical methods, and preliminary exploration and application of break-point analysis to the ratios.
This document proposes a methodology for evaluating statistical classification models for churn prediction using a composite indicator. It considers factors beyond just accuracy, like robustness, speed, interpretability and ease of use. The methodology will be tested on classification models applied to real customer data from a Spanish retail company. It also analyzes the impact of different variable selection methods on model performance.
This document discusses using machine learning algorithms to predict household poverty levels. The goals are to build classification models to predict a household's poverty level as either "poor" or "non-poor" based on household attributes. Linear regression is proposed as the modeling algorithm. The document outlines collecting and preprocessing a household dataset, feature selection, model training and evaluation using metrics like MSE, RMSE and R-squared. References are provided on related work applying machine learning to poverty prediction using household surveys and satellite imagery.
This document summarizes an article that proposes a novel cost-free learning (CFL) approach called ABC-SVM to address the class imbalance problem. The approach aims to maximize the normalized mutual information of the predicted and actual classes to balance errors and rejects without requiring cost information. It optimizes misclassification costs, SVM parameters, and feature selection simultaneously using an artificial bee colony algorithm. Experimental results on several datasets show the method performs effectively compared to sampling techniques for class imbalance.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
This document summarizes research on predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms. It reviews relevant techniques for handling class imbalance, including using the AUC evaluation metric, resampling methods like undersampling and oversampling, tuning the positive ratio, cross-validation, regularization for logistic regression, decision trees, and ensemble methods. The study aims to develop an optimal risk prediction model by jointly applying these techniques, with results showing that boosting on decision trees using oversampled data achieves the best performance.
The document is an introduction to sensitivity analysis that contains three exploratory exercises demonstrating how changes to parameter values affect system behavior in system dynamics models. The first exercise explores a lemonade stand model and finds that while parameter changes alter the appearance of behavior, they do not change the overall behavior mode. The second exercise on an epidemics model shows that different parameter changes can create different types of behavior changes. The exercises are intended to help readers understand how to identify important parameters for sensitivity testing and how parameter values can influence system dynamics.
Data analysis involves editing, coding, sorting, and entering data to discover useful information. The process includes gathering data from various sources, reviewing it, and analyzing it to form conclusions. Specifically, editing ensures quality by reviewing collected data, coding categorizes data for computer analysis, sorting arranges data in a meaningful order for comprehension, and data entry inputs collected information into a computer. Finally, a master chart compiles all essential data in one format.
The document compares different statistical significance tests for evaluating information retrieval systems:
1) Randomization, bootstrap, and Student's t-test produced similar significance values and are recommended.
2) The Wilcoxon and sign tests produced different p-values and can incorrectly predict or fail to detect significant differences between systems.
3) The randomization test is recommended as it can use any evaluation metric and does not assume a specific distribution of test statistics.
Evaluation metric plays a critical role in achieving the optimal classifier during the classification training.
Thus, a selection of suitable evaluation metric is an important key for discriminating and obtaining the
optimal classifier. This paper systematically reviewed the related evaluation metrics that are specifically
designed as a discriminator for optimizing generative classifier. Generally, many generative classifiers
employ accuracy as a measure to discriminate the optimal solution during the classification training.
However, the accuracy has several weaknesses which are less distinctiveness, less discriminability, less
informativeness and bias to majority class data. This paper also briefly discusses other metrics that are
specifically designed for discriminating the optimal solution. The shortcomings of these alternative metrics
are also discussed. Finally, this paper suggests five important aspects that must be taken into consideration
in constructing a new discriminator metric.
What if Analysis,Goal Seek Analysis,Sensitivity Analysis,Optimization Analysi...Sourav Das
The document discusses various analytical techniques used in data analysis:
1) What-if analysis allows making hypothetical changes to variables to observe impact on results, including scenario analysis, data tables analysis, and goal seek analysis.
2) Sensitivity analysis determines the effect of changes to independent variables on dependent variables under certain assumptions, including partial sensitivity analysis and best/worst case scenarios.
3) Optimization analysis finds optimal values for target variables given constraints by changing other variables repeatedly.
4) Data mining discovers interesting patterns from large datasets and involves exploration, model building/validation, and deployment stages.
There are two main statistical techniques for comparing systems: independent sampling and correlated sampling. When comparing two systems, it is necessary to use confidence intervals. There are three possible scenarios when computing confidence intervals depending on if the sampling is independent or correlated. When comparing several designs, the goals may be to estimate each performance measure, compare to a present system, or select the best. The Bonferroni approach can be used to make statements about multiple alternatives while controlling the overall confidence level. Design of experiments tools like factorial designs, screening, and response surface methods can help understand the effect of design alternatives on performance measures.
The technique used to determine how independent variable values will impact a particular dependent variable under a given set of assumptions is defined as sensitive analysis. It’s usage will depend on one or more input variables within the specific boundaries, such as the effect that changes in interest rates will have on a bond’s price.
-What is Sensitivity Analysis in Project Risk Management?
-Example on Sensitivity Analysis….
-Types of Sensitivity Analysis……
-Advantages & Disadvantages
The document discusses feature selection for machine learning models. It considers whether feature selection improves model accuracy or interpretability for random forests. The author tests different feature selection methods on a dataset about telemarketing campaigns. The results show feature selection did not improve accuracy and reduced interpretability by changing variable importance measures in unclear ways. In conclusion, feature selection may not improve accuracy or help interpret models to make causal inferences, though it can reduce features and decorrelate them. Regularization alone may suffice if the goal is just prediction.
pratik meshram -Unit 3 contemporary marketing research full notes pune univer...Pratik Meshram
This document provides an overview of experimental designs and sampling techniques used in marketing research. It discusses different types of experimental designs including classical designs like before-after and before-after with control group designs as well as statistical designs like randomized block, Latin square, and factorial designs. It also covers topics like test marketing, which involves conducting controlled experiments in select marketplaces to test new products or marketing mix variables. The document outlines techniques for studying the effectiveness of advertising and sales promotion campaigns through analyzing past experiences, consumer surveys, pre-testing ads, and post-test research.
This document discusses preliminary data analysis techniques. It begins by explaining that data analysis is done to make sense of collected data. The basic steps of preliminary analysis are editing, coding, and tabulating data. Editing involves checking for errors and inconsistencies. Coding transforms raw data into numerical codes for analysis. Tabulation involves counting how many cases fall into each coded category. Examples of tabulations like simple counts and cross-tabulations are provided to show relationships between variables. Preliminary analysis helps detect errors and develop hypotheses for further statistical testing.
This study examines customer switching behavior for bank services. It identifies four dimensions of customer satisfaction: personal, financial, environmental, and convenience factors. The study develops and tests hypotheses about the relationship between customer satisfaction and switching likelihood. It finds that satisfaction is negatively related to switching, and that importance of a service mediates this relationship. A survey was administered and factor analysis was conducted. The results show the effect of importance varies across service categories, and that competition levels also influence switching decisions. The findings suggest banks should focus more on personal, atmospheric, and convenience factors to reduce switching.
Models of Operations Research is addressedSundar B N
Introduction, Meaning and Characteristics of Operations Research is addressed.
MODELS IN OPERATIONS RESEARCH, Classification of Models, degree of abstraction, Purpose Models, Predictive models, Descriptive models, Prescriptive models, Mathematic / Symbolic models, Models by nature of an environment, Models by the extent of generality, Models by Behaviour, Models by Method of Solution, Models by Method of Solution, Static and dynamic models, Iconic models Iconic models, Analogue models.
Subscribe to Vision Academy for Video Assistance
https://www.youtube.com/channel/UCjzpit_cXjdnzER_165mIiw
This chapter discusses choosing appropriate statistical techniques for analyzing numerical and categorical data. For numerical variables, it identifies questions about describing characteristics, drawing conclusions about the mean/standard deviation, determining differences between groups, identifying influencing factors, predicting values, and determining stability over time. For each, it lists relevant techniques. For categorical variables, it addresses similar questions and outlines techniques like hypothesis testing, regression, and control charts. The goal is to match the right analysis to the data type and research purpose.
This document provides information about data interpretation and different ways to present data. It discusses numerical data tables including time series tables, spatial tables, frequency distribution tables, and cumulative frequency tables. Examples are given to show how to calculate capacity utilization, sales growth percentages, and solve other problems using the data in tables. Cartesian graphs are also introduced as a way to show the variation of a quantity with respect to two parameters on the X and Y axes.
Factor analysis is a statistical technique used to identify underlying factors that explain the pattern of correlations within a set of observed variables. It groups variables that are highly correlated with each other into factors to reduce data dimensionality. The key steps are extracting factors with eigenvalues greater than 1, evaluating factor loadings to interpret the grouping of variables, and rotating factors to maximize interpretability of the results. SPSS output includes correlation coefficients, KMO/Bartlett's tests of sampling adequacy, eigenvalues, communalities, scree plots, and rotated component matrices.
3rd alex marketing club (pharmaceutical forecasting) dr. ahmed sham'aMahmoud Bahgat
#Mahmoud_Bahgat
#Marketing_Club
Join us by WhatsApp to me 00966568654916
*اشترك في صفحة ال Marketing Club* عالفيسبوك
https://www.facebook.com/MarketingTipsPAGE/
*اشترك في جروب ال Marketing Club* عالفيسبوك
https://www.facebook.com/groups/837318003074869/
*Marketing Club Middle East*
25 Meetings in 6 Cities in 1 year & 2 months
Since October 2015
*We have 6 groups whatsapp*
*for almost 600 marketers*
From all middle east
*since 5 years*
& now 10 more groups
For Marketing Club Lovers as future Marketers
أهم حاجة الشروط
*Only marketers*
From all Industries
No students
*No sales*
*No hotels Reps*
*No restaurants Reps*
*No Travel Agents*
*No Advertising Agencies*
*Many have asked to Attend the Club*
((We Wish All can Attend,But Cant..))
*Criteria of Marketing Club Members*
•••••••••••••••••••••••••••••••••••••
For Better Harmony & Mind set.
*Must be only Marketer*
*Also Previous Marketing experience*
●Business Managers
●Country Manager,GM
●Directors, CEO
Are most welcomed to add Value to us.
■■■■■■■■■■■■■■■■
《 *Unmatched Criteria*》
Not Med Rep,
Not Key Account,
Not Product Specialist,
Not Sales Supervisor,
Not Sales Manager,
●●●●●●●●●●●●●●●●●●
But till you become a marketer
you can join other What'sApp group
*Marketing Lover Future Club Group*
■■■■■■■■■■■■■■■■
《 *Unmatched Criteria*》
For Conflict of Intrest
*Also Can't attend*
If Working in
*Marketing Services Provider*
=not *Hotel* Marketers
=not *Restaurant* Marketers
=not *Advertising* Marketer
=not *Event Manager*
=not *Market Researcher*.
■■■■■■■■■■■■■■■■
■■■■■■■■■■■■■■■■
*this Club for Only Marketers*
Very Soon we will have
*Business Leaders Club*
For Sales Managers & Directors
Will be Not for Markters
●●●●●●●●●●●●●●●●●●●●
■ *Only Marketers* ■
*& EPS Marketing Diploma*
●●●●●●●●●●●●●●●●●●●●
Confirm coming by Pvt WhatsApp
*To know the new Location*
*#Mahmoud_Bahgat*
00966568654916
*#Marketing_Club*
http://goo.gl/forms/RfskGzDslP
*اشترك بصفحة جمعية الصيادلة المصريين* عالفيسبوك
https://lnkd.in/fucnv_5
■ *Bahgat Facbook Page*
https://lnkd.in/fVAdubA
■ *Bahgat Linkedin*
https://lnkd.in/fvDQXuG
■ *Bahgat Twitter*
https://lnkd.in/fmNC72T
■ *Bahgat YouTube Channel*
https://www.Youtube.com /mahmoud bahgat
■ *Bahgat Instagram*
https://lnkd.in/fmWPXrY
■ *Bahgat SnapChat*
https://lnkd.in/f6GR-mR
*#Mahmoud_Bahgat*
*#Legendary_ADLAND*
www.TheLegendary.info
This document summarizes a master's thesis project that investigates using break-point analysis to predict bankruptcy risk. The project was motivated by weaknesses in previous bankruptcy prediction studies, namely that they did not consider the timing of financial data or rates of change. The project applies Bayesian change-point (break-point) analysis to financial ratios of firms, to identify significant changes that may indicate distress. If break-points are found closer to a firm's bankruptcy, it could improve prediction compared to only considering financial ratio levels. The document describes the data, analytical methods, and preliminary exploration and application of break-point analysis to the ratios.
This document proposes a methodology for evaluating statistical classification models for churn prediction using a composite indicator. It considers factors beyond just accuracy, like robustness, speed, interpretability and ease of use. The methodology will be tested on classification models applied to real customer data from a Spanish retail company. It also analyzes the impact of different variable selection methods on model performance.
This document discusses using machine learning algorithms to predict household poverty levels. The goals are to build classification models to predict a household's poverty level as either "poor" or "non-poor" based on household attributes. Linear regression is proposed as the modeling algorithm. The document outlines collecting and preprocessing a household dataset, feature selection, model training and evaluation using metrics like MSE, RMSE and R-squared. References are provided on related work applying machine learning to poverty prediction using household surveys and satellite imagery.
This document summarizes an article that proposes a novel cost-free learning (CFL) approach called ABC-SVM to address the class imbalance problem. The approach aims to maximize the normalized mutual information of the predicted and actual classes to balance errors and rejects without requiring cost information. It optimizes misclassification costs, SVM parameters, and feature selection simultaneously using an artificial bee colony algorithm. Experimental results on several datasets show the method performs effectively compared to sampling techniques for class imbalance.
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...IJMIT JOURNAL
This document summarizes research on predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms. It reviews relevant techniques for handling class imbalance, including using the AUC evaluation metric, resampling methods like undersampling and oversampling, tuning the positive ratio, cross-validation, regularization for logistic regression, decision trees, and ensemble methods. The study aims to develop an optimal risk prediction model by jointly applying these techniques, with results showing that boosting on decision trees using oversampled data achieves the best performance.
On Confidence Intervals Construction for Measurement System Capability Indica...IRJESJOURNAL
Abstract: There are many criteria that have been proposed to determine the capability of a measurement system, all based on estimates of variance components. Some of them are the Precision to Tolerance Ratio, the Signal to Noise Ratio and the probabilities of misclassification. For most of these indicators, there are no exact confidence intervals, since the exact distributions of the point estimators are not known. In such situations, two approaches are widely used to obtain approximate confidence intervals: the Modified Large Samples (MLS) methods initially proposed by Graybill and Wang, and the construction of Generalized Confidence Intervals (GCI) introduced by Weerahandi. In this work we focus on the construction of the confidence intervals by the generalized approach in the context of Gauge repeatability and reproducibility studies. Since GCI are obtained by simulation procedures, we analyze the effect of the number of simulations on the variability of the confidence limits as well as the effect of the size of the experiment designed to collect data on the precision of the estimates. Both studies allowed deriving some practical implementation guidelinesin the use of the GCI approach. We finally present a real case study in which this technique was applied to evaluate the capability of a destructive measurement system.
A Classification And Comparison Of Credit Assignment Strategies In Multiobjec...Dustin Pytko
The document discusses adaptive operator selection (AOS) strategies for multiobjective optimization problems. It introduces a classification of credit assignment strategies for evaluating operator performance in AOS. The classification groups strategies based on the sets of solutions used to assess impact and the fitness function used to compare solutions. The paper proposes five new credit assignment strategies to fill gaps in the classification. It then experimentally compares the performance of nine total credit assignment strategies on benchmark problems. The results show that most strategies improve the generality of multiobjective evolutionary algorithms compared to a random operator selector.
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRYIJITCA Journal
Churn customer, one of the most important issues in customer relationship management and marketing is especially in industries such as telecommunications, the financial and insurance. In recent decades much
research has been done in this area. In this research, the index set for the reasons set reason churn customers for our customers is of particular importance. In this study we are intended to provide a formula for the index churn customers, the better to understand the reasons for customers to provide churn. Therefore, in order to evaluate the formula provided through six Classification methods (Decision tree QUEST, Decision tree C5.0, Decision tree CHAID, Decision trees CART, Bayesian network, Neural network) to evaluate the formula will be involved with individual indicators
This document discusses performance metrics for evaluating machine learning models. It explains that metrics are used to understand how well a model performs on both the training data and new, unseen data. For classification models, common metrics include accuracy, confusion matrix, precision, recall, F1 score, and AUC. For regression models, common metrics are mean absolute error, mean squared error, R2 score, and adjusted R2. The document provides formulas and explanations for calculating and interpreting each of these important performance metrics.
This document discusses performance metrics for evaluating machine learning models. It explains that performance metrics help understand how well a model performs on its training data and new, unseen data. For classification models, common metrics include accuracy, confusion matrix, precision, recall, F1 score, and AUC. For regression models, common metrics are mean absolute error, mean squared error, R2 score, and adjusted R2. The document provides formulas and explanations for calculating and interpreting each of these important performance metrics.
The document compares techniques for handling incomplete data when using decision trees. It investigates the robustness and accuracy of seven popular techniques when applied to different proportions, patterns and mechanisms of missing data in 21 datasets. The techniques include listwise deletion, decision tree single imputation, expectation maximization single imputation, mean/mode single imputation, and multiple imputation. The results suggest important differences between the techniques, with multiple imputation and decision tree single imputation generally performing better than the others. The choice of technique depends on factors like the amount and nature of the missing data.
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
Software cost estimation is an important task in the software design and development process.
Planning and budgeting tasks are carried out with reference to the software cost values. A variety of
software properties are used in the cost estimation process. Hardware, products, technology and
methodology factors are used in the cost estimation process. The software cost estimation quality is
measured with reference to the accuracy levels.
Software cost estimation is carried out using three types of techniques. They are regression based
model, anology based model and machine learning model. Each model has a set of technique for the
software cost estimation process. 11 cost estimation techniques fewer than 3 different categories are
used in the system. The Attribute Relational File Format (ARFF) is used maintain the software product
property values. The ARFF file is used as the main input for the system.
The proposed system is designed to perform the clustering and ranking of software cost
estimation methods. Non overlapped clustering technique is enhanced with optimal centroid estimation
mechanism. The system improves the clustering and ranking process accuracy. The system produces
efficient ranking results on software cost estimation methods.
This document summarizes a knowledge engineering approach using analytic hierarchy process (AHP) to resolve conflicts between experts in risk-related decision making. It proposes using a modified version of AHP to increase transparency in the analysis procedure. This allows identification of major causes of inter-expert discrepancy, which are differences in unstated assumptions and subjective weightings of risk factors. The document demonstrates how AHP can systematically decompose complex decision problems, evaluate alternatives based on multiple criteria, and aggregate results to provide an overall evaluation that incorporates differing expert opinions in a consistent manner.
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
Many business operations and strategies rely on bankruptcy prediction. In this paper, we aim to
study the impacts of public records and firmographics and predict the bankruptcy in a 12-
month-ahead period with using different classification models and adding values to traditionally
used financial ratios. Univariate analysis shows the statistical association and significance of
public records and firmographics indicators with the bankruptcy. Further, seven statistical
models and machine learning methods were developed, including Logistic Regression, Decision
Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and
Neural Network. The performance of models were evaluated and compared based on
classification accuracy, Type I error, Type II error, and ROC curves on the hold-out dataset.
Moreover, an experiment was set up to show the importance of oversampling for rare event
prediction. The result also shows that Bayesian Network is comparatively more robust than
other models without oversampling.
Instance Selection and Optimization of Neural NetworksITIIIndustries
Credit scoring is an important tool in financial institutions, which can be used in credit granting decision. Credit applications are marked by credit scoring models and those with high marks will be treated as “good”, while those with low marks will be regarded as “bad”. As data mining technique develops, automatic credit scoring systems are warmly welcomed for their high efficiency and objective judgments. Many machine learning algorithms have been applied in training credit scoring models, and ANN is one of them with good performance. This paper presents a higher accuracy credit scoring model based on MLP neural networks trained with back propagation algorithm. Our work focuses on enhancing credit scoring models in three aspects: optimize data distribution in datasets using a new method called Average Random Choosing; compare effects of training-validation-test instances numbers; and find the most suitable number of hidden units. Another contribution of this paper is summarizing the tendency of scoring accuracy of models when the number of hidden units increases. The experiment results show that our methods can achieve high credit scoring accuracy with imbalanced datasets. Thus, credit granting decision can be made by data mining methods using MLP neural networks.
This paper provides a structured way of thinking the design and construction of a composite indicator whose purpose is to facilitate ranking EU countries by Health System Performance and/or assess their progress over time on complex and multi-dimensional health issues.
Constructing a musa model to determine priority factor of a servperf model.Alexander Decker
This document summarizes a study that aimed to determine the priority factor among factors in a SERVPERF model for measuring student satisfaction with hostel management services. The researchers attempted to build a Multi-criteria Satisfaction Analysis (MUSA) model based on ordinal regression and linear programming to identify the priority factor. However, the MUSA model built was found to be unstable and unable to interpret the dataset. The researchers were unable to construct a stable and interpretable MUSA model for this data, even when categorizing the data in various ways.
Constructing a musa model to determine priority factor of a servperf model.Alexander Decker
This document summarizes a study that aimed to determine the priority factor among factors in a SERVPERF model for measuring student satisfaction with hostel management services. The researchers attempted to build a Multi-criteria Satisfaction Analysis (MUSA) model based on ordinal regression and linear programming to identify the priority factor. However, the MUSA model built was found to be unstable and unable to interpret the dataset. The researchers were unable to construct a stable MUSA model even after categorizing the data by gender and home state.
Aon FI Risk Advisory - CCAR Variable SelectionEvan Sekeris
Operational risk projections for CCAR pose challenges as the relationships between losses and economic factors are unclear. While established models exist for market and credit risk, the best approach for operational risk is less certain. Banks must use economic scenarios but many operational risk drivers are independent of economic changes. The document discusses whether variable selection should be objective via algorithms or subjective through expert judgment based on economic theory. It acknowledges limitations of both approaches. Regulators are looking for banks to consider both objective and subjective aspects in their models. Hybrid approaches that include multiple techniques like regression analysis, scenario analysis and historical averages may be most appropriate.
Grant Selection Process Using Simple Additive Weighting Approachijtsrd
Selection of educational grant is a key success factor for student learning and academic performance. Among popular methods, this paper contributes a real problem of selecting educational grant using data of grant application forms by one of the multi criteria decision making model, SAW method. This paper introduces nine criteria that are qualitative and positive for selecting grant for the students amongst fifteen application forms and also ranking them. Finally, the proposed method is demonstrated in a case study on selecting educational grant for students. Kyi Kyi Myint | Tin Tin Soe | Myint Myint Toe ""Grant Selection Process Using Simple Additive Weighting Approach"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25169.pdf
Paper URL: https://www.ijtsrd.com/computer-science/simulation/25169/grant-selection-process-using-simple-additive-weighting-approach/kyi-kyi-myint
Forecasting operational risk losses for CCAR poses a number of challenges, not least of which is uncertainty about the best methodology for modeling the relationship between a bank’s future losses and the macroeconomic environment.
Similar to A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION (20)
MULTIMODAL COURSE DESIGN AND IMPLEMENTATION USING LEML AND LMS FOR INSTRUCTIO...IJMIT JOURNAL
Traditionally, teaching has been centered around classroom delivery. However, the onslaught of the
COVID-19 pandemic has cultivated usage of technology, teaching, and learning methodologies for course
delivery. We investigate and describe different modes of course delivery that maintain the integrity of
teaching and learning. This paper answers to the research questions: 1) What course delivery method our
academic institutions use and why? 2) How can instructors validate the guidelines of the institutions? 3)
How courses should be taught to provide student learning outcomes? Using the Learning Environment
Modeling Language (LEML), we investigate the design and implementation of courses for delivery in the
following environments: face-to-face, online synchronous, asynchronous, hybrid, and hyflex. A good
course design and implementation are key components of instructional alignment. Furthermore, we
demonstrate how to design, implement, and deliver courses in synchronous, asynchronous, and hybrid
modes and describe our proposed enhancements to LEML.
Novel R&D Capabilities as a Response to ESG Risks-Lessons From Amazon’s Fusio...IJMIT JOURNAL
Environmental, Social, and Governance (ESG) management is essential for transforming corporate
financial performance-oriented business strategies into Finance (F) + ESG optimization strategies to
achieve the Sustainable Development Goals (SDGs).
In this trend, the rise of ESG risks has divided firms into two categories. Former incorporates a growthmindset that creates a passion for learning, and urges it to improve itself by endeavoring Research and
development (R&D) -driven challenges, while the other category, characterized by risk aversion, avoids
challenging highly uncertain R&D activities and seeks more manageable endeavors.
This duality underscores the complexity of corporate R&D strategies in addressing ESG risks and
necessitates the development of novel R&D capabilities for corporate R&D transformation strategies
towards F + ESG optimization.
International Journal of Managing Information Technology (IJMIT) ** WJCI IndexedIJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government, and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
International Journal of Managing Information Technology (IJMIT) ** WJCI IndexedIJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly open access peer-reviewed journal that publishes articles that contribute new results in all areas of the strategic application of information technology (IT) in organizations. The journal focuses on innovative ideas and best practices in using IT to advance organizations – for-profit, non-profit, and governmental. The goal of this journal is to bring together researchers and practitioners from academia, government, and industry to focus on understanding both how to use IT to support the strategy and goals of the organization and to employ IT in new ways to foster greater collaboration, communication, and information sharing both within the organization and with its stakeholders. The International Journal of Managing Information Technology seeks to establish new collaborations, new best practices, and new theories in these areas.
NOVEL R & D CAPABILITIES AS A RESPONSE TO ESG RISKS- LESSONS FROM AMAZON’S FU...IJMIT JOURNAL
Environmental, Social, and Governance (ESG) management is essential for transforming corporate
financial performance-oriented business strategies into Finance (F) + ESG optimization strategies to
achieve the Sustainable Development Goals (SDGs).
In this trend, the rise of ESG risks has divided firms into two categories. Former incorporates a growthmindset that creates a passion for learning, and urges it to improve itself by endeavoring Research and
development (R&D) -driven challenges, while the other category, characterized by risk aversion, avoids
challenging highly uncertain R&D activities and seeks more manageable endeavors.
This duality underscores the complexity of corporate R&D strategies in addressing ESG risks and
necessitates the development of novel R&D capabilities for corporate R&D transformation strategies
towards F + ESG optimization.
Building on this premise, this paper conducts an empirical analysis, utilizing reliable firms data on ESG
risk and brand value, with a focus on 100 global R&D leader firms. It analyzes R&D and actions for ESG
risk mitigation, and assesses the development of new functions that fulfill F + ESG optimization through
R&D. The analysis also highlights the significance of network externality effects, with a specific focus on
Amazon, a leading R&D company, providing insights into the direction for transforming R&D strategies
towards F + ESG optimization.
The dynamics of stakeholder engagement in F + ESG optimization are indicated with the example of
amazon's activities. Through the analysis, it became evident that Amazon's capacity encompassing growth
and scalability, specifically its ability to grow and expand, is accelerating high-level research and
development by gaining the trust of stakeholders in the "synergy through R&D-driven ESG risk
mitigation."
Finally, as examples of these initiatives, the paper discussed the Climate Pledge led by Amazon and the
transformation of Japan's management system.
A REVIEW OF STOCK TREND PREDICTION WITH COMBINATION OF EFFECTIVE MULTI TECHNI...IJMIT JOURNAL
It is important for investors to understand stock trends and market conditions before trading stocks. Both
these capabilities are very important for an investor in order to obtain maximized profit and minimized
losses. Without this capability, investors will suffer losses due to their ignorance regarding stock trends
and market conditions. Technical analysis helps to understand stock prices behavior with regards to past
trends, the signals given by indicators and the major turning points of the market price. This paper reviews
the stock trend predictions with a combination of the effective multi technical indicator strategy to increase
investment performance by taking into account the global performance and the proposed combination of
effective multi technical indicator strategy model.
INTRUSION DETECTION SYSTEM USING CUSTOMIZED RULES FOR SNORTIJMIT JOURNAL
This document proposes an intrusion detection system using customized rules for the Snort tool to improve security. The system uses Wireshark to scan network traffic for anomalies, Snort to detect attacks using customized rulesets for faster response times, and Wazuh and Splunk to analyze log files. Rules are created using the Snorpy tool and added to Snort to monitor for specific attacks like ICMP ping impersonation and authentication attempts. When attacks are attempted, the system successfully detects them and logs the alerts. The integration of these tools provides low-cost intrusion detection capabilities with automated threat identification and faster response compared to existing Snort configurations.
Artificial Intelligence (AI) has rapidly become a critical technology for businesses seeking to improve
efficiency and profitability. One area where AI is proving particularly impactful is in service operations
management, where it is used to create AI-powered service operations (AIServiceOps) that deliver highvalue services to customers. AIServiceOps involve the use of AI to automate and optimize various business
processes, such as customer service, sales, marketing, and supply chain management. The rapid
development of Artificial Intelligence has prompted many changes in the field of Information Technology
(IT) Service Operations. IT Service Operations are driven by AI, i.e., AIServiceOps. AI has empowered
new vitality and addressed many challenges in IT Service Operations. However, there is a literature gap on
the Business Value Impact of Artificial intelligence (AI) Powered IT Service Operations. It can help IT
build optimized business resilience by creating value in complex and ever-changing environments as
product organizations move faster than IT can handle. So, this research paper examines how AIServiceOps
creates business value and sustainability, basically how AIServiceOps makes the IT staff liberation from a
low-level, repetitive workout and traditional IT practices for a continuously optimized process. One of the
research objectives is to compare Traditional IT Service Operations with AIServiceOPs. This paper
provides the basis for how enterprises can evaluate AIServiceOps and consider it a digital transformation
tool. The paper presents a case study of a company that implemented AI-powered service operations
(AIServiceOps) and analyzes the resulting business outcomes. The study shows that AIServiceOps can
significantly improve service delivery, reduce response times, and increase customer satisfaction.
Furthermore, it demonstrates how AIServiceOps can deliver substantial cost savings, such as reducing
labor costs and minimizing downtime.
MEDIATING AND MODERATING FACTORS AFFECTING READINESS TO IOT APPLICATIONS: THE...IJMIT JOURNAL
Although IOT seems to be the upcoming trend, it is still in its infancy; especially in the banking industry.
There is a clear gap in literature, as only few studies identify factors affecting readiness to IOT
applications in banks in general, and almost negligible investigations on mediating and moderating
factors. Accordingly, this research aims to investigate the main factors that affect employees’ readiness to
IOT applications, while highlighting the mediating and moderating factors in the Egyptian banking sector.
The importance of Egypt stems from its high population and steady steps taken towards technology
adoption. 479 valid questionnaires were distributed over HR employees in banks. Data collected was
statistically analysed using Regression and SEM. Results showed a significant impact of ‘Security’,
‘Networking’, ‘Software Development’ and ‘Regulations’ on ‘readiness to IOT applications. Thus, the
readiness acceptance level is high‘Security’ and ‘User Intention’ were proven to mediate the relationship
between research variables and readiness to IOT applications, and only a partial moderation role was
proven for ‘Efficiency’. The study contributes to increasing literature on IOT applications in general, and
fills a gap on the Egyptian banking context in particular. Finally, it provides decision makers at banks with
useful guidelines on how to optimally promote IOT applications among employees.
EFFECTIVELY CONNECT ACQUIRED TECHNOLOGY TO INNOVATION OVER A LONG PERIODIJMIT JOURNAL
IT (Information and Communication Technology) companies are facing the dilemma of decreasing
productivity despite increasing research and development efforts. M&A (Merger and Acquisition) is being
considered as a breakthrough solution. From existing research, it has been pointed out that M&A leads to
the emergence of new innovations. Purpose of this study was to discuss the efficient ways of acquisition and
to resolve the dilemma of productivity decline by clarifying how the technology obtained through M&A
leads to the creation of new innovations. Hypothesis 1 was that the technology acquired through M&A is
utilized for innovation creation, Hypothesis 2 was that the acquired technology is utilized over a long
period of time, and Hypothesis 3 was that a long-term utilization has a positive impact on corporate
performance. The results, using sports prosthetics as a case study and using patents as a proxy variable,
confirmed all the hypotheses set. We have revealed that long-term utilization of technology obtained
through M&A is effective for creating new innovations.
International Journal of Managing Information Technology (IJMIT) ** WJCI IndexedIJMIT JOURNAL
The International Journal of Managing Information Technology (IJMIT) is a quarterly peer-reviewed journal that publishes articles on the strategic application of information technology in organizations from both academic and industry perspectives. The journal focuses on innovative uses of IT to support organizational goals and foster collaboration both within and outside organizations. It covers topics such as education technology, e-government, healthcare IT, mobile systems, and more. Authors are invited to submit original research papers for consideration through the journal's online submission system.
4th International Conference on Cloud, Big Data and IoT (CBIoT 2023)IJMIT JOURNAL
4th International Conference on Cloud, Big Data and IoT (CBIoT 2023) will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the areas of Cloud, Big Data and IoT. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of Cloud, Big Data and IoT.
Authors are solicited to contribute to the conference by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in Cloud, Big Data and IoT.
TRANSFORMING SERVICE OPERATIONS WITH AI: A CASE FOR BUSINESS VALUEIJMIT JOURNAL
This document discusses how AI-powered service operations (AIServiceOps) can create business value through digital transformation. It begins with background on digital transformation and how AI is driving changes in IT service operations. It then examines how AIServiceOps can streamline processes, provide insights, and improve customer experience. A case study is presented showing how one company implemented AIServiceOps to significantly reduce response times, increase customer satisfaction, and lower costs. The document argues that AIServiceOps can deliver both quantifiable and flexible benefits while enhancing organizational resilience and sustainability over the long term.
DESIGNING A FRAMEWORK FOR ENHANCING THE ONLINE KNOWLEDGE-SHARING BEHAVIOR OF ...IJMIT JOURNAL
The main objective of this paper is to identify the factors that influence academic staff's digital knowledgesharing behaviors in Ethiopian higher education. A structural equation model was used to validate the
research framework using survey data from 210 respondents. The collected data has been analyzed using
Smart PLS software. The results of the study show that trust, self-motivation, and altruism are positively
related to attitude. Contrary to our expectations, knowledge technology negatively affects attitude.
However, reward systems and empowerment by leaders are significantly associated with knowledgesharing intentions.Knowledge-sharing intention, in turn, was significantly related to digital knowledgesharing behavior. The contributions of this study are twofold. The framework may serve as a roadmap for
future researchers and managers considering their strategy to enhance digital knowledge sharing in HEI.
The findings will benefit academic staff and university administrations.The study will also help academic
staff enhance their knowledge-sharing practices.
BUILDING RELIABLE CLOUD SYSTEMS THROUGH CHAOS ENGINEERINGIJMIT JOURNAL
Cloud computing systems need to be reliable so that they can be accessed and used for computing at any
given point in time. The complex nature of cloud systems is the motivation to conduct research in novel
ways of ensuring that cloud systems are built with reliability in mind. In building cloud systems, it is
expected that the cloud system will be able to deal with high demands and unexpected events that affect the
reliability and performance of the system.
In this paper, chaos engineering is considered a heuristic method that can be used to build reliable cloud
systems. Chaos engineering is aimed at exposing weaknesses in systems that are in production. Chaos
engineering will help identify system weaknesses and strengths when a system is exposed to unexpected
knocks and shocks while it is in production.
Chaos engineering allows system developers and administrators to get insights into how the cloud system
will behave when it is exposed to unexpected occurrences.
A REVIEW OF STOCK TREND PREDICTION WITH COMBINATION OF EFFECTIVE MULTI TECHNI...IJMIT JOURNAL
It is important for investors to understand stock trends and market conditions before trading stocks. Both
these capabilities are very important for an investor in order to obtain maximized profit and minimized
losses. Without this capability, investors will suffer losses due to their ignorance regarding stock trends
and market conditions. Technical analysis helps to understand stock prices behavior with regards to past
trends, the signals given by indicators and the major turning points of the market price. This paper reviews
the stock trend predictions with a combination of the effective multi technical indicator strategy to increase
investment performance by taking into account the global performance and the proposed combination of
effective multi technical indicator strategy model.
NETWORK MEDIA ATTENTION AND GREEN TECHNOLOGY INNOVATIONIJMIT JOURNAL
This paper will provide a novel empirical study for the relationship between network media attention and
green technology innovation and examine how network media attention can ease financing constraints. It
collected data from listed companies in China's heavy pollution industry and performed rigorous
regression analysis, in order to innovatively explore the environmental governance functions of the media.
It found that network media attention significantly promotes green technology innovation. By analyzing the
inner mechanism further, it found that network media attention can promote green innovation by easing
financing constraints. Besides, network media attention has a significant positive impact on green invention
patents while not affecting green utility model patents.
INCLUSIVE ENTREPRENEURSHIP IN HANDLING COMPETING INSTITUTIONAL LOGICS FOR DHI...IJMIT JOURNAL
Information System (IS) research advocates employing collaborative and loose coupling strategies to address contradictory issues to address diversified actors’ interests than the prescriptive and unilateral Information Technology (IT) governance mechanisms’, yet it is rarely depicting how managers employ these strategies in Health Information System (HIS) implementation, particularly in a resource-constrained setting where IS implementation activities have highly relied on multiple international organizations resources. This study explored how managers in resource-constrained settings employ collaborative IT governance mechanisms in the case of District Health Information System 2 (DHIS2) adoption with an interpretative case study approach and the institutional logic concept. The institutional logic concept was used to identify the major actors’ logics underpinning the DHIS2 adoption. The study depicted the importance of high-level officials' distance from the dominant systemic logic to consider new alternative, and to employ inclusive IT governance mechanisms which separated resource from the system that facilitated stakeholders’ collaboration in DHIS2 adoption based on their capacity and interest.
DEEP LEARNING APPROACH FOR EVENT MONITORING SYSTEMIJMIT JOURNAL
With an increasing number of extreme events and complexity, more alarms are being used to monitor
control rooms. Operators in the control rooms need to monitor and analyze these alarms to take suitable
actions to ensure the system’s stability and security. Security is the biggest concern in the modern world. It
is important to have a rigid surveillance that should guarantee protection from any sought of hazard.
Considering security, Closed Circuit TV (CCTV) cameras are being utilized for reconnaissance, but these
CCTV cameras require a person for supervision. As a human being, there can be a possibility to be tired
off in supervision at any point of time. So, we need a system to detect automatically. Thus, we came up with
a solution using YOLO V5. We have taken a data set and used robo-flow framework to enhance the existing
images into numerous variations where it will create a copy of grey scale image, a copy of its rotation and
a copy of its blurred version which will be used to get an enlarged data set. This work mainly focuses on
providing a secure environment using CCTV live footage as a source to detect the weapons. Using YOLO
algorithm, it divides an image from the video into grid system and each grid detects an object within itself
MULTIMODAL COURSE DESIGN AND IMPLEMENTATION USING LEML AND LMS FOR INSTRUCTIO...IJMIT JOURNAL
The document discusses course delivery modalities including face-to-face, online asynchronous, online synchronous, hybrid, and HyFlex. It investigates the design and implementation of courses using the Learning Environment Modeling Language (LEML) for different delivery environments. The authors describe their experience delivering courses at Southern University and A&M College and Baton Rouge Community College. They aim to answer questions about the course delivery methods used by their institutions and how to validate guidelines and ensure student learning outcomes.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
UNLOCKING HEALTHCARE 4.0: NAVIGATING CRITICAL SUCCESS FACTORS FOR EFFECTIVE I...amsjournal
The Fourth Industrial Revolution is transforming industries, including healthcare, by integrating digital,
physical, and biological technologies. This study examines the integration of 4.0 technologies into
healthcare, identifying success factors and challenges through interviews with 70 stakeholders from 33
countries. Healthcare is evolving significantly, with varied objectives across nations aiming to improve
population health. The study explores stakeholders' perceptions on critical success factors, identifying
challenges such as insufficiently trained personnel, organizational silos, and structural barriers to data
exchange. Facilitators for integration include cost reduction initiatives and interoperability policies.
Technologies like IoT, Big Data, AI, Machine Learning, and robotics enhance diagnostics, treatment
precision, and real-time monitoring, reducing errors and optimizing resource utilization. Automation
improves employee satisfaction and patient care, while Blockchain and telemedicine drive cost reductions.
Successful integration requires skilled professionals and supportive policies, promising efficient resource
use, lower error rates, and accelerated processes, leading to optimized global healthcare outcomes.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
1. International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
DOI: 10.5121/ijmit.2021.13101
A NOVEL PERFORMANCE MEASURE FOR MACHINE
LEARNING CLASSIFICATION
Mingxing Gong
Financial Services Analytics, University of Delaware Newark, DE 19716
ABSTRACT
Machine learning models have been widely used in numerous classification problems and performance
measures play a critical role in machine learning model development, selection, and evaluation. This
paper covers a comprehensive overview of performance measures in machine learning classification.
Besides, we proposed a framework to construct a novel evaluation metric that is based on the voting
results of three performance measures, each of which has strengths and limitations. The new metric can be
proved better than accuracy in terms of consistency and discriminancy.
1. INTRODUCTION
Performance measures play an essential role in machine learning model development, selection,
and evaluation. There are a plethora of metrics for classification, such as Accuracy, Recall,
Fmeasure, Area Under the ROC Curve (AUC), Mean Squared Error (or known as Brior Score),
LogLoss/Entropy, Cohen’s Kappa, Matthews Correlation Coefficient (MCC), etc. Generally, the
performance measures can be broadly classified into three categories [1]:
Metrics based on a confusion matrix: accuracy, macro-/micro-averaged accuracy
(arithmetic and geometric), precision, recall, F-measure, Matthews correlation
coefficient and Cohen’s kappa, etc.
Metrics based on a probabilistic understanding of error, i.e., measuring the deviation
from the actual probability: mean absolute error, accuracy ratio, mean squared error
(Brier score), LogLoss (cross-entropy), etc.
Metrics based on discriminatory power: ROC and its variants, AUC, Somers’D, Youden
Index, precision-recall curve, Kolmogorov-Smirnov, lift chart, etc.
Confusion-matrix-based metrics are most commonly used, especially accuracy since it is
intuitive and straightforward. Most classifiers aim to optimize accuracy. However, the drawbacks
of accuracy are also evident. The classifier with accuracy as the objective function is more biased
towards majority class. For example, in credit card fraud detection where the data is highly
imbalanced, 99.9% of transactions are legitimate, and only 0.1% are fraudulent. A model merely
claiming that all transactions are legitimate will reach 99.9% accuracy that almost no other
classifiers can beat. However, the model is useless in practice, given no fraudulent transactions
could be captured by the model. The drawback of accuracy highlights the importance of recall
(or called true positive rate), which measures the effectiveness of the model in detecting
fraudulent transactions. Nevertheless, recall is still a biased metric. For example, an extreme case
that every transaction is classified as fraudulent would give the highest recall. To reduce false
alarms, precision needs to be taken into account to measure the reliability of the prediction. In
other words, those metrics alone can only capture part of the information of the model
performance. That is why accuracy, recall, precision, or the combination (F-measures) are
considered altogether in a balanced manner when dealing with the imbalanced data classification.
Besides, both recall and precision are focus on positive examples and predictions. Neither of
them captures how the classifier handles the negative examples. It is challenging to describe all
the information of the confusion matrix using one single metric. The Matthews correlation
1
2. 2
coefficient(MCC) is one of the measures regarded as a balanced measure, which takes into
account both true and false positives and negatives. MCC is commonly used as a reference
performance measure on imbalanced data sets in many fields, including bioinformatics,
astronomy, etc. [2], where both negatives and positives are very important. Recent study also
indicated that MCC is favourable over F1 score and accuracy in binary classification evaluation
on imbalanced datasets [3]. It is worth mentioning that all confusion-matrix-based metrics are
subject to threshold selection, which is very difficult, especially when misclassification cost and
prior probability are unknown.
On the contrary, the probability metrics do not rely on a threshold. The probability metrics
measure the deviation or entropy information between the prediction probability p(i; j) and true
probability f(i; j) (0 or 1). The three most popular probability metrics are Brier score, cross
entropy, and calibration score. The first two measures are minimized when the predicted
probability is close to the true probability. The calibration score assesses score reliability. A well-
calibrated score indicates that if the classifier prediction probability of positives for a sample of
datasets is p, the proportion of positives in the sample is also p. A well-calibrated score does not
mean perfect quality of the classifier.
For metrics based on discriminatory power, ROC is the most popular one [4, 5, 6]. Compared
with confusion matrix based metrics, ROC has several advantages. Firstly, ROC does not rely on
threshold settings [5]. Secondly, the change of prior class distribution will not impact the ROC
curve, since ROC depends on true positive rate and false positive rate, which are calculated
against true positives and true negatives independently. Thirdly, ROC analysis does not
necessarily have to produce exact probability, and what it has to do is to discriminate positive
instances from negative instances [7]. There are several ROC extensions and invariants. To
integrate the decision goal with cost information and prior class distribution, ROC convex hull is
developed. And under some conditions that only early retrieval information is of any interest,
Concentrated ROC (CROC) was proposed to magnify the interested area using an appropriate
continuous and monotone function. Besides, ROC curve is closely related to or can be
transformed into precision-recall curve, cost curve, lift chart, Kolmogorov–Smirnov (K-S) alike
curves, etc. Furthermore, ROC can be extended to multi-class via one vs. one (OVO) or one vs.
all (OVA) methodologies. A single summary metric called Area under the ROC curve (AUC) is
derived from ROC. Statistical meaning of the AUC measure is the following: AUC of a classifier
is equivalent to the probability that the classifier’s output probability for a randomly chosen
positive instance is higher than a randomly chosen negative instance. A similar measure in multi-
class is called volume under ROC space (VUS). There are a lot of discussions on multiclass ROC
and VUS [8, 9, 10, 11]. In addition to the three categories of performance measures as mentioned
above, the practitioner often assesses the stability of the classifiers’ output based on distribution
and characteristic analysis. Popularly used metrics include Population Stability Index (PSI) [12],
Characteristic Stability Index (CSI), etc. In summary, there is no guarantee that one measure
would definitely outperform the other in all situations.
The motivation of this study is that although there are a lot of literature to discuss the evaluation
of performance of classifiers in an isolated manner (e.g., [13] wrote an overview of ROC curve;
[14] discussed the impact of class imbalance in classification performance metrics based on the
binary confusion matrix, etc.), there exists very few comprehensive review in this area. [15]
wrote a very general review on evaluation metrics for data classification evaluations. However
the review is lack of details and a lot of popular measures are not mentioned. Santafe et al.
review the important aspects of the evaluation process in supervised classification in [16], which
is more focus on the statistical significance tests of differences of the measures. There are also
many literature papers discussing the machine learning performance measures in the specific
fields. For example, [17] studied machine learning performance metrics and diagnostic context in
radiology; [18] performed a
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
3. comparative study of performance metrics of data mining algorithms on medical data; [19] discussed
the performance measures in text classification. Motivated by the lack of comprehensive generic
review in performance measures, our paper covers a comprehensive review of above-mentioned
three categories of measures. In addition, we propose a framework to construct a new and better
evaluation metric.
The main contribution of this paper lies in three aspects: it provided a comprehensive and
detailed review of performance measures; it highlighted the limitations and common pitfalls of using
these performance measures from a practitioner’s perspective; it proposed a framework to construct
a better evaluation metric. The rest of the paper is structured as follows: Section 2 provides an
overview of performance measures as well as their limitations and misuses in practice; Section 3
introduces a solution to compare the performance measures; Section 4 proposes a novel performance
measure based on the voting method and Section 5 discusses the experimental results; Section 6
concludes the paper and discusses the potential research opportunities in the future.
2 An overview of performance measures
2.1 Metrics based on a confusion matrix
Confusion-matrix is commonly accepted in evaluating classification tasks. For binary classification,
it is a two by two matrix with actual class and predicted class, illustrated as below. For short, TP
stands for true positive, FN for false negative, FP for false positive, and TN for true negative. In
fraud detection problem setting, we assume positive means fraud and negative means non-fraud.
The metrics derived from confusion-matrix are described as follows:
Predictive
class
Actual Class
p n total
p0 True
Positive
False
Positive
P0
n0 False
Negative
True
Negative
N0
total P N
Accuracy (Acc) is the most simple and common measure derived from the confusion matrix.
ACC =
TP + TN
P + N
=
TP + TN
TP + TN + FP + FN
(1)
True positive rate (TPR) or recall (also known as sensitivity) is defined as true positive (TP) divided
by total number of actual positives (TP + FN).
TPR =
TP
P
=
TP
TP + FN
(2)
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
3
4. Specificity or true negative rate (TNR) is calculated as
Specificity =
TN
N
=
TN
FP + TN
(3)
1 − Specificity is called false positive rate (FPR). Sensitivity can be interpreted as accuracy on
the actual positive instance, while specificity is characterized as accuracy on the actual negative
instances. Similarly, there are also metrics on performance on the predictive positive/negative
instances, such as precision and negative predictive value (NPV).
Precision =
TP
P0 =
TP
TP + FP
(4)
NPV =
TN
N0 =
TN
TN + FN
(5)
It is worth noting that none of above-mentioned single metric alone can be sufficient to describe
the model performance in fraud detection. For example, similar to accuracy, if we use recall as the
performance metric, a model that predicts all transactions are fraudulent can achieve 100% recall
rate. However, the model is useless.
The F-measure, a balanced performance metric, considers both recall and precision. The
commonly used F1 measure is the harmonic mean of precision and recall:
F1 = 2 ∗
Precision ∗ Recall
Precision + Recall
=
2TP
2TP + FP + FN
(6)
In the F1 measure, precision is weighted as the same important as recall. The general formula for
Fβ:
Fβ = (1 + β2
) ∗
Precision ∗ Recall
β2 ∗ Precision + Recall
(7)
which weights β times as much importance to recall as precision. However, as Equation 7 shows that
true negatives are neglected in F-measure which focuses more on positive instances and predictions.
In addition, we should be very cautious when using F measure to compare model performance
in fraud detection, especially considering resampling methods (undersampling legitimate cases or
oversampling fraud cases) are often adopted in fraud detection to address imbalance data issues,
which may result in different data distributions. Unlike other performance measures (e.g., recall,
false positive rate, ROC) that are independent of the data set distribution, F measure will differ
significantly on different data sets, as illustrated in Table 1. The model with the same fraud capture
rate (recall) and false positive rate exhibits different F1. Only relying on F1 score may reach a
misleading conclusion that the model overfits or deteriorates.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
4
5. Table 1: Different F1 scores for the same model on different data sets
(a) Recall = 0.5;FPR = 0.091;F1 = 0.645
Predictive
class
Actual Class
p n total
p0 TP
500
FP
50
P0
n0 FN
500
True
500
N0
total P N
(b) Recall = 0.5; FPR = 0.091;F1 = 0.50
Predictive
class
Actual Class
p n total
p0 TP
500
FP
500
P0
n0 FN
500
TN
5000
N0
total P N
It is challenging to describe all the information of the confusion matrix using one single metric.
The Matthews correlation coefficient (MCC) is one of the best such measures, which takes into
account both true and false positives and negatives [20]. The MCC can be calculated from confusion
matrix:
MCC =
TP ∗ TN − FP ∗ FN
p
(TP + FP)(TP + FN)(TN + FP)(TN + FN)
(8)
MCC varies from [−1, 1], with 1 as perfect prediction, 0 as random prediction and -1 as total
disagreement between the actual and prediction.
Cohen’s kappa [21] is another measure to correct the predictive accuracy due to chance. It is
calculated as:
k =
p0 − pc
1 − pc
(9)
p0 is the observed accuracy, represented by T P +T N
T P +T N+F P +F N from confusion matrix, while pc is
expected accuracy, calculated as
(TN + FP) ∗ (TN + FN) + (FN + TP) ∗ (FP + TP)
(P + N)2
(10)
It is important to note that the aforementioned confusion-matrix based measures are subject to
thresholds and prior class distribution. The different selection of thresholds will result in changes of
confusion matrix. As long as the predicted probability is greater than the threshold, the example
would be labeled as positive and vice versa. It does not matter how close the predicted probability
is to the actual probability. For example, there are two classifiers for a new instance, which actually
belongs to the positive class. One predicts the likelihood of positive event is 0.51 and the second
classifier predicts the probability of positive event is 0.99. If the threshold is 0.50, the confusion-
matrix based measures would be exactly the same for these two classifiers, although intuitively the
second classifier is much better.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
5
6. 2.2 Metrics based on probability deviation
Measures based on probability are threshold-free metrics that assess the deviation from the true
probability. There are several probability-based measures including MAE (mean absolute error)
which shows how much the predictions deviate from the actual outcome; Brier score,or called mean
square error, a quadratic version of MAE which penalizes greater on the deviation; Cross entropy,
derived from information theory; calibration score, which is also known as Expected vs. Actual
(EvA) analysis. Below are the formulas for these common probability-based metrics.
The mean absolute error (MAE) is defined as:
MAE =
1
n
n
X
i=1
|yi − pi|
The Brier score is defined as:
Brier_score =
1
n
n
X
i=1
(yi − pi)2
The cross entropy is defined as:
cross_entropy =
n
X
i=1
−yilog(pi)
Where pi is the predicted probability and yi is the ground truth probability (0 or 1 for binary
classification). Brier score and cross entropy are the most commonly used metrics, which are widely
used as the objective function in machine learning algorithm. Generally, cross entropy is preferred
in classification problems and brier score is preferred in regression models. In practice, brier score
and cross entropy are more considered as objective function/loss function, instead of performance
measures. The reason is the most classification models are used for ranking-order purposes and the
absolute accuracy of the model is not of significant concern.
2.3 Metrics based on discriminatory power
Two-class ROC
ROC (Receiver Operations Characteristics) is one of the most widely used performance metrics,
especially in binary classification problems. A ROC curve example is shown below (Figure 1). The
horizontal axis is represented by false positive rate (FPR), and the vertical axis is true positive rate
(TPR). The diagonal connecting (0,0) and (1,1) indicates the situation in which the model does not
provide any separation power of bad from good over random sampling. Classifiers with ROC curves
away from the diagonal and located on the upper-left side are considered excellent in separation
power. The maximum vertical distance between ROC curve and diagonal is known as Youden’s
index, which may be used as a criterion for selecting operating cut-off point.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
6
7. Figure 1: A ROC curve example Source: scikit-learn simulation [22]
ROC is considered as a trade-off between true positive rate and false positive rate. One useful
feature of the ROC curve is that they remain unchanged when altering class distribution [21]. Since
ROC is based on TPR and FPR, which are calculated on actual positives and actual negatives
respectively, ROC curves are consequently independent of class distribution. Another useful feature
of ROC curve is that the curves exhaust all possible thresholds, which contain more information
than static threshold dependent metrics, such as precision, accuracy, and recall. Therefore ROC
curve is deemed as a useful measure for overall ranking and separation power for model evaluation.
Unfortunately, while a ROC graph is a valuable visualization technique, it may not help select
the classifiers in certain situations. Only when one classifier clearly dominates another over the
entire operating space can it be declared better. For example, Figure 2 shows that we can claim
that classifier 0 outperforms other classifiers since the curve is always above other curves. However,
it is a little challenging to tell whether classifier 4 is better than classifier 2 or not.
Thus, A single summary metric called Area under the ROC curve (AUC) is derived from ROC.
Statistical meaning of the AUC measure is the following: AUC of a classifier is equivalent to the
probability that the classifier will evaluate a randomly chosen positive instance as a higher probability
than a randomly chosen negative instance. Let {p1, ..., pm} be the predicted probabilities of the
m positives to belong to the positive class sorted by descending order. And {n1, ..., nk} be the
predicted probabilities of the k negatives to belong to the positive class sorted by descending order.
AUC can be defined as
AUC =
1
mk
m
X
i=1
k
X
j=1
1pi>nj
(11)
Where 1pi>nj
is an indicator function, equal to 1 when pi > nj.
AUC is useful to rank the examples by the estimated probability and compare the performance
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
7
8. of the classifier. Huang and Ling theoretically and empirically show that AUC is more preferred
than accuracy [23]. However, AUC as a single summary metric has its drawbacks. For example,
the AUC for classifier 1 is 0.55, and the AUC for classifier 2 is 0.65. Classifier 2 performs much
better than classifier 1 in terms of AUC. However, considering a situation like a marketing campaign,
where people may only be interested in the very top-ranked examples, classifier 1 may be a better
choice. And also, AUC is purely a ranked order metric, and it ignores the probability values [23].
Figure 2: ROC comparisons Source: scikit-learn simulation [22]
There are many variants of ROC and AUC. Gini coefficient, also called Somer’s D, is one of
them, which is widely used in the evaluation of credit score modeling in banking industry. Gini can
be converted from AUC approximately, using the formula: Gini = (AUC − 0.5) ∗ 2. Wu et al. [24]
proposed a scored AUC (sAUC) that incorporates both the ranks and probability estimate of the
classifiers. sAUC is calculated as
sAUC =
1
mk
m
X
i=1
k
X
j=1
1pi>nj ∗ (pi − nj) (12)
The only difference between sAUC and AUC is the addition of a factor pi − nj, which quantifies the
probability deviation when the ranks are correct. pAUC (probabilistic AUC) is another AUC variant
(Ferri et al.,2005) [23] which takes both rank performance and the magnitude of the probabilities
into account. pAUC is defined as
pAUC =
Pm
i=1 pi
m −
Pk
j=1 ni
k + 1
2
=
Pm
i=1 pi
m +
Pk
j=1(1−ni)
k
2
(13)
pAUC can be interpreted as the mean of the average of probabilities that positive instances are
assigned to positive class, and negative instances are assigned to negative class.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
8
9. ROC Convex Hull
For a set of classifiers with ROC curves in the same graph, the ROC convex hull (ROCCH) is
constructed in a way that the line connects all corresponding points on the curves to ensure there
are no points above the final curve. The ROC convex hull describes the potential optimal operating
characteristics for the classifiers and provides a possible solution to form a better classifier by
combining different classifiers. Classifiers below the ROCCH is always sub-optimal. As shown in
Figure 3, classifiers B, D are sub-optimal, while classifiers A, C are potentially optimal classifiers.
Figure 3: The ROC convex hull identifies potentially optimal classifier
Source: Provost and Fawcett, 2000
Since ROC curve does not rely on the cost matrix and class distribution, it is difficult to make
decisions. By incorporating cost matrix and class distribution into ROC space, the decision goal can
be achieved in the ROC space. Especially, the expected cost of applying the classifier represented by
a point (FP, TP) in ROC space is: [25]
p(+) ∗ (1 − TP) ∗ C(−|+) + p(−) ∗ FP ∗ C(+|−) (14)
Where p(+) and p(−) are the prior distribution of positives and negatives. C(−|+) is the misclassi-
fication cost of false negative, while C(+|−) is the misclassification cost of false positive. Therefore,
two points, (FP1, TP1) and (FP2, TP2), have the same performance if
TP2 − TP1
FP2 − FP1
=
C(+|−)p(−)
C(−|+)p(+)
(15)
The above equation defines the slope of an iso-performance line. That is, all classifiers corre-
sponding to points on the line have the same expected cost. Each set of class and cost distributions
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
9
10. defines a family of iso-performance lines. Lines "more northwest" (having a larger TP − intercept)
are better because they correspond to classifiers with lower expected cost [25].
Figure 4: Line α and β show the optimal classifier under different sets of
conditions
Source: Provost and Fawcett, 2000
Assume an imbalanced dataset with the negative observations 10 times of positive observations
and equal misclassification cost. In this case, classifier A would be preferred. If the false negative
costs 100 times of the false postive, classifier C would be a better choice. ROC convex hull provides
a visual solution for the decision making under different set of cost and class distributions.
Concentrated ROC
The concentrated ROC (CROC) plot evaluates the early-retrieval performance of a classifier [26].
The findings are motivated by many practical problems ranging from marketing campaign to fraud
detection, where only the top-ranked predictions are of any interest and ROC and AUCs, which are
used to measure the overall ranking power, are not very useful.
Figure 5 shows that two ROC curves and the red rectangle area, the interested early retrieval area.
The green curve outperforms the red curve in the early retrieval area. CROC adopts an approach
whereby any portion of the ROC curve of interest is magnified smoothly using an appropriate
continuous and monotone function f from [0,1] to [0,1], satisfying f(0) = 0 and f(1) = 1. Functions
like exponential, power or logarithmic are popularly used to expand the early part of the x − axis
and contract the late part of the x − axis. The choice of magnification curve f and magnification
parameters α are subject to different application scenarios. The early recognition performance of a
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
10
11. classifier can be measured by the AUC[CROC]. This area depends on the magnification parameter
α and therefore both the area and α must be reported [26].
Figure 5: Early retrieval area in ROC curves. Source: Swamidass et al.,
2010
Precision-Recall Curve
Like ROC curve, Precision-Recall curve (PRC) provides a model-wide evaluation of classifiers across
different thresholds. The Precision-Recall curve shows a trade-off between precision and recall. The
area under the curve (PRC) is called AUC (PRC). There are several aspects in which Precision-Recall
is different from ROC curve. Firstly, the baseline for ROC curve is fixed, which is the diagonal
connecting (0,0) and (1,1), while the baseline for PRC is correlated with the class distribution P
N+P .
Secondly, interpolation methods for PRC and ROC curves are different-ROC analysis uses linear and
PRC analysis uses non-linear interpolation. Interpolation between two points A and B in PRC space
can be represented as a function y = T PA+x
T PA+x+F PA+
F PB−F PA
T PB−T PA
∗x
where x can be any value between
TPA and TPB [27]. Thirdly, ROC curve is monotonic but PRC may not be monotonic, which
leads to more crossover in PRC curves from different classifiers. As mentioned that the baseline for
PRC is determined by the class distribution, for imbalance data PRC plots is believed to be more
informative and powerful plot and can explicitly reveal differences in early-retrieval performance [28].
Figure 6 consists four panels for different visualization plots. Each panel contains two plots with
balanced (left) and imbalanced (right) for (A) ROC, (B) CROC, (C) Cost Curve, and (D) PRC.
Five curves represent five different performance levels: Random (Rand; red), Poor early retrieval
(ER-; blue), Good early retrieval (ER+; green), Excellent (Excel; purple), and Perfect (Perf; orange)
[28]. Figure 6 shows that PRC is changed, but other plots remain the same between balanced and
imbalanced dataset, indicating that only precision-recall curve is sensitive to the class distribution
changes.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
11
12. Figure 6: PRC is changed but other plots remain the same between
balanced and imbalanced dataset
Source: Saito & Rehmsmeier, 2015
The study in paper [29] shows a deep connection between ROC space and PRC space, stating
that a curve dominates in ROC space if and only if it dominates in PRC space. And the same paper
demonstrates that PRC space analog to the convex hull in ROC space is achievable.
Alternative Graphical Performance Measures
There are many other graphical performance measures, such as cost curve, lift chart, PN-curve, K-S
curve, calibration curves, etc. These visualization tools are discussed as below:
• Cost curve. The paper [30] proposed an alternative graphical measure to ROC analysis,
which represents the cost explicitly. The ROC curve can be transformed into a cost curve.
The x-axis in a cost curve is the probability-cost function for cost examples, PCF(+) =
p(+)C(−|+)/(p(+)C(−|+) + p(−)C(+|−)). The y-axis is expected cost normalized with
respect to the cost incurred when every example is incorrectly classified, which is defined as
below:
NE[C] =
(1 − TP)p(+)C(−|+) + FPp(−)C(+|−)
p(+)C(−|+) + p(−)C(+|−)
(16)
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
12
13. where p(+) is the probability of positive instance (prior class probability) and C(−|+) is the
false negative misclassifying cost. And p(−) is the probability of negative instance (prior class
probability) and C(+|−) is the false positive misclassifying cost.
• Lift Chart. Lift chart is a graphical depiction of how the model improves proportion of
positive instances when a subset of data is selected. The x-axis is the % of the dataset and
y-axis is the number of true positives. Not like ROC curve, the diagonal line for lift chart
is not necessarily at 450
. The diagonal line estimates what would be the number of true
positives if a sample size of x is randomly selected. The far above the diagonal, the greater
the model improves the lift. The lift chart is widely used in response models to make decisions
on marketing campaigns. The same principle of ROC convex hull with iso-performance can be
applied to lift chart to decide the cut-off point for maximizing profit [31].
• PN-Curve. PN-curve is similar to ROC. Instead of using normalized TPR and FPR, PN-
curve directly has TP on the y-axis and FP on the x-axis. PN-curve can be transformed to a
ROC after normalized using prior class distribution.
• Kolmogorov–Smirnov alike curve. Kolmogorov–Smirnov (K-S) graph is widely used in
credit risk scoring to measure the discrimination. The K-S stat measures the maximum
difference between the cumulative distribution function of positives and negatives. A similar
graph is proposed in [32], where true positive rate and false positive rate are drawn separately
against the probability threshold ([0,1]) in the same graph 7.
Figure 7: False positive/True positive Curve
Source: Desen & Bintong, 2017
In the graph, f1 is true positive rate, monotonically decreasing when the threshold is set higher
and higher. f0 stands for true negative rate and 1 − f0 is the false positive rate. The distance
between the two curves is 1 − f1 − f0, which equals to fpr − tpr. Recall the distance between
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
13
14. ROC curve and the diagonal is tpr − fpr and AUC = (tpr − fpr + 1)/2 [20]. Minimizing
the distance in figure 7 is the same as maximizing the distance between ROC curve and the
diagonal as well as the AUC. The k-s alike is another visual representation of ROC curve.
• Calibration Curves. Calibration curves are a method that will show whether the predicted
probability is well calibrated that there are actually p proportional events happening when
they are predicted probability is p. The x-axis is true probability, which counts the subset of
examples with the same score. And y-axis is the predicted probability. Calibration curves
focus on the classification bias, not the classification quality. A perfect calibration does not
mean a perfect classifier [31, 33].
2.4 Metrics based on distribution stability
In order to ensure that a model continues to perform as intended, it is important to monitor the
stability of the model output between the development data and more recent data. The model
stability can be assessed either at the overall system level by examining the population distribution
changes across model outcomes or at the individual model variable or characteristic level. The
Population Stability Index (PSI) is a common metric used to measure overall model population
stability while the Characteristic Stability Index (CSI) can be used to assess the stability at the
individual variable level.
PSI and CSI are widely accepted metrics in banking industry. The formula is displayed as below:
PSI =
n
X
i
((Actual% − Expected%)ln(
Actual%
Expected%
))
Where the samples are divided into n bins based on the model output distribution (PSI) or variable
distribution (CSI). The percentage of actual and expected events in each bin is calculated. The PSI
and CSI values are sensitive to the number of bins. In general, 8-12 bins are mostly used, which
should take various operating points into account.
3 Measure of measures
We can see no metric can absolutely dominate another. The metrics all have advantages and
disadvantages. As Ferri et al. (2009) [1] point out, several works have shown that given a dataset,
the learning method that obtains the best model according to a given measure, may not be the best
method if another different measure is employed. A handful of examples have been discussed to
demonstrate how challenging it is to compare the performance of the measures. For example, Huang
et al. (2003) [34] reveal that Naive Bayes and decision tree perform similarly in predicting accuracy.
However, Naive Bayes is significantly better than decision trees in AUC. Even surprisingly, Rosset
(2004) [35] shows that if we use AUC for selecting models based on a validation dataset, we obtain
betters results in accuracy (in a different test dataset) than using accuracy for selecting the models.
Huang et al. (2005) [23] propose a theoretical framework for comparing two different measures
for learning algorithms. The authors define the criteria, degree of consistency and degree of
discriminancy, as follows:
• Degree of Consistency. For two measures f and g on domain Ψ, let R = {(a, b)|a, b ∈
Ψ, f(a) > f(b), g(a) > g(b)} and S = {(a, b)|a, b ∈ Ψ, f(a) > f(b), g(a) < g(b)}. The degree of
consistency of f and g is C(0 ≤ C ≤ 1), where C = |R|
|R|+|S|
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
14
15. • Degree of Discriminancy.For two measures f and g on domain Ψ, let P = {(a, b)|a, b ∈
Ψ, f(a) > f(b), g(a) = g(b)} and Q = {(a, b)|a, b ∈ Ψ, g(a) > g(b), f(a) = f(b)}. The degree of
discriminancy for f over g is D = P
Q
Intuitively, for two classifiers (a & b), consistence requires that when one measure indicates one
classifier is strictly better than the other (f(a) > f(b)), the other measure would not tell the opposite
way. Discriminancy requires that when one measure can’t tell the difference between classifiers, the
other measure can differentiate. In order to conclude a measure f is better than measure g, we
require C > 0.5 and D > 1.
4 A novel performance measure
For selecting a better classifier, the decision made based on one measure may be different if employing
a different measure. The metrics mentioned in Section 2 have different focus. Confusion-matrix-based
metrics (e.g., accuracy, precision, recall, F-measures, etc.) focus on errors, probability-based matrix
(e.g., Brier score, cross-entropy, etc.) focus on probability, and ranking tools (e.g., AUC/VUS, lift
chart, PRC, K-s, etc.) focus on ranking and visualization. Some measures combine those focuses.
For example, sAUC combines the probability and ranking and AUC:acc proposed by Huang et
al.(2007) [36] uses AUC as a dominant measure and acc as a tie-breaker if AUC ties.
Here we propose a new measure, which combines accuracy, MCC, and AUC. The new measure
will be based on the voting of three metrics. Given two classifiers (a,b) on the same dataset, the
new measure will be defined as:
NewMeasure = 1(AUC(a), AUC(b)) + 1(MCC(a), MCC(b)) + 1(acc(a), acc(b)) (17)
We can empirically prove that the voting methods for three metrics would be better than accuracy
in terms of discriminancy and consistency as defined in [23].
5 Experimental results and discussion
Following a similar method in [36], we compared the new measures with accuracy on artificial test
datasets. The datasets are balanced datasets with equal numbers of positive and negative instances.
We tested on datasets with 8, 10, 12, 14, 16 instances, respectively. We exhausted all the ranking
orders of the instances and calculated accuracy/MCC/AUC for each ranking order. The two criteria
(consistency and discriminancy) as defined in Section 3 are computed and the results are summarized
in Table 2 and 3.
For consistency, we count the number of pairs that satisfy “new(a) > new(b)&Acc(a) > Acc(b)
and “new(a) > new(b)&Acc(a) < Acc(b). For discriminancy, we count the number of pairs that
satisfy “new(a) > new(b)&Acc(a) = Acc(b) and “new(a) = new(b)&Acc(a) > Acc(b).
Two measures are consistent when comparing two models a and b, if one measure claims that
model a is better than model b, the other measure indicates that model a is better than model b.
As Table 2 shows, our proposed new measure demonstrates excellent consistency with accuracy with
C value significantly greater than 0.5. Take the 8 instances case for example. we exhausted all
ranking orders of the instances and calculated accuracy/MCC/AUC for each ranking order. The
results show that there are 184 times where our new metric stipulates that a is better than b and
accuracy indicates the same, while there are only 35 times that the decision from our new metric is
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
15
16. not in line with the accuracy. The results demonstrate that in most cases, the decision based on our
new metric is well aligned with the decision from accuracy. Therefore, we claim our new metric is
consistent with accuracy.
Table 2: Experimental results for verifying statistical consistency between new metric and accuracy
for the balanced dataset
The degree of discriminancy is defined as the ratio of cases where our new metric can tell the
difference but accuracy cannot, over the cases where accuracy can tell the difference but our new
metric cannot. Table 3 indicates that our new measure is much better in terms of discriminancy
given D is significantly higher than 1. We use the 8 instances case for example. The results show
that there are 78 cases where our new metric can tell the difference between a and b, but accuracy
cannot, while there are only 3 cases that accuracy can tell the differences, but our new metric cannot.
Therefore, we claim our new metric is much more discriminant than accuracy.
Table 3: Experimental results for verifying statistical discriminancy between new metric and accuracy
for the balanced dataset
6 Conclusion
Performance measures are essential in model development, selection and evaluation. There are no
gold standard rules for the best performance measure. Most of classifiers are trained to optimize
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
16
17. accuracy, but the accuracy is more biased towards majority class. Balanced performance measures
such as F1 and MCC are preferred in imbalanced data. However, all confusion matrix based metrics
are subject to threshold selection, which is very difficult to determine especially when misclassification
cost and prior probability are unknown. In contrast, the probability metrics do not rely on how the
probability will fall relative to a threshold. However, the probability metrics are not intuitive and it
is more commonly used in objective function instead of measuring the outcome of machine learning
models. Visualization tools such as ROC, precision-recall complement the confusion-matrix, which
are threshold free and independent of prior class distribution. We can see no metric in one category
can completely dominate another. They all have its advantages and disadvantages.
In this paper, we provide a detailed review of commonly used performance measures and
highlighted their advantages and disadvantages. To address the deficiency of one single measure,
we propose a voting method by combining multiple measures. Our proposed new measure that
incorporates several measures proves to be more effective in terms of consistency and discriminancy.
7 Compliance with Ethical Standards
Funding: Not Applicable.
Conflict of Interest: Author declares that he/she has no conflict of interest.
Ethical approval: This article does not contain any studies with human participants or animals
performed by any of the authors.
References
[1] César Ferri, José Hernández-Orallo, and R Modroiu. An experimental comparison of performance
measures for classification. Pattern Recognition Letters, 30(1):27–38, 2009.
[2] Chang Cao, Davide Chicco, and Michael M Hoffman. The mcc-f1 curve: a performance
evaluation technique for binary classification. arXiv preprint arXiv:2006.11278, 2020.
[3] Davide Chicco and Giuseppe Jurman. The advantages of the matthews correlation coefficient
(mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):6,
2020.
[4] Yan Wang and Xuelei Sherry Ni. Predicting class-imbalanced business risk using resampling, reg-
ularization, and model emsembling algorithms. International Journal of Managing Information
Technology (IJMIT) Vol, 11, 2019.
[5] A Cecile JW Janssens and Forike K Martens. Reflection on modern methods: revisiting the
area under the roc curve. International Journal of Epidemiology, 2020.
[6] Sarang Narkhede. Understanding auc-roc curve. Towards Data Science, 26, 2018.
[7] Matjaž Majnik and Zoran Bosnić. Roc analysis of classifiers in machine learning: A survey.
Intelligent data analysis, 17(3):531–558, 2013.
[8] Jonathan E Fieldsend and Richard M Everson. Formulation and comparison of multi-class roc
surfaces. 2005.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
17
18. [9] Thomas Landgrebe and R Duin. A simplified extension of the area under the roc to the
multiclass domain. In Seventeenth annual symposium of the pattern recognition association of
South Africa, pages 241–245, 2006.
[10] Kun Deng, Chris Bourke, Stephen Scott, and NV Vinodchandran. New algorithms for optimizing
multi-class classifiers via roc surfaces. In Proceedings of the ICML workshop on ROC analysis
in machine learning, pages 17–24, 2006.
[11] Jia Hua and Lili Tian. A comprehensive and comparative review of optimal cut-points selection
methods for diseases with multiple ordinal stages. Journal of Biopharmaceutical Statistics,
30(1):46–68, 2020.
[12] Bilal Yurdakul. Statistical properties of population stability index. 2018.
[13] Luzia Gonçalves, Ana Subtil, M Rosário Oliveira, and P d Bermudez. Roc curve estimation:
An overview. REVSTAT–Statistical Journal, 12(1):1–20, 2014.
[14] Amalia Luque, Alejandro Carrasco, Alejandro Martín, and Ana de las Heras. The impact of
class imbalance in classification performance metrics based on the binary confusion matrix.
Pattern Recognition, 91:216–231, 2019.
[15] M Hossin and MN Sulaiman. A review on evaluation metrics for data classification evaluations.
International Journal of Data Mining & Knowledge Management Process, 5(2):1, 2015.
[16] Guzman Santafe, Iñaki Inza, and Jose A Lozano. Dealing with the evaluation of supervised
classification algorithms. Artificial Intelligence Review, 44(4):467–508, 2015.
[17] Henrik Strøm, Steven Albury, and Lene Tolstrup Sørensen. Machine learning performance
metrics and diagnostic context in radiology. In 2018 11th CMI International Conference:
Prospects and Challenges Towards Developing a Digital Economy within the EU, pages 56–61.
IEEE, 2018.
[18] Ashok Suragala, P Venkateswarlu, and M China Raju. A comparative study of performance
metrics of data mining algorithms on medical data. In ICCCE 2020, pages 1549–1556. Springer,
2021.
[19] Shadi Diab and Badie Sartawi. Classification of questions and learning outcome statements
(los) into blooms taxonomy (bt) by similarity measurements towards extracting of learning
outcome from learning material. arXiv preprint arXiv:1706.03191, 2017.
[20] David Martin Powers. Evaluation: from precision, recall and f-measure to roc, informedness,
markedness and correlation. 2011.
[21] Jacob Cohen. A coefficient of agreement for nominal scales. Educational and psychological
measurement, 20(1):37–46, 1960.
[22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine
Learning Research, 12:2825–2830, 2011.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
18
19. [23] Jin Huang and Charles X Ling. Using auc and accuracy in evaluating learning algorithms.
IEEE Transactions on knowledge and Data Engineering, 17(3):299–310, 2005.
[24] Cèsar Ferri, Peter Flach, José Hernández-Orallo, and Athmane Senad. Modifying roc curves to
incorporate predicted probabilities. In Proceedings of the second workshop on ROC analysis in
machine learning, pages 33–40, 2005.
[25] Foster Provost and Tom Fawcett. Robust classification for imprecise environments. Machine
learning, 42(3):203–231, 2001.
[26] S Joshua Swamidass, Chloé-Agathe Azencott, Kenny Daily, and Pierre Baldi. A croc stronger
than roc: measuring, visualizing and optimizing early retrieval. Bioinformatics, 26(10):1348–
1356, 2010.
[27] Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In
Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM,
2006.
[28] Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc
plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3):e0118432, 2015.
[29] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
[30] Chris Drummond and Robert C Holte. Explicitly representing expected cost: An alternative
to roc representation. In Proceedings of the sixth ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 198–207. ACM, 2000.
[31] Miha Vuk and Tomaz Curk. Roc curve, lift chart and calibration plot. Metodoloski zvezki,
3(1):89, 2006.
[32] Desen Wang and Bintong Chen. Cost-sensitive learning algorithm. In Working paper. University
of Delaware, 2017.
[33] Ira Cohen and Moises Goldszmidt. Properties and benefits of calibrated classifiers. In PKDD,
volume 3202, pages 125–136. Springer, 2004.
[34] Charles X Ling, Jin Huang, and Harry Zhang. Auc: a statistically consistent and more
discriminating measure than accuracy. In IJCAI, volume 3, pages 519–524, 2003.
[35] Saharon Rosset. Model selection via the auc. In Proceedings of the twenty-first international
conference on Machine learning, page 89. ACM, 2004.
[36] Jin Huang and Charles X Ling. Constructing new and better evaluation measures for machine
learning. In IJCAI, pages 859–864, 2007.
International Journal of Managing Information Technology (IJMIT) Vol.13, No.1, February 2021
19