The problem of imbalanced class distribution or small datasets is quite frequent in certain fields especially in medical domain. However, the classical Naive Bayes approach in dealing with uncertainties within medical datasets face with the difficulties in selecting prior distributions, whereby parameter estimation such as the maximum likelihood estimation (MLE) and maximum a posteriori (MAP) often hurt the accuracy of predictions. This paper presents the full Bayesian approach to assess the predictive distribution of all classes using three classifiers; naïve bayes (NB), bayesian networks (BN), and tree augmented naïve bayes (TAN) with three datasets; Breast cancer, breast cancer wisconsin, and breast tissue dataset. Next, the prediction accuracies of bayesian approaches are also compared with three standard machine learning algorithms from the literature; K-nearest neighbor (K-NN), support vector machine (SVM), and decision tree (DT). The results showed that the best performance was the bayesian networks (BN) algorithm with accuracy of 97.281%. The results are hoped to provide as base comparison for further research on breast cancer detection. All experiments are conducted in WEKA data mining tool.
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
The most common type of cancer in women
worldwide is the Breast Cancer. Breast cancer may be
detected early using Mammograms, probably before it's
spread. Recurrent breast cancer could occur months or years
after initial treatment. The cancer could return within the
same place because the original cancer (local recurrence), or it
may spread to different areas of your body (distant
recurrence). Early stage treatment is done not only to cure
breast cancer however additionally facilitate in preventing its
repetition/recurrence. Data mining algorithms provide
assistance in predicting the early-stage breast cancer that
continually has been difficult analysis drawback. The
projected analysis can establish the most effective algorithm
that predicts the recurrence of the breast cancer and improve
the accuracy the algorithms. Large information like Clump,
Classification, Association Rules, Prediction and Neural
Networks, Decision Trees can be analyzed using data mining
applications and techniques.
A Comparative Study on the Methods Used for the Detection of Breast Cancerrahulmonikasharma
Among women in the world, the death caused by the Breast cancer has become the leading role. At an initial stage, the tumor in the breast is hard to detect. Manual attempt have proven to be time consuming and inefficient in many cases. Hence there is a need for efficient methods that diagnoses the cancerous cell without human involvement with high accuracy. Mammography is a special case of CT scan which adopts X-ray method with high resolution film. so that it can detect well the tumors in the breast. This paper describes the comparative study of the various data mining methods on the detection of the breast cancer by using image processing techniques.
Breast cancer diagnosis and recurrence prediction using machine learning tech...eSAT Journals
Abstract Breast Cancer has become the common cause of death among women. Due to long hours invested in manual diagnosis and lesser diagnostic system available emphasize the development of automated diagnosis for early diagnosis of the disease. Our aim is to classify whether the breast cancer is benign or malignant and predict the recurrence and non-recurrence of malignant cases after a certain period. To achieve this we have used machine learning techniques such as Support Vector Machine, Logistic Regression, KNN and Naive Bayes. These techniques are coded in MATLAB using UCI machine learning depository. We have compared the accuracies of different techniques and observed the results. We found SVM most suited for predictive analysis and KNN performed best for our overall methodology. Keywords: Breast Cancer, SVM, KNN, Naive Bayes, Logistic Regression, Classification.
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEIJCSEIT Journal
Breast cancer is one of the leading cancers for women in developed countries including India. It is the
second most common cause of cancer death in women. The high incidence of breast cancer in women has
increased significantly in the last years. In this paper we have discussed various data mining approaches
that have been utilized for breast cancer diagnosis and prognosis. Breast Cancer Diagnosis is
distinguishing of benign from malignant breast lumps and Breast Cancer Prognosis predicts when Breast
Cancer is to recur in patients that have had their cancers excised. This study paper summarizes various
review and technical articles on breast cancer diagnosis and prognosis also we focus on current research
being carried out using the data mining techniques to enhance the breast cancer diagnosis and prognosis.
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
The most common type of cancer in women
worldwide is the Breast Cancer. Breast cancer may be
detected early using Mammograms, probably before it's
spread. Recurrent breast cancer could occur months or years
after initial treatment. The cancer could return within the
same place because the original cancer (local recurrence), or it
may spread to different areas of your body (distant
recurrence). Early stage treatment is done not only to cure
breast cancer however additionally facilitate in preventing its
repetition/recurrence. Data mining algorithms provide
assistance in predicting the early-stage breast cancer that
continually has been difficult analysis drawback. The
projected analysis can establish the most effective algorithm
that predicts the recurrence of the breast cancer and improve
the accuracy the algorithms. Large information like Clump,
Classification, Association Rules, Prediction and Neural
Networks, Decision Trees can be analyzed using data mining
applications and techniques.
A Comparative Study on the Methods Used for the Detection of Breast Cancerrahulmonikasharma
Among women in the world, the death caused by the Breast cancer has become the leading role. At an initial stage, the tumor in the breast is hard to detect. Manual attempt have proven to be time consuming and inefficient in many cases. Hence there is a need for efficient methods that diagnoses the cancerous cell without human involvement with high accuracy. Mammography is a special case of CT scan which adopts X-ray method with high resolution film. so that it can detect well the tumors in the breast. This paper describes the comparative study of the various data mining methods on the detection of the breast cancer by using image processing techniques.
Breast cancer diagnosis and recurrence prediction using machine learning tech...eSAT Journals
Abstract Breast Cancer has become the common cause of death among women. Due to long hours invested in manual diagnosis and lesser diagnostic system available emphasize the development of automated diagnosis for early diagnosis of the disease. Our aim is to classify whether the breast cancer is benign or malignant and predict the recurrence and non-recurrence of malignant cases after a certain period. To achieve this we have used machine learning techniques such as Support Vector Machine, Logistic Regression, KNN and Naive Bayes. These techniques are coded in MATLAB using UCI machine learning depository. We have compared the accuracies of different techniques and observed the results. We found SVM most suited for predictive analysis and KNN performed best for our overall methodology. Keywords: Breast Cancer, SVM, KNN, Naive Bayes, Logistic Regression, Classification.
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEIJCSEIT Journal
Breast cancer is one of the leading cancers for women in developed countries including India. It is the
second most common cause of cancer death in women. The high incidence of breast cancer in women has
increased significantly in the last years. In this paper we have discussed various data mining approaches
that have been utilized for breast cancer diagnosis and prognosis. Breast Cancer Diagnosis is
distinguishing of benign from malignant breast lumps and Breast Cancer Prognosis predicts when Breast
Cancer is to recur in patients that have had their cancers excised. This study paper summarizes various
review and technical articles on breast cancer diagnosis and prognosis also we focus on current research
being carried out using the data mining techniques to enhance the breast cancer diagnosis and prognosis.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
Image processing and machine learning techniques used in computer-aided dete...IJECEIAES
This paper aims to review the previously developed Computer-aided detection (CAD) systems for mammogram screening because increasing death rate in women due to breast cancer is a global medical issue and it can be controlled only by early detection with regular screening. Till now mammography is the widely used breast imaging modality. CAD systems have been adopted by the radiologists to increase the accuracy of the breast cancer diagnosis by avoiding human errors and experience related issues. This study reveals that in spite of the higher accuracy obtained by the earlier proposed CAD systems for breast cancer diagnosis, they are not fully automated. Moreover, the false-positive mammogram screening cases are high in number and over-diagnosis of breast cancer exposes a patient towards harmful overtreatment for which a huge amount of money is being wasted. In addition, it is also reported that the mammogram screening result with and without CAD systems does not have noticeable difference, whereas the undetected cancer cases by CAD system are increasing. Thus, future research is required to improve the performance of CAD system for mammogram screening and make it completely automated.
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
Breast Cancer is one of the crucial and burning diseases that has invaded women. Predicting breast cancer manually takes a lot of time and it is difficult for the physician to classification. So, detecting cancer through various automatic diagnostic techniques is very necessary. Data mining is the process of running powerful classification techniques that extract useful information from data. The uses and potentials of these techniques have found its scope in medical data. Classification techniques tend to simplify the prediction segment.
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the
developed and the developing world. Increased life expectancy, urbanization and adoption of western
lifestyles trigger the occurrence of breast cancer in the developing world. Most cancer events are
diagnosed in the late phases of the illness and so, early detection in order to improve breast cancer
outcome and survival is very crucial.
In this study, it is intended to contribute to the early diagnosis of breast cancer. An analysis on breast
cancer diagnoses for the patients is given. For the purpose, first of all, data about the patients whose
cancers’ have already been diagnosed is gathered and they are arranged, and then whether the other
patients are in trouble with breast cancer is tried to be predicted under cover of those data. Predictions of
the other patients are realized through seven different algorithms and the accuracies of those have been
given. The data about the patients have been taken from UCI Machine Learning Repository thanks to Dr.
William H. Wolberg from the University of Wisconsin Hospitals, Madison. During the prediction process,
RapidMiner 5.0 data mining tool is used to apply data mining with the desired algorithms.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)
Interactive Technologies and Games (ITAG) Conference 2016
Health, Disability and EducationDates: Wednesday 26 October 2016 - Thursday 27 October 2016 Location: The Council House, NG1 2DT
Improving Hierarchical Decision Approach for Single Image Classification of P...IJECEIAES
The single image classification of Pap smears is an important part of the early detection of cervical cancer through Pap smear tests. Unfortunately, most classification processes still require accuracy enhancement, especially to complete the classification in seven classes and to get a qualified classification process. In addition, attempts to improve the single image classification of Pap smears were performed to be able to distinguish normal and abnormal cells. This study proposes a better approach by providing different handling of the initial data preparation process in the form of the distribution for training data and testing data so that it resulted in a new model of Hierarchial Decision Approach (HDA) which has the higher learning rate and momentum values in the proposed new model. This study evaluated 20 different features in hierarchical decision approach model based on Neural Network (NN) and genetic algorithm method for single image classification of Pap smear which resulted in classification experiment using value learning rate of 0.3 and momentum of 0.2 and value of learning rate of 0.5 and momentum of 0.5 by generating classification of 7 classes (Normal Intermediate, Normal Colummar, Mild (Light) Dyplasia, Moderate Dyplasia, Servere Dyplasia and Carcinoma In Situ) better. The accuracy value enhancemenet were also influenced by the application of Genetic Algorithm to feature selection. Thus, from the results of model testing, it can be concluded that the Hierarchical Decision Approach (HDA) method for Pap Smear image classification can be used as a reference for initial screening process to analyze Pap Smear image classification.
MEDICAL IMAGING MUTIFRACTAL ANALYSIS IN PREDICTION OF EFFICIENCY OF CANCER TH...csandit
Based on pressing need for predictive performance improvement, we explored the value of pretherapy
tumour histology image analysis to predict chemotherapy response. It was shown that
multifractal analysis of breast tumour tissue prior to chemotherapy indeed has the capacity to
distinguish between histological images of the different chemotherapy responder groups with
accuracies of 91.4% for pPR, 82.9% for pCR and 82.1% for PD/SD.
Outlier Modification and Gene Selection for Binary Cancer Classification usin...CSCJournals
Gaussian linear Bayes classifier is one of the most popular approaches for classification. However, it is not so popular for cancer classification using gene expression data due to the inverse problem of its covariance matrix in presence of large number of gene variables with small number of cancer patients/samples in the training dataset. To overcome these problems, we propose few top differentially expressed (DE) genes from both upregulated and downregulated groups for binary cancer classification using the Gaussian linear Bayes classifier. Usually top DE genes are selected by ranking the p-values of t-test procedure. However, both t-test statistic and Gaussian linear Bayes classifier are sensitive to outliers. Therefore, we also propose outlier modification for gene expression dataset before applying to the proposed methods, since gene expression datasets are often contaminated by outliers due to several steps involves in the data generating process from hybridization to image analysis. The performance of the proposed method is investigated using both simulated and real gene expression datasets. It is observed that the proposed method improves the performance with outlier modifications for binary cancer classification.
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESIAEME Publication
Women who have improved from breast cancer (BC) constantly panic about setback. The way that they have persevered through the meticulous treatment makes repeat their biggest fear. However, with current spreads in technology, early repeat prediction can enable patients to get treatment prior. The accessibility of broad information and propelled techniques make precise and fast prediction possible. This examination expects to think about the exactness of a couple of existing information mining calculations in predicting BC repeat. It inserts a particle swarm optimization as highlight choice into ANN classifier. An objective of increasing the accuracy level of the prediction model.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump
of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among
women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can
affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less
than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early
detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available
dataset containing features of breast tumors was utilized to identify breast tumors using machine learning
and deep learning techniques. Various prediction models were constructed, including logistic regression
(LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB),
Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These
models were trained to classify and predict breast tumor cases based on the provided features.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Machine Learning and Applications: An International Journal (MLAIJ) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the machine learning. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of machine learning and applications.The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on machine learning advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of machine learning.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of machine learning.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
Image processing and machine learning techniques used in computer-aided dete...IJECEIAES
This paper aims to review the previously developed Computer-aided detection (CAD) systems for mammogram screening because increasing death rate in women due to breast cancer is a global medical issue and it can be controlled only by early detection with regular screening. Till now mammography is the widely used breast imaging modality. CAD systems have been adopted by the radiologists to increase the accuracy of the breast cancer diagnosis by avoiding human errors and experience related issues. This study reveals that in spite of the higher accuracy obtained by the earlier proposed CAD systems for breast cancer diagnosis, they are not fully automated. Moreover, the false-positive mammogram screening cases are high in number and over-diagnosis of breast cancer exposes a patient towards harmful overtreatment for which a huge amount of money is being wasted. In addition, it is also reported that the mammogram screening result with and without CAD systems does not have noticeable difference, whereas the undetected cancer cases by CAD system are increasing. Thus, future research is required to improve the performance of CAD system for mammogram screening and make it completely automated.
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
Breast Cancer is one of the crucial and burning diseases that has invaded women. Predicting breast cancer manually takes a lot of time and it is difficult for the physician to classification. So, detecting cancer through various automatic diagnostic techniques is very necessary. Data mining is the process of running powerful classification techniques that extract useful information from data. The uses and potentials of these techniques have found its scope in medical data. Classification techniques tend to simplify the prediction segment.
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the
developed and the developing world. Increased life expectancy, urbanization and adoption of western
lifestyles trigger the occurrence of breast cancer in the developing world. Most cancer events are
diagnosed in the late phases of the illness and so, early detection in order to improve breast cancer
outcome and survival is very crucial.
In this study, it is intended to contribute to the early diagnosis of breast cancer. An analysis on breast
cancer diagnoses for the patients is given. For the purpose, first of all, data about the patients whose
cancers’ have already been diagnosed is gathered and they are arranged, and then whether the other
patients are in trouble with breast cancer is tried to be predicted under cover of those data. Predictions of
the other patients are realized through seven different algorithms and the accuracies of those have been
given. The data about the patients have been taken from UCI Machine Learning Repository thanks to Dr.
William H. Wolberg from the University of Wisconsin Hospitals, Madison. During the prediction process,
RapidMiner 5.0 data mining tool is used to apply data mining with the desired algorithms.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)
Interactive Technologies and Games (ITAG) Conference 2016
Health, Disability and EducationDates: Wednesday 26 October 2016 - Thursday 27 October 2016 Location: The Council House, NG1 2DT
Improving Hierarchical Decision Approach for Single Image Classification of P...IJECEIAES
The single image classification of Pap smears is an important part of the early detection of cervical cancer through Pap smear tests. Unfortunately, most classification processes still require accuracy enhancement, especially to complete the classification in seven classes and to get a qualified classification process. In addition, attempts to improve the single image classification of Pap smears were performed to be able to distinguish normal and abnormal cells. This study proposes a better approach by providing different handling of the initial data preparation process in the form of the distribution for training data and testing data so that it resulted in a new model of Hierarchial Decision Approach (HDA) which has the higher learning rate and momentum values in the proposed new model. This study evaluated 20 different features in hierarchical decision approach model based on Neural Network (NN) and genetic algorithm method for single image classification of Pap smear which resulted in classification experiment using value learning rate of 0.3 and momentum of 0.2 and value of learning rate of 0.5 and momentum of 0.5 by generating classification of 7 classes (Normal Intermediate, Normal Colummar, Mild (Light) Dyplasia, Moderate Dyplasia, Servere Dyplasia and Carcinoma In Situ) better. The accuracy value enhancemenet were also influenced by the application of Genetic Algorithm to feature selection. Thus, from the results of model testing, it can be concluded that the Hierarchical Decision Approach (HDA) method for Pap Smear image classification can be used as a reference for initial screening process to analyze Pap Smear image classification.
MEDICAL IMAGING MUTIFRACTAL ANALYSIS IN PREDICTION OF EFFICIENCY OF CANCER TH...csandit
Based on pressing need for predictive performance improvement, we explored the value of pretherapy
tumour histology image analysis to predict chemotherapy response. It was shown that
multifractal analysis of breast tumour tissue prior to chemotherapy indeed has the capacity to
distinguish between histological images of the different chemotherapy responder groups with
accuracies of 91.4% for pPR, 82.9% for pCR and 82.1% for PD/SD.
Outlier Modification and Gene Selection for Binary Cancer Classification usin...CSCJournals
Gaussian linear Bayes classifier is one of the most popular approaches for classification. However, it is not so popular for cancer classification using gene expression data due to the inverse problem of its covariance matrix in presence of large number of gene variables with small number of cancer patients/samples in the training dataset. To overcome these problems, we propose few top differentially expressed (DE) genes from both upregulated and downregulated groups for binary cancer classification using the Gaussian linear Bayes classifier. Usually top DE genes are selected by ranking the p-values of t-test procedure. However, both t-test statistic and Gaussian linear Bayes classifier are sensitive to outliers. Therefore, we also propose outlier modification for gene expression dataset before applying to the proposed methods, since gene expression datasets are often contaminated by outliers due to several steps involves in the data generating process from hybridization to image analysis. The performance of the proposed method is investigated using both simulated and real gene expression datasets. It is observed that the proposed method improves the performance with outlier modifications for binary cancer classification.
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESIAEME Publication
Women who have improved from breast cancer (BC) constantly panic about setback. The way that they have persevered through the meticulous treatment makes repeat their biggest fear. However, with current spreads in technology, early repeat prediction can enable patients to get treatment prior. The accessibility of broad information and propelled techniques make precise and fast prediction possible. This examination expects to think about the exactness of a couple of existing information mining calculations in predicting BC repeat. It inserts a particle swarm optimization as highlight choice into ANN classifier. An objective of increasing the accuracy level of the prediction model.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump
of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among
women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can
affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less
than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early
detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available
dataset containing features of breast tumors was utilized to identify breast tumors using machine learning
and deep learning techniques. Various prediction models were constructed, including logistic regression
(LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB),
Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These
models were trained to classify and predict breast tumor cases based on the provided features.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Machine Learning and Applications: An International Journal (MLAIJ) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the machine learning. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of machine learning and applications.The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on machine learning advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of machine learning.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of machine learning.
Predictive modeling for breast cancer based on machine learning algorithms an...IJECEIAES
Breast cancer is one of the leading causes of death among women worldwide. However, early prediction of breast cancer plays a crucial role. Therefore, strong needs exist for automatic accurate early prediction of breast cancer. In this paper, machine learning (ML) classifiers combined with features selection methods are used to build an intelligent tool for breast cancer prediction. The Wisconsin diagnostic breast cancer (WDBC) dataset is used to train and test the model. Classification algorithms, including support vector machine (SVM), light gradient boosting machine (LightGBM), random forest (RF), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes, were employed. Performance measures for each of them were obtained, namely: accuracy, precision, recall, F-score, Kappa, Matthews correlation coefficient (MCC), and time. The results indicate that without feature selection, LightGBM achieves the highest accuracy at 95%. With minimum redundancy maximum relevance (mRMR) feature selection (15 features), LightGBM outperforms other classifiers, achieving an accuracy of 98%. For Pearson correlation coefficient feature selection (15 features), LightGBM also excels with a 95% accuracy rate. Lasso feature selection (5 features) produces varied results across classifiers, with logistic regression achieving the highest accuracy at 96%. These findings underscore the importance of feature selection in refining model performance and in improving detection for breast cancer.
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...CSCJournals
Breast cancer patients with the same diagnostic and clinical prognostic profile can have markedly different clinical outcome. This difference is possibly caused by the limitation of current breast cancer prognostic indices, which group molecularly distinct patients into similar clinical classes based mainly on morphological of disease. Traditional clinical based prognosis models were discovered contain some restriction to address the heterogeneity of breast cancer. The invention of microarray technology and its ability to simultaneously interrogate thousands genes has changed the paradigm of molecular classification of human cancers as well as it shifted clinical prognosis model to broader prospect. Numerous studies have revealed the potential value of gene expression signatures in examining the risk of disease recurrence. However, currently most of these studies attempted to implement genetic marker based prognostic models to replace the traditional clinical markers, yet neglecting the rich information contain in clinical information. Therefore, this research took an effort to integrate both clinical and microarray data in order to obtain accurate breast cancer prognosis, by taking into account that these data complements each other. This article presents a review of the development of breast cancer prognosis models, concentrating precisely on clinical and gene expression profiles. The literature is reviewed in an explicit machine learning framework, which include the elements of feature selection and classification techniques.
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...IJECEIAES
Machine learning methods have been an interesting method in the field of medical for many years, and they have achieved successful results in various fields of medical science. This paper examines the effects of using machine learning algorithms in the diagnosis and classification of breast cancer from mammography imaging data. Cancer diagnosis is the identification of images as cancer or non-cancer, and this involves image preprocessing, feature extraction, classification, and performance analysis. This article studied 93 different references mentioned in the previous years in the field of processing and tries to find an effective way to diagnose and classify breast cancer. Based on the results of this research, it can be concluded that most of today’s successful methods focus on the use of deep learning methods. Finding a new method requires an overview of existing methods in the field of deep learning methods in order to make a comparison and case study.
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
Breast cancer detection using machine learning approaches: a comparative studyIJECEIAES
As the cause of the breast cancer disease has not yet clearly identified and a method to prevent its occurrence has not yet been developed, its early detection has a significant role in enhancing survival rate. In fact, artificial intelligent approaches have been playing an important role to enhance the diagnosis process of breast cancer. This work has selected eight classification models that are mostly used to predict breast cancer to be under investigation. These classifiers include single and ensemble classifiers. A trusted dataset has been enhanced by applying five different feature selection methods to pick up only weighted features and to neglect others. Accordingly, a dataset of only 17 features has been developed. Based on our experimental work, three classifiers, multi-layer perceptron (MLP), support vector machine (SVM) and stack are competing with each other by attaining high classification accuracy compared to others. However, SVM is ranked on the top by obtaining an accuracy of 97.7% with classification errors of 0.029 false negative (FN) and 0.019 false positive (FP). Therefore, it is noteworthy to mention that SVM is the best classifier and it outperforms even the stack classier.
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisIOSR Journals
Abstract: We analyze the breast Cancer data available from the WBC, WDBC from UCI machine learning with
the aim of developing accurate prediction models for breast cancer using data mining techniques. Data mining
has, for good reason, recently attracted a lot of attention, it is a new Technology, tackling new problem, with
great potential for valuable commercial and scientific discoveries. The experiments are conducted in WEKA.
Several data mining classification techniques were used on the proposed data. There are many classification
techniques in data mining such as Decision Tree, Rules NNge, Tree random forest, Random Tree, lazy IBK. The
aim of this paper is to investigate the performance of different classification techniques. The data breast cancer
data with a total 286 rows and 10 columns will be used to test and justify the different between the classification
methods and algorithm.
Keywords - Machine learning, data mining Weka, classification, breast cancer
Breast Cancer Diagnostics with Bayesian NetworksBayesia USA
The Wisconsin Breast Cancer Database (WBCD) is a widely studied (and publicly available) data set from the field of breast cancer diagnostics. The creators of this database, Wolberg, Street, Heisey and Managasarian, made an important contribution with their research towards automating diagnostics with image processing and machine learning.
Beyond the medical field, many statisticians and computer scientists have proposed a wide range of classification models based on WBCD. Such new methods have continuously raised the benchmark in terms of diagnostic performance.
Our white paper now reevaluates the Wisconsin Breast Cancer Database within the framework of Bayesian networks, which, to our knowledge, has not been done before. We demonstrate how the BayesiaLab software can extremely quickly — and simply — create a Bayesian network model that is on par performance-wise with virtually all existing models that have been developed from WBCD over the last 15 years.
An approach of cervical cancer diagnosis using class weighting and oversampli...TELKOMNIKA JOURNAL
Globally, cervical cancer caused 604,127 new cases and 341,831 deaths in 2020, according to the global cancer observatory. In addition, the number of cervical cancer patients who have no symptoms has grown recently. Therefore, giving patients early notice of the possibility of cervical cancer is a useful task since it would enable them to have a clear understanding of their health state. The use of artificial intelligence (AI), particularly in machine learning, in this work is continually uncovering cervical cancer. With the help of a logit model and a new deep learning technique, we hope to identify cervical cancer using patient-provided data. For better outcomes, we employ Keras deep learning and its technique, which includes class weighting and oversampling. In comparison to the actual diagnostic result, the experimental result with model accuracy is 94.18%, and it also demonstrates a successful logit model cervical cancer prediction.
Square transposition: an approach to the transposition process in block cipherjournalBEEI
The transposition process is needed in cryptography to create a diffusion effect on data encryption standard (DES) and advanced encryption standard (AES) algorithms as standard information security algorithms by the National Institute of Standards and Technology. The problem with DES and AES algorithms is that their transposition index values form patterns and do not form random values. This condition will certainly make it easier for a cryptanalyst to look for a relationship between ciphertexts because some processes are predictable. This research designs a transposition algorithm called square transposition. Each process uses square 8 × 8 as a place to insert and retrieve 64-bits. The determination of the pairing of the input scheme and the retrieval scheme that have unequal flow is an important factor in producing a good transposition. The square transposition can generate random and non-pattern indices so that transposition can be done better than DES and AES.
Hyper-parameter optimization of convolutional neural network based on particl...journalBEEI
Deep neural networks have accomplished enormous progress in tackling many problems. More specifically, convolutional neural network (CNN) is a category of deep networks that have been a dominant technique in computer vision tasks. Despite that these deep neural networks are highly effective; the ideal structure is still an issue that needs a lot of investigation. Deep Convolutional Neural Network model is usually designed manually by trials and repeated tests which enormously constrain its application. Many hyper-parameters of the CNN can affect the model performance. These parameters are depth of the network, numbers of convolutional layers, and numbers of kernels with their sizes. Therefore, it may be a huge challenge to design an appropriate CNN model that uses optimized hyper-parameters and reduces the reliance on manual involvement and domain expertise. In this paper, a design architecture method for CNNs is proposed by utilization of particle swarm optimization (PSO) algorithm to learn the optimal CNN hyper-parameters values. In the experiment, we used Modified National Institute of Standards and Technology (MNIST) database of handwritten digit recognition. The experiments showed that our proposed approach can find an architecture that is competitive to the state-of-the-art models with a testing error of 0.87%.
Supervised machine learning based liver disease prediction approach with LASS...journalBEEI
In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system.
A secure and energy saving protocol for wireless sensor networksjournalBEEI
The research domain for wireless sensor networks (WSN) has been extensively conducted due to innovative technologies and research directions that have come up addressing the usability of WSN under various schemes. This domain permits dependable tracking of a diversity of environments for both military and civil applications. The key management mechanism is a primary protocol for keeping the privacy and confidentiality of the data transmitted among different sensor nodes in WSNs. Since node's size is small; they are intrinsically limited by inadequate resources such as battery life-time and memory capacity. The proposed secure and energy saving protocol (SESP) for wireless sensor networks) has a significant impact on the overall network life-time and energy dissipation. To encrypt sent messsages, the SESP uses the public-key cryptography’s concept. It depends on sensor nodes' identities (IDs) to prevent the messages repeated; making security goals- authentication, confidentiality, integrity, availability, and freshness to be achieved. Finally, simulation results show that the proposed approach produced better energy consumption and network life-time compared to LEACH protocol; sensors are dead after 900 rounds in the proposed SESP protocol. While, in the low-energy adaptive clustering hierarchy (LEACH) scheme, the sensors are dead after 750 rounds.
Plant leaf identification system using convolutional neural networkjournalBEEI
This paper proposes a leaf identification system using convolutional neural network (CNN). This proposed system can identify five types of local Malaysia leaf which were acacia, papaya, cherry, mango and rambutan. By using CNN from deep learning, the network is trained from the database that acquired from leaf images captured by mobile phone for image classification. ResNet-50 was the architecture has been used for neural networks image classification and training the network for leaf identification. The recognition of photographs leaves requested several numbers of steps, starting with image pre-processing, feature extraction, plant identification, matching and testing, and finally extracting the results achieved in MATLAB. Testing sets of the system consists of 3 types of images which were white background, and noise added and random background images. Finally, interfaces for the leaf identification system have developed as the end software product using MATLAB app designer. As a result, the accuracy achieved for each training sets on five leaf classes are recorded above 98%, thus recognition process was successfully implemented.
Customized moodle-based learning management system for socially disadvantaged...journalBEEI
This study aims to develop Moodle-based LMS with customized learning content and modified user interface to facilitate pedagogical processes during covid-19 pandemic and investigate how teachers of socially disadvantaged schools perceived usability and technology acceptance. Co-design process was conducted with two activities: 1) need assessment phase using an online survey and interview session with the teachers and 2) the development phase of the LMS. The system was evaluated by 30 teachers from socially disadvantaged schools for relevance to their distance learning activities. We employed computer software usability questionnaire (CSUQ) to measure perceived usability and the technology acceptance model (TAM) with insertion of 3 original variables (i.e., perceived usefulness, perceived ease of use, and intention to use) and 5 external variables (i.e., attitude toward the system, perceived interaction, self-efficacy, user interface design, and course design). The average CSUQ rating exceeded 5.0 of 7 point-scale, indicated that teachers agreed that the information quality, interaction quality, and user interface quality were clear and easy to understand. TAM results concluded that the LMS design was judged to be usable, interactive, and well-developed. Teachers reported an effective user interface that allows effective teaching operations and lead to the system adoption in immediate time.
Understanding the role of individual learner in adaptive and personalized e-l...journalBEEI
Dynamic learning environment has emerged as a powerful platform in a modern e-learning system. The learning situation that constantly changing has forced the learning platform to adapt and personalize its learning resources for students. Evidence suggested that adaptation and personalization of e-learning systems (APLS) can be achieved by utilizing learner modeling, domain modeling, and instructional modeling. In the literature of APLS, questions have been raised about the role of individual characteristics that are relevant for adaptation. With several options, a new problem has been raised where the attributes of students in APLS often overlap and are not related between studies. Therefore, this study proposed a list of learner model attributes in dynamic learning to support adaptation and personalization. The study was conducted by exploring concepts from the literature selected based on the best criteria. Then, we described the results of important concepts in student modeling and provided definitions and examples of data values that researchers have used. Besides, we also discussed the implementation of the selected learner model in providing adaptation in dynamic learning.
Prototype mobile contactless transaction system in traditional markets to sup...journalBEEI
One way to prevent and reduce the spread of the covid-19 pandemic is through physical distancing program. This research aims to develop a prototype contactless transaction system using digital payment mechanisms and QR code technology that will be applied in traditional markets. The method used in the development of electronic market systems is a prototype approach. The application of QR code and digital payments are used as a solution to minimize money exchange contacts that are common in traditional markets. The results showed that the system built was able to accelerate and facilitate the buying and selling transaction process in traditional market environment. Alpha testing shows that all functional systems are running well. Meanwhile, beta testing shows that the user can very well accept the system that was built. The results of the study also show acceptance of the usefulness of the system being built, as well as the optimism of its users to be able to take advantage of this system both technologically and functionally, so its can be a part of the digital transformation of the traditional market to the electronic market and has become one of the solutions in reducing the spread of the current covid-19 pandemic.
Wireless HART stack using multiprocessor technique with laxity algorithmjournalBEEI
The use of a real-time operating system is required for the demarcation of industrial wireless sensor network (IWSN) stacks (RTOS). In the industrial world, a vast number of sensors are utilised to gather various types of data. The data gathered by the sensors cannot be prioritised ahead of time. Because all of the information is equally essential. As a result, a protocol stack is employed to guarantee that data is acquired and processed fairly. In IWSN, the protocol stack is implemented using RTOS. The data collected from IWSN sensor nodes is processed using non-preemptive scheduling and the protocol stack, and then sent in parallel to the IWSN's central controller. The real-time operating system (RTOS) is a process that occurs between hardware and software. Packets must be sent at a certain time. It's possible that some packets may collide during transmission. We're going to undertake this project to get around this collision. As a prototype, this project is divided into two parts. The first uses RTOS and the LPC2148 as a master node, while the second serves as a standard data collection node to which sensors are attached. Any controller may be used in the second part, depending on the situation. Wireless HART allows two nodes to communicate with each other.
Implementation of double-layer loaded on octagon microstrip yagi antennajournalBEEI
A double-layer loaded on the octagon microstrip yagi antenna (OMYA) at 5.8 GHz industrial, scientific and medical (ISM) Band is investigated in this paper. The double-layer consist of two double positive (DPS) substrates. The OMYA is overlaid with a double-layer configuration were simulated, fabricated and measured. A good agreement was observed between the computed and measured results of the gain for this antenna. According to comparison results, it shows that 2.5 dB improvement of the OMYA gain can be obtained by applying the double-layer on the top of the OMYA. Meanwhile, the bandwidth of the measured OMYA with the double-layer is 14.6%. It indicates that the double-layer can be used to increase the OMYA performance in term of gain and bandwidth.
The calculation of the field of an antenna located near the human headjournalBEEI
In this work, a numerical calculation was carried out in one of the universal programs for automatic electro-dynamic design. The calculation is aimed at obtaining numerical values for specific absorbed power (SAR). It is the SAR value that can be used to determine the effect of the antenna of a wireless device on biological objects; the dipole parameters will be selected for GSM1800. Investigation of the influence of distance to a cell phone on radiation shows that absorbed in the head of a person the effect of electromagnetic radiation on the brain decreases by three times this is a very important result the SAR value has decreased by almost three times it is acceptable results.
Exact secure outage probability performance of uplinkdownlink multiple access...journalBEEI
In this paper, we study uplink-downlink non-orthogonal multiple access (NOMA) systems by considering the secure performance at the physical layer. In the considered system model, the base station acts a relay to allow two users at the left side communicate with two users at the right side. By considering imperfect channel state information (CSI), the secure performance need be studied since an eavesdropper wants to overhear signals processed at the downlink. To provide secure performance metric, we derive exact expressions of secrecy outage probability (SOP) and and evaluating the impacts of main parameters on SOP metric. The important finding is that we can achieve the higher secrecy performance at high signal to noise ratio (SNR). Moreover, the numerical results demonstrate that the SOP tends to a constant at high SNR. Finally, our results show that the power allocation factors, target rates are main factors affecting to the secrecy performance of considered uplink-downlink NOMA systems.
Design of a dual-band antenna for energy harvesting applicationjournalBEEI
This report presents an investigation on how to improve the current dual-band antenna to enhance the better result of the antenna parameters for energy harvesting application. Besides that, to develop a new design and validate the antenna frequencies that will operate at 2.4 GHz and 5.4 GHz. At 5.4 GHz, more data can be transmitted compare to 2.4 GHz. However, 2.4 GHz has long distance of radiation, so it can be used when far away from the antenna module compare to 5 GHz that has short distance in radiation. The development of this project includes the scope of designing and testing of antenna using computer simulation technology (CST) 2018 software and vector network analyzer (VNA) equipment. In the process of designing, fundamental parameters of antenna are being measured and validated, in purpose to identify the better antenna performance.
Transforming data-centric eXtensible markup language into relational database...journalBEEI
eXtensible markup language (XML) appeared internationally as the format for data representation over the web. Yet, most organizations are still utilising relational databases as their database solutions. As such, it is crucial to provide seamless integration via effective transformation between these database infrastructures. In this paper, we propose XML-REG to bridge these two technologies based on node-based and path-based approaches. The node-based approach is good to annotate each positional node uniquely, while the path-based approach provides summarised path information to join the nodes. On top of that, a new range labelling is also proposed to annotate nodes uniquely by ensuring the structural relationships are maintained between nodes. If a new node is to be added to the document, re-labelling is not required as the new label will be assigned to the node via the new proposed labelling scheme. Experimental evaluations indicated that the performance of XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML concerning storing time, query retrieval time and scalability. This research produces a core framework for XML to relational databases (RDB) mapping, which could be adopted in various industries.
Key performance requirement of future next wireless networks (6G)journalBEEI
Given the massive potentials of 5G communication networks and their foreseeable evolution, what should there be in 6G that is not in 5G or its long-term evolution? 6G communication networks are estimated to integrate the terrestrial, aerial, and maritime communications into a forceful network which would be faster, more reliable, and can support a massive number of devices with ultra-low latency requirements. This article presents a complete overview of potential 6G communication networks. The major contribution of this study is to present a broad overview of key performance indicators (KPIs) of 6G networks that cover the latest manufacturing progress in the environment of the principal areas of research application, and challenges.
Noise resistance territorial intensity-based optical flow using inverse confi...journalBEEI
This paper presents the use of the inverse confidential technique on bilateral function with the territorial intensity-based optical flow to prove the effectiveness in noise resistance environment. In general, the image’s motion vector is coded by the technique called optical flow where the sequences of the image are used to determine the motion vector. But, the accuracy rate of the motion vector is reduced when the source of image sequences is interfered by noises. This work proved that the inverse confidential technique on bilateral function can increase the percentage of accuracy in the motion vector determination by the territorial intensity-based optical flow under the noisy environment. We performed the testing with several kinds of non-Gaussian noises at several patterns of standard image sequences by analyzing the result of the motion vector in a form of the error vector magnitude (EVM) and compared it with several noise resistance techniques in territorial intensity-based optical flow method.
Modeling climate phenomenon with software grids analysis and display system i...journalBEEI
This study aims to model climate change based on rainfall, air temperature, pressure, humidity and wind with grADS software and create a global warming module. This research uses 3D model, define, design, and develop. The results of the modeling of the five climate elements consist of the annual average temperature in Indonesia in 2009-2015 which is between 29oC to 30.1oC, the horizontal distribution of the annual average pressure in Indonesia in 2009-2018 is between 800 mBar to 1000 mBar, the horizontal distribution the average annual humidity in Indonesia in 2009 and 2011 ranged between 27-57, in 2012-2015, 2017 and 2018 it ranged between 30-60, during the East Monsoon, the wind circulation moved from northern Indonesia to the southern region Indonesia. During the west monsoon, the wind circulation moves from the southern part of Indonesia to the northern part of Indonesia. The global warming module for SMA/MA produced is feasible to use, this is in accordance with the value given by the validate of 69 which is in the appropriate category and the response of teachers and students through a 91% questionnaire.
An approach of re-organizing input dataset to enhance the quality of emotion ...journalBEEI
The purpose of this paper is to propose an approach of re-organizing input data to recognize emotion based on short signal segments and increase the quality of emotional recognition using physiological signals. MIT's long physiological signal set was divided into two new datasets, with shorter and overlapped segments. Three different classification methods (support vector machine, random forest, and multilayer perceptron) were implemented to identify eight emotional states based on statistical features of each segment in these two datasets. By re-organizing the input dataset, the quality of recognition results was enhanced. The random forest shows the best classification result among three implemented classification methods, with an accuracy of 97.72% for eight emotional states, on the overlapped dataset. This approach shows that, by re-organizing the input dataset, the high accuracy of recognition results can be achieved without the use of EEG and ECG signals.
Parking detection system using background subtraction and HSV color segmentationjournalBEEI
Manual system vehicle parking makes finding vacant parking lots difficult, so it has to check directly to the vacant space. If many people do parking, then the time needed for it is very much or requires many people to handle it. This research develops a real-time parking system to detect parking. The system is designed using the HSV color segmentation method in determining the background image. In addition, the detection process uses the background subtraction method. Applying these two methods requires image preprocessing using several methods such as grayscaling, blurring (low-pass filter). In addition, it is followed by a thresholding and filtering process to get the best image in the detection process. In the process, there is a determination of the ROI to determine the focus area of the object identified as empty parking. The parking detection process produces the best average accuracy of 95.76%. The minimum threshold value of 255 pixels is 0.4. This value is the best value from 33 test data in several criteria, such as the time of capture, composition and color of the vehicle, the shape of the shadow of the object’s environment, and the intensity of light. This parking detection system can be implemented in real-time to determine the position of an empty place.
Quality of service performances of video and voice transmission in universal ...journalBEEI
The universal mobile telecommunications system (UMTS) has distinct benefits in that it supports a wide range of quality of service (QoS) criteria that users require in order to fulfill their requirements. The transmission of video and audio in real-time applications places a high demand on the cellular network, therefore QoS is a major problem in these applications. The ability to provide QoS in the UMTS backbone network necessitates an active QoS mechanism in order to maintain the necessary level of convenience on UMTS networks. For UMTS networks, investigation models for end-to-end QoS, total transmitted and received data, packet loss, and throughput providing techniques are run and assessed and the simulation results are examined. According to the results, appropriate QoS adaption allows for specific voice and video transmission. Finally, by analyzing existing QoS parameters, the QoS performance of 4G/UMTS networks may be improved.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
2. ISSN: 2302-9285
Bulletin of Electr Eng and Inf, Vol. 8, No. 4, December 2019 : 1303 – 1311
1304
area. Breast cancer on male is rare but born as a male is not an exception to the risk of having breast cancer.
The risk of cancer is about 1 in 1000.
Breast cancer research falls under the category of medical, which as in other fields, use data mining
to analyze past experiences and identify trends and solutions to the present situations [3]. Data mining is
well-known analytical methodology to extract such invaluable information and is especially efficient to work
with large volume of medical data [4]. The methodology varies from predictive models
that enable classification and prediction as well as clustering models that for discovering groups or patterns
from data [5]. Salama et al. [6] compared five different classifiers using three different breast cancer datasets.
The chosen classifiers were Naïve Bayes (NB), Multi-Layer Perception (MLP), Decision Tree (J48),
Instance-based for K-Nearest Neighbor (IBK), and Sequential Minimal Optimization (SMO). Along the line,
Aruna et al. [7] investigated the performance of different classification algorithms similar to [6], which were
Naïve Bayes and Decision Tree J4 as well as new algorithms such as the Support Vector Machines, Radial
Basis Neural Networks, and and simple Classification and Regression Trees (CART). Meanwhile, Bashir et
al. [8] proposed a feature selection component to classification problem while maintaining conventional
classification algorithms such as the Decision Tree, Bayesian algorithms, Rule-based algorithms, Neural
Networks, Support Vector Machines, Associative classification, Distance-based methods, and
Genetic algorithms.
Following previous research, this work used breast cancer datasets obtained from UCI Machine
Learning Repository from [9], which are Breast Cancer [10], Breast Cancer Wisconsin [11], and Breast
Tissue [12]. This paper presents comparative analysis on the performance of three classification algorithms,
which are the Naïve Bayes (NB) [13], Bayesian Networks (BN), and Tree Augmented Naïve Bayes (TAN)
[14] as well as with algorithms from the existing literature such as K-Nearest Neighbor (KNN), Support
Vector Machine (SVM), and Decision Trees (DT) based on breast cancer dataset. Following [15], this paper
will be using accuracy, precision, recall and F-measure for evaluation [16]. These standard measures have
significantly higher correlation with human judgments and an intuitive interpretation. NB classifiers
improved accuracy with high speed [17-19]. The results showed that the NB classifier provides better
performances [20-21]. The accuracy, sensitivity and specificity up to 90% and above [22]. Dian E. R.,
Nurizal D. P. & Machsus Machsus [23] is promising and able to enhance the prediction on Breast Cancer
Wisconsin data.
The rest of the paper is organized as follows. Section 2 describes the research methodology and the
proposed classification approach on dataset for finding the best performance of algorithm. Section 3 shows
the experimental results and finally Section 4 concludes the work and highlights a direction for
future research.
2. RESEARCH METHOD
This section presents a data mining approach for a classification problem. A data mining process
that describe about data mining approach to tackle the problem. This methodology is implement to get the
best result from the classification experiment. In this, research using Knowledge Discovery in Database
(KDD) from [14] as model methodology. There are important phases that have to get the best result at the
research. There are five phase in KDD which are Selection, Preprocessing, Transformation, Classification
and Evaluation. Figure 1 shows the KDD model. Based on Figure 1, KDD model is an iterative process when
evaluation measures can be enhanced. The detail of each phase is discussed in the next sub-sections.
Figure 1. KDD model [14]
3. Bulletin of Electr Eng and Inf ISSN: 2302-9285
Comparative analysis on bayesian classification for breast cancer… (Wan Nor Liyana Wan Hassan Ibeni)
1305
2.1. Dataset selection
This research used three breast cancer dataset was obtained from UCI machine learning repository
from Brown [9], which are breast cancer (WDBC), wisconsin breast cancer (WBC), and breast tissue. The
Breast Cancer Wisconsin dataset contains 699 instances with two classes that are benign and malignant. The
benign class contains 458 instances while the malignant contains 241 instances. Breast cancer contains 10
attributes; which are sample code number, clump thickness, uniformity of cell size, uniformity of cell shape,
marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. The
excerpt of the dataset is shown in Figure 2.
Figure 2. Breast cancer wisconsin (BCW) dataset
The breast cancer dataset contains 286 instances with two classes that are recurrence-events and
no-recurrence-events. The recurrence-events class contains 85 instances while the no-recurrence-events
contains 201 instances. This dataset contains nine attributes, which include age, menopause, tumor-size,
inv-nodes, node caps, deg-malig, breast, breast-quad, and irradiat. The excerpt of the dataset is shown in
Figure 3.
Figure 3. Breast cancer (BC) dataset
The Breast Tissue dataset contains 106 instances with six classes that are Carcinoma (CAR),
Fibro-adenoma (FAD), Mastopathy (MAS), Glandular (GLA), Connective (CON), and Adipose (ADI). The
class Carcinoma (CAR) contains 21 instances, Fibro-adenoma (FAD) contains 15 instances, Mastopathy
(MAS) contains 18 instances, Glandular (GLA) contains 16 instances, Connective (CON) contains 14
instances, and Adipose (ADI) contains 22 instances. Breast Tissue contains nine attributes that include
4. ISSN: 2302-9285
Bulletin of Electr Eng and Inf, Vol. 8, No. 4, December 2019 : 1303 – 1311
1306
impedivity (ohm) at zero frequency, phase angle at 500 KHz, high-frequency slope of phase angle,
impedance distance between spectral ends, area under spectrum, area normalized by DA, maximum of the
spectrum, distance between 10 and real part of the maximum frequency point, and length of the spectral
curve. The excerpt of the dataset is shown in Figure 4.
Figure 4. Breast tissue (BT) dataset
2.2. Preprocessing
The types of the breast cancer wisconsin dataset are categorized as asymmetric. It means they are
class breast cancer dataset represented using into categories benign or malignant. The attributed value of
breast cancer dataset not completely record. That have only one attribute have missing values which are Bare
Nuclei that have 2% of missing value are denoted by “?”. To handle the missing value in machine learning is
one of the important things to get the best accuracy.
In this paper imputation is a way of handling missing value by replacing them with meaningful
replacement values. Since „Bare Nuclei‟ is a categorical attribute, this experiment has use to handling the
missing values of the attribute. Based on the breast cancer dataset given the mode for the attribute „Bare
Nuclei‟ is‟1-10‟ since it has the maximum value which is 10. Figure 5 and Figure 6 show the before and after
dealing the missing value.
Figure 5. Before dealing with missing values Figure 6. After dealing with missing values
2.3. Transformation
The breast cancer dataset is divided into training and testing sets based on 10-fold validation method
where nine parts is for training the algorithm and the last one part for assessing the algorithm. This breast
cancer dataset will be using accuracy estimation because this is effective measure of the performance of a
classifier. Figure 7 shows the procedure of accuracy rate estimation adopted from Mitani and
Hamamoto [24].
5. Bulletin of Electr Eng and Inf ISSN: 2302-9285
Comparative analysis on bayesian classification for breast cancer… (Wan Nor Liyana Wan Hassan Ibeni)
1307
Figure 7. Accuracy rate estimation [24]
2.4. Classification algorithms
In this work, there Bayesian classification algorithms explored, which are the Naïve Bayes (NB),
Bayesian Networks (BN), Tree-Augmented Naïve Bayes (TAN).
Naive Bayes (NB) algorithm is one type of probabilistic classifier based on the Bayes theorem. NB
produces a probability that a given instance belongs to that class rather than prediction. One main advantage
of NB is that it only requires small amount of training data because it is based on major assumption that an
attribute value on a given feature is always independent from the values of other features. This assumption
forms the basis of NB and is widely known as class conditional independence [25].
Bayesian Networks (BN) is a type of directed acyclic graph where the nodes represent domain
variables and are connected with arcs that represent dependencies between the variables. BN is composed of
the network structure and its conditional probabilities. Meanwhile, the Tree Augmented Naïve Bayes (TAN)
incorporates some dependencies between the attributes by building a directed tree among the attribute
variables. This means the n attributes will form a directed tree that represents the dependency relations
between the attributes. This learning algorithm creates a TAN graph structure whereby a single class variable
has no parents while other variables have the class as a parent or at most one other attribute as a parent.
Next, the results from Bayesian classification algorithms will be compared with K-Nearest Neighbor
(KNN) and Support Vector Machine (SVM) from the literature. K-Nearest Neighbor Algorithm (KNN) is an
algorithm to classify the object based on the learning data that is closest to the object. There has two basic
k-Nearest Neighbor Classification algorithm to be considered, (1) finding the k training instances that are
closest to the unseen instances or (2) taking the most commonly accruing classification for these k instances.
Using k is to reduce the effect of the presence of point noise. A number of neighbor making a decision is
considered better than a single neighbor making a decision.
Support Vector Machine (SVM) is a classification technique used in the case of classification. SVM
scans for the best hyper plane which serves as a separator of two classes in the input space, where the input
data serve as a vector in an n-dimensional space. The hyper plane separation sets as margin to be as big as
possible between both sets of data. The margin is calculated by constructing two parallel hyper-planes that
separate between the two sets of data. In an obvious case, a good separation can be accomplished with a
hyper-plane which has the largest distances to neighboring data points in both classes. This means the larger
the low margin of error, the more generalized a classifier could be.
2.5. Evaluation metrics
The evaluation metrics used in the experiments are accuracy, precision, recall and F-measure. The
equations respectively, where TP is the number of true positive, TN is the number of true negatives, FP is the
number of false positive and FN is the number of false negatives.
a. Accuracy
Accuracy is total number of samples correctly classified to the total number of samples classified.
The formula for calculating accuracy is shown in (1).
6. ISSN: 2302-9285
Bulletin of Electr Eng and Inf, Vol. 8, No. 4, December 2019 : 1303 – 1311
1308
(1)
b. Precision
Precision the number of samples is categorized positively classed correctly divided by total samples
are classified as positive samples. The formula for calculating precision is shown in (2).
(2)
c. Recall
Recall is the number of samples is classified as positive divided by the total sample in the testing set
positive category. The formula for calculating recall is shown in (3).
(3)
d. F-Measure
F-Measure. F-Measure is the weighted average of Precision and Recall. Therefore, this score takes
both false positives and false negatives into account. The formula for calculating F-Measure score is shown
in (4).
(4)
3. RESULTS AND DISCUSSION
The performance result for the breast cancer dataset, which by three algorithms is naïve bayes (NB),
bayesian networks (BN), and tree augmented naïve bayes (TAN) has be implement by using Waikato
Environment for knowledge analysis (WEKA) software. The performance result can trace the breast cancer
benign or malignant is correctly or incorrectly classified. The accuracy of an algorithm can be obtained by
comparing the accuracy of result related work and this using new algorithm in this research. Table 1 shows
the accuracy and precision percentage for breast cancer dataset in different training and testing environment.
Table 1. Accuracy and precision for breast cancer dataset
Algorithm Accuracy
(%)
Precision
(%)
Recall
(%)
F-measure (%)
KNN 94.992 96.943 95.483 96.207
SVM 96.852 97.161 98.017 98.591
DT(J48) 94.992 95.633 96.688 96.157
BAN 97.281 96.506 99.325 97.895
NB 95.994 95.196 98.642 96.888
TAN 96.280 95.851 98.430 97.123
The results showed that BN algorithm has the higher classification accuracy based on breast cancer
dataset. BN has the highest value of accuracy, which is 97.281% while NB is only achieved 95.994% and
TAN achieved 96.280%. The comparison the result of the related paper using other algorithm which is KNN
only achieved 94.992%, SVM 96.852% and DT 94.992% of accuracy. This proven the BN algorithm is the
best classification using breast cancer dataset.
Precision is the predictive value for a class label of whether positive or negative depending on the
class it is calculated for. This is essentially the predictive power of the classification algorithm. Figure 8
shows precision of BN is 96.506%, higher than NB 95.994% and TAN 95.851%.
Next, recall is the fraction of relevant instances that are retrieved. A high recall value indicates that
the classification algorithm is able to return most of the relevant results. Figure 9 shows BAN is higher recall,
which is 99.325% as compared to the other algorithm NB with 98.642% and TAN with 98.430%.
7. Bulletin of Electr Eng and Inf ISSN: 2302-9285
Comparative analysis on bayesian classification for breast cancer… (Wan Nor Liyana Wan Hassan Ibeni)
1309
Figure 8. Comparison of precision
Figure 9. Comparison of recall
Finally, F-measure is a measure of accuracy that considers both the precision and the recall rate of
the test and computes a composite score. A good score favors the algorithms with higher sensitivity and
challenges algorithms with higher specificity. Based on the considerations, we can conclude that BN is
preferable to NB and TAN. The F-measure for BAN is higher with 99.895% while NB 96.888% and TAN
97.123%. The results are shown in Figure 10.
To conclude the experiments, Table 2 shows the comparisons of the proposed Bayesian approach
across all three datasets, which are breast cancer, breast cancer wisconsin, and breast tissue dataset and the
result for breast cancer datasets with three classification algorithms.
Based on the table, BN algorithm has higher results than NB and TAN algorithm with the Breast
Cancer and Breast Cancer Wisconsin dataset while than the second higher in the Breast Tissue dataset it
because the attribute Breast tissue more to specification to tissue and that have many classes depend Breast
Cancer and Breast Cancer Wisconsin have only two classes.
8. ISSN: 2302-9285
Bulletin of Electr Eng and Inf, Vol. 8, No. 4, December 2019 : 1303 – 1311
1310
Figure 10. Comparison of F-measure
Table 2. Comparison across different datasets
Algorithm Naïve Bayes (NB) Bayesian Networks (BN) Tree Augmented Naïve Bayes (TAN)
Breast Cancer Wisconsin 72.028% 72.377% 67.832%
Breast Cancer 95.994% 97.281% 96.280%
Breast Tissue 70.754% 66.037% 62.264%
4. CONCLUSION
Classification is an important technique of the data mining with applications in various fields. This
paper presented a comparative experiment on different techniques evaluated on the breast cancer dataset. Six
classifiers were compared based on accuracy to select the best result to be used in the classification task. The
best result between three algorithm chose in this paper showed Bayesian Networks (BN) classification
because has the highest accuracy which is 97.281%. Next, this paper also compared the performance of
Bayesian algorithms based on different datasets, which are the Breast Cancer Wisconsin and Breast Tissue
dataset to prove that BN algorithm has the best accuracy as compared to NB and TAN algorithms. In the
present study few issue like high dimensionality, scalability and accuracy are to be considered for further
research along with other algorithms not currently available in WEKA environment.
ACKNOWLEDGEMENTS
This research is supported by Universiti Tun Hussein Onn Malaysia.
REFERENCES
[1] Malaysia Study on Cancer Survival (2018), available online
http://www.moh.gov.my/resources/index/Penerbitan/Laporan/Malaysian_Study_on_Cancer_Survival_MySCan_20
18.pdf
[2] Cancer Research Malaysia, 2016, available online http://www.cancerresearch.my/our-work/cancers
[3] Fernández-Llatas, C., & García-Gómez, J. M. (Eds.), Data mining in clinical medicine, 2015, Humana Press.
[4] Chen, J. H., Podchiyska, T., & Altman, R. B, "OrderRex: clinical order decision support and outcome predictions
by data-mining electronic medical records," Journal of the American Medical Informatics Association, vol. 23,
no. 2, pp. 339-348, 2015.
[5] Dua, S., & Du, X., “Data mining and machine learning in cybersecurity”. Auerbach Publications, 2016.
[6] Gouda I. Salama, M. B. Abdelhalim, Magdy Abd-elghany Zeid, "Breast cancer diagnosis on three different datasets
using multi-classifiers," International Journal of Computer and Information Technology, vol. 01, no. 01, pp. 36-43,
2012.
[7] Aruna, S., Rajagopalan, S. P., & Nandakishore, L. V., "An empirical comparison of supervised learning algorithms
in disease detection," International Journal of Information Technology Convergence and Services–IJITCS, vol. 1,
no. 4, 81-92, 2011.
[8] Bashir, S., Qamar, U., Khan, F. H., "IntelliHealth: a medical decision support application using a novel weighted
multi-layer classifier ensemble framework," Journal of biomedical informatics, 59, pp. 185-200, 2016
[9] Brown, G. (2004). Diversity in neural network ensembles (Doctoral dissertation, University of Birmingham).
9. Bulletin of Electr Eng and Inf ISSN: 2302-9285
Comparative analysis on bayesian classification for breast cancer… (Wan Nor Liyana Wan Hassan Ibeni)
1311
[10] Breast Cancer, Symptoms, Diagnosis, Types, and More, 2017, available online
https://www.breastcancer.org/symptoms
[11] William H. Wolberg, W. Nick Street, Olvi L. Mangasarian, "Breast Cancer Wisconsin (Diagnostic) Data Set,"
2017, available online https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
[12] Breast tumors (2017), available online https://www.nationalbreastcancer.org/breast-tumors
[13] Megha, R., & Arun, K. S., "Breast Cancer Prediction using Naïve Bayes Classifier," International Journal of
Information Technology & Systems, vol. 1; no. 2, July-Dec. 2012.
[14] Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P., "From data mining to knowledge discovery in databases" AI
magazine, vol. 17, no. 3, 37-54, pp. 1996.
[15] Melamed, I. D., Green, R., & Turian, J. P., "Precision and recall of machine translation," In Companion Volume of
the Proceedings of HLT-NAACL 2003-Short Papers, vol. 2, pp. 61-63, 2003.
[16] Bazila, B., & Ponniah, T., "Comparison of Bayes Classifiers for Breast Cancer Classification," Asian Pac J Cancer
Prev., vol. 19, no. 10, pp. 2917–2920, 2018.
[17] Shweta, K., Shika, A., Sunita, S., "Naive Bayes Classifiers: A Probabilistic Detection Model for Breast Cancer,"
International Journal of Computer Applications, vol. 92, no. 10, pp. 26-31, April 2014.
[18] Diana, D., "Prediction of recurrent events in breast cancer using the Naive Bayesian classification," Annals of
University of Craiova, Math. Comp. Sci. Ser, vol. 36, no. 2, pp. 92-96, 2009.
[19] Rodríguez-López V., Cruz-Barbosa R., "On the Breast Mass Diagnosis Using Bayesian Networks. MICAI 2014,
Part II, LNAI 8857," Springer International Publishing Switzerland, vol 8857, pp. 474–485, 2014
[20] Khatija, A. & Shajun, N., "Breast Cancer Data Classification Using SVM and Naïve Bayes Techniques,"
International Journal of Innovative Research in Computer and Communication Engineering, vol. 4, no. 12,
pp. 21167- 21175, 2016.
[21] Souad, D., "Mining Knowledge of the Patient Record: The Bayesian Classification to Predict and Detect Anomalies
in Breast Cancer," The Electronic Journal of Knowledge Management, vol. 14, no. 3 pp128-139, 2016.
[22] Maysanjaya, I. M. D., Pradnyana, I. M. A. & Putrama, I. M., "Classification of breast cancer using Wrapper and
Naïve Bayes algorithms," International Conference on Mathematics and Natural Sciences. Journal of Physics:
Conf. Series 1040, 2017.
[23] Dian E. R., Nurizal D. P. & Machsus Machsus, "A Modified K-Means with Naïve Bayes (KMNB) Algorithm for
Breast Cancer Classification," Journal of Telecommunication, Electronic and Computer Engineering, vol. 10,
no.1-6, pp. 137-140, 2018
[24] Mitani, Y., & Hamamoto, Y., "A local mean-based nonparametric classifier," Pattern Recognition Letters, vol. 27,
no. 10, pp. 1151-1159, 2006.
[25] Han, J., Pei, J., & Kamber, M., "Data mining: concepts and techniques", Elsevier, 2011.