The diagnosed cases of Breast cancer is increasing annually and unfortunately getting converted into a
high mortality rate. Cancer, at the early stages, is hard to detect because the malicious cells show similar
properties (density) as shown by the non-malicious cells. The mortality ratio could have been minimized if
the breast cancer could have been detected in its early stages. But the current systems have not been able
to achieve a fully automatic system which is not just capable of detecting the breast cancer but also can
detect the stage of it. Estimation of malignancy grading is important in diagnosing the degree of growth of
malicious cells as well as in selecting a proper therapy for the patient. Therefore, a complete and efficient
clinical decision support system is proposed which is capable of achieving breast cancer malignancy
grading scheme very efficiently. The system is based on Image processing and machine learning domains.
Classification Imbalance problem, a machine learning problem, occurs when instances of one class is
much higher than the instances of the other class resulting in an inefficient classification of samples and
hence a bad decision support system. Therefore EUSBoost, ensemble based classifier is proposed which is
efficient and is able to outperform other classifiers as it takes the benefits of both-boosting algorithm with
Random Undersampling techniques. Also comparison of EUSBoost with other techniques is shown in the
paper.
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET Journal
This document presents a study that uses a convolutional neural network (CNN) deep learning model to classify breast cancer tumors as benign or malignant based on ultrasonic images. The researchers trained a CNN model using a dataset of ultrasonic breast images labeled as benign or malignant. The trained model can then analyze new ultrasonic images and determine the tumor type, which could help doctors diagnose and treat breast cancer more accurately. The document provides background on breast cancer and existing diagnosis methods, describes the proposed CNN classification system, and reviews related work applying machine learning to breast cancer analysis.
A Comparative Study on the Methods Used for the Detection of Breast Cancerrahulmonikasharma
Among women in the world, the death caused by the Breast cancer has become the leading role. At an initial stage, the tumor in the breast is hard to detect. Manual attempt have proven to be time consuming and inefficient in many cases. Hence there is a need for efficient methods that diagnoses the cancerous cell without human involvement with high accuracy. Mammography is a special case of CT scan which adopts X-ray method with high resolution film. so that it can detect well the tumors in the breast. This paper describes the comparative study of the various data mining methods on the detection of the breast cancer by using image processing techniques.
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisIOSR Journals
Abstract: We analyze the breast Cancer data available from the WBC, WDBC from UCI machine learning with
the aim of developing accurate prediction models for breast cancer using data mining techniques. Data mining
has, for good reason, recently attracted a lot of attention, it is a new Technology, tackling new problem, with
great potential for valuable commercial and scientific discoveries. The experiments are conducted in WEKA.
Several data mining classification techniques were used on the proposed data. There are many classification
techniques in data mining such as Decision Tree, Rules NNge, Tree random forest, Random Tree, lazy IBK. The
aim of this paper is to investigate the performance of different classification techniques. The data breast cancer
data with a total 286 rows and 10 columns will be used to test and justify the different between the classification
methods and algorithm.
Keywords - Machine learning, data mining Weka, classification, breast cancer
Machine Learning - Breast Cancer DiagnosisPramod Sharma
Machine learning is helping in making smart decisions faster. In this presentation measurements carried out on FNAC was analysed. The results were validated using 20 percent of the data. The data used for POC is from UCI Repository/
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...Kiogyf
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND FIREFLY ALGORITHM
ABSTRACT
Cancer is a globally recognized cause of death. A proper cancer analysis demands the classification of several types of tumor. Investigations into microarray gene expressions seem to be a successful platform for revising genetic diseases. Although the standard machine learning (ML) approaches have been efficient in the realization of significant genes and in the classification of new types of cancer cases, their medical and logical application has faced several drawbacks such as DNA microarray data analysis limitation, which includes an incredible number of features and the relatively small size of an instance. To achieve a reasonable and efficient DNA microarray dataset information, there is a need to extend the level of interpretability and forecast approach while maintaining a great level of precision. In this work, a novel way of cancer classification based on based gene expression profiles is presented. This method is a combination of both Firefly algorithm and Mutual Information Method. First, the features are used to select the features before using the Firefly algorithm for feature reduction. Finally, the Support Vector Machine is used to classify cancer into types. The performance of the proposed system was evaluated by using it to classify datasets from colon cancer; the results of the evaluation were compared with some recent approaches.
Keywords: Feature Selection, Firefly Algorithm, Cancer Disease, Mutual Information
IRJET - Cervical Cancer Prognosis using MARS and ClassificationIRJET Journal
This document describes a study that used data mining techniques like MARS and C5.0 classification to analyze cervical cancer prognosis and recurrence. The study used a dataset of 168 cervical cancer patients with 12 variables from a hospital tumor registry. MARS and C5.0 models were developed using a training set of 118 patients and tested on 50 patients. The C5.0 model had a higher average correct classification rate of 96% compared to 86% for MARS. The C5.0 model also identified the most important prognostic factors as pStage, pT, cell type and RT target Summary. The study demonstrated that data mining can help identify specific risk factors for recurrent cervical cancer.
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)
Interactive Technologies and Games (ITAG) Conference 2016
Health, Disability and EducationDates: Wednesday 26 October 2016 - Thursday 27 October 2016 Location: The Council House, NG1 2DT
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
IRJET - Classifying Breast Cancer Tumour Type using Convolution Neural Netwo...IRJET Journal
This document presents a study that uses a convolutional neural network (CNN) deep learning model to classify breast cancer tumors as benign or malignant based on ultrasonic images. The researchers trained a CNN model using a dataset of ultrasonic breast images labeled as benign or malignant. The trained model can then analyze new ultrasonic images and determine the tumor type, which could help doctors diagnose and treat breast cancer more accurately. The document provides background on breast cancer and existing diagnosis methods, describes the proposed CNN classification system, and reviews related work applying machine learning to breast cancer analysis.
A Comparative Study on the Methods Used for the Detection of Breast Cancerrahulmonikasharma
Among women in the world, the death caused by the Breast cancer has become the leading role. At an initial stage, the tumor in the breast is hard to detect. Manual attempt have proven to be time consuming and inefficient in many cases. Hence there is a need for efficient methods that diagnoses the cancerous cell without human involvement with high accuracy. Mammography is a special case of CT scan which adopts X-ray method with high resolution film. so that it can detect well the tumors in the breast. This paper describes the comparative study of the various data mining methods on the detection of the breast cancer by using image processing techniques.
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisIOSR Journals
Abstract: We analyze the breast Cancer data available from the WBC, WDBC from UCI machine learning with
the aim of developing accurate prediction models for breast cancer using data mining techniques. Data mining
has, for good reason, recently attracted a lot of attention, it is a new Technology, tackling new problem, with
great potential for valuable commercial and scientific discoveries. The experiments are conducted in WEKA.
Several data mining classification techniques were used on the proposed data. There are many classification
techniques in data mining such as Decision Tree, Rules NNge, Tree random forest, Random Tree, lazy IBK. The
aim of this paper is to investigate the performance of different classification techniques. The data breast cancer
data with a total 286 rows and 10 columns will be used to test and justify the different between the classification
methods and algorithm.
Keywords - Machine learning, data mining Weka, classification, breast cancer
Machine Learning - Breast Cancer DiagnosisPramod Sharma
Machine learning is helping in making smart decisions faster. In this presentation measurements carried out on FNAC was analysed. The results were validated using 20 percent of the data. The data used for POC is from UCI Repository/
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...Kiogyf
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND FIREFLY ALGORITHM
ABSTRACT
Cancer is a globally recognized cause of death. A proper cancer analysis demands the classification of several types of tumor. Investigations into microarray gene expressions seem to be a successful platform for revising genetic diseases. Although the standard machine learning (ML) approaches have been efficient in the realization of significant genes and in the classification of new types of cancer cases, their medical and logical application has faced several drawbacks such as DNA microarray data analysis limitation, which includes an incredible number of features and the relatively small size of an instance. To achieve a reasonable and efficient DNA microarray dataset information, there is a need to extend the level of interpretability and forecast approach while maintaining a great level of precision. In this work, a novel way of cancer classification based on based gene expression profiles is presented. This method is a combination of both Firefly algorithm and Mutual Information Method. First, the features are used to select the features before using the Firefly algorithm for feature reduction. Finally, the Support Vector Machine is used to classify cancer into types. The performance of the proposed system was evaluated by using it to classify datasets from colon cancer; the results of the evaluation were compared with some recent approaches.
Keywords: Feature Selection, Firefly Algorithm, Cancer Disease, Mutual Information
IRJET - Cervical Cancer Prognosis using MARS and ClassificationIRJET Journal
This document describes a study that used data mining techniques like MARS and C5.0 classification to analyze cervical cancer prognosis and recurrence. The study used a dataset of 168 cervical cancer patients with 12 variables from a hospital tumor registry. MARS and C5.0 models were developed using a training set of 118 patients and tested on 50 patients. The C5.0 model had a higher average correct classification rate of 96% compared to 86% for MARS. The C5.0 model also identified the most important prognostic factors as pStage, pT, cell type and RT target Summary. The study demonstrated that data mining can help identify specific risk factors for recurrent cervical cancer.
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information (Abeer Alzubaidi, Georgina Cosma, David Brown and Graham Pockley)
Interactive Technologies and Games (ITAG) Conference 2016
Health, Disability and EducationDates: Wednesday 26 October 2016 - Thursday 27 October 2016 Location: The Council House, NG1 2DT
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
A New Approach to the Detection of Mammogram Boundary IJECEIAES
Mammography is a method used for the detection of breast cancer. computer-aided diagnostic (CAD) systems help the radiologist in the detection and interpretation of mass in breast mammography. One of the important information of a mass is its contour and its form because it provides valuable information about the abnormality of a mass. The accuracy in the recognition of the shape of a mass is related to the accuracy of the detected mass contours. In this work we propose a new approach for detecting the boundaries of lesion in mammography images based on region growing algorithm without using the threshold, the proposed method requires an initial rectangle surrounding the lesion selected manually by the radiologist (Region Of Interest), where the region growing algorithm applies on lines segments that attach each pixel of this rectangle with the seed point, such as the ends (seeds) of each line segment grow in a direction towards one another. The proposed approach is evaluated on a set of data with 20 masses of the MIAS base whose contours are annotated manually by expert radiologists. The performance of the method is evaluated in terms of specificity, sensitivity, accuracy and overlap. All the findings and details of approach are presented in detail.
GRADE CATEGORIZATION OF TUMOUR CELLS WITH STANDARD AND REFERENTIAL FRONTIER A...pharmaindexing
This document summarizes a research paper that proposes a new method for classifying brain tumor grades using image processing techniques. The method involves preprocessing MRI images to isolate the tumor region using thresholding and image subtraction. The tumor area is then segmented into four quadrants. Standard points mark the initial tumor location, while growth points registered in later images indicate tumor expansion over time. Comparing growth point changes across patient images at different stages allows calculating the tumor growth rate, aiding pathologists in diagnosis and treatment recommendations.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Week12sampling and feature selection technique to solve imbalanced datasetMusTapha KaMal FaSya
The document describes a study that aimed to improve predictions of breast cancer patient survivability by addressing the issue of imbalanced data. The study used various machine learning techniques including logistic regression, decision trees, SMOTE oversampling, and cost-sensitive learning on a dataset of 215,221 breast cancer patients obtained from the SEER database. Experimental results found that decision tree and logistic regression models combined with SMOTE and cost-sensitive learning had higher predictive performance than the original models. Logistic regression was also found to have better statistical power than decision trees in predicting five-year survivability.
This document discusses a data-driven approach to evaluate guidelines for non-melanoma skin cancers (NMSCs). The approach models clinical guidelines from a hospital using BPMN process models. Patient data from the hospital information system is then analyzed and transformed into event logs to study patient behavior and compliance with the guidelines. When this approach was applied to guidelines for basal cell carcinoma (BCC) and squamous cell carcinoma (SCC), a large percentage of patient events were found to be skipped compared to what is described in the guidelines. This indicates areas where the guidelines could potentially be improved based on actual patient data and experiences.
MEDICAL IMAGING MUTIFRACTAL ANALYSIS IN PREDICTION OF EFFICIENCY OF CANCER TH...csandit
Multifractal analysis of breast tumor tissue prior to chemotherapy can predict chemotherapy response with high accuracy. It was shown to distinguish histological images of different response groups with over 82% accuracy for pathological complete response and progressive/stable disease. The maximum of the multifractal spectrum parameter f(α)max provided the most important predictive value, suggesting it may detect unknown structural clues related to drug resistance. Further investigation of f(α)max could help characterize its predictive potential.
Objective Quality Assessment of Image Enhancement Methods in Digital Mammogra...sipij
Breast cancer is the most common cancer among women worldwide constituting more than 25%
of all cancer incidences occurring in the world [1]. Statistics show that US, India and China
account for more than one third of all breast cancer cases [2]. Also, there has been a steady
increase in the breast cancer incidence among young generation in the world. In India, one out of
two women die after being detected with breast cancer where as in China it is one in four and in
USA it is one in eight [2]. Therefore, the statistics show that cancer mortality is highest in India
among all other nations in the world. In US, though the number of women diagnosed with cancer
is more than that in India, their mortality
Cancer is a disease caused by abnormal cell growth and can affect different cell types. This paper focuses on breast cancer, which is classified based on the type of affected cell. There are several risk factors that can increase the likelihood of developing cancer, including gender, age, genetics, family history, weight, alcohol use, and smoking. The dataset used contains information from 569 breast cancer cases, including demographic and cell feature data. Machine learning algorithms like decision trees can be applied to build models for diagnosing breast cancer based on these attributes with up to 96.46% accuracy.
A novel approach to jointly address localization and classification of breast...IJECEIAES
Localization of the cancerous region as well as classification of the type of the cancer is highly inter-linked with each other. However, investigation towards existing approaches depicts that these problems are always iindividually solved where there is still a big research gap for a generalized solution towards addressing both the problems. Therefore, the proposed manuscript presents a simple, novel, and less-iterative computational model that jointly address the localization-classification problems taking the case study of early diagnosis of breast cancer. The proposed study harnesses the potential of simple bio-inspired optimization technique in order to obtained better local and global best outcome to confirm the accuracy of the outcome. The study outcome of the proposed system exhibits that proposed system offers higher accuracy and lower response time in contrast with other existing classifiers that are freqently witnessed in existing approaches of classification in medical image process.
This document discusses developing predictive models for Acute Myeloid Leukemia (AML) patients using two computational modeling techniques: Artificial Neural Networks (ANN) and Logistic Regression (LR). Protein expression data from 191 AML patients was used to develop ANN and LR models to predict patient remission or resistance. The ANN exhibited high specificity and positive predictive value, indicating potential clinical use. While the LR performed poorly, it provided statistical insights. Integrating ANN and LR may create a more accurate predictive tool for physicians.
Enahncement method to detect breast cancerPunit Karnani
This document summarizes and compares existing methods for enhancing and segmenting mammogram images to aid in the detection of breast abnormalities like masses and calcifications. It categorizes enhancement methods into conventional, region-based, feature-based, and fuzzy techniques. Conventional techniques like histogram equalization are used to enhance masses, while region-based methods enhance features based on surroundings. Feature-based techniques enhance in the wavelet domain and can be used for both masses and calcifications. The document aims to provide a comprehensive review and analysis of mammogram enhancement and segmentation algorithms to highlight the best approaches.
This document provides a comprehensive review and analysis of mammogram enhancement and segmentation techniques. It categorizes mammogram enhancement methods into four groups: conventional, region-based, feature-based, and fuzzy enhancement. Region-based and feature-based techniques can be used to enhance masses, while feature-based and fuzzy methods can also enhance micro-calcifications. The document also categorizes mammogram segmentation into breast region segmentation and region of interest segmentation. Region of interest segmentation using a single view is further divided into unsupervised techniques like region-based, contour-based, and clustering segmentation. The document evaluates and compares various enhancement and segmentation algorithms and highlights the best available approaches.
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...ijsc
Microarray is a useful technique for measuring expression data of thousands or more of genes
simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data
is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the
distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and
robust gene identification methods is extremely fundamental. Many gene selection methods as well as their
corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination
capability is selected and classification rules are generated for cancer based on gene
expression profiles. The method first computes importance factor of each gene of experimental cancer
dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high
class discrimination capability according to their depended degree of classes. Then initial important genes
are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans
clustering algorithm is applied on each selected gene of initial reduct and compute missclassification
errors of individual genes. The final reduct is formed by selecting most important genes with
respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced
by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples
of experimental test dataset. The proposed method test on four publicly available cancerous gene
expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using
important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to
prove the robustness of proposed method compares the outcomes (correctly classified instances) with some
existing well known classifiers.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
IRJET- A Survey on Categorization of Breast Cancer in Histopathological ImagesIRJET Journal
This document summarizes various methods for categorizing breast cancer in histopathological images. It discusses machine learning and image processing techniques that have been used to build computer-aided diagnosis (CAD) systems to help pathologists diagnose breast cancer more objectively and consistently. The document reviews different classification methods that have been proposed, including those using fuzzy logic, level set methods, convolutional neural networks, texture features and ensemble methods. It concludes that accurately classifying histopathological images remains challenging due to limited publicly available datasets and variability in tissue appearance, but that machine learning and advanced image analysis offer promising approaches to improve cancer detection and diagnosis.
Hybrid prediction model with missing value imputation for medical data 2015-g...Jitender Grover
The document presents a novel hybrid prediction model called HPM-MI that uses K-means clustering and multilayer perceptron (MLP) to improve predictive classification for medical data with missing values. The model first analyzes 11 different imputation techniques using K-means clustering to select the best one for filling missing values in the data. It then uses K-means clustering again to validate class labels and remove incorrectly classified instances before applying the MLP classifier. The model is tested on three medical datasets from the UCI repository and shows improved accuracy, sensitivity, specificity and other metrics compared to existing methods, particularly when datasets have large numbers of missing values.
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
The most common type of cancer in women
worldwide is the Breast Cancer. Breast cancer may be
detected early using Mammograms, probably before it's
spread. Recurrent breast cancer could occur months or years
after initial treatment. The cancer could return within the
same place because the original cancer (local recurrence), or it
may spread to different areas of your body (distant
recurrence). Early stage treatment is done not only to cure
breast cancer however additionally facilitate in preventing its
repetition/recurrence. Data mining algorithms provide
assistance in predicting the early-stage breast cancer that
continually has been difficult analysis drawback. The
projected analysis can establish the most effective algorithm
that predicts the recurrence of the breast cancer and improve
the accuracy the algorithms. Large information like Clump,
Classification, Association Rules, Prediction and Neural
Networks, Decision Trees can be analyzed using data mining
applications and techniques.
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
The classification algorithms are very frequently used algorithms for analyzing various kinds of data available in different repositories which have real world applications. The main objective of this research work is to find the performance of classification algorithms in analyzing Breast Cancer data via analyzing the mammogram images based its characteristics.Different attribute values of cancer affected mammogram images are considered for analysis in this work. The Patients food habits, age of the patients, their life styles, occupation, their problem about the diseases and other information are taken into account for classification. Finally, performance of classification algorithms J48, CART and ADTree are given with its accuracy. The accuracy of taken algorithms is measured by various measures like specificity, sensitivity and kappa statistics (Errors).
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
This document describes a study that uses machine learning and deep learning techniques to detect breast tumors using a publicly available dataset. Various classification models were developed including logistic regression, decision trees, random forests, support vector machines, gradient boosting, extreme gradient boosting, LightGBM, and a recurrent neural network. These models were trained on features from the dataset to classify breast tumor cases as benign or malignant. The models' performance was evaluated using various metrics like accuracy, precision, recall, and F1 score. Ensemble techniques like bagging and boosting were also explored to improve the models' performance at detecting breast cancer.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Machine Learning and Applications: An International Journal (MLAIJ) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the machine learning. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of machine learning and applications.The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on machine learning advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of machine learning.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of machine learning.
A New Approach to the Detection of Mammogram Boundary IJECEIAES
Mammography is a method used for the detection of breast cancer. computer-aided diagnostic (CAD) systems help the radiologist in the detection and interpretation of mass in breast mammography. One of the important information of a mass is its contour and its form because it provides valuable information about the abnormality of a mass. The accuracy in the recognition of the shape of a mass is related to the accuracy of the detected mass contours. In this work we propose a new approach for detecting the boundaries of lesion in mammography images based on region growing algorithm without using the threshold, the proposed method requires an initial rectangle surrounding the lesion selected manually by the radiologist (Region Of Interest), where the region growing algorithm applies on lines segments that attach each pixel of this rectangle with the seed point, such as the ends (seeds) of each line segment grow in a direction towards one another. The proposed approach is evaluated on a set of data with 20 masses of the MIAS base whose contours are annotated manually by expert radiologists. The performance of the method is evaluated in terms of specificity, sensitivity, accuracy and overlap. All the findings and details of approach are presented in detail.
GRADE CATEGORIZATION OF TUMOUR CELLS WITH STANDARD AND REFERENTIAL FRONTIER A...pharmaindexing
This document summarizes a research paper that proposes a new method for classifying brain tumor grades using image processing techniques. The method involves preprocessing MRI images to isolate the tumor region using thresholding and image subtraction. The tumor area is then segmented into four quadrants. Standard points mark the initial tumor location, while growth points registered in later images indicate tumor expansion over time. Comparing growth point changes across patient images at different stages allows calculating the tumor growth rate, aiding pathologists in diagnosis and treatment recommendations.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Week12sampling and feature selection technique to solve imbalanced datasetMusTapha KaMal FaSya
The document describes a study that aimed to improve predictions of breast cancer patient survivability by addressing the issue of imbalanced data. The study used various machine learning techniques including logistic regression, decision trees, SMOTE oversampling, and cost-sensitive learning on a dataset of 215,221 breast cancer patients obtained from the SEER database. Experimental results found that decision tree and logistic regression models combined with SMOTE and cost-sensitive learning had higher predictive performance than the original models. Logistic regression was also found to have better statistical power than decision trees in predicting five-year survivability.
This document discusses a data-driven approach to evaluate guidelines for non-melanoma skin cancers (NMSCs). The approach models clinical guidelines from a hospital using BPMN process models. Patient data from the hospital information system is then analyzed and transformed into event logs to study patient behavior and compliance with the guidelines. When this approach was applied to guidelines for basal cell carcinoma (BCC) and squamous cell carcinoma (SCC), a large percentage of patient events were found to be skipped compared to what is described in the guidelines. This indicates areas where the guidelines could potentially be improved based on actual patient data and experiences.
MEDICAL IMAGING MUTIFRACTAL ANALYSIS IN PREDICTION OF EFFICIENCY OF CANCER TH...csandit
Multifractal analysis of breast tumor tissue prior to chemotherapy can predict chemotherapy response with high accuracy. It was shown to distinguish histological images of different response groups with over 82% accuracy for pathological complete response and progressive/stable disease. The maximum of the multifractal spectrum parameter f(α)max provided the most important predictive value, suggesting it may detect unknown structural clues related to drug resistance. Further investigation of f(α)max could help characterize its predictive potential.
Objective Quality Assessment of Image Enhancement Methods in Digital Mammogra...sipij
Breast cancer is the most common cancer among women worldwide constituting more than 25%
of all cancer incidences occurring in the world [1]. Statistics show that US, India and China
account for more than one third of all breast cancer cases [2]. Also, there has been a steady
increase in the breast cancer incidence among young generation in the world. In India, one out of
two women die after being detected with breast cancer where as in China it is one in four and in
USA it is one in eight [2]. Therefore, the statistics show that cancer mortality is highest in India
among all other nations in the world. In US, though the number of women diagnosed with cancer
is more than that in India, their mortality
Cancer is a disease caused by abnormal cell growth and can affect different cell types. This paper focuses on breast cancer, which is classified based on the type of affected cell. There are several risk factors that can increase the likelihood of developing cancer, including gender, age, genetics, family history, weight, alcohol use, and smoking. The dataset used contains information from 569 breast cancer cases, including demographic and cell feature data. Machine learning algorithms like decision trees can be applied to build models for diagnosing breast cancer based on these attributes with up to 96.46% accuracy.
A novel approach to jointly address localization and classification of breast...IJECEIAES
Localization of the cancerous region as well as classification of the type of the cancer is highly inter-linked with each other. However, investigation towards existing approaches depicts that these problems are always iindividually solved where there is still a big research gap for a generalized solution towards addressing both the problems. Therefore, the proposed manuscript presents a simple, novel, and less-iterative computational model that jointly address the localization-classification problems taking the case study of early diagnosis of breast cancer. The proposed study harnesses the potential of simple bio-inspired optimization technique in order to obtained better local and global best outcome to confirm the accuracy of the outcome. The study outcome of the proposed system exhibits that proposed system offers higher accuracy and lower response time in contrast with other existing classifiers that are freqently witnessed in existing approaches of classification in medical image process.
This document discusses developing predictive models for Acute Myeloid Leukemia (AML) patients using two computational modeling techniques: Artificial Neural Networks (ANN) and Logistic Regression (LR). Protein expression data from 191 AML patients was used to develop ANN and LR models to predict patient remission or resistance. The ANN exhibited high specificity and positive predictive value, indicating potential clinical use. While the LR performed poorly, it provided statistical insights. Integrating ANN and LR may create a more accurate predictive tool for physicians.
Enahncement method to detect breast cancerPunit Karnani
This document summarizes and compares existing methods for enhancing and segmenting mammogram images to aid in the detection of breast abnormalities like masses and calcifications. It categorizes enhancement methods into conventional, region-based, feature-based, and fuzzy techniques. Conventional techniques like histogram equalization are used to enhance masses, while region-based methods enhance features based on surroundings. Feature-based techniques enhance in the wavelet domain and can be used for both masses and calcifications. The document aims to provide a comprehensive review and analysis of mammogram enhancement and segmentation algorithms to highlight the best approaches.
This document provides a comprehensive review and analysis of mammogram enhancement and segmentation techniques. It categorizes mammogram enhancement methods into four groups: conventional, region-based, feature-based, and fuzzy enhancement. Region-based and feature-based techniques can be used to enhance masses, while feature-based and fuzzy methods can also enhance micro-calcifications. The document also categorizes mammogram segmentation into breast region segmentation and region of interest segmentation. Region of interest segmentation using a single view is further divided into unsupervised techniques like region-based, contour-based, and clustering segmentation. The document evaluates and compares various enhancement and segmentation algorithms and highlights the best available approaches.
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...ijsc
Microarray is a useful technique for measuring expression data of thousands or more of genes
simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data
is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the
distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and
robust gene identification methods is extremely fundamental. Many gene selection methods as well as their
corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination
capability is selected and classification rules are generated for cancer based on gene
expression profiles. The method first computes importance factor of each gene of experimental cancer
dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high
class discrimination capability according to their depended degree of classes. Then initial important genes
are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans
clustering algorithm is applied on each selected gene of initial reduct and compute missclassification
errors of individual genes. The final reduct is formed by selecting most important genes with
respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced
by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples
of experimental test dataset. The proposed method test on four publicly available cancerous gene
expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using
important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to
prove the robustness of proposed method compares the outcomes (correctly classified instances) with some
existing well known classifiers.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
IRJET- A Survey on Categorization of Breast Cancer in Histopathological ImagesIRJET Journal
This document summarizes various methods for categorizing breast cancer in histopathological images. It discusses machine learning and image processing techniques that have been used to build computer-aided diagnosis (CAD) systems to help pathologists diagnose breast cancer more objectively and consistently. The document reviews different classification methods that have been proposed, including those using fuzzy logic, level set methods, convolutional neural networks, texture features and ensemble methods. It concludes that accurately classifying histopathological images remains challenging due to limited publicly available datasets and variability in tissue appearance, but that machine learning and advanced image analysis offer promising approaches to improve cancer detection and diagnosis.
Hybrid prediction model with missing value imputation for medical data 2015-g...Jitender Grover
The document presents a novel hybrid prediction model called HPM-MI that uses K-means clustering and multilayer perceptron (MLP) to improve predictive classification for medical data with missing values. The model first analyzes 11 different imputation techniques using K-means clustering to select the best one for filling missing values in the data. It then uses K-means clustering again to validate class labels and remove incorrectly classified instances before applying the MLP classifier. The model is tested on three medical datasets from the UCI repository and shows improved accuracy, sensitivity, specificity and other metrics compared to existing methods, particularly when datasets have large numbers of missing values.
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
The most common type of cancer in women
worldwide is the Breast Cancer. Breast cancer may be
detected early using Mammograms, probably before it's
spread. Recurrent breast cancer could occur months or years
after initial treatment. The cancer could return within the
same place because the original cancer (local recurrence), or it
may spread to different areas of your body (distant
recurrence). Early stage treatment is done not only to cure
breast cancer however additionally facilitate in preventing its
repetition/recurrence. Data mining algorithms provide
assistance in predicting the early-stage breast cancer that
continually has been difficult analysis drawback. The
projected analysis can establish the most effective algorithm
that predicts the recurrence of the breast cancer and improve
the accuracy the algorithms. Large information like Clump,
Classification, Association Rules, Prediction and Neural
Networks, Decision Trees can be analyzed using data mining
applications and techniques.
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
The classification algorithms are very frequently used algorithms for analyzing various kinds of data available in different repositories which have real world applications. The main objective of this research work is to find the performance of classification algorithms in analyzing Breast Cancer data via analyzing the mammogram images based its characteristics.Different attribute values of cancer affected mammogram images are considered for analysis in this work. The Patients food habits, age of the patients, their life styles, occupation, their problem about the diseases and other information are taken into account for classification. Finally, performance of classification algorithms J48, CART and ADTree are given with its accuracy. The accuracy of taken algorithms is measured by various measures like specificity, sensitivity and kappa statistics (Errors).
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
This document describes a study that uses machine learning and deep learning techniques to detect breast tumors using a publicly available dataset. Various classification models were developed including logistic regression, decision trees, random forests, support vector machines, gradient boosting, extreme gradient boosting, LightGBM, and a recurrent neural network. These models were trained on features from the dataset to classify breast tumor cases as benign or malignant. The models' performance was evaluated using various metrics like accuracy, precision, recall, and F1 score. Ensemble techniques like bagging and boosting were also explored to improve the models' performance at detecting breast cancer.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Machine Learning and Applications: An International Journal (MLAIJ) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the machine learning. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of machine learning and applications.The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on machine learning advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of machine learning.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of machine learning.
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET Journal
This document describes a study that used machine learning algorithms to predict breast cancer. The researchers trained decision tree, logistic regression, and random forest models on a breast cancer dataset. They preprocessed the data, which included converting categorical variables to numeric. The random forest model achieved the best prediction accuracy of 98.6%. The study aims to help detect breast cancer at earlier stages to improve treatment outcomes.
Applying Deep Learning to Transform Breast Cancer DiagnosisCognizant
Deep convolutional neural networks can assist pathologists in breast cancer diagnosis by automatically filtering benign tissue biopsies, identifying malignant regions and labeling important cellular features like nuclei for further analysis. Automatic detection of diagnostically relevant regions-of-interest and nuclei segmentation reduces the pathologist’s workload, while ensuring that no critical region is overlooked, rendering breast cancer diagnosis more reliable, efficient and cost-effective.
Breast cancer detection using machine learning approaches: a comparative studyIJECEIAES
As the cause of the breast cancer disease has not yet clearly identified and a method to prevent its occurrence has not yet been developed, its early detection has a significant role in enhancing survival rate. In fact, artificial intelligent approaches have been playing an important role to enhance the diagnosis process of breast cancer. This work has selected eight classification models that are mostly used to predict breast cancer to be under investigation. These classifiers include single and ensemble classifiers. A trusted dataset has been enhanced by applying five different feature selection methods to pick up only weighted features and to neglect others. Accordingly, a dataset of only 17 features has been developed. Based on our experimental work, three classifiers, multi-layer perceptron (MLP), support vector machine (SVM) and stack are competing with each other by attaining high classification accuracy compared to others. However, SVM is ranked on the top by obtaining an accuracy of 97.7% with classification errors of 0.029 false negative (FN) and 0.019 false positive (FP). Therefore, it is noteworthy to mention that SVM is the best classifier and it outperforms even the stack classier.
Image processing and machine learning techniques used in computer-aided dete...IJECEIAES
This paper aims to review the previously developed Computer-aided detection (CAD) systems for mammogram screening because increasing death rate in women due to breast cancer is a global medical issue and it can be controlled only by early detection with regular screening. Till now mammography is the widely used breast imaging modality. CAD systems have been adopted by the radiologists to increase the accuracy of the breast cancer diagnosis by avoiding human errors and experience related issues. This study reveals that in spite of the higher accuracy obtained by the earlier proposed CAD systems for breast cancer diagnosis, they are not fully automated. Moreover, the false-positive mammogram screening cases are high in number and over-diagnosis of breast cancer exposes a patient towards harmful overtreatment for which a huge amount of money is being wasted. In addition, it is also reported that the mammogram screening result with and without CAD systems does not have noticeable difference, whereas the undetected cancer cases by CAD system are increasing. Thus, future research is required to improve the performance of CAD system for mammogram screening and make it completely automated.
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
Breast Cancer is one of the crucial and burning diseases that has invaded women. Predicting breast cancer manually takes a lot of time and it is difficult for the physician to classification. So, detecting cancer through various automatic diagnostic techniques is very necessary. Data mining is the process of running powerful classification techniques that extract useful information from data. The uses and potentials of these techniques have found its scope in medical data. Classification techniques tend to simplify the prediction segment.
An approach of cervical cancer diagnosis using class weighting and oversampli...TELKOMNIKA JOURNAL
Globally, cervical cancer caused 604,127 new cases and 341,831 deaths in 2020, according to the global cancer observatory. In addition, the number of cervical cancer patients who have no symptoms has grown recently. Therefore, giving patients early notice of the possibility of cervical cancer is a useful task since it would enable them to have a clear understanding of their health state. The use of artificial intelligence (AI), particularly in machine learning, in this work is continually uncovering cervical cancer. With the help of a logit model and a new deep learning technique, we hope to identify cervical cancer using patient-provided data. For better outcomes, we employ Keras deep learning and its technique, which includes class weighting and oversampling. In comparison to the actual diagnostic result, the experimental result with model accuracy is 94.18%, and it also demonstrates a successful logit model cervical cancer prediction.
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
inAbstract— Pattern Recognition (PR) plays an important role in field of Bioinformatics. PR is concerned with processing raw measurement data by a computer to arrive at a prediction that can be used to formulate a decision to be taken. The important problem in which pattern recognition are applied have common that they are too complex to model explicitly. Diverse methods of this PR are used to analyze, segment and manage the high dimensional microarray gene data for classification. PR is concerned with the development of systems that learn to solve a given problem using a set of instances, each instances represented by a number of features. The microarray expression technologies are possible to monitor the expression levels of thousands of genes simultaneously. The microarrays generated large amount of data has stimulate the development of various computational methods to different biological processes by gene expression profiling. Microarray Gene Expression Profiling (MGEP) is important in Bioinformatics, it yield various high dimensional data used in various clinical applications like cancer diagnostics and drug designing. In this work a new schema has developed for classification of unknown malignant tumors into known class. According to this work an new classification scheme includes the transformation of very high dimensional microarray data into mahalanobis space before classification. The eligibility of the proposed classification scheme has proved to 10 commonly available cancer gene datasets, this contains both the binary and multiclass data sets. To improve the performance of the classification gene selection method is applied to the datasets as a preprocessing and data extraction step.
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
IRJET- Comparison of Breast Cancer Detection using Probabilistic Neural Netwo...IRJET Journal
1) The document compares two machine learning algorithms, probabilistic neural network (PNN) and support vector machine (SVM), for detecting breast cancer in mammogram images.
2) It evaluates the performance of PNN and SVM on a dataset of 322 mammogram images containing both benign and malignant tumors.
3) The proposed methodology applies techniques like image enhancement, segmentation, and feature extraction before classifying the images using PNN and SVM to detect tumors and determine if they are benign or malignant.
Diagnosis of Cancer using Fuzzy Rough Set TheoryIRJET Journal
This document presents a study that uses fuzzy rough set theory to diagnose cancer using medical data. It has four main modules: 1) feature selection to identify relevant features using fuzzy rough subset evaluation and particle swarm optimization, 2) instance selection to remove missing/noisy data, 3) classification of the data using fuzzy rough nearest neighbor algorithm, and 4) performance analysis using metrics like accuracy, sensitivity and AUC. The study aims to classify different cancer types by reducing noise and selecting optimal features/instances to improve classifier performance. It is found that fuzzy rough set approaches help preserve meaning during reduction and improve classification compared to other methods.
IRJET- Breast Cancer Prediction using Support Vector MachineIRJET Journal
This document discusses using a support vector machine to predict whether breast cancer is benign or malignant. The SVM model is trained on the Wisconsin Diagnostic Breast Cancer dataset containing 569 samples with 32 features each. The SVM achieves 96.09% accuracy at classifying samples into benign versus malignant categories on the test data, demonstrating its effectiveness at predicting breast cancer diagnosis.
A Progressive Review on Early Stage Breast Cancer DetectionIRJET Journal
This document provides a progressive review of techniques for early stage breast cancer detection. It discusses various image segmentation methods used in previous research like threshold-based, region-based, edge-based, and clustering-based segmentation. Feature extraction techniques discussed include gray level co-occurrence matrix, principal component analysis, and linear discriminant analysis. Classifiers reviewed are convolutional neural networks, support vector machines, artificial neural networks, random forests, and decision trees. The paper analyzes the merits and limitations of these techniques and identifies convolutional neural networks and artificial neural networks as providing high accuracy for breast cancer classification from images.
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...CSCJournals
Breast cancer patients with the same diagnostic and clinical prognostic profile can have markedly different clinical outcome. This difference is possibly caused by the limitation of current breast cancer prognostic indices, which group molecularly distinct patients into similar clinical classes based mainly on morphological of disease. Traditional clinical based prognosis models were discovered contain some restriction to address the heterogeneity of breast cancer. The invention of microarray technology and its ability to simultaneously interrogate thousands genes has changed the paradigm of molecular classification of human cancers as well as it shifted clinical prognosis model to broader prospect. Numerous studies have revealed the potential value of gene expression signatures in examining the risk of disease recurrence. However, currently most of these studies attempted to implement genetic marker based prognostic models to replace the traditional clinical markers, yet neglecting the rich information contain in clinical information. Therefore, this research took an effort to integrate both clinical and microarray data in order to obtain accurate breast cancer prognosis, by taking into account that these data complements each other. This article presents a review of the development of breast cancer prognosis models, concentrating precisely on clinical and gene expression profiles. The literature is reviewed in an explicit machine learning framework, which include the elements of feature selection and classification techniques.
Similar to Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Grading - A Review (20)
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Grading - A Review
1. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
DOI: 10.5121/ijmpict.2017.8102 17
ENSEMBLE CLASSIFIER APPROACH IN BREAST
CANCER DETECTION AND MALIGNANCY
GRADING- A REVIEW
Deepti Ameta1
1
Department of Computer Engineering, Banasthali University, Banasthali, India
ABSTRACT
The diagnosed cases of Breast cancer is increasing annually and unfortunately getting converted into a
high mortality rate. Cancer, at the early stages, is hard to detect because the malicious cells show similar
properties (density) as shown by the non-malicious cells. The mortality ratio could have been minimized if
the breast cancer could have been detected in its early stages. But the current systems have not been able
to achieve a fully automatic system which is not just capable of detecting the breast cancer but also can
detect the stage of it. Estimation of malignancy grading is important in diagnosing the degree of growth of
malicious cells as well as in selecting a proper therapy for the patient. Therefore, a complete and efficient
clinical decision support system is proposed which is capable of achieving breast cancer malignancy
grading scheme very efficiently. The system is based on Image processing and machine learning domains.
Classification Imbalance problem, a machine learning problem, occurs when instances of one class is
much higher than the instances of the other class resulting in an inefficient classification of samples and
hence a bad decision support system. Therefore EUSBoost, ensemble based classifier is proposed which is
efficient and is able to outperform other classifiers as it takes the benefits of both-boosting algorithm with
Random Undersampling techniques. Also comparison of EUSBoost with other techniques is shown in the
paper.
KEYWORDS
Breast Cancer Malignancy Grading, Breast cancer detection, Imbalanced classification problem,
Ensemble classifier, Evolutionary Undersampling with Boosting
1. INTRODUCTION
The diagnosed cases of cancer is increasing annually and the maximum diagnosed cases are of
the type Breast cancer among the middle aged women. For instance between 2009 and 2013
there was an increase of 2400 new diagnosed cases in India. This statistics is increasing annually
and unfortunately getting converted into a larger death rate. This ratio could have been
minimized to a certain level if the detection of the Breast cancer could have been made in the
early stages. The probability of treating the cancer in its early stage is considered to be higher as
compared to treating it in its more advanced stages. The signs of it may include the formation of
lumps in the breast region, pain, fever, formation of pink or red patches, ejection of infectious
fluid from the nipples and many more symptoms. But the problem with the cancer is that it is
very much hard to detect. The treatment is most expected to be successful in the early stages.
In order to differentiate different stages, a grading system is used during the diagnosis process so
that a proper treatment can be determined. It is not only being used in India but also in many
other countries of the world. At present many techniques are used for detecting the malignant
2. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
18
cells.Different types of mammography techniques and laboratory tests are being performed for
the purpose. A Fine Needle Aspiration biopsy test (FNA) is one of the techniques used to detect
the malicious cells in the sample extracted from the questionable area from the breast through a
very thin needle. The results help the doctors to select the appropriate treatment and therapy.
Many techniques provide a grading scheme to the collected samples. Such techniques makes use
of Machine Learning classifiers. The grading scheme is based on the grading scale proposed by
Bloom and Richardson [1] and modified Scarff-Bloom-Richardson system [2] which involves
the following:
1. Degree of structural differentiation (SD): It is a factor which describes the ability of the
cells and tissues to get combined into groups based on some similarities. The cells are
capable of forming the tubules and groups. If a single group is spotted in the stained
slide under the microscope, then a case of Intermediate Malignancy case can be assured
and if many groups are scattered throughout the observation then a High Malignancy
Grade can be assigned to it.
2. Frequency of hyperchromatic and mitotic figures (HMF): In malignancy detection,
spotting the behaviour of the nuclei of the cells is a significant task as the tendency of
splitting the cell’s nuclei can detect the stage of the cancer and how fast it is growing
inside the body. Mitosis is a biological phenomenon in which the mother cell gets
divided into two equal cells and hence fastens the spreading of the cancer.
3. Plemorphism (P): This factor differentiates the cells based on their color, shape and size,
staining of the nuclei and their behaviour when they are stained with a chemical and
observed under the microscope.
This grading scheme can be achieved by the applications of Machine Learning. But classification
of malignancy grading suffers from the Class Imbalance Problem or imbalanced classification
problem, which is a difficulty in Machine Learning process. We need to overcome this problem
by using ensemble classifier approach and hence an automatic clinical decision support system
for malignancy grading can be achieved which can detect the cancer of different stages very
efficiently.
The main contributions of the work is as follows:
• A proposal for a clinical decision support system for malignancy grading (detection of
the stage of the cancer)
• The solution to the imbalanced classification problem is proposed by combining the
random undersampling with boosting and proposing Evolutionary undersampling
boosting technique to make the system more efficient and stable.
2. THE CLASSIFICATION IMBALANCE PROBLEM
The proposed system is not just capable of detecting the cancer but can also provide a
malignancy grading so that the detection of cancer in the early stage becomes easier. For this
purpose there is a need to classify the samples which are taken from the questionable part of the
breast, in different groups or classes so that the samples showing similar properties can be
grouped into the same class. Several approaches of machine learning is used. The classification
imbalance problem for two class create problems in the learning process. According to this
problem, the number of instances of one class is very much greater than the number of instances
of the other class [3]. One of the classes is under represented. Hence the recognition rate of the
minority class gets worsen. Most of the machine learning algorithm works when the number of
instances from both the classes are nearly equal. The present standard classifier learning
algorithm focuses on getting a high accuracy rate and therefore the minority class is considered
as a noise or disturbance and they show a biased nature towards the majority class. When
3. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
19
addressing a two class imbalance problem, a confusion matrix can be used to store the correctly
and incorrectly classified sets [4].
Table1: Confusion Matrix for two class problem
Positive Prediction Negative Prediction
Positive class True Positive(TP) False Negative (FN)
Negative class False Positive (FP) True Negative (TN)
Historically, accuracy rate has been considered to evaluate the performance of the classifiers but
this cannot be taken as an only parameter because accuracy rate weights the influence of the
classes depending on the number of instances, and therefore classes with more instances have
more influence on the results. [5]
Accuracy =
So to converge towards the perfection we consider some more parameters
1. Sensitivity /True positive rate (TPR)/ Recall: It is the classification function which
represents how correctly the proportion of positives are identified in the dataset i.e. sick
people correctly identified as sick. It is given by the formula
TPR =
2. Specificity /True negative rate: It is a classification function which represents how
correctly the proportion of negatives are classified in the dataset i.e. the percentage of
healthy people who are correctly identified as not suffering from any diseases. It is
given by the formula
SPC =
3. Geometric Mean (GM): The Geometric Mean considers the balance between the
classification performances on the majority and minority class by considering both the
parameters, sensitivity and specificity. It is given by the formula [4]
Geometric Mean =
4. ROC and AUC: The receiver operating characteristic (ROC) curve and area under ROC
curve (AUC) is an important parameter for accessing the overall classification
performance. The ROC is a curve which shows the relationship between benefits (true
positive rate) and costs (false positive rate) [6]. Over a wide range of operating points
the ROC curve can be used to indicate whether a classifier is superior to another
classifier or not. AUC provides a scalar measure of the performance of a classifier
[7].The larger the AUC, the better is the performance over the data set [4]. It is given by
the formula
5. F-Measure: It is the harmonic mean of recall and precision. The more F-measure value,
the more is the precision and recall. It is given by the formula
3. SOLUTIONS FOR THE CLASS IMBALANCE PROBLEM
To deal with class imbalance problem many techniques has been developed which can be
4. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
20
categorized into four main categories [8]
1. Algorithm level approaches: These approaches deal with biasing the learning procedure
towards the minority class [9]. This approach adapts a supervised classifier to strengthen
the accuracy towards the minority class. It creates a new classifier or brings changes in
the existing ones. This approach heavily relies on the nature of the classifier and they
require the knowledge of the classifier used and the application domain as they are
domain specific [10].
2. Data level approaches: These approaches rely on data resampling in order to reduce the
effect of classification imbalance problem. They use a data pre-processing step which
makes them independent of the classifier used [11]. Examples of these approaches
include random undersampling and oversampling, sampling methods and feature
selection strategies.
3. Cost-sensitive learning: The main idea is to assign a higher misclassification cost to the
objects belonging to the minority class as compared to the misclassification costs
assigned to majority class [10]. It requires both, data level transformations (adding costs
to instances) and algorithm level modifications (modifying the learning process to accept
costs) [12]. But it is very difficult for experts to assign costs to every class.
4. Ensemble based approaches: A classifier is introduced for each training set examples by
a chosen machine learning algorithm, therefore, there will be k number of classifiers for
each k variations of the training set and the result is produces by combining the output of
all the classifiers [13]. AdaBoost, Bagging and RandomForest are some of the learning
methods. Many reported works like SMOTEBoost, RUSBoost and DataBoost-IM have
improved the classification performance [13].
5. Hybrid approaches: Most of the learning methods rely on combining many machine
learning algorithms to achieve better results. The hybridization is used with other
learning methods and is designed to alleviate the problem in sampling, feature sub-set
selection and cost matrix optimization [13].
The approaches mentioned above cannot be used independently because of their disadvantages
so hybridization with other approaches provides a better way of achieving good results and
dealing with classification imbalance problem. Hence EUSBoost model is proposed which
combines Evolutionary Undersampling with AdaBoost.M2 algorithm [14].
4. PROPOSED SOLUTION: EVOLUTIONARY UNDERSAMPLING BOOSTING
The combination of ensemble based classifier learning techniques has been proposed to
overcome the classification imbalance problem in breast cancer malignancy grading. The most
common approach is to introduce the data preprocessing step to balance the data distribution.
Some of them are Bagging, Boosting and Hybrid-based ensembles depending on the ensemble
based algorithm [8]. These methods are more versatile as compared to cost-sensitive ensembles
because no setting of cost for different classes is required [4]. EUSBoost is a proposed solution
which is based on Random Undersampling with Boosting. EUSBoost was able to outperform
other methods because it uses the Evolutionary undersampling and provides diversity of the base
classifiers [4].
4.1. Combining Boosting with Evolutionary Undersampling
Boosting algorithm [6] improves the performance of the weak classifiers by forcing the learners
to learn the difficult examples. At each boosting iteration redistribution of the training data set is
achieved by updating their respective weights for each sample [6]. To understand EUSBoost we
5. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
21
will understand Boosting algorithm first as it provides a base to it. The boosting steps involved in
AdaBoost can be shown by the following steps: [15]
Steps: Adaptive Boosting algorithm (AdaBoost) [15]
Let N be the total number of samples in the training data-set with two classes represented by y є
{0,1}.
1. Assign initial weights to each of the samples in the training data-set. Initially these
weights should be assigned equally.
2. Now by using weighted resampling, randomly select a data-set with N samples from the
original data-set. The samples with the higher weights have more probability to get
selected.
3. Obtain a learner, f(x), usually a predictive model or a classifier from the resampled data-
set. Apply this newly obtained learner to the original sample data-set. If a wrong
classification is done then the error_flag=1, otherwise error_flag=0 and compute the total
sum of the weighted errors of the samples.
4. Calculate confidence index of the learner f(x) which depends on the weighted error. Now
again update the weights of the training samples. If the samples are correctly identified
then the weights remain unchanged while for misclassified samples, the weight gets
increases.
5. Normalization of the weights is done and collectively their weights are considered. After
T iterations we get T models using voting approach.
In Boosting, classifiers are learned serially using the whole training set in all the base classifiers.
More attention is given to difficult instances so that the samples misclassified in the previous
iterations can be classified correctly in the new iterations [4].
Evolutionary undersampling is an evolutionary prototype selection algorithm which is suitable to
be used in class imbalance domains [4]. In this, a prototype selection is done, which is a sampling
process aiming at reducing the reference set for the nearest neighbour classifier and also
improving efficiency and supporting less storage necessity [16]. Chromosomes are ranked using
a fitness function and in EUSBoost, the fitness function can be considered as a balancing between
both the classes and the expected performance with the selected data subsets [17]. We need to
promote diversity of the base classifiers so that we can correctly and efficiently classify the
samples from the training data-set. Diversity refers to the fact that the base classifier should be
composed in such a way that it gives different outputs hence improving the classification
techniques. For this purpose, the fitness function of EUS is modified. But it is hard to promote
diversity as EUS is a preprocessing algorithm. So diversity among the solutions is considered
which assumes that the base classifiers which have learned from data-sets of different instances
show more diversity.When we analyse the algorithm of EUSBoost, EUS embedded in
AdaBoost.M2, we can find similarities with Boosting algorithm explained above as it provides a
base to the new algorithm [4].
6. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
22
4.2. Experimental Analysis
The following figure explains how the experiment was carried out. It includes the description of
the dataset used, the feature set that was used during the experiment, the experimental set up and
the experimental results.
Figure 1: Experimental analysis of EUSBoost Technique
In this study, a dataset of FNA slides has been considered with 100 images and with a resolution
of 96 dpi (dots per inch). The FNA slides were stained with Haematoxylin and Eosin technique.
7. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
23
After staining, the nuclei of the cells showed purple stain, cytoplasm showed pink stain and
blood cells showed dark orange and red stains. Now to differentiate the classes these slides were
diagnosed under two types of magnification, with low magnifying power of 100× and some
were diagnosed under high magnifying power of 400× magnification. The main idea behind
diagnosing them under two different magnifying power was to spot different features which
were available only at different magnifying powers. Some features, like structural differentiation
i.e. cells ability of forming groups, were successfully spotted at 100× magnification and some
features, like mitosis and pleomorphism, were spotted under 400× magnification. The extracted
features are summarized below:
Table 2: Extracted Features for Malignancy Grading
The features spotted under Structural differentiation, total number of groups formed, total
100× magnification area under the individual as well as collective groups and their
distribution in the slides, nuclei and its orientation
The features spotted under Mitotic count, Pleomorphism, textural features, momentum and
400× magnification histogram features like skew, energy and entropy.
5. RESULT ANALYSIS AND CONCLUSION
After extracting the features and collecting the feature dataset, the EUSBoost algorithm was
applied on a population size of 50 instances. The Geometric Mean (GM) was taken as the
evaluation measure, majority was taken as the majority type with true balancing and the total
number of iterations carried out were 10.As a result a Boosted classifier was gained which could
classify even the ambiguous instances in the proper classes very efficiently. It helped
maintaining the diversity of the base classifiers and hence provided a stability to the
classification function. It also benefitted the classification scheme by providing a confidence
level of the base classifiers in the weight update. The comparison of EUSBoost with other
techniques is shown below:
Table 3: Comparison of EUSBoost with other reference methods for fuzzy c-means color
segmentation
Hypothesis p-value p-value p-value (AUC)
(SEN) (GM)
EUB vs. BGG + (0.0031) + (0.0042) + (0.0072)
EUB vs. BST + (0.0034) + (0.0052) + (0.0053)
EUB vs. UNB + (0.0321) + (0.0180) + (0.0200)
EUB vs. RBB + (0.0367) + (0.0443) + (0.0400)
EUB vs. OVB + (0.0115) + (0.0128) + (0.0150)
EUB vs. RUB + (0.0350) + (0.0390) + (0.0400)
8. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
24
Table 4: Comparison of EUSBoost with other reference methods for grey-level quantization
segmentation
Hypothesis p-value p-value p-value
(SEN) (GM) (AUC)
EUB vs. BGG + (0.0075) + (0.0088) + (0.0076)
EUB vs. BST + (0.0095) + (0.0112) + (0.0110)
EUB vs. UNB + (0.0250) + (0.0232) + (0.0252)
EUB vs. RBB + (0.3484) + (0.3555) + (0.3512)
EUB vs. OVB + (0.0188) + (0.0194) + (0.0187)
EUB vs. RUB + (0.3488) + (0.3575) + (0.3517)
Table 5: Overall Comparison of EUSBoost with other techniques
Segmentation Measure BGG BST UNB RBB OVB RUB EUB
FCM SEN 71.70 72.8 88.4 89.98 84.66 89.8 92.19
GM 76.20 78.9 90.3 90.6 88.40 91.7 92.52
AUC 79 81 90.44 93.01 89 92.38 93.88
LS SEN 75.90 74.99 90.24 92.86 88.05 91.55 94.61
GM 80.31 82.44 92.8 94.37 90.8 93.5 95.71
AUC 81.05 83.2 93 95 90.98 95.09 96.38
GLQ SEN 70.47 72 84.99 90.27 83.92 90.27 90.27
GM 71.6 73.07 87.3 91 85 90.89 91.03
AUC 72.47 73.8 88 90.8 86.03 90.74 90.76
The difficulty is not only connected with the uneven distribution of the instances in the dataset
but also the difficulties present in the nature of the data used. By looking at the above table it
can be concluded that the standard approaches like Bagging and Boosting independently are
insufficient to deliver a satisfactory classification result. Hence the classification imbalance
problem can be solved with EUS method as it provides maximum efficient and satisfactory
result as compared to classical models when different segmentation techniques and measures are
used.
9. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
25
Also when we consider the learning paradigm, hybrid based approaches show poor performance
for all of the measured used. Some of the experiments have proved that hybrid based approaches
are competitive to other techniques but in reality, when we consider a decision support system,
these techniques show the worst results. This shows that the dataset used contains some difficult
samples which remains undetected for these learning classifiers. But EUS approach is able to
overcome this problem as it considers the diversity among the base classifiers. Hence. EUSBoost
stands superior to all other methods as it extends RUSBoost, which itself is an efficient method
for solving class imbalance problem. It also applies AdaBoost.M2 algorithm for the creation of
ensembles. It searches the most effective instances from the majority class and at the same time
maintains the diversity among the classifiers.
Hence EUSBoost can handle the imbalanced datasets for Breast cancer Malignancy grading very
efficiently. It is believed that the cancer at its early stage is very much hard to detect. But this
technique is able to assign malignancy grading and helps in detection of the cancer of any stage,
very importantly the cancer at the early stages.
Summarizing, it can be concluded that EUSBoost can be used to create a highly efficient and
highly accurate decision support system for breast cancer malignancy grading based on
ensemble classification.
ACKNOWLEDGEMENTS
I would like to take this opportunity to express my profound gratitude and deep regards to Dr.Iti
Mathur and Dr. Saurabh Mukherjee for their exemplary guidance and encouragement throughout
the work. I would also like to give my sincere gratitude to all my friends and colleagues.
REFERENCES
[1] H.J.G. Bloom, W.W. Richardson (1957), “Histological grading and prognosis in breast cancer”,
Br. J. Cancer, Vol.11, pp 359–377.
[2] R.W. Scarff, H. Torloni (1968), “Histological typing of breast tumors”, International histological
classification of tumours, World Health Organization, Vol. 2 (2), pp 13–20.
[3] Q. Yang, X. Wu (2006), “10 challenging problems in data mining research”, Int. Journal of
Information Technology, Vol. 5(4), pp 597–604.
[4] B. Krawczyk, M. Galar, L.Jelen, F.Herrera (2016), “Evolutionary undersampling boosting for
imbalanced classification of breast cancer malignancy”, Article in Applied soft computing,
Elsevier B.V., pp 1-14.
[5] H.D. Cheng, X. Cai, X. Chen, L. Hu, X. Lou (2003), “Computer-aided detection and
classification of microcalcifications in mammograms: a survey”, Pattern Recognition, Vol.36
(12), pp 2967–2991.
[6] G.H. Nguyen, A. Bouzerdoum, S.L. Phung (2007), “Learning pattern classification tasks with
imbalanced datasets”, Chapter 10, University of Wollongong, Australia, pp 193-210.
[7] J. Huang, C.X. Ling (2005), “Using AUC and accuracy in evaluating learning algorithms”, IEEE
Trans. Knowl. Data Eng., Vol. 17(3), pp 299–310.
[8] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera (2012), “A review on ensembles
for the class imbalance problem: bagging-, boosting-, and hybrid based approaches”, IEEE Trans.
Systems Man Cybern. C: Appl. Rev., Vol. 42 (4), pp 463–484.
[9] Y. Lin, Y. Lee, G. Wahba (2002), “Support vector machines for classification in nonstandard
situations in Machine Learning”, Vol. 46, pp 191–202.
[10] Available at https://www.researchgate.net/post/ class_imbalance_problem
[11] A. Fernández, S. García, M.J. Del Jesus, F. Herrera (2008), “A study of the behaviour of
linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets”,
Fuzzy Sets System, Vol. 159 (18), pp 2378–2398.
10. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT)
Vol. 8, No. 1, March 2017
26
[12] S. Zhang, L. Liu, X. Zhu, C. Zhang (2008), “A strategy for attributes selection in cost sensitive
decision trees induction”, IEEE 8th International Conference on Computer and Information
Technology Workshops, pp 8–13.
[13] A. Ali, S.M. Shamsuddin, A.L. Ralescu (2015), “Classification with class imbalance problem: A
Review”, International journal of applied soft computing, Vol. 7(3), pp 176-205.
[14] Y. Freund, R.E. Schapire (1997), “A decision-theoretic generalization of on-line learning and an
application to boosting “, J. Comput. Syst. Sci., Vol. 55 (1), pp 119–139.
[15] Cao, D.S., Xu, Q. S., Liang, Y.Z., Zhang, L.X., Li, H.-D. (2010), “The boosting: A new idea of
building models”, Chemometrics and Intelligent Laboratory Systems, Vol.4, pp 1-11.
[16] S. García, J. Derrac, J. Cano, F. Herrera(2012) , “Prototype selection for nearest neighbor
classification: taxonomy and empirical study”, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 34
(3), pp 417–435.
[17] García, F. Herrera (2009), “Evolutionary undersampling for classification with imbalanced
datasets: proposals and taxonomy”, Evol. Computing, Vol.17, pp 275–306.
Author
Deepti Ameta is currently pursuing M.Tech. in Computer Science from Banasthali
University, India. Her areas of interest include Machine Learning, Medical Image
Processing, Soft computing and Natural Language Processing.