This document summarizes a research paper that proposed a hybrid genetic algorithm and support vector machine (SVM) approach for breast cancer detection. The approach uses genetic algorithms to select the optimal features for input into an SVM classifier. It evaluated the approach on a breast cancer dataset containing 569 cases. The results found that the sequential minimal optimization (SMO) SVM algorithm with genetic algorithm feature selection achieved very high accuracy, recall, and F-measure - outperforming other classification algorithms. It detected breast cancer with 97.71% accuracy, demonstrating the robustness of the proposed hybrid method.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
This document summarizes a research paper that proposed a hybrid genetic algorithm and support vector machine (SVM) approach for breast cancer detection and feature selection. The proposed approach uses genetic algorithms to select the optimal features for an SVM classifier. It evaluated this approach on a breast cancer dataset and found that the sequential minimal optimization (SMO) SVM algorithm with genetic feature selection achieved very high accuracy, recall, and F-measure for classifying breast cancer compared to other classification algorithms. The genetic clustering algorithm was also able to accurately cluster the benign and malignant cases in the dataset within 38 seconds. In conclusion, the hybrid genetic algorithm and SVM approach provided an effective and accurate model for breast cancer detection and diagnosis.
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
The classification algorithms are very frequently used algorithms for analyzing various kinds of data available in different repositories which have real world applications. The main objective of this research work is to find the performance of classification algorithms in analyzing Breast Cancer data via analyzing the mammogram images based its characteristics.Different attribute values of cancer affected mammogram images are considered for analysis in this work. The Patients food habits, age of the patients, their life styles, occupation, their problem about the diseases and other information are taken into account for classification. Finally, performance of classification algorithms J48, CART and ADTree are given with its accuracy. The accuracy of taken algorithms is measured by various measures like specificity, sensitivity and kappa statistics (Errors).
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
A MODIFIED BINARY PSO BASED FEATURE SELECTION FOR AUTOMATIC LESION DETECTION ...ijcsit
This document summarizes a research paper that proposes a modified binary particle swarm optimization (MBPSO) method for feature selection to improve the classification of lesions in mammograms as benign or malignant. The MBPSO method incorporates iteration best into the velocity update calculation and selects solutions with fewer features when fitness values are equal. A total of 117 shape, texture, and histogram features are extracted from mammogram regions of interest. MBPSO is used to select an optimal feature subset, which is then used to train a neural network classifier. Experimental results show MBPSO obtains better classification accuracy than using the full feature set, and performs comparably to other feature selection techniques.
ABSTRACT
This paper presents an effective feature selection method that can be applied to build a computer aided diagnosis system for breast cancer in order to discriminate between healthy, benign and malignant parenchyma. Determining the optimal feature set from a large set of original features is an important preprocessing step which removes irrelevant and redundant features and thus improves computational efficiency, classification accuracy and also simplifies the classifier structure. A modified binary particle swarm optimized feature selection method (MBPSO)has been proposed where k-Nearest Neighbour algorithm with leave-one-out cross validation serves as the fitness function. Digital mammograms obtained from Regional Cancer Centre, Thiruvananthapuram and the mammograms from web accessible mini-MIAS database has been used as the dataset for this experiment. Region of interests from the mammograms are automatically detected and segmented. A total of 117 shape, texture and histogram features are extracted from the ROIs. Significant features are selected using the proposed feature selection method.Classification is performed using feed forward artificial neural networks with back propagation learning. Receiver operating characteristics (ROC) and confusion matrix are used to evaluate the performance. Experimental results show that the modified binary PSO feature selection method not only obtains better classification accuracy but also simplifies the classification process as compared to full set of features. The performance of the modified BPSO is found to be at par with other widely used feature selection techniques.
Logistic Regression Model for Predicting the Malignancy of Breast CancerIRJET Journal
1. The document describes using a logistic regression machine learning model to predict whether breast cancer is benign or malignant based on characteristics of breast cell samples.
2. The model was trained on a dataset of 569 samples described by 33 characteristics and achieved 94.94% accuracy on the training data and 92.10% accuracy on the test data.
3. The model provides a way to efficiently diagnose breast cancer and determine the appropriate treatment needed based on whether the cancer is predicted as benign or malignant.
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
Breast Cancer is one of the crucial and burning diseases that has invaded women. Predicting breast cancer manually takes a lot of time and it is difficult for the physician to classification. So, detecting cancer through various automatic diagnostic techniques is very necessary. Data mining is the process of running powerful classification techniques that extract useful information from data. The uses and potentials of these techniques have found its scope in medical data. Classification techniques tend to simplify the prediction segment.
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
This document describes a study that uses supervised machine learning algorithms to predict breast cancer. Three algorithms - decision tree, logistic regression, and random forest - are applied to preprocessed breast cancer data. The random forest model achieved the best accuracy at 98.6% for predicting whether a tumor was benign or malignant. The study aims to develop an early prediction system for breast cancer using machine learning techniques.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
This document summarizes a research paper that proposed a hybrid genetic algorithm and support vector machine (SVM) approach for breast cancer detection and feature selection. The proposed approach uses genetic algorithms to select the optimal features for an SVM classifier. It evaluated this approach on a breast cancer dataset and found that the sequential minimal optimization (SMO) SVM algorithm with genetic feature selection achieved very high accuracy, recall, and F-measure for classifying breast cancer compared to other classification algorithms. The genetic clustering algorithm was also able to accurately cluster the benign and malignant cases in the dataset within 38 seconds. In conclusion, the hybrid genetic algorithm and SVM approach provided an effective and accurate model for breast cancer detection and diagnosis.
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
The classification algorithms are very frequently used algorithms for analyzing various kinds of data available in different repositories which have real world applications. The main objective of this research work is to find the performance of classification algorithms in analyzing Breast Cancer data via analyzing the mammogram images based its characteristics.Different attribute values of cancer affected mammogram images are considered for analysis in this work. The Patients food habits, age of the patients, their life styles, occupation, their problem about the diseases and other information are taken into account for classification. Finally, performance of classification algorithms J48, CART and ADTree are given with its accuracy. The accuracy of taken algorithms is measured by various measures like specificity, sensitivity and kappa statistics (Errors).
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
A MODIFIED BINARY PSO BASED FEATURE SELECTION FOR AUTOMATIC LESION DETECTION ...ijcsit
This document summarizes a research paper that proposes a modified binary particle swarm optimization (MBPSO) method for feature selection to improve the classification of lesions in mammograms as benign or malignant. The MBPSO method incorporates iteration best into the velocity update calculation and selects solutions with fewer features when fitness values are equal. A total of 117 shape, texture, and histogram features are extracted from mammogram regions of interest. MBPSO is used to select an optimal feature subset, which is then used to train a neural network classifier. Experimental results show MBPSO obtains better classification accuracy than using the full feature set, and performs comparably to other feature selection techniques.
ABSTRACT
This paper presents an effective feature selection method that can be applied to build a computer aided diagnosis system for breast cancer in order to discriminate between healthy, benign and malignant parenchyma. Determining the optimal feature set from a large set of original features is an important preprocessing step which removes irrelevant and redundant features and thus improves computational efficiency, classification accuracy and also simplifies the classifier structure. A modified binary particle swarm optimized feature selection method (MBPSO)has been proposed where k-Nearest Neighbour algorithm with leave-one-out cross validation serves as the fitness function. Digital mammograms obtained from Regional Cancer Centre, Thiruvananthapuram and the mammograms from web accessible mini-MIAS database has been used as the dataset for this experiment. Region of interests from the mammograms are automatically detected and segmented. A total of 117 shape, texture and histogram features are extracted from the ROIs. Significant features are selected using the proposed feature selection method.Classification is performed using feed forward artificial neural networks with back propagation learning. Receiver operating characteristics (ROC) and confusion matrix are used to evaluate the performance. Experimental results show that the modified binary PSO feature selection method not only obtains better classification accuracy but also simplifies the classification process as compared to full set of features. The performance of the modified BPSO is found to be at par with other widely used feature selection techniques.
Logistic Regression Model for Predicting the Malignancy of Breast CancerIRJET Journal
1. The document describes using a logistic regression machine learning model to predict whether breast cancer is benign or malignant based on characteristics of breast cell samples.
2. The model was trained on a dataset of 569 samples described by 33 characteristics and achieved 94.94% accuracy on the training data and 92.10% accuracy on the test data.
3. The model provides a way to efficiently diagnose breast cancer and determine the appropriate treatment needed based on whether the cancer is predicted as benign or malignant.
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
Breast Cancer is one of the crucial and burning diseases that has invaded women. Predicting breast cancer manually takes a lot of time and it is difficult for the physician to classification. So, detecting cancer through various automatic diagnostic techniques is very necessary. Data mining is the process of running powerful classification techniques that extract useful information from data. The uses and potentials of these techniques have found its scope in medical data. Classification techniques tend to simplify the prediction segment.
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
This document describes a study that uses supervised machine learning algorithms to predict breast cancer. Three algorithms - decision tree, logistic regression, and random forest - are applied to preprocessed breast cancer data. The random forest model achieved the best accuracy at 98.6% for predicting whether a tumor was benign or malignant. The study aims to develop an early prediction system for breast cancer using machine learning techniques.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET Journal
This document describes a study that used machine learning algorithms to predict breast cancer. The researchers trained decision tree, logistic regression, and random forest models on a breast cancer dataset. They preprocessed the data, which included converting categorical variables to numeric. The random forest model achieved the best prediction accuracy of 98.6%. The study aims to help detect breast cancer at earlier stages to improve treatment outcomes.
The document presents a proposal for developing a hybrid machine learning model for prostate cancer detection. It aims to combine deep convolutional neural networks (DCNN) and fuzzy support vector machines (SVMs) to overcome limitations of individual models. The methodology section outlines the steps as: pre-processing data, extracting features using DCNN, training and testing the fuzzy SVM classifier, and evaluating performance. Key aspects of the DCNN and fuzzy SVM approaches are also summarized, such as convolutional and pooling layers, fully connected layers, and the SVM classification technique. The proposal seeks to improve prostate cancer detection accuracy through this hybrid modeling approach.
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET Journal
This document compares three machine learning techniques - Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) - for predicting breast cancer using a dataset of 198 patient records. It finds that SVM achieved the highest accuracy of 96.97% for classification, followed by RF at 96.45% and NB at 95.45%. SVM also had the highest recall rate at 0.97, indicating it was best at correctly identifying malignant tumors. While NB had the lowest precision of 0.92, meaning it incorrectly identified some benign cases as malignant, all three techniques showed high performance in predicting breast cancer.
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
The classification of different types of tumor is of great importance in cancer diagnosis and drug discovery.
Earlier studies on cancer classification have limited diagnostic ability. The recent development of DNA
microarray technology has made monitoring of thousands of gene expression simultaneously. By using this
abundance of gene expression data researchers are exploring the possibilities of cancer classification.
There are number of methods proposed with good results, but lot of issues still need to be addressed. This
paper present an overview of various cancer classification methods and evaluate these proposed methods
based on their classification accuracy, computational time and ability to reveal gene information. We have
also evaluated and introduced various proposed gene selection method. In this paper, several issues
related to cancer classification have also been discussed.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Predictive modeling for breast cancer based on machine learning algorithms an...IJECEIAES
Breast cancer is one of the leading causes of death among women worldwide. However, early prediction of breast cancer plays a crucial role. Therefore, strong needs exist for automatic accurate early prediction of breast cancer. In this paper, machine learning (ML) classifiers combined with features selection methods are used to build an intelligent tool for breast cancer prediction. The Wisconsin diagnostic breast cancer (WDBC) dataset is used to train and test the model. Classification algorithms, including support vector machine (SVM), light gradient boosting machine (LightGBM), random forest (RF), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes, were employed. Performance measures for each of them were obtained, namely: accuracy, precision, recall, F-score, Kappa, Matthews correlation coefficient (MCC), and time. The results indicate that without feature selection, LightGBM achieves the highest accuracy at 95%. With minimum redundancy maximum relevance (mRMR) feature selection (15 features), LightGBM outperforms other classifiers, achieving an accuracy of 98%. For Pearson correlation coefficient feature selection (15 features), LightGBM also excels with a 95% accuracy rate. Lasso feature selection (5 features) produces varied results across classifiers, with logistic regression achieving the highest accuracy at 96%. These findings underscore the importance of feature selection in refining model performance and in improving detection for breast cancer.
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the
developed and the developing world. Increased life expectancy, urbanization and adoption of western
lifestyles trigger the occurrence of breast cancer in the developing world. Most cancer events are
diagnosed in the late phases of the illness and so, early detection in order to improve breast cancer
outcome and survival is very crucial.
In this study, it is intended to contribute to the early diagnosis of breast cancer. An analysis on breast
cancer diagnoses for the patients is given. For the purpose, first of all, data about the patients whose
cancers’ have already been diagnosed is gathered and they are arranged, and then whether the other
patients are in trouble with breast cancer is tried to be predicted under cover of those data. Predictions of
the other patients are realized through seven different algorithms and the accuracies of those have been
given. The data about the patients have been taken from UCI Machine Learning Repository thanks to Dr.
William H. Wolberg from the University of Wisconsin Hospitals, Madison. During the prediction process,
RapidMiner 5.0 data mining tool is used to apply data mining with the desired algorithms.
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET Journal
This document proposes a conceptual method for classifying breast tumors as benign or malignant using SHAP values and Adaboost. It involves scoring ultrasound image features using the BI-RADS lexicon with human input. SHAP value mining is used to discover diagnostic rules from the scored features. Weak classifiers are constructed by combining benign and malignant rules. The Adaboost algorithm then integrates the weak classifiers into a strong classifier for tumor classification. Experimental results on 250 patient records show the proposed method achieves high accuracy, specificity, and sensitivity, indicating potential for clinical use.
The current big challenge facing radiologists in healthcare is the automatic detection and classification of masses in breast mammogram images. In the last few years, many researchers have proposed various solutions to this problem. These solutions are effectively dependent and work on annotated breast image data. But these solutions fail when applied to unlabeled and non-annotated breast image data. Therefore, this paper provides the solution to this problem with the help of a neural network that considers any kind of unlabeled data for its procedure. In this solution, the algorithm automatically extracts tumors in images using a segmentation approach, and after that, the features of the tumor are extracted for further processing. This approach used a double thresholding-based segmentation technique to obtain a perfect location of the tumor region, which was not possible in existing techniques in the literature. The experimental results also show that the proposed algorithm provides better accuracy compared to the accuracy of existing algorithms in the literature.
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...IRJET Journal
This document presents research on using the DenseNet169 deep learning model for cervical cancer detection. The researcher trained and tested the model on a large cervical cell image dataset from Kaggle. Through data preprocessing like augmentation and normalization, and transfer learning by fine-tuning a DenseNet pre-trained on ImageNet, the model achieved 95.27% accuracy in classifying five cervical cell types. Evaluation of the model showed high average precision, recall, and F1-score, demonstrating its ability to correctly classify different cervical cell images. The research highlights the potential of deep learning models for automating cervical cancer screening and improving early detection.
This document presents research on using machine learning algorithms to predict breast cancer. Six classification algorithms (logistic regression, decision tree, KNN, random forest, SVM, naive Bayes) are tested on a breast cancer dataset containing 569 cases. The algorithms are trained on 80% of the data and tested on the remaining 20%. Logistic regression and random forest achieved the highest testing accuracy of 96.49%. The results indicate that machine learning algorithms can accurately predict breast cancer and may help pathologists in diagnosis.
Breast Cancer Detection Using Machine LearningIRJET Journal
The document proposes using machine learning classifiers like k-Nearest Neighbors (KNN) and Naive Bayes to classify breast cancer as benign or malignant based on features in the Wisconsin Breast Cancer Dataset, and finds that the KNN algorithm achieved 97.13% accuracy in classification; it also describes developing a web application with doctor and patient login for entering patient details and viewing cancer reports and classification results.
Skin Lesion Classification using Supervised Algorithm in Data Miningijtsrd
Skin cancer is one of the major types of cancers with an increasing incidence over the past decades. Accurately diagnosing skin lesions to discriminate between benign and skin lesions is crucial.J48 Algorithm and SVM SUPPORT VECTOR MACHINE based techniques to estimate effort. In this work proposed system of the project is using data mining techniques for collecting the datasets for skin cancer. So that system can overcome to diagnosing the disease quickly and accuracy. Comparing to other algorithm proposed algorithm has more accuracy. When we have to using two kind of algorithm .They are J48, SVM. J48 Algorithm produced better accuracy more than SVM algorithm. The accuracy of the proposed system is 90.2381 . It means this prediction is very close to the actual values. G. Saranya | Dr. S. M. Uma "Skin Lesion Classification using Supervised Algorithm in Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29346.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29346/skin-lesion-classification-using-supervised-algorithm-in-data-mining/g-saranya
Breast Cancer Prediction using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to predict breast cancer from patient data and imaging results. It first provides background on breast cancer, noting it is the most commonly diagnosed cancer worldwide. The document then reviews prior works applying machine learning to breast cancer prediction, finding support vector machines achieved the highest accuracy. It describes the dataset used, from the University of Wisconsin, containing patient data and tumor characteristics. Finally, it explores the data and discusses implementing classification algorithms like logistic regression, support vector machines, random forests and neural networks to predict cancer type, finding logistic regression achieved the highest accuracy of 98.24%.
Breast cancer detection using machine learning approaches: a comparative studyIJECEIAES
As the cause of the breast cancer disease has not yet clearly identified and a method to prevent its occurrence has not yet been developed, its early detection has a significant role in enhancing survival rate. In fact, artificial intelligent approaches have been playing an important role to enhance the diagnosis process of breast cancer. This work has selected eight classification models that are mostly used to predict breast cancer to be under investigation. These classifiers include single and ensemble classifiers. A trusted dataset has been enhanced by applying five different feature selection methods to pick up only weighted features and to neglect others. Accordingly, a dataset of only 17 features has been developed. Based on our experimental work, three classifiers, multi-layer perceptron (MLP), support vector machine (SVM) and stack are competing with each other by attaining high classification accuracy compared to others. However, SVM is ranked on the top by obtaining an accuracy of 97.7% with classification errors of 0.029 false negative (FN) and 0.019 false positive (FP). Therefore, it is noteworthy to mention that SVM is the best classifier and it outperforms even the stack classier.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET Journal
This document describes a study that used machine learning algorithms to predict breast cancer. The researchers trained decision tree, logistic regression, and random forest models on a breast cancer dataset. They preprocessed the data, which included converting categorical variables to numeric. The random forest model achieved the best prediction accuracy of 98.6%. The study aims to help detect breast cancer at earlier stages to improve treatment outcomes.
The document presents a proposal for developing a hybrid machine learning model for prostate cancer detection. It aims to combine deep convolutional neural networks (DCNN) and fuzzy support vector machines (SVMs) to overcome limitations of individual models. The methodology section outlines the steps as: pre-processing data, extracting features using DCNN, training and testing the fuzzy SVM classifier, and evaluating performance. Key aspects of the DCNN and fuzzy SVM approaches are also summarized, such as convolutional and pooling layers, fully connected layers, and the SVM classification technique. The proposal seeks to improve prostate cancer detection accuracy through this hybrid modeling approach.
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET Journal
This document compares three machine learning techniques - Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) - for predicting breast cancer using a dataset of 198 patient records. It finds that SVM achieved the highest accuracy of 96.97% for classification, followed by RF at 96.45% and NB at 95.45%. SVM also had the highest recall rate at 0.97, indicating it was best at correctly identifying malignant tumors. While NB had the lowest precision of 0.92, meaning it incorrectly identified some benign cases as malignant, all three techniques showed high performance in predicting breast cancer.
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...mlaij
The classification of different types of tumor is of great importance in cancer diagnosis and drug discovery.
Earlier studies on cancer classification have limited diagnostic ability. The recent development of DNA
microarray technology has made monitoring of thousands of gene expression simultaneously. By using this
abundance of gene expression data researchers are exploring the possibilities of cancer classification.
There are number of methods proposed with good results, but lot of issues still need to be addressed. This
paper present an overview of various cancer classification methods and evaluate these proposed methods
based on their classification accuracy, computational time and ability to reveal gene information. We have
also evaluated and introduced various proposed gene selection method. In this paper, several issues
related to cancer classification have also been discussed.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Predictive modeling for breast cancer based on machine learning algorithms an...IJECEIAES
Breast cancer is one of the leading causes of death among women worldwide. However, early prediction of breast cancer plays a crucial role. Therefore, strong needs exist for automatic accurate early prediction of breast cancer. In this paper, machine learning (ML) classifiers combined with features selection methods are used to build an intelligent tool for breast cancer prediction. The Wisconsin diagnostic breast cancer (WDBC) dataset is used to train and test the model. Classification algorithms, including support vector machine (SVM), light gradient boosting machine (LightGBM), random forest (RF), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes, were employed. Performance measures for each of them were obtained, namely: accuracy, precision, recall, F-score, Kappa, Matthews correlation coefficient (MCC), and time. The results indicate that without feature selection, LightGBM achieves the highest accuracy at 95%. With minimum redundancy maximum relevance (mRMR) feature selection (15 features), LightGBM outperforms other classifiers, achieving an accuracy of 98%. For Pearson correlation coefficient feature selection (15 features), LightGBM also excels with a 95% accuracy rate. Lasso feature selection (5 features) produces varied results across classifiers, with logistic regression achieving the highest accuracy at 96%. These findings underscore the importance of feature selection in refining model performance and in improving detection for breast cancer.
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
According to World Health Organization (WHO), breast cancer is the top cancer in women both in the
developed and the developing world. Increased life expectancy, urbanization and adoption of western
lifestyles trigger the occurrence of breast cancer in the developing world. Most cancer events are
diagnosed in the late phases of the illness and so, early detection in order to improve breast cancer
outcome and survival is very crucial.
In this study, it is intended to contribute to the early diagnosis of breast cancer. An analysis on breast
cancer diagnoses for the patients is given. For the purpose, first of all, data about the patients whose
cancers’ have already been diagnosed is gathered and they are arranged, and then whether the other
patients are in trouble with breast cancer is tried to be predicted under cover of those data. Predictions of
the other patients are realized through seven different algorithms and the accuracies of those have been
given. The data about the patients have been taken from UCI Machine Learning Repository thanks to Dr.
William H. Wolberg from the University of Wisconsin Hospitals, Madison. During the prediction process,
RapidMiner 5.0 data mining tool is used to apply data mining with the desired algorithms.
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET Journal
This document proposes a conceptual method for classifying breast tumors as benign or malignant using SHAP values and Adaboost. It involves scoring ultrasound image features using the BI-RADS lexicon with human input. SHAP value mining is used to discover diagnostic rules from the scored features. Weak classifiers are constructed by combining benign and malignant rules. The Adaboost algorithm then integrates the weak classifiers into a strong classifier for tumor classification. Experimental results on 250 patient records show the proposed method achieves high accuracy, specificity, and sensitivity, indicating potential for clinical use.
The current big challenge facing radiologists in healthcare is the automatic detection and classification of masses in breast mammogram images. In the last few years, many researchers have proposed various solutions to this problem. These solutions are effectively dependent and work on annotated breast image data. But these solutions fail when applied to unlabeled and non-annotated breast image data. Therefore, this paper provides the solution to this problem with the help of a neural network that considers any kind of unlabeled data for its procedure. In this solution, the algorithm automatically extracts tumors in images using a segmentation approach, and after that, the features of the tumor are extracted for further processing. This approach used a double thresholding-based segmentation technique to obtain a perfect location of the tumor region, which was not possible in existing techniques in the literature. The experimental results also show that the proposed algorithm provides better accuracy compared to the accuracy of existing algorithms in the literature.
Cervical Cancer Detection: An Enhanced Approach through Transfer Learning and...IRJET Journal
This document presents research on using the DenseNet169 deep learning model for cervical cancer detection. The researcher trained and tested the model on a large cervical cell image dataset from Kaggle. Through data preprocessing like augmentation and normalization, and transfer learning by fine-tuning a DenseNet pre-trained on ImageNet, the model achieved 95.27% accuracy in classifying five cervical cell types. Evaluation of the model showed high average precision, recall, and F1-score, demonstrating its ability to correctly classify different cervical cell images. The research highlights the potential of deep learning models for automating cervical cancer screening and improving early detection.
This document presents research on using machine learning algorithms to predict breast cancer. Six classification algorithms (logistic regression, decision tree, KNN, random forest, SVM, naive Bayes) are tested on a breast cancer dataset containing 569 cases. The algorithms are trained on 80% of the data and tested on the remaining 20%. Logistic regression and random forest achieved the highest testing accuracy of 96.49%. The results indicate that machine learning algorithms can accurately predict breast cancer and may help pathologists in diagnosis.
Breast Cancer Detection Using Machine LearningIRJET Journal
The document proposes using machine learning classifiers like k-Nearest Neighbors (KNN) and Naive Bayes to classify breast cancer as benign or malignant based on features in the Wisconsin Breast Cancer Dataset, and finds that the KNN algorithm achieved 97.13% accuracy in classification; it also describes developing a web application with doctor and patient login for entering patient details and viewing cancer reports and classification results.
Skin Lesion Classification using Supervised Algorithm in Data Miningijtsrd
Skin cancer is one of the major types of cancers with an increasing incidence over the past decades. Accurately diagnosing skin lesions to discriminate between benign and skin lesions is crucial.J48 Algorithm and SVM SUPPORT VECTOR MACHINE based techniques to estimate effort. In this work proposed system of the project is using data mining techniques for collecting the datasets for skin cancer. So that system can overcome to diagnosing the disease quickly and accuracy. Comparing to other algorithm proposed algorithm has more accuracy. When we have to using two kind of algorithm .They are J48, SVM. J48 Algorithm produced better accuracy more than SVM algorithm. The accuracy of the proposed system is 90.2381 . It means this prediction is very close to the actual values. G. Saranya | Dr. S. M. Uma "Skin Lesion Classification using Supervised Algorithm in Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29346.pdf Paper URL: https://www.ijtsrd.com/computer-science/data-miining/29346/skin-lesion-classification-using-supervised-algorithm-in-data-mining/g-saranya
Breast Cancer Prediction using Machine LearningIRJET Journal
This document discusses using machine learning algorithms to predict breast cancer from patient data and imaging results. It first provides background on breast cancer, noting it is the most commonly diagnosed cancer worldwide. The document then reviews prior works applying machine learning to breast cancer prediction, finding support vector machines achieved the highest accuracy. It describes the dataset used, from the University of Wisconsin, containing patient data and tumor characteristics. Finally, it explores the data and discusses implementing classification algorithms like logistic regression, support vector machines, random forests and neural networks to predict cancer type, finding logistic regression achieved the highest accuracy of 98.24%.
Breast cancer detection using machine learning approaches: a comparative studyIJECEIAES
As the cause of the breast cancer disease has not yet clearly identified and a method to prevent its occurrence has not yet been developed, its early detection has a significant role in enhancing survival rate. In fact, artificial intelligent approaches have been playing an important role to enhance the diagnosis process of breast cancer. This work has selected eight classification models that are mostly used to predict breast cancer to be under investigation. These classifiers include single and ensemble classifiers. A trusted dataset has been enhanced by applying five different feature selection methods to pick up only weighted features and to neglect others. Accordingly, a dataset of only 17 features has been developed. Based on our experimental work, three classifiers, multi-layer perceptron (MLP), support vector machine (SVM) and stack are competing with each other by attaining high classification accuracy compared to others. However, SVM is ranked on the top by obtaining an accuracy of 97.7% with classification errors of 0.029 false negative (FN) and 0.019 false positive (FP). Therefore, it is noteworthy to mention that SVM is the best classifier and it outperforms even the stack classier.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
Similar to SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION (20)
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
1. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
DOI :10.5121/ijscai.2020.9401 1
SVM &GA-CLUSTERING BASED FEATURE
SELECTION APPROACH FOR BREAST
CANCER DETECTION
Rashmi Priya1
and Syed Wajahat Abbas Rizvi2
1
Assistant Professor GD Goenka University ,India
2
Amity University, Uttar Pradesh ,India
ABSTRACT
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second
most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast
cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data
processing tools, we tackled this disease analysis. Data mining is an important step of library discovery
where intelligent methods are used to detect patterns. Several clinical breast cancer studies were
conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier,
easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best
features and parameter values. Data mining is an important step of library discovery where intelligent
methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A
comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F.
Tree processes, i.e. 97.71%.
KEYWORDS
S.M.O., Breast cancer, Machine learning, Feature selection, and WEKA
1. INTRODUCTION
Breast cancer is the most prevalent non-skin cancer in women and the second-largest cause of
cancer mortality in women. [1]. [1]. "Mammography usually represents thick-area breast cancer
and clusters. A normal benign mass has a circular border, circumscribed and round, but malignant
cancer usually has a suspected raw and fuzzy boundary. "[2],[3].
Nowadays, demand for machine learning grows until it becomes an operation. Sadly, machine
learning also takes skills and is a field with very high barriers. The creation of a successful
machine learning model involves many skills and experience, including pre-processing steps,
feature selection, and classification processes. Using computer analysis and machine learning
methods in medical fields is prevalent, as these techniques can be regarded as a great aid in
decision-making processes for medical professionals. A vast variety of databases are being used
for breast cancer cases that are helpful in supporting scientific and academic studies and even
more in integrating previous field computer analysis and machine learning.
2. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
2
In certain fields and implementations, solving such a problem is based on retrieving features from
the original images obtained in the physical world and organised as vectors. The processing
system 's efficiency depends heavily on the correct choice of these vectors. But, in many
situations, problem solving becomes almost impossible due to the excessive dimensionality of
these vectors or anomalies that may exist in the results. Therefore, reducing the size of the dataset
samples to a more suitable size is sometimes helpful and often necessary, even though this
reduction may lead to minor information loss.
Accurate and accurate data will be gathered to help the doctor's early diagnosis and treatment of
illness, both stable and malignant, using an exact model. This will save doctors time and improve
efficiency. This article reflects on how benign or malignant breast cancer diagnosis is, also at an
early stage. Forecast criteria are based on breast cancer symptoms. This paper's data set contains
32 attributes.
A breast cancer diagnosis may be useful in predicting the outcomes of complex diseases or
recognising the molecular nature of the tumour. Many methods for examining and identifying
trends of breast cancer. This paper compares empirically the utility of three classical tree
classifiers designed to specifically evaluate their effects. Traditional SVM preparation normally
requires Q.P. Set, and it takes longer to solve Q.P. Optimization problem, particularly for large
dataset problems. SVM planning is slow, and the creation of large data sets takes time. The
SMO-SVM method minimises data, is more reliable and easier to implement[4]. We suggest a
hybrid SVM-GA (Support vector machines with genetic attributes) approach to achieve optimal
results on Dataset breast cancer.
2. LITERATURE SURVEY
Machine learning techniques usually refer to life scientific analysis. Numerous research focused
on medical diagnostic technology has been written. These studies have applied different solutions
to the problem and achieved detailed classification precision[5] using an artificial cortical
network to assess breast cancer therapy. They have tested their system on a limited set of data,
but results suggest they understand actual survival. And al.[6] Also in breast cancer patients, a
naïve bay, decision tree, and neural backpropagation network were used.
Though the results were high (about 90% accuracy), they were not appropriate because the data
were split into two groups: one for more than five years of life and the other for those who died in
five years. Findings became meaningless. [7] Program pick approach to usability assessment of
feature selectors. This seeks a simple, coherent set of functions without losing the predictive
precision dimension. Using a ranking algorithm applies confidence to characteristics. [8]
Proposed hybrid GA / SVM approach using fuzzy logic to minimise the initial problem size.
Identify a sub-set of balanced genes, which are then checked by SVM. [9] This analysis aimed to
compare the performance of the Artificial Neural Network.
(ANN) and Vector Machine Support (SVM) for liver cancer classification. On BUPA Liver
Disease Dataset, both model accuracy, durability, precision, and performance were contrasted
and validated. Curved Field (A.U.C.). [10] used in tandem with heart disease modelling in mining
and genetic algorithms. The proposed approach used Gini genetic mutation index statistics for
interface process and crossover. They used a technique for gathering consistency functions.
3. CLASSIFICATION
Classification is one of the data mining techniques specifically used to analyse and assign a given
dataset to a particular class[11]. This method is designed to remove classification errors.
3. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
3
Classification enables model extraction that determines the forms for a given data set. Three
different testing approaches are used in information processing: guided and unrestricted study.
[12] [12]. Classification is a first step in evaluating many situations. The overarching aim is to
enhance market comprehension or forecasts. Analysis has proposed many common ways of
classification techniques. Data mining 's central operation is designing accurate classification
systems. A vital mission for data processing and machine learning research. Like decision-
making trees, naive- Bayesian systems, serial minB.B.F.um optimization (S.M.O.), I.B.K., B.F.
Chain modes of grouping methods, etc.
4. FEATURE SELECTION
Increased use of computers from both directions contributes to comprehensive data processing.
This data are large, systematically interconnected data to identify acceptable patterns, making
data mining a crucial area for data processing, prediction, and other activities. It has joined a
complex science area to address real- time theoretical problems. Big data mining is used in many
areas where data analysis is needed. These data mining and creation techniques were commonly
used at various levels, such as pattern recognition, etc., and collecting apps plays an important
role in virtually every field.
The collection attempts to assess the smallest possible subset of characteristics. The architecture
selects the foundation of original features by removing redundant and unused dimensionality
features without losing information. Until data-mining activities are implemented, the pre-
processing step is critical. Mining accuracy, measurement time, and test comprehension are
improved. Three filtering strategies comprises of philtres, wrappers and embedded approaches.
As discussed in [16], the Filter selects the function without the classifier type used. The
advantage of this method is that it is only necessary to select features once, it is simple and
irrespective of the classifier used. This procedure, however, lacks the classifier relationship; each
feature is interpreted separately from functional dependence. Wrapper's approach depends on
classification. Classifier results are used to assess the goodness of the specified feature or
attribute. Another method has the benefit that the filtering cycle eliminates the downside, which
is simpler than the filtering system as it still takes all the dependencies. The next embedded
approach is to combine a philtre algorithm with a wrapper approach to find an optimal sub-set in
the classification structure. This method 's advantage is less expensive and less vulnerable than
wrapper approach. Different feature selection applications over the past two decades:
• Text mining
• Image processing and computer vision
• Industrial applications
• Bioinformatics
4. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
4
5. MATERIAL AND METHODS
Weka is a data mining application that uses algorithms. Such algorithms can be directly applied
to the data or code-named Java.
"Weka's a platform for:
Regression.
Clusters.
Association.
Pre-processing data.
Classification Rating.
Visualization.”[13]
We have Weka classifiers to approximate measurable or numerical quantities. Decision and lists
are available for learning systems, vector-support machines, case-dependent classifiers, technical
regression, and Bayes networks. If loaded, all tabs are allowed. The best algorithm for basic data
representation can be found for parameters, tests and errors.
The tab Cluster aims to identify clusters or event groups in data set. Clustering provides the
customer details for analysis. Clustering uses the training set, percentage division, test set, and
classes issued, for which users can ignore other requirement-based data set attributes. "K-Means,
EM, Cobweb, X-means and Farthest First in Weka."
6. PROPOSED METHOD:
6.1. Sequential Minimal Optimization (S.M.O.)
The new teaching algorithm supporting vector machines ( SVMs) is minimal sequential
optimization. In 1998, John Platt developed it for minimal sequential optimization (S.M.O.)[14],
5. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
5
a simple and easy approach for SVM preparation. The underlying theory is to solve the dual
quadratic optimization by maximising the complete two-component sub-set.
S.M.O. 's advantage is the simple and clear introduction. A broad topic of quadratic programming
is understanding a vector-supporting computer.
"S.M.O. distinguishes this core quadratic programming problem in a number of small potentials.
Such a wide square
Analytically solved problem method, which avoids the use of time-consuming quadratic
numerical programming as an internal loop. S.M.O. has a continuous memory feature, allowing
S.M.O. to do intense exercises. Since matrix calculation is avoided, standard S.M.O. scales for
different testing problems like linear and quadratic chunking SVM algorithm scales like linear
and cubic. S.M.O. time is regulated by SVM evaluation; S.M.O. is also the fastest for linear,
small data sets.
7. FRAMEWORK
Fig.3 provides an illustrative outline of SVM and gene-dependent clustering process. Below is a
genetic clustering enhanced SVM process algorithm.
Fig. 3. Proposed Framework of S.M.O. classification with genetic algorithm
Select initial variables indexed by SVM philtre. Output dependent on adaptation, precision, estimation,
memory, accuracy. In the data mining industry, these metric metrics played a crucial role in assessing the
effects of different classifications and directing the algorithms as shown in Table I
6. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
6
Classification efficiency is calculated in the tables below.
7.1. The Proposed Hybrid Feature Selection Algorithm
Within this section, as our observations indicate, we identify a primary genetic clustering S.M.O.
algorithm. Several operations must be determined before using genetic algorithm. It is the first
population, fitness, selection, and crossover.
STEP 1: Population initialization: randomly generated initial population. A random function (g)
generates a random number in [0 , 1] for each element. If the number exceeds a threshold, g=1.
Otherwise, g=0. For example, the threshold value may be 0.5 to equate 1 or 0.
7. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
7
STEP 2: Minimal sequential vector recognition interface aid preparation.
STEP 3: It extracts all missing properties globally and converts them into binary attributes. This
functions are also simply standardised. (This is important in the study of output parameters based
on structured data, not original data.
STEP 4: The genetic algorithm integrates genetic algorithm K-means into a modern statistical
clustering method.
8. RESULTS AND DISCUSSION
In our opinion, genetic algorithms are original methods that can be used for the extraction of
functions. They suggest a system for the evaluation of individuals based on the combination of
SVM[15] classifiers that have been qualified for each function. More specifically, we associate
an SVM classifier with each primitive and implement a selection of classifiers. In the method we
propose, classifier learning is Performed one step before each primitive genetic algorithm. The
genetic algorithm fitness function is calculated by a mixture of these classes. We significantly
reduce the training period for the that classifier. Therefore, the proposed selection approach
significantly reduces implementation time. This dataset includes569 reports of breast cancer
cases, 357 of whom were healthy, and 212 of whom were malignant. Class name and 32 function
number are identical to the brain or malignant type of breast cancer. This functions are Measured
by visual representation of the breast mass aspirates needle (F.N.A.) representing the cell nucleus
characteristics in the photo.
8.1. S.M.O. Based SVM Algorithm Results
We run 5 K folded philtre output S.M.O. Classifier. Various other algorithms were contrasted
with our system and can be seen below:
Where, B = Benign, M = Malignant.
It can be shown that S.M.O. algorithm worked better in terms of Accuracy, Recall, F-measure,
and R.O.C. field than other algorithms.
8. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
8
8.2. Genetic Clustering Algorithm Results
Founded by [15]. Standard Manhattan Distance class is used for outcome similarity calculations.
Easy K-Means incorporates the basic missing qualities. If a mechanism creates a chromosome
with all the records of one cluster, chromosomes are changed until at least two
Here, the algorithm took 38.81 seconds to run.
Ultimately, it was found that the overall wrongly clustered instances is 36, 6.32 percent of the
entire data collection. Visualization of cluster forming can be seen in the figure below indicating
cluster formation selection.
9. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
9
Blue in the database shows malignant examples, and red in the Map indicates Benign cases. The
plot reveals that clustering was successful.
9. CONCLUSION
This essay presents a new approach to feature collection. This selection approach is based on
genetic algorithms combined with 5k fold-cross-validation SVM classification. Our system was
evaluated on the Wisconsin dataset for breast cancer with several other forms classifier. We
obtained very high accuracy results in grouping and quick results in clustering Breast-cancer
diagnosis. This method not only provides a more accurate model estimation utility estimation, but
also helps to assess the best ML algorithms parameters. Results show the suggested method's
robustness. Our goal is to launch the G.A. immediately. Analysis to ensure the required balance
of consistency and time of diverse datasets from other fields of inquiry.
REFERENCES
1. E.C. E.C. Fear, P.M. Meaney, and M.A.Stuchly,"Breast Cancer Detection Microwaves, "IEEE
Potentials, vol.22, pp.12-18,February-March 2003.
10. International Journal on Soft Computing, Artificial Intelligence and Applications (IJSCAI), Vol.9, No.4,
November 2020
10
2. Homer MJ.Mammographic: A realistic solution. Boston, MA, second edition, 1997.
3. Reston VA, Illustrated Breast Imaging Reporting and Data System (BI-RADSTM), third edition,
1998.
4. Urmaliya, Ajay, Singhai. (2013). (2013). Minimal sequential optimization to help vector machine for
breast cancer diagnosis function collection. 2013 IEEE 2nd International Image Processing
Conference, IEEE ICIIP 2013. 481–486. ICIIP.2013. 6707638 10.1109/2013.
5. Bittern, R., Dolgobrodov, D., Marshall, R., Moore, P., Steele, R. Artificial neural cancer networks.
All Hands Conference 19 (2007), pp.251 – 263.
6. Bellaachia, A., AND Guven, E. Estimating breast cancer longevity through data mining.
7. Afef Ben Brahim, Mohamed Limam, 2017, "Ensemble feature selection for high-dimensional data: a
new approach and comparative analysis," IFCS, vol. 12(4), 937-952
8. Huerta, E.B., 2006, "A Hybrid GA / SVM method for gene discovery and microarray data
classification," springer, vol.3907 pp.34–44
9. Sallehuddin, R.N.H.aidillah, S.H., Mustaffa, N.H., 2014, "Liver cancer detection using artificial
neural networks and supporting vector machines" In: Proceedings of the International Conference on
Progress in Communication Network, and Computation, Elsevier Research, pp 487-493.
10. Jabbar, M.A., Deekshatulu, B.L., Chandra, 2012, "Heart disease prediction method using associative
and genetic algorithm."
11. S. S. Nikam, "A Comparative Study of Data Mining Algorithms Classification Strategies," Oriental
Computer Science & Technology Journal, vol.8, pp.13-19, April 2015.
12. S. Neelamegam, E.Ramaraj, "International Journal of P2P Network Developments and Technology
(IJPTT), vol. 4, pp. 369-374, September 2013.
13. Eibe Frank, Mark A. Hall, Witten (2016). Workbench WEKA. "Data Mining: Functional Machine
Learning Methods and Techniques," Morgan Kaufmann, Fourth Edition, 2016.
14. Fourteen. "Platt, J.C.: Minimum sequential optimization: quick algorithm for training vector
machines. MSR-TR-98- 14, Microsoft Research, 1998.
15. Fifteen. 'Islam, M. Z., Estivill-Castro, V., Rahman, M. A. (From 2018). Combining K-Means and a
Genetic Algorithm into a Novel Method for High Quality Clustering. Application-expert systems.
91:402-417 AM.
16. Sangaiah, Vincent Antony Kumar, A. (2019). (2019). Improving the efficiency of medical diagnosis
using hybrid function selection by reliefs and entropy-based genetic search (RF-EGA) approach:
breast cancer prediction. Computing array. 22. 22. 10.1007 / s10586-1702-5.