This study aims to predict Melasma based on users' data combined with medical practice data community by dermatologists to predict the disease and make some necessary recommendations in the patient screening. This study also helps reduce treatment costs and supports remote patient treatment. In this study, we built a machine learning model to assist dermatologists in predicting a person's risk of Melasma after entering his/her community
information. People can use this model through an application to track their risk of Melasma. Combining input community data with the expertise of Melasma
specialists, we built a dataset with relevant information to predict Melasma. Based on this dataset, we have statistically described the data characteristics as well as the correlated data parameters that may cause Melasma, then we use the XGBoost algorithm to build a machine learning model to predict whether a person is infected to Melasma or not. The obtained results are going to be applied to assist in predicting whether a person may have Melasma with the input of
community information combined with medical practice knowledge about the disease. From this result, it is possible to continue researching and applying artificial intelligence to support diagnosis and treatment of Melasma.
Breast cancer detection using machine learning approaches: a comparative studyIJECEIAES
As the cause of the breast cancer disease has not yet clearly identified and a method to prevent its occurrence has not yet been developed, its early detection has a significant role in enhancing survival rate. In fact, artificial intelligent approaches have been playing an important role to enhance the diagnosis process of breast cancer. This work has selected eight classification models that are mostly used to predict breast cancer to be under investigation. These classifiers include single and ensemble classifiers. A trusted dataset has been enhanced by applying five different feature selection methods to pick up only weighted features and to neglect others. Accordingly, a dataset of only 17 features has been developed. Based on our experimental work, three classifiers, multi-layer perceptron (MLP), support vector machine (SVM) and stack are competing with each other by attaining high classification accuracy compared to others. However, SVM is ranked on the top by obtaining an accuracy of 97.7% with classification errors of 0.029 false negative (FN) and 0.019 false positive (FP). Therefore, it is noteworthy to mention that SVM is the best classifier and it outperforms even the stack classier.
This document describes an advanced machine learning approach for predicting skin cancer. It discusses using machine learning algorithms like Naive Bayes, Decision Tree, Random Forest on a dataset to estimate disease risk and determine algorithm accuracy. The paper focuses on developing a system that integrates symptom and medical data using machine learning algorithms like K-means to provide accurate disease predictions.
An approach of cervical cancer diagnosis using class weighting and oversampli...TELKOMNIKA JOURNAL
Globally, cervical cancer caused 604,127 new cases and 341,831 deaths in 2020, according to the global cancer observatory. In addition, the number of cervical cancer patients who have no symptoms has grown recently. Therefore, giving patients early notice of the possibility of cervical cancer is a useful task since it would enable them to have a clear understanding of their health state. The use of artificial intelligence (AI), particularly in machine learning, in this work is continually uncovering cervical cancer. With the help of a logit model and a new deep learning technique, we hope to identify cervical cancer using patient-provided data. For better outcomes, we employ Keras deep learning and its technique, which includes class weighting and oversampling. In comparison to the actual diagnostic result, the experimental result with model accuracy is 94.18%, and it also demonstrates a successful logit model cervical cancer prediction.
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...IJECEIAES
Machine learning methods have been an interesting method in the field of medical for many years, and they have achieved successful results in various fields of medical science. This paper examines the effects of using machine learning algorithms in the diagnosis and classification of breast cancer from mammography imaging data. Cancer diagnosis is the identification of images as cancer or non-cancer, and this involves image preprocessing, feature extraction, classification, and performance analysis. This article studied 93 different references mentioned in the previous years in the field of processing and tries to find an effective way to diagnose and classify breast cancer. Based on the results of this research, it can be concluded that most of today’s successful methods focus on the use of deep learning methods. Finding a new method requires an overview of existing methods in the field of deep learning methods in order to make a comparison and case study.
IRJET- Cancer Disease Prediction using Machine Learning over Big DataIRJET Journal
1. The document discusses using machine learning algorithms to predict cancer and other diseases by analyzing big healthcare data. It specifically looks at using support vector machines (SVM) for cancer prediction and classification.
2. SVM is presented as a powerful machine learning tool for cancer classification and identification of biomarkers, drug targets, and cancer-driving genes. The paper also examines applying machine learning algorithms like SVM to predict outbreaks of chronic diseases in populations using medical data.
3. The researchers aim to improve disease prediction accuracy by addressing issues with incomplete or inconsistent medical data from different regions. They also seek to enable early diagnosis and treatment by analyzing large healthcare datasets with machine learning.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
This document describes a study that uses machine learning and deep learning techniques to detect breast tumors using a publicly available dataset. Various classification models were developed including logistic regression, decision trees, random forests, support vector machines, gradient boosting, extreme gradient boosting, LightGBM, and a recurrent neural network. These models were trained on features from the dataset to classify breast tumor cases as benign or malignant. The models' performance was evaluated using various metrics like accuracy, precision, recall, and F1 score. Ensemble techniques like bagging and boosting were also explored to improve the models' performance at detecting breast cancer.
Breast cancer detection using machine learning approaches: a comparative studyIJECEIAES
As the cause of the breast cancer disease has not yet clearly identified and a method to prevent its occurrence has not yet been developed, its early detection has a significant role in enhancing survival rate. In fact, artificial intelligent approaches have been playing an important role to enhance the diagnosis process of breast cancer. This work has selected eight classification models that are mostly used to predict breast cancer to be under investigation. These classifiers include single and ensemble classifiers. A trusted dataset has been enhanced by applying five different feature selection methods to pick up only weighted features and to neglect others. Accordingly, a dataset of only 17 features has been developed. Based on our experimental work, three classifiers, multi-layer perceptron (MLP), support vector machine (SVM) and stack are competing with each other by attaining high classification accuracy compared to others. However, SVM is ranked on the top by obtaining an accuracy of 97.7% with classification errors of 0.029 false negative (FN) and 0.019 false positive (FP). Therefore, it is noteworthy to mention that SVM is the best classifier and it outperforms even the stack classier.
This document describes an advanced machine learning approach for predicting skin cancer. It discusses using machine learning algorithms like Naive Bayes, Decision Tree, Random Forest on a dataset to estimate disease risk and determine algorithm accuracy. The paper focuses on developing a system that integrates symptom and medical data using machine learning algorithms like K-means to provide accurate disease predictions.
An approach of cervical cancer diagnosis using class weighting and oversampli...TELKOMNIKA JOURNAL
Globally, cervical cancer caused 604,127 new cases and 341,831 deaths in 2020, according to the global cancer observatory. In addition, the number of cervical cancer patients who have no symptoms has grown recently. Therefore, giving patients early notice of the possibility of cervical cancer is a useful task since it would enable them to have a clear understanding of their health state. The use of artificial intelligence (AI), particularly in machine learning, in this work is continually uncovering cervical cancer. With the help of a logit model and a new deep learning technique, we hope to identify cervical cancer using patient-provided data. For better outcomes, we employ Keras deep learning and its technique, which includes class weighting and oversampling. In comparison to the actual diagnostic result, the experimental result with model accuracy is 94.18%, and it also demonstrates a successful logit model cervical cancer prediction.
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...IJECEIAES
Machine learning methods have been an interesting method in the field of medical for many years, and they have achieved successful results in various fields of medical science. This paper examines the effects of using machine learning algorithms in the diagnosis and classification of breast cancer from mammography imaging data. Cancer diagnosis is the identification of images as cancer or non-cancer, and this involves image preprocessing, feature extraction, classification, and performance analysis. This article studied 93 different references mentioned in the previous years in the field of processing and tries to find an effective way to diagnose and classify breast cancer. Based on the results of this research, it can be concluded that most of today’s successful methods focus on the use of deep learning methods. Finding a new method requires an overview of existing methods in the field of deep learning methods in order to make a comparison and case study.
IRJET- Cancer Disease Prediction using Machine Learning over Big DataIRJET Journal
1. The document discusses using machine learning algorithms to predict cancer and other diseases by analyzing big healthcare data. It specifically looks at using support vector machines (SVM) for cancer prediction and classification.
2. SVM is presented as a powerful machine learning tool for cancer classification and identification of biomarkers, drug targets, and cancer-driving genes. The paper also examines applying machine learning algorithms like SVM to predict outbreaks of chronic diseases in populations using medical data.
3. The researchers aim to improve disease prediction accuracy by addressing issues with incomplete or inconsistent medical data from different regions. They also seek to enable early diagnosis and treatment by analyzing large healthcare datasets with machine learning.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Breast cancer tissues grow when cells in the breast expand and divide uncontrollably, resulting in a lump of tissue commonly called and named tumor. Breast cancer is the second most prevalent cancer among women, following skin cancer. While it is more commonly diagnosed in women aged 50 and above, it can affect individuals of any age. Although it is rare, men can also develop breast cancer, accounting for less than 1% of all cases, with approximately 2,600 cases reported annually in the United States. Early detection of breast tumors is crucial in reducing the risk of developing breast cancer. A publicly available dataset containing features of breast tumors was utilized to identify breast tumors using machine learning and deep learning techniques. Various prediction models were constructed, including logistic regression (LR), decision tree (DT), random forest (RF), support vector machine (SVM), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Light GBM, and a recurrent neural network (RNN) model. These models were trained to classify and predict breast tumor cases based on the provided features.
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
This document describes a study that uses machine learning and deep learning techniques to detect breast tumors using a publicly available dataset. Various classification models were developed including logistic regression, decision trees, random forests, support vector machines, gradient boosting, extreme gradient boosting, LightGBM, and a recurrent neural network. These models were trained on features from the dataset to classify breast tumor cases as benign or malignant. The models' performance was evaluated using various metrics like accuracy, precision, recall, and F1 score. Ensemble techniques like bagging and boosting were also explored to improve the models' performance at detecting breast cancer.
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Machine Learning and Applications: An International Journal (MLAIJ) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the machine learning. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of machine learning and applications.The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on machine learning advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of machine learning.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of machine learning.
This document presents the aim and methodology of a study that aims to develop a machine learning model to predict measles outbreaks. The study will collect a large, diverse dataset from various health sources to train models. It will preprocess the data, select features, train and evaluate models, and deploy the best model in a web app. The model is expected to accurately predict measles likelihood and outbreaks by identifying important risk factors from the extensive dataset. The results could help control measles spread, especially in under-resourced areas.
Melanoma Skin Cancer Detection using Image Processing and Machine Learningijtsrd
Dermatological Diseases are one of the biggest medical issues in 21st century due to its highly complex and expensive diagnosis with difficulties and subjectivity of human interpretation. In cases of fatal diseases like Melanoma diagnosis in early stages play a vital role in determining the probability of getting cured. We believe that the application of automated methods will help in early diagnosis especially with the set of images with variety of diagnosis. Hence, in this article we present a completely automated system of dermatological disease recognition through lesion images, a machine intervention in contrast to conventional medical personnel based detection. Our model is designed into three phases compromising of data collection and augmentation, designing model and finally prediction. We have used multiple AI algorithms like Convolutional Neural Network and Support Vector Machine and amalgamated it with image processing tools to form a better structure, leading to higher accuracy of 85 . Vijayalakshmi M M ""Melanoma Skin Cancer Detection using Image Processing and Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23936.pdf
Paper URL: https://www.ijtsrd.com/engineering/other/23936/melanoma-skin-cancer-detection-using-image-processing-and-machine-learning/vijayalakshmi-m-m
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Machine learning is the field that focuses on how computers learn from data. Today, machine learning is playing an integral role in the medical industry. This is due to its ability to process huge datasets beyond the scope of human capability, and then convert the data analyzed into clinical insights that aid physicians in providing care. Machine learning is a powerful, relatively easy to implement tool with numerous possibilities to enhance medical practice. The applications of machine learning in medicine are advancing medicine into a new realm. Therefore, educating the next generation of medical professionals with machine learning is essential. This paper provides a brief introduction to applying machine learning in medicine. Matthew N. O Sadiku | Sarhan M. Musa | Adedamola Omotoso "Machine Learning in Medicine: A Primer" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-2 , February 2019, URL: https://www.ijtsrd.com/papers/ijtsrd20255.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/20255/machine-learning-in-medicine-a-primer/matthew-n-o-sadiku
Machine Learning and the Value of Health TechnologiesCovance
Machine learning can be applied through the development of algorithms that can unravel or "learn" complex associations in large datasets with limited human input. These algorithms are capable of making predictions that go beyond our capabilities as humans and they can process and analyze more possibilities. Machine learning may help us find answers to questions that we didn't even think of in the past, revealing evidence previously hidden among the data. We can use these methods to dig up imperceptible patterns and allow health technologies to be used at the right time and for the right patient population. (A4 Version)
Giving more insight for automatic risk prediction during pregnancy with inter...journalBEEI
Maternal mortality rate (MMR) in Indonesia intercensal population survey (SUPAS) was considered high. For pregnancy risk detection, the public health center (puskesmas) applies a Poedji Rochjati screening card (KSPR) demonstrating 20 features. In addition to KSPR, pregnancy risk monitoring has been assisted with a pregnancy control card. Because of the differences in the number of features between the two control cards, it is necessary to make agreements between them. Our objectives are determining the most influential features, exploring the links among features on the KSPR and pregnancy control cards, and building a machine learning model for predicting pregnancy risk. For the first objective, we use correlation-based feature selection (CFS) and C5.0 algorithm. The next objective was answered by the union operation in the features produced by the two techniques. By performing the machine learning experiment on these features, the accuracy of the XGBoost algorithm demonstrated the hightest results of 94% followed by random forest, Naïve Bayes, and k-Nearest neighbor algorithms, 87%, 66%, and 60% respectively. Interpretability aspects are implemented with SHAP and LIME to provide more insight for classification model. In conclusion, the similarity feature generated in the two interpretation approaches confirmed that Cesar was dominant in determining pregnancy risk.
Tomato Disease Fusion and Classification using Deep LearningIJCI JOURNAL
Tomato plants' susceptibility to diseases imperils agricultural yields. About 30% of the total crop loss is attributable to plants with disease. Detecting such illnesses in the plant is crucial to avoid significant output losses.This study introduces "data fusion" to enhance disease classification by amalgamating distinct disease-specific traits from leaf halves. Data fusion generates synthetic samples, fortifying a TensorFlow Keras deep learning model using a diverse tomato leaf image dataset. Results illuminate the augmented model's efficacy, particularly for diseases marked by overlapping traits. Enhanced disease recognition accuracy and insights into disease interactions transpire. Evaluation metrics (accuracy 0.95, precision 0.58, recall 0.50, F1 score 0.51) spotlight balanced performance. While attaining commendable accuracy, the intricate precision-recall interplay beckons further examination. In conclusion, data fusion emerges as a promising avenue for refining disease classification, effectively addressing challenges rooted in trait overlap. The integration of TensorFlow Keras underscores the potential for enhancing agricultural practices. Sustained endeavours toward enhanced recall remain pivotal, charting a trajectory for future advancements.
This document discusses how machine learning can be applied to help plastic surgeons better analyze and interpret the large amounts of patient data that are now routinely collected. It begins by explaining that traditional data analysis techniques struggle with "big data," which contains complex patterns. Machine learning, a subfield of artificial intelligence, can generate algorithms capable of acquiring knowledge from historical examples to help address this challenge. The document then provides examples of how machine learning has already been successfully applied in other fields and in cancer treatment. It proposes that plastic surgeons should also look to machine learning approaches to more efficiently deliver healthcare and improve surgical outcomes by extracting meaningful insights from their extensive patient data collections. Specific potential applications discussed include burn surgery, microsurgery, and various types of reconstruct
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESIAEME Publication
Women who have improved from breast cancer (BC) constantly panic about setback. The way that they have persevered through the meticulous treatment makes repeat their biggest fear. However, with current spreads in technology, early repeat prediction can enable patients to get treatment prior. The accessibility of broad information and propelled techniques make precise and fast prediction possible. This examination expects to think about the exactness of a couple of existing information mining calculations in predicting BC repeat. It inserts a particle swarm optimization as highlight choice into ANN classifier. An objective of increasing the accuracy level of the prediction model.
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...ijmpict
The diagnosed cases of Breast cancer is increasing annually and unfortunately getting converted into a
high mortality rate. Cancer, at the early stages, is hard to detect because the malicious cells show similar
properties (density) as shown by the non-malicious cells. The mortality ratio could have been minimized if
the breast cancer could have been detected in its early stages. But the current systems have not been able
to achieve a fully automatic system which is not just capable of detecting the breast cancer but also can
detect the stage of it. Estimation of malignancy grading is important in diagnosing the degree of growth of
malicious cells as well as in selecting a proper therapy for the patient. Therefore, a complete and efficient
clinical decision support system is proposed which is capable of achieving breast cancer malignancy
grading scheme very efficiently. The system is based on Image processing and machine learning domains.
Classification Imbalance problem, a machine learning problem, occurs when instances of one class is
much higher than the instances of the other class resulting in an inefficient classification of samples and
hence a bad decision support system. Therefore EUSBoost, ensemble based classifier is proposed which is
efficient and is able to outperform other classifiers as it takes the benefits of both-boosting algorithm with
Random Undersampling techniques. Also comparison of EUSBoost with other techniques is shown in the
paper.
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
The classification algorithms are very frequently used algorithms for analyzing various kinds of data available in different repositories which have real world applications. The main objective of this research work is to find the performance of classification algorithms in analyzing Breast Cancer data via analyzing the mammogram images based its characteristics.Different attribute values of cancer affected mammogram images are considered for analysis in this work. The Patients food habits, age of the patients, their life styles, occupation, their problem about the diseases and other information are taken into account for classification. Finally, performance of classification algorithms J48, CART and ADTree are given with its accuracy. The accuracy of taken algorithms is measured by various measures like specificity, sensitivity and kappa statistics (Errors).
- The document discusses the use of Bayesian belief networks (BBNs) to model complex public health programs. BBNs allow representation of relationships between various inputs, outputs, and impact indicators in public health.
- BayesiaLab software is highlighted as a tool for building and understanding BBN models of public health programs. Various examples are provided of building BBN models from existing data on topics like malaria programs, market shares, and student perceptions.
- The examples demonstrate how BBNs can be constructed from expert knowledge or learned from data using algorithms based on information theory and the minimum description length principle. Conditional probabilities are key to representing dependencies between variables in a BBN model.
This document proposes using a DenseNet-II neural network model to classify mammogram images as benign or malignant. It first preprocesses mammogram images through normalization and data augmentation. It then improves the original DenseNet model by replacing the first convolutional layer with an Inception structure, creating a new DenseNet-II model. This model, along with other common models, are tested on mammogram data and the DenseNet-II model achieves the highest average accuracy of 94.55% for benign-malignant classification.
Natural language processing through the subtractive mountain clustering algor...kevig
In this work, the subtractive mountain clustering algorithm has been adapted to the
problem of natural languages processing in view to construct a chatbot that answers questions
posed by the user. The implemented algorithm version allosws for the association of a set of words
into clusters. After finding the centre of every cluster — the most relevant word, all the others are
aggregated according to a defined metric adapted to the language processing realm. All the relevant
stored information (necessary to answer the questions) is processed, as well as the questions, by the
algorithm. The correct processing of the text enables the chatbot to produce answers that relate
to the posed queries. Since we have in view a chatbot to help elder people with medication, to
validate the method, we use the package insert of a drug as the available information and formulate
associated questions. Errors in medication intake among elderly people are very common. One of
the main causes for this is their loss of ability to retain information. The high amount of medicine
intake required by the advanced age is another limiting factor. Thence, the design of an interactive
aid system, preferably using natural language, to help the older population with medication is in
demand. A chatbot based on a subtractive cluster algorithm is the chosen solution.
Natural language processing through the subtractive mountain clustering algor...ijnlc
In this work, the subtractive mountain clustering algorithm has been adapted to the
problem of natural languages processing in view to construct a chatbot that answers questions
posed by the user. The implemented algorithm version allosws for the association of a set of words
into clusters. After finding the centre of every cluster — the most relevant word, all the others are
aggregated according to a defined metric adapted to the language processing realm. All the relevant
stored information (necessary to answer the questions) is processed, as well as the questions, by the
algorithm. The correct processing of the text enables the chatbot to produce answers that relate
to the posed queries. Since we have in view a chatbot to help elder people with medication, to
validate the method, we use the package insert of a drug as the available information and formulate
associated questions. Errors in medication intake among elderly people are very common. One of
the main causes for this is their loss of ability to retain information. The high amount of medicine
intake required by the advanced age is another limiting factor. Thence, the design of an interactive
aid system, preferably using natural language, to help the older population with medication is in
demand. A chatbot based on a subtractive cluster algorithm is the chosen solution.
A deep convolutional structure-based approach for accurate recognition of ski...IJECEIAES
One-third of all cancer diagnoses worldwide are skin malignancies. One of the most common tumors, skin cancer can develop from a variety of dermatological conditions and is subdivided into different categories based on its textile, color, body, and other morphological characteristics. The most effective strategy to lower the mortality rate of melanoma is early identification because skin cancer incidence has been on the rise recently. In order to categorize dermoscopy images into the four diagnosis classifications of melanoma, benign, malignant, and human against machine (HAM) not melanoma, this research suggests a computer-aided diagnosis (CAD) system. Experimental results show that the suggested approach enabled 97.25% classification accuracy. In order to automate the identification of skin cancer and expedite the diagnosis process in order to save a life, the proposed technique offers a less complex and cutting-edge framework.
Predictive modeling for breast cancer based on machine learning algorithms an...IJECEIAES
Breast cancer is one of the leading causes of death among women worldwide. However, early prediction of breast cancer plays a crucial role. Therefore, strong needs exist for automatic accurate early prediction of breast cancer. In this paper, machine learning (ML) classifiers combined with features selection methods are used to build an intelligent tool for breast cancer prediction. The Wisconsin diagnostic breast cancer (WDBC) dataset is used to train and test the model. Classification algorithms, including support vector machine (SVM), light gradient boosting machine (LightGBM), random forest (RF), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes, were employed. Performance measures for each of them were obtained, namely: accuracy, precision, recall, F-score, Kappa, Matthews correlation coefficient (MCC), and time. The results indicate that without feature selection, LightGBM achieves the highest accuracy at 95%. With minimum redundancy maximum relevance (mRMR) feature selection (15 features), LightGBM outperforms other classifiers, achieving an accuracy of 98%. For Pearson correlation coefficient feature selection (15 features), LightGBM also excels with a 95% accuracy rate. Lasso feature selection (5 features) produces varied results across classifiers, with logistic regression achieving the highest accuracy at 96%. These findings underscore the importance of feature selection in refining model performance and in improving detection for breast cancer.
A Comprehensive Survey On Predictive Analysis Of Breast CancerAngela Shin
This document summarizes a research paper that analyzes different techniques for predicting breast cancer. It begins by noting that breast cancer is one of the most common cancers and early prediction can reduce deaths. It then discusses how earlier methods used data mining, machine learning, and hybrid approaches to predict diagnoses, but proposes using a deep learning technique with a faster RNN algorithm to achieve higher accuracy. The document reviews different machine learning and data mining algorithms that have been used for breast cancer prediction, including supervised techniques like Gaussian mixture models, decision trees, and random forests. It concludes that machine learning and deep learning can provide cheap, easy, and accurate methods to detect tumor type and help medical studies.
IRJET - Development of a Predictive Fuzzy Logic Model for Monitoring the Risk...IRJET Journal
This document describes the development of a predictive fuzzy logic model to monitor the risk of sexually transmitted diseases (STDs) in female humans. The researchers identified 9 non-invasive risk factors associated with STDs in Nigeria such as marital status, socio-economic status, age of first sexual intercourse, number of sexual partners, and history of STDs. Fuzzy logic modeling was used to develop a classification system where the risk factors were input variables and the output was the predicted risk of STDs. Membership functions were created to map the linguistic labels of the risk factors and predicted risk levels. Over 2,300 inference rules were formulated relating combinations of risk factor levels to predicted STD risk. The model was simulated in MATLAB and results showed it
Basavarajeeyam is an important text for ayurvedic physician belonging to andhra pradehs. It is a popular compendium in various parts of our country as well as in andhra pradesh. The content of the text was presented in sanskrit and telugu language (Bilingual). One of the most famous book in ayurvedic pharmaceutics and therapeutics. This book contains 25 chapters called as prakaranas. Many rasaoushadis were explained, pioneer of dhatu druti, nadi pareeksha, mutra pareeksha etc. Belongs to the period of 15-16 century. New diseases like upadamsha, phiranga rogas are explained.
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
More Related Content
Similar to APPLING MACHINE LEARNING TO PREDICT MELASMA
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
Machine Learning and Applications: An International Journal (MLAIJ) is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the machine learning. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of machine learning and applications.The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on machine learning advancements, and establishing new collaborations in these areas. Original research papers, state-of-the-art reviews are invited for publication in all areas of machine learning.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of machine learning.
This document presents the aim and methodology of a study that aims to develop a machine learning model to predict measles outbreaks. The study will collect a large, diverse dataset from various health sources to train models. It will preprocess the data, select features, train and evaluate models, and deploy the best model in a web app. The model is expected to accurately predict measles likelihood and outbreaks by identifying important risk factors from the extensive dataset. The results could help control measles spread, especially in under-resourced areas.
Melanoma Skin Cancer Detection using Image Processing and Machine Learningijtsrd
Dermatological Diseases are one of the biggest medical issues in 21st century due to its highly complex and expensive diagnosis with difficulties and subjectivity of human interpretation. In cases of fatal diseases like Melanoma diagnosis in early stages play a vital role in determining the probability of getting cured. We believe that the application of automated methods will help in early diagnosis especially with the set of images with variety of diagnosis. Hence, in this article we present a completely automated system of dermatological disease recognition through lesion images, a machine intervention in contrast to conventional medical personnel based detection. Our model is designed into three phases compromising of data collection and augmentation, designing model and finally prediction. We have used multiple AI algorithms like Convolutional Neural Network and Support Vector Machine and amalgamated it with image processing tools to form a better structure, leading to higher accuracy of 85 . Vijayalakshmi M M ""Melanoma Skin Cancer Detection using Image Processing and Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23936.pdf
Paper URL: https://www.ijtsrd.com/engineering/other/23936/melanoma-skin-cancer-detection-using-image-processing-and-machine-learning/vijayalakshmi-m-m
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Machine learning is the field that focuses on how computers learn from data. Today, machine learning is playing an integral role in the medical industry. This is due to its ability to process huge datasets beyond the scope of human capability, and then convert the data analyzed into clinical insights that aid physicians in providing care. Machine learning is a powerful, relatively easy to implement tool with numerous possibilities to enhance medical practice. The applications of machine learning in medicine are advancing medicine into a new realm. Therefore, educating the next generation of medical professionals with machine learning is essential. This paper provides a brief introduction to applying machine learning in medicine. Matthew N. O Sadiku | Sarhan M. Musa | Adedamola Omotoso "Machine Learning in Medicine: A Primer" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-2 , February 2019, URL: https://www.ijtsrd.com/papers/ijtsrd20255.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/20255/machine-learning-in-medicine-a-primer/matthew-n-o-sadiku
Machine Learning and the Value of Health TechnologiesCovance
Machine learning can be applied through the development of algorithms that can unravel or "learn" complex associations in large datasets with limited human input. These algorithms are capable of making predictions that go beyond our capabilities as humans and they can process and analyze more possibilities. Machine learning may help us find answers to questions that we didn't even think of in the past, revealing evidence previously hidden among the data. We can use these methods to dig up imperceptible patterns and allow health technologies to be used at the right time and for the right patient population. (A4 Version)
Giving more insight for automatic risk prediction during pregnancy with inter...journalBEEI
Maternal mortality rate (MMR) in Indonesia intercensal population survey (SUPAS) was considered high. For pregnancy risk detection, the public health center (puskesmas) applies a Poedji Rochjati screening card (KSPR) demonstrating 20 features. In addition to KSPR, pregnancy risk monitoring has been assisted with a pregnancy control card. Because of the differences in the number of features between the two control cards, it is necessary to make agreements between them. Our objectives are determining the most influential features, exploring the links among features on the KSPR and pregnancy control cards, and building a machine learning model for predicting pregnancy risk. For the first objective, we use correlation-based feature selection (CFS) and C5.0 algorithm. The next objective was answered by the union operation in the features produced by the two techniques. By performing the machine learning experiment on these features, the accuracy of the XGBoost algorithm demonstrated the hightest results of 94% followed by random forest, Naïve Bayes, and k-Nearest neighbor algorithms, 87%, 66%, and 60% respectively. Interpretability aspects are implemented with SHAP and LIME to provide more insight for classification model. In conclusion, the similarity feature generated in the two interpretation approaches confirmed that Cesar was dominant in determining pregnancy risk.
Tomato Disease Fusion and Classification using Deep LearningIJCI JOURNAL
Tomato plants' susceptibility to diseases imperils agricultural yields. About 30% of the total crop loss is attributable to plants with disease. Detecting such illnesses in the plant is crucial to avoid significant output losses.This study introduces "data fusion" to enhance disease classification by amalgamating distinct disease-specific traits from leaf halves. Data fusion generates synthetic samples, fortifying a TensorFlow Keras deep learning model using a diverse tomato leaf image dataset. Results illuminate the augmented model's efficacy, particularly for diseases marked by overlapping traits. Enhanced disease recognition accuracy and insights into disease interactions transpire. Evaluation metrics (accuracy 0.95, precision 0.58, recall 0.50, F1 score 0.51) spotlight balanced performance. While attaining commendable accuracy, the intricate precision-recall interplay beckons further examination. In conclusion, data fusion emerges as a promising avenue for refining disease classification, effectively addressing challenges rooted in trait overlap. The integration of TensorFlow Keras underscores the potential for enhancing agricultural practices. Sustained endeavours toward enhanced recall remain pivotal, charting a trajectory for future advancements.
This document discusses how machine learning can be applied to help plastic surgeons better analyze and interpret the large amounts of patient data that are now routinely collected. It begins by explaining that traditional data analysis techniques struggle with "big data," which contains complex patterns. Machine learning, a subfield of artificial intelligence, can generate algorithms capable of acquiring knowledge from historical examples to help address this challenge. The document then provides examples of how machine learning has already been successfully applied in other fields and in cancer treatment. It proposes that plastic surgeons should also look to machine learning approaches to more efficiently deliver healthcare and improve surgical outcomes by extracting meaningful insights from their extensive patient data collections. Specific potential applications discussed include burn surgery, microsurgery, and various types of reconstruct
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESIAEME Publication
Women who have improved from breast cancer (BC) constantly panic about setback. The way that they have persevered through the meticulous treatment makes repeat their biggest fear. However, with current spreads in technology, early repeat prediction can enable patients to get treatment prior. The accessibility of broad information and propelled techniques make precise and fast prediction possible. This examination expects to think about the exactness of a couple of existing information mining calculations in predicting BC repeat. It inserts a particle swarm optimization as highlight choice into ANN classifier. An objective of increasing the accuracy level of the prediction model.
Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Gradin...ijmpict
The diagnosed cases of Breast cancer is increasing annually and unfortunately getting converted into a
high mortality rate. Cancer, at the early stages, is hard to detect because the malicious cells show similar
properties (density) as shown by the non-malicious cells. The mortality ratio could have been minimized if
the breast cancer could have been detected in its early stages. But the current systems have not been able
to achieve a fully automatic system which is not just capable of detecting the breast cancer but also can
detect the stage of it. Estimation of malignancy grading is important in diagnosing the degree of growth of
malicious cells as well as in selecting a proper therapy for the patient. Therefore, a complete and efficient
clinical decision support system is proposed which is capable of achieving breast cancer malignancy
grading scheme very efficiently. The system is based on Image processing and machine learning domains.
Classification Imbalance problem, a machine learning problem, occurs when instances of one class is
much higher than the instances of the other class resulting in an inefficient classification of samples and
hence a bad decision support system. Therefore EUSBoost, ensemble based classifier is proposed which is
efficient and is able to outperform other classifiers as it takes the benefits of both-boosting algorithm with
Random Undersampling techniques. Also comparison of EUSBoost with other techniques is shown in the
paper.
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
The classification algorithms are very frequently used algorithms for analyzing various kinds of data available in different repositories which have real world applications. The main objective of this research work is to find the performance of classification algorithms in analyzing Breast Cancer data via analyzing the mammogram images based its characteristics.Different attribute values of cancer affected mammogram images are considered for analysis in this work. The Patients food habits, age of the patients, their life styles, occupation, their problem about the diseases and other information are taken into account for classification. Finally, performance of classification algorithms J48, CART and ADTree are given with its accuracy. The accuracy of taken algorithms is measured by various measures like specificity, sensitivity and kappa statistics (Errors).
- The document discusses the use of Bayesian belief networks (BBNs) to model complex public health programs. BBNs allow representation of relationships between various inputs, outputs, and impact indicators in public health.
- BayesiaLab software is highlighted as a tool for building and understanding BBN models of public health programs. Various examples are provided of building BBN models from existing data on topics like malaria programs, market shares, and student perceptions.
- The examples demonstrate how BBNs can be constructed from expert knowledge or learned from data using algorithms based on information theory and the minimum description length principle. Conditional probabilities are key to representing dependencies between variables in a BBN model.
This document proposes using a DenseNet-II neural network model to classify mammogram images as benign or malignant. It first preprocesses mammogram images through normalization and data augmentation. It then improves the original DenseNet model by replacing the first convolutional layer with an Inception structure, creating a new DenseNet-II model. This model, along with other common models, are tested on mammogram data and the DenseNet-II model achieves the highest average accuracy of 94.55% for benign-malignant classification.
Natural language processing through the subtractive mountain clustering algor...kevig
In this work, the subtractive mountain clustering algorithm has been adapted to the
problem of natural languages processing in view to construct a chatbot that answers questions
posed by the user. The implemented algorithm version allosws for the association of a set of words
into clusters. After finding the centre of every cluster — the most relevant word, all the others are
aggregated according to a defined metric adapted to the language processing realm. All the relevant
stored information (necessary to answer the questions) is processed, as well as the questions, by the
algorithm. The correct processing of the text enables the chatbot to produce answers that relate
to the posed queries. Since we have in view a chatbot to help elder people with medication, to
validate the method, we use the package insert of a drug as the available information and formulate
associated questions. Errors in medication intake among elderly people are very common. One of
the main causes for this is their loss of ability to retain information. The high amount of medicine
intake required by the advanced age is another limiting factor. Thence, the design of an interactive
aid system, preferably using natural language, to help the older population with medication is in
demand. A chatbot based on a subtractive cluster algorithm is the chosen solution.
Natural language processing through the subtractive mountain clustering algor...ijnlc
In this work, the subtractive mountain clustering algorithm has been adapted to the
problem of natural languages processing in view to construct a chatbot that answers questions
posed by the user. The implemented algorithm version allosws for the association of a set of words
into clusters. After finding the centre of every cluster — the most relevant word, all the others are
aggregated according to a defined metric adapted to the language processing realm. All the relevant
stored information (necessary to answer the questions) is processed, as well as the questions, by the
algorithm. The correct processing of the text enables the chatbot to produce answers that relate
to the posed queries. Since we have in view a chatbot to help elder people with medication, to
validate the method, we use the package insert of a drug as the available information and formulate
associated questions. Errors in medication intake among elderly people are very common. One of
the main causes for this is their loss of ability to retain information. The high amount of medicine
intake required by the advanced age is another limiting factor. Thence, the design of an interactive
aid system, preferably using natural language, to help the older population with medication is in
demand. A chatbot based on a subtractive cluster algorithm is the chosen solution.
A deep convolutional structure-based approach for accurate recognition of ski...IJECEIAES
One-third of all cancer diagnoses worldwide are skin malignancies. One of the most common tumors, skin cancer can develop from a variety of dermatological conditions and is subdivided into different categories based on its textile, color, body, and other morphological characteristics. The most effective strategy to lower the mortality rate of melanoma is early identification because skin cancer incidence has been on the rise recently. In order to categorize dermoscopy images into the four diagnosis classifications of melanoma, benign, malignant, and human against machine (HAM) not melanoma, this research suggests a computer-aided diagnosis (CAD) system. Experimental results show that the suggested approach enabled 97.25% classification accuracy. In order to automate the identification of skin cancer and expedite the diagnosis process in order to save a life, the proposed technique offers a less complex and cutting-edge framework.
Predictive modeling for breast cancer based on machine learning algorithms an...IJECEIAES
Breast cancer is one of the leading causes of death among women worldwide. However, early prediction of breast cancer plays a crucial role. Therefore, strong needs exist for automatic accurate early prediction of breast cancer. In this paper, machine learning (ML) classifiers combined with features selection methods are used to build an intelligent tool for breast cancer prediction. The Wisconsin diagnostic breast cancer (WDBC) dataset is used to train and test the model. Classification algorithms, including support vector machine (SVM), light gradient boosting machine (LightGBM), random forest (RF), logistic regression (LR), k-nearest neighbors (k-NN), and naïve Bayes, were employed. Performance measures for each of them were obtained, namely: accuracy, precision, recall, F-score, Kappa, Matthews correlation coefficient (MCC), and time. The results indicate that without feature selection, LightGBM achieves the highest accuracy at 95%. With minimum redundancy maximum relevance (mRMR) feature selection (15 features), LightGBM outperforms other classifiers, achieving an accuracy of 98%. For Pearson correlation coefficient feature selection (15 features), LightGBM also excels with a 95% accuracy rate. Lasso feature selection (5 features) produces varied results across classifiers, with logistic regression achieving the highest accuracy at 96%. These findings underscore the importance of feature selection in refining model performance and in improving detection for breast cancer.
A Comprehensive Survey On Predictive Analysis Of Breast CancerAngela Shin
This document summarizes a research paper that analyzes different techniques for predicting breast cancer. It begins by noting that breast cancer is one of the most common cancers and early prediction can reduce deaths. It then discusses how earlier methods used data mining, machine learning, and hybrid approaches to predict diagnoses, but proposes using a deep learning technique with a faster RNN algorithm to achieve higher accuracy. The document reviews different machine learning and data mining algorithms that have been used for breast cancer prediction, including supervised techniques like Gaussian mixture models, decision trees, and random forests. It concludes that machine learning and deep learning can provide cheap, easy, and accurate methods to detect tumor type and help medical studies.
IRJET - Development of a Predictive Fuzzy Logic Model for Monitoring the Risk...IRJET Journal
This document describes the development of a predictive fuzzy logic model to monitor the risk of sexually transmitted diseases (STDs) in female humans. The researchers identified 9 non-invasive risk factors associated with STDs in Nigeria such as marital status, socio-economic status, age of first sexual intercourse, number of sexual partners, and history of STDs. Fuzzy logic modeling was used to develop a classification system where the risk factors were input variables and the output was the predicted risk of STDs. Membership functions were created to map the linguistic labels of the risk factors and predicted risk levels. Over 2,300 inference rules were formulated relating combinations of risk factor levels to predicted STD risk. The model was simulated in MATLAB and results showed it
Similar to APPLING MACHINE LEARNING TO PREDICT MELASMA (20)
Basavarajeeyam is an important text for ayurvedic physician belonging to andhra pradehs. It is a popular compendium in various parts of our country as well as in andhra pradesh. The content of the text was presented in sanskrit and telugu language (Bilingual). One of the most famous book in ayurvedic pharmaceutics and therapeutics. This book contains 25 chapters called as prakaranas. Many rasaoushadis were explained, pioneer of dhatu druti, nadi pareeksha, mutra pareeksha etc. Belongs to the period of 15-16 century. New diseases like upadamsha, phiranga rogas are explained.
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
ABDOMINAL TRAUMA in pediatrics part one.drhasanrajab
Abdominal trauma in pediatrics refers to injuries or damage to the abdominal organs in children. It can occur due to various causes such as falls, motor vehicle accidents, sports-related injuries, and physical abuse. Children are more vulnerable to abdominal trauma due to their unique anatomical and physiological characteristics. Signs and symptoms include abdominal pain, tenderness, distension, vomiting, and signs of shock. Diagnosis involves physical examination, imaging studies, and laboratory tests. Management depends on the severity and may involve conservative treatment or surgical intervention. Prevention is crucial in reducing the incidence of abdominal trauma in children.
Basavarajeeyam is a Sreshta Sangraha grantha (Compiled book ), written by Neelkanta kotturu Basavaraja Virachita. It contains 25 Prakaranas, First 24 Chapters related to Rogas& 25th to Rasadravyas.
Rasamanikya is a excellent preparation in the field of Rasashastra, it is used in various Kushtha Roga, Shwasa, Vicharchika, Bhagandara, Vatarakta, and Phiranga Roga. In this article Preparation& Comparative analytical profile for both Formulationon i.e Rasamanikya prepared by Kushmanda swarasa & Churnodhaka Shodita Haratala. The study aims to provide insights into the comparative efficacy and analytical aspects of these formulations for enhanced therapeutic outcomes.
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptxHolistified Wellness
We’re talking about Vedic Meditation, a form of meditation that has been around for at least 5,000 years. Back then, the people who lived in the Indus Valley, now known as India and Pakistan, practised meditation as a fundamental part of daily life. This knowledge that has given us yoga and Ayurveda, was known as Veda, hence the name Vedic. And though there are some written records, the practice has been passed down verbally from generation to generation.
Osteoporosis - Definition , Evaluation and Management .pdfJim Jacob Roy
Osteoporosis is an increasing cause of morbidity among the elderly.
In this document , a brief outline of osteoporosis is given , including the risk factors of osteoporosis fractures , the indications for testing bone mineral density and the management of osteoporosis
Osteoporosis - Definition , Evaluation and Management .pdf
APPLING MACHINE LEARNING TO PREDICT MELASMA
1. APPLING MACHINE LEARNING TO PREDICT MELASMA
Ho Van Lam(1)
, Vu Tuan Anh(2)
, Pham Thi Hoang Bich Diu(2)
, Tran Xuan Viet(2)
1. Faculty of Information Technology, Quy Nhon University.
2. Quyhoa National Leprosy Dermatology Hospital, Binh Dinh, Vietnam.
Email: hovanlam@qnu.edu.vn; drvtavn@gmail.com; bichdiuqnqh@gmail.com; thstranxuanviet@gmail.com
Abstract - This study aims to predict Melasma based
on users' data combined with medical practice data
community by dermatologists to predict the disease
and make some necessary recommendations in the
patient screening. This study also helps reduce
treatment costs and supports remote patient
treatment. In this study, we built a machine learning
model to assist dermatologists in predicting a person's
risk of Melasma after entering his/her community
information. People can use this model through an
application to track their risk of Melasma. Combining
input community data with the expertise of Melasma
specialists, we built a dataset with relevant
information to predict Melasma. Based on this dataset,
we have statistically described the data characteristics
as well as the correlated data parameters that may
cause Melasma, then we use the XGBoost algorithm to
build a machine learning model to predict whether a
person is infected to Melasma or not. The obtained
results are going to be applied to assist in predicting
whether a person may have Melasma with the input of
community information combined with medical
practice knowledge about the disease. From this
result, it is possible to continue researching and
applying artificial intelligence to support diagnosis
and treatment of Melasma.
Key words: XGBoost algorithm, Melasma disease,
machine learning, Melasma prediction.
I. INTRODUCTION
Machine Learning is a field of Artificial
Intelligence, which is a technique that helps
computers learn on their own without setting up
decision rules. Normally, a computer program needs
rules to be able to execute a certain task, but with
machine learning, computers can automatically
execute the task upon receiving input data. In other
words, machine learning means that computers can
think on their own like humans. Another approach
argues that machine learning is a method of drawing
lines that represents the relationship of a data set [1],
[5], [13]. Combining the expertise of dermatologists
with people's public information on Melasma [10],
we used data analysis techniques to show
correlations features of the data: descriptive analysis,
visualisation data may help experts and people easily
monitor the possibility of having Melasma through
input data of a person's daily information. From the
results of the analysis, we built a machine learning
model with the input data of expertise of
dermatologists specializing in Melasma combined
with patients' community information, so that
computer can support to predict whether a person is
infected to Melasma or not. The machine learning
model was built based on XGBoost algorithm, a
machine learning algorithm that is evaluated to have
numerous advantages [8]. In this study, the machine
learning model using the XGBoost algorithm to
predict whether a person is at risk of getting
Melasma and how much the probability is. The
adjustment of parameters to optimize the model is
also done in this paper through analyzing some
properties of the model such as Cross validation,
Learning Curves, confusion matrix, ROC-AUC
curve, Precision-Recall curve and data variables that
affect the predictive model. Approach some Model
Evaluation methods to evaluate the results obtained
from the model, evaluate whether the model has met
the set goals or not, analyze the indicators achieved
by the model, and make decisions on the use of the
analysis results in practice. We also showed how the
deployment has been done at Quyhoa National
Leprosy Dermatology Hospital where our machine
learning model is used in a web application to help
users predict the likelihood of Melasma after
providing some survey information.
II. DATA IN MELASMA DISEASE
Melasma is an acquired hypermelanosis with
complex etiology and pathogenesis. The primary
lesion of the disease is macules and/or dark brown,
symmetrical patches in sun-exposed areas. Common
sites of infection are the cheeks, upper lip, chin, and
forehead. Though the disease is benign, it greatly
affects the psychology and aesthetics of the patients
[9]. In women, the disease can be idiopathic or
related to pregnancy [10].
Descriptive statistics (age, geographical
distribution, group of facial hyperpigmentation
disease, education level, occupation, marital status,
maternity history, family history in Melasma,
medical history and cosmetics use...) were measured
by frequency and percentage. From the collected
data, we remove inadequate data and build a
Melasma prediction model.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
56 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. The data set used in training and testing the
machine learning model was collected from the
community through a survey. The data set includes
21 data fields containing information of persons to
be checked and medical practice information with a
total of 795 recorded samples organized as .csv files.
Descriptive analysis from the data set helps us to
get some more information such as variables of
clinical characteristics data and daily habit factors
for Melasma. Through this data descriptive analysis,
we also obtained clinical characteristics-independent
variables such as age, occupation, ethnicity,
comorbidities, family history... as well as habit-
independent variables such as sun exposure,
cosmetic use, pregnancy, and oral contraceptive use.
This makes the training, evaluation and model
correction process more effective.
From the age information in the data set, we
have the age distribution of people who are likely to
have Melasma in Figure 1. It is shown in the Figure
that 35-45-year-old subjects have a high probability
of Melasma.
Figure 1. Distribution of Melasma by age
In terms of family economic status, the analysis
results also indicate that the proportion of poor and
near-poor patients with Melasma is higher than that
of the non-poor group. A multivariable logistic
regression analysis reveals that the poor and near-
poor groups have a 3.91 times higher risk of
Melasma than the non-poor group.
The analysis also shows that the percentage of
Melasma patients who are pregnant is higher than
that of Melasma patients who are not. A multivariate
logistic regression analysis indicates that those with
a history of Melasma during pregnancy have a 2.93
times higher risk of Melasma compared with those
without a history of Melasma during pregnancy as
presented in Figure 2.
Figure 2. Distribution of Melasma by history of
Melasma during pregnancy
Occupation is also indicated to have an influence
on possibility of Melasma as in Figure 3.
Figure 3. Distribution of Melasma by occupation
Cosmetics use is another influential factor of
Melasma. A multivariable logistic regression
analysis shows that the use of whitening cosmetics
increases the risk of Melasma by 1.5 times compared
to the group that do not use, which is presented in
Figure 4.
Figure 4. Distribution of Melasma by Cosmetics
use
The dataset includes 21 features:
Birthcontrolpills, Occupation, FamilyEconomy,
Melasmaduringpregnancy, Religion, Familyhistory,
Monthofpregnancy, Morning, Afternoon,
Numberofpregnancies, Usingcosmetics, Noon,
Numberofhoursofsunlightexposure, Yearofbird,
Ageofusingcosmetics, Pathology, Marriage,
Education, Numberofbirths, Ethnicity,
Chemicalexposure and analyzed their correlation so
that we have more insight about the dataset in
Figure 5. Correlation analysis used to study the
strength of a relationship and possible connections
between features in dataset.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
57 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. Through correlation analysis we get ranking of
correlation coefficients; Example: Pair
(Numberofpregnancies, Numberofbirths) is 98% or
pair (Morning, Afternoon) is 42% and so on, from
there it helps us to make accurate assessments and
provide solutions to upgrade our machine learning
model.
Figure 5. Correlation analysis in the data set
From the dataset and the results of the dataset
analysis above, to predict whether a person is
infected to Melasma or not we chose XGBoost
algorithm for machine learning model with the input
data of expertise of dermatologists specializing in
Melasma combined with patients' community
information. XGBoost is short for eXtreme Gradient
Boosting. It is an efficient and scalable
implementation of gradient boosting framework [1],
[8], [14]. It has several features: 1. Speed: XGBoost
can automatically do parallel computation. 2. Input
Type: XGBoost takes several types of input data:
Dense Matrix, Sparse Matrix, Data File. 3. Sparsity:
XGBoost accepts sparse input for both tree booster
and linear booster, and is optimized for sparse input.
4. Customization: XGBoost supports customized
objective function and evaluation function. 5.
Performance: XGBoost has better performance on
several different datasets. In next section we will
present how XGBoost actions and apply to our
dataset to solve prediction problem whether a person
is infected to Melasma or not.
III. MACHINE LEARNING MODEL
1) XGBoost algorithm
XGBoost, designed with speed and performance,
is a new machine-learning algorithm that has been
widely applied successfully by the machine learning
community in applications and competitions taking
place on Kaggle. XGBoost stands for eXtreme
Gradient Boosting which deals with decision trees
algorithms, applies techniques for merging decision
trees, smooths training loss, and performs
regularization. Following are four attributes that
have made XGBOOST so successful [5], [8], [13]:
- Proportional reduction of leaf nodes
(pruning) which improves model generality. It is
important that the weak learners have skill but
remain weak. There are a number of ways that the
trees can be constrained. A good general heuristic is
that the more constrained tree creation is, the more
trees you will need in the model, and the reverse,
where less constrained individual trees, the fewer
trees that will be required.
- Newton Boosting which finds the minima
directly instead of reducing the slope, making the
learning process faster. The predictions of each tree
are added together sequentially. The contribution of
each tree to this sum can be weighted to slow down
the learning by the algorithm. This weighting is
called a shrinkage or a learning rate. Each update is
simply scaled by the value of the “learning rate
parameter v”. Similar to a learning rate in stochastic
optimization, shrinkage reduces the influence of
each individual tree and leaves space for future trees
to improve the model.
- Additional random parameter which
reduces the correlation between trees, ultimately
improving group strength. A big insight into bagging
ensembles and random forest was allowing trees to
be greedily created from subsamples of the training
dataset. This same benefit can be used to reduce the
correlation between the trees in the sequence in
gradient boosting models. This variation of boosting
is called stochastic gradient boosting at each
iteration a subsample of the training data is drawn at
random from the full training dataset. The randomly
selected subsample is then used, instead of the full
sample, to fit the base learner.
- The only penalty of the tree. Classical
decision trees like CART (Classification and
regression tree) are not used as weak learners,
instead a modified form called a regression tree is
used that has numeric values in the leaf nodes (also
called terminal nodes). The values in the leaves of
the trees can be called weights in some literature.
Input: a training set
1
,
N
i i i
x y
, loss function
( , ( ))
L y F x a differentiable function, number of
weak learner M and learning speed .
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
58 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. Output: Find an objective function
* * *
( )
0
( ) ( ),
M
M m
m
F x F x F x
that minimizes the
expected error function.
1. Initialize the model with the constant value
*
1
( ) argmin ( , ).
N
i
i
F x L y
2. For m=1 to M
a. Compute “gradient” ( )
m i
g x and “hessians”
( ) :
m i
h x
*
( 1)
( ) ( )
( , ( ))
( ) .
( )
m
i i
m i
i F x F x
L y F x
g x
F x
*
( 1)
2
2
( ) ( )
( , ( ))
( ) .
( )
m
i i
m i
i F x F x
L y F x
h x
F x
b.Recomputed the learning function using the
training set
1
( )
,
( )
N
m i
i
m i i
g x
x
h x
by solving the
following optimization problem:
2
1
( )
1
argmin ( ) ( ) ,
2 ( )
N
m i
m m i i
i m i
g x
h x x
h x
*
( ) ( ).
m m
F x x
c. Update the model: * * *
( ) ( 1) ( )
( ) ( ) ( ).
m m m
F x F x F x
3. Returned result
* * *
( )
0
( ) ( ).
M
M m
m
F x F x F x
A part of XGBoost's decision tree diagram with
our melasma dataset in Figure 6.
Figure 6. A part of XGBoost's decision tree diagram
2) Model uses for predicting Melasma.
Input data to build and train the model of
prediction Melasma is the dataset of a study on
clinical characteristics and some factors related to
Melasma in women in 2016 provided by Quyhoa
National Leprosy Dermatology Hospital with of 795
data samples [11]. The goal is to find the outcome
variable (y = Results; non-infected = 0, infected = 1).
The findings show that there are 238 Melasma
infected cases and 557 non-infected cases. Gradient
boosting is one of the most powerful techniques for
building predictive models and we have used
XGBoost to build our predictive model with 67%
dataset used for training set and 33% for testing set.
A benefit of using gradient boosting is that after
the boosted trees are constructed, it is relatively
straightforward to retrieve importance scores for
each feature.
Generally, importance provides a score that
indicates how useful or valuable each feature was in
the construction of the boosted decision trees within
the model. The more an attribute is used to make key
decisions with decision trees, the higher its relative
importance.
This importance is calculated explicitly for each
attribute in the dataset, allowing attributes to be
ranked and compared to each other.
Importance is calculated for a single decision tree
by the amount that each attribute split point
improves the performance measure, weighted by the
number of observations the node is responsible for.
The performance measure may be the purity used to
select the split points or another more specific error
function.
The importance features are then averaged across
all of the decision trees within our model and
showed in the Figure 7.
Figure 7. Importance level of features affecting the
outcome
Important (influential) variables on the outcome
(infected or non-infected) are: Birth control pills
(9.5%), Family history (9.3%), Family Economy
(8.8%), Month of pregnancy (8.7%), Location
(8,6%), Melasma during pregnancy (6.2%)
Pathology (5.1%), Year of bird (4%), Sun exposure
at afternoon (4%), Number of pregnancies (3.8%),
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
59 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. Sun exposure at noon (3.7%), Sun exposure at
morning (3.7), Number of hours of sunlight exposure
(3.7%), Age of using cosmetics (3.5%), Education
(3.5%) and Occupation, Religion, Using cosmetics,
Number of births, Marriage, …, which have little
effect on the outcome.
XGBoost supports early stopping after a fixed
number of iterations. Running on our dataset trains
the model 67% of the data and evaluates the model
every training epoch 33% test.
…
[51] validation_0-logloss:0.553238
[52] validation_0-logloss:0.552787
[53] validation_0-logloss:0.553458
…
[67] validation_0-logloss:0.558049
Stopping. Best iteration:
[52] validation_0-logloss:0.552787
We can see that the model stopped training at
epoch 67 and that the model with the best loss was
observed at epoch 52.
To evaluate our predictive model using XGBoost
for this Melasma dataset, we approach some ways to
evaluate the machine learning model's performance
as below:
Cross validation: With variable Kford = 10 we
have fitting 10 folds for each of 81 candidates,
totalling 810 fits with the best parameters across
ALL searched params: {'gamma': 0.6,
'learning_rate': 0.1, 'max_depth': 10, 'n_estimators':
100} we obtain Cross Validation results:
[0.66666667, 0.66666667, 0.71698113, 0.71698113,
0.69811321, 0.75471698, 0.81132075, 0.75471698,
0.67924528, 0.75471698] so Cross Validation Mean
Accuracy: 0.722013.
Learning Curves: We can retrieve the
performance of the model on the evaluation dataset
and plot it to get insight into how learning unfolded
while training. We can then use these collected
performance measures to create a line plot and gain
further insight into how the model behaved on train
and test datasets over training epochs.
Figure 8 shows the logarithmic loss of the
XGBoost model for each epoch on our training and
test datasets.
Figure 9 shows the classification error of the
XGBoost model for each epoch on our training and
test datasets.
Figure 8. XGBoost Learning Curve Log Loss
From the Figure, it looks like there is an opportunity
to stop the learning early, perhaps somewhere
around epoch 40 to epoch 60.
Figure 9. XGBoost Learning Curve Classification
Error
We see a similar story for classification error,
where error appears to go back up at around epoch
60.
Confusion Matrix: A confusion matrix is a
correlation between the predictions of a model and
the actual class labels of the data points. Our
predictive model using XGBoost for 795 records of
Melasma dataset has Confusion Matrix in Figure 10.
In this: Positive (P): Observation is positive (eg.
is infected). Negative (N): Observation is not
positive (eg. is not infected). True Positive (TP):
Outcome where the model correctly predicts the
positive class (519). True Negative (TN): Outcome
where the model correctly predicts the negative
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
60 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. class (192). False Positive (FP): Also called a type 1
error, an outcome where the model incorrectly
predicts the positive class when it is actually
negative (39). False Negative (FN): Also called a
type 2 error, an outcome where the model
incorrectly predicts the negative class when it is
actually positive (46).
Figure 10. Confusion matrix
Accuracy is what its literal meaning says, a
measure of how accurate your model is.
Accuracy = Correct Predictions / Total
Predictions.
By using confusion matrix, Accuracy = (TP +
TN)/(TP+TN+FP+FN). In our predictive model
using XGBoost algorithm Accuracy = (519+192) /
(519+192+38+46) = 89.4%.
Precision-Recall Curves
Precision
Precision is a ratio of the number of true
positives divided by the sum of the true positives and
false positives. It describes how good a model is at
predicting the positive class. Precision is referred to
as the positive predictive value.
Precision (non-infected) = 92%
Precision (infected) = 83%
Recall
Recall is calculated as the ratio of the number of
true positives divided by the sum of the true
positives and the false negatives
Recall (non-infected) = 93%
Recall (infected) = 81%
Figure 11. Precision-Recall curve
F-Measure
F1-score (non-infected) = 93%
F1-score (infected) = 82%
ROC-AUC Curves
A useful tool when predicting the probability of
a binary outcome is the Receiver Operating
Characteristic curve, or ROC curve.
It is a plot of the false positive rate (x-axis)
versus the true positive rate (y-axis) for a number of
different candidate threshold values between 0.0
and 1.0. Put another way, it plots the false alarm
rate versus the hit rate.
The true positive rate is calculated as the number
of true positives divided by the sum of the number
of true positives and the number of false negatives.
It describes how good the model is at predicting the
positive class when the actual outcome is positive.
The ROC-AUC curve below presents the
accuracy of the model. ROC for "non-infected" is
0.93 and for "Infected" is 0.93 in Figure 12.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
61 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
7. Figure 12. ROC-AUC of model
IV. CONCLUSIONS
In this study, we present steps of a data analysis
process in practice and build a machine learning
model using the XGBOOST algorithm to predict the
possibility of a user getting infected to Melasma.
With this approach, the proposed method has
exploited existing community data together with data
collected through surveys to help the machine
learning model have higher than 89.4% accurate
prediction results, which assists in the prevention,
diagnosis and treatment of the disease, thereby
helping to reduce the cost of treatment.
The machine learning model that predicts the risk
of Melasma is packaged and embedded into the web
application at https://ramma.bvquyhoa.vn to help
users know the possibility of being infected to
Melasma, and provide users with habits that may
cause Melasma, so that the users can prevent it.
Dermatologists use the application to contact
patients and evaluate support to upgrade the model
through expertise and practical results. The
application updates the data once collecting enough
new data (the model is set to be retrained each time
100 new data are inputted) to enhance the accuracy
of the model.
In order to get higher accuracy for the model, it
is necessary to collect community data of many
individuals and from many different regions, though
it would require a lot of effort, time and expense.
REFERENCES
[1] A. Panesar, “Machine Learning and AI for
Healthcare”, Arjun Panesar, 2019.
[2] Dhar, V., “Data science and
prediction”. Communications of the
ACM. 56 (12): 64. 2013. doi:10.1145/2500499.
[3] Deepak Sahoo, Rakesh Chandra Balabantaray,
“Single-Sentence Compression using
XGBoost”, International Journal of Information
Retrieval Research, Volume 9 Issue 3, July-
September 2019
[4] Jacob Montiel et al, “Adaptive XGBoost for
Evolving Data Streams”, arXiv:2005.07353v1
[cs.LG] 15 May 2020.
[5] Jason Brownlee, “XGBoost with Python”,
Machine Learning Mastery, update 2021.
[6] Jinghui Ma et al, “Application of the XGBoost
Machine Learning Method in PM2.5
Prediction: A Case Study of Shanghai”,
Aerosol and Air Quality Research, 20: 128–
138, 2020.
[7] Ramraj S, Nishant Uzir, Sunil R and Shatadeep
Banerjee, “Experimenting XGBoost Algorithm
for Prediction and Classifi cation of Diff erent
Datasets”, International Journal of Control
Theory and Applications, Volume 9, Number
40, 2016.
[8] Tianqi Chen, Carlos Guestrin, “XGBoost : A
scalable tree boosting system”, March 9, 2016,
arXiv:1603.02754 [cs.LG].
[9] Nguyen Van Thuong, “Melasma disease”,
Dermatology Pathology, Volume 1, Medical
Publishing, 143-148, 2017.
[10] Ratna Rajaratnam Asad Salim, Eva Soos
Domanne, "Melasma", Evidence-based
Dermatology. Third Edition, 2014.
[11] Quyhoa National Leprosy Dermatology
Hospital, Binh Dinh, Vietnam, “Dataset of
Study on clinical characteristics and some
factors related to melasma in women in 2016”,
2016.
[12] Sami Smadi et al., “VPN Encrypted Traffic
classification using XGBoost”, International
Journal of Emerging Trends in Engineering
Research, 9(7), July 2021, 960 – 966
[13] Zhiyuan He, Danchen Lin1, Thomas Lau1, and
Mike Wu1, “Gradient Boosting Machine: A
Survey”, arXiv:1908.06951v1 [stat.ML] 19
Aug 2019.
[14] https://github.com/dmlc/xgboost
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 19, No. 11, November 2021
62 https://sites.google.com/site/ijcsis/
ISSN 1947-5500