Logistic Regression (LR) is a well known classification method in the field of statistical learning. It allows probabilistic classification and shows promising results on several benchmark problems. Logistic regression enables us to investigate the relationship between a categorical outcome and a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal non-linear inference models and have gained extensive popularity in recent years. Research activities are considerable and literature is growing. The goal of this research work is to compare the performance of Logistic Regression and Neural Network models on publicly available medical datasets. The evaluation process of the model is as follows. The logistic regression and neural network methods with sensitivity analysis have been evaluated for the effectiveness of the classification. The Classification Accuracy is used to measure the performance of both the models. From the experimental results it is confirmed that the neural network model with sensitivity analysis model gives more efficient result.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...ijtsrd
Customary characterization calculations can be constrained in their execution on exceedingly uneven informational collections. A famous stream of work for countering the substance of class inelegance has been the use of an assorted of inspecting methodologies. In this correspondence, we center on planning alterations neural system to properly handle the issue of class irregularity. We consolidate distinctive rebalance heuristics in ANN demonstrating, including cost delicate learning, and over and under testing. These ANN based systems are contrasted and different best in class approaches on an assortment of informational collections by utilizing different measurements, including G mean, region under the collector working trademark curve, F measure, and region under the exactness review curve. Numerous regular strategies, which can be classified into testing, cost delicate, or gathering, incorporate heuristic and task subordinate procedures. So as to accomplish a superior arrangement execution by detailing without heuristics and errand reliance, presently propose RBF based Network RBF NN . Its target work is the symphonious mean of different assessment criteria got from a perplexity grid, such criteria as affectability, positive prescient esteem, and others for negatives. This target capacity and its enhancement are reliably detailed on the system of CM KLOGR, in light of least characterization mistake and summed up probabilistic plunge MCE GPD learning. Because of the benefits of the consonant mean, CM KLOGR, and MCE GPD, RBF NN improves the multifaceted exhibitions in a very much adjusted way. It shows the definition of RBF NN and its adequacy through trials that nearly assessed RBF NN utilizing benchmark imbalanced datasets. Nitesh Kumar | Dr. Shailja Sharma "Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm Optimization" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25255.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/25255/adaptive-classification-of-imbalanced-data-using-ann-with-particle-of-swarm-optimization/nitesh-kumar
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/scoring-and-predicting-risk-preferences/
This study presents amethodology to determine risk scores of individ-uals, for a given financial risk preference survey. To this end, we use a regression-based iterative algorithm to determine the weights for survey questions in the scoring process. Next, we generate classification models to classify individuals into risk-averse and risk-seeking categories, using a subset of survey questions. We illustrate the methodology through a sample survey with 656 respondents. We find that the demographic (indirect) questions can be almost as successful as risk-related (direct) questions in predicting risk preference classes of respon-dents. Using a decision-tree based classification model, we discuss how one can generate actionable business rules based on the findings.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...ijtsrd
Customary characterization calculations can be constrained in their execution on exceedingly uneven informational collections. A famous stream of work for countering the substance of class inelegance has been the use of an assorted of inspecting methodologies. In this correspondence, we center on planning alterations neural system to properly handle the issue of class irregularity. We consolidate distinctive rebalance heuristics in ANN demonstrating, including cost delicate learning, and over and under testing. These ANN based systems are contrasted and different best in class approaches on an assortment of informational collections by utilizing different measurements, including G mean, region under the collector working trademark curve, F measure, and region under the exactness review curve. Numerous regular strategies, which can be classified into testing, cost delicate, or gathering, incorporate heuristic and task subordinate procedures. So as to accomplish a superior arrangement execution by detailing without heuristics and errand reliance, presently propose RBF based Network RBF NN . Its target work is the symphonious mean of different assessment criteria got from a perplexity grid, such criteria as affectability, positive prescient esteem, and others for negatives. This target capacity and its enhancement are reliably detailed on the system of CM KLOGR, in light of least characterization mistake and summed up probabilistic plunge MCE GPD learning. Because of the benefits of the consonant mean, CM KLOGR, and MCE GPD, RBF NN improves the multifaceted exhibitions in a very much adjusted way. It shows the definition of RBF NN and its adequacy through trials that nearly assessed RBF NN utilizing benchmark imbalanced datasets. Nitesh Kumar | Dr. Shailja Sharma "Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm Optimization" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25255.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/25255/adaptive-classification-of-imbalanced-data-using-ann-with-particle-of-swarm-optimization/nitesh-kumar
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/scoring-and-predicting-risk-preferences/
This study presents amethodology to determine risk scores of individ-uals, for a given financial risk preference survey. To this end, we use a regression-based iterative algorithm to determine the weights for survey questions in the scoring process. Next, we generate classification models to classify individuals into risk-averse and risk-seeking categories, using a subset of survey questions. We illustrate the methodology through a sample survey with 656 respondents. We find that the demographic (indirect) questions can be almost as successful as risk-related (direct) questions in predicting risk preference classes of respon-dents. Using a decision-tree based classification model, we discuss how one can generate actionable business rules based on the findings.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
Life is the most precious gift to man and safeguarding this gift is of utmost importance.With
increasing number of diseases and fast paced lives, people have less time to look after themselves and
their family members or to even visit the doctor for regular check-ups.Our E-Health patient
monitoring system can remotely monitor the health of the patients and intimate the doctor of critical
conditions without human intervention. Some of the existing E-Health systems include telemedicine
network for Francophone African countries (RAFT) and LOBIN. RAFT is implemented in java and
uses asymmetric public – private key encryption, however it is expensive, does not support mobility
and is not a context aware system. LOBIN is a hardware/software platform to locate and monitor a set
of physiological parameters and context parameters of several patients within hospital facilities.
Although it is a context aware system it cannot handle high and concurrent data traffic load.
To overcome the above flaws, our proposed system puts forward an idea of patient monitoring
using various knowledge based techniques like K-means clustering, Gaussian kernel function, ANN
and Fuzzy inference engine. In our project we intend to do remote patient health monitoring in which
we will be using three-four machines which will send various sensed health parameters to the
centralised server that will make clusters of the sensed health parameters based on criticality of the
health condition. Then depending upon clusters formed and on comparison with the threshold values
appropriate reports will be generated and send to the doctors and caretakers.
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEIJDKP
In this paper we propose an approach on chronic kidney disease classification with the presence of missing data. We implemented a classification system to solve the challenge of detecting chronic kidney diseases based on medical test data. The approach is comparing three different techniques that deals with missing data including deletion, mean imputation, and selection of best features. Each techniques is tested using the K-NN classifier, Naïve Bayes classifier, decision tree, and support vector machines (SVM). The final accuracy of each system is determined using 10-fold cross validation.
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
A clinical prediction model can be used in various clinical contexts, including screening for asymptomatic illness, forecasting future events such as disease, and assisting doctors in their decision-making and health education. Despite the positive effects of clinical prediction models on practice, prediction modeling is a difficult process that necessitates meticulous statistical analysis and sound clinical judgments. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Read More With Us: https://bit.ly/3dxn32c
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics across Methodologies | Wide Range of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
United Kingdom: 44-1143520021
India: 91-4448137070
WhatsApp: 91-8754446690
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
This presentation discusses the application of logistic model in sports research. One can understand the model and the procedure involved in developing it if the assumptions for this analysis is satisfied.
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseMichael Lieberman
Logistic Regression - Predicting the Chances of Coronary Heart Disease weighs risks factors for heart disease and calculates the odds of contracting the disease within the next ten years.
Cancer prognosis prediction using balanced stratified samplingijscai
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the
rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the
analytical challenge exists in double. The use of effective sampling technique in classification algorithms
always yields good prediction accuracy. The SEER public use cancer database provides various prominent
class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling
techniques in classifying the prognosis variable and propose an ideal sampling method based on the
outcome of the experimentation. In the first phase of this work the traditional random sampling and
stratified sampling techniques have been used. At the next level the balanced stratified sampling with
variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been
focused on performing the pre-processing of the SEER data set. The classification model for
experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with
three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three
prognosis factors survival, stage and metastasis have been used as class labels for experimental
comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model
as the sample size increases, but the traditional approach fluctuates before the optimum results.
Life is the most precious gift to man and safeguarding this gift is of utmost importance.With
increasing number of diseases and fast paced lives, people have less time to look after themselves and
their family members or to even visit the doctor for regular check-ups.Our E-Health patient
monitoring system can remotely monitor the health of the patients and intimate the doctor of critical
conditions without human intervention. Some of the existing E-Health systems include telemedicine
network for Francophone African countries (RAFT) and LOBIN. RAFT is implemented in java and
uses asymmetric public – private key encryption, however it is expensive, does not support mobility
and is not a context aware system. LOBIN is a hardware/software platform to locate and monitor a set
of physiological parameters and context parameters of several patients within hospital facilities.
Although it is a context aware system it cannot handle high and concurrent data traffic load.
To overcome the above flaws, our proposed system puts forward an idea of patient monitoring
using various knowledge based techniques like K-means clustering, Gaussian kernel function, ANN
and Fuzzy inference engine. In our project we intend to do remote patient health monitoring in which
we will be using three-four machines which will send various sensed health parameters to the
centralised server that will make clusters of the sensed health parameters based on criticality of the
health condition. Then depending upon clusters formed and on comparison with the threshold values
appropriate reports will be generated and send to the doctors and caretakers.
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASEIJDKP
In this paper we propose an approach on chronic kidney disease classification with the presence of missing data. We implemented a classification system to solve the challenge of detecting chronic kidney diseases based on medical test data. The approach is comparing three different techniques that deals with missing data including deletion, mean imputation, and selection of best features. Each techniques is tested using the K-NN classifier, Naïve Bayes classifier, decision tree, and support vector machines (SVM). The final accuracy of each system is determined using 10-fold cross validation.
CLUSTERING ALGORITHM FOR A HEALTHCARE DATASET USING SILHOUETTE SCORE VALUEijcsit
The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining
interesting research areas. Data mining tools and techniques help to discover and understand hidden
patterns in a dataset which may not be possible by mainly visualization of the data. Selecting appropriate
clustering method and optimal number of clusters in healthcare data can be confusing and difficult most
times. Presently, a large number of clustering algorithms are available for clustering healthcare data, but
it is very difficult for people with little knowledge of data mining to choose suitable clustering algorithms.
This paper aims to analyze clustering techniques using healthcare dataset, in order to determine suitable
algorithms which can bring the optimized group clusters. Performances of two clustering algorithms (Kmeans
and DBSCAN) were compared using Silhouette score values. Firstly, we analyzed K-means
algorithm using different number of clusters (K) and different distance metrics. Secondly, we analyzed
DBSCAN algorithm using different minimum number of points required to form a cluster (minPts) and
different distance metrics. The experimental result indicates that both K-means and DBSCAN algorithms
have strong intra-cluster cohesion and inter-cluster separation. Based on the analysis, K-means algorithm
performed better compare to DBSCAN algorithm in terms of clustering accuracy and execution time.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
A clinical prediction model can be used in various clinical contexts, including screening for asymptomatic illness, forecasting future events such as disease, and assisting doctors in their decision-making and health education. Despite the positive effects of clinical prediction models on practice, prediction modeling is a difficult process that necessitates meticulous statistical analysis and sound clinical judgments. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Read More With Us: https://bit.ly/3dxn32c
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics across Methodologies | Wide Range of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
United Kingdom: 44-1143520021
India: 91-4448137070
WhatsApp: 91-8754446690
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
This presentation discusses the application of logistic model in sports research. One can understand the model and the procedure involved in developing it if the assumptions for this analysis is satisfied.
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseMichael Lieberman
Logistic Regression - Predicting the Chances of Coronary Heart Disease weighs risks factors for heart disease and calculates the odds of contracting the disease within the next ten years.
A gentle introduction to 2 classification techniques, as presented by Kriti Puniyani to the NYC Predictive Analytics group (April 14, 2011). To download the file please go here: http://www.meetup.com/NYC-Predictive-Analytics/files/
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
ABSTRACT
Genome-wide transcription profiling is a powerful technique in studying disease susceptible footprints. Moreover, when applied to disease tissue it may reveal quantitative and qualitative alterations in gene expression that give information on the context or underlying basis for the disease and may provide a new diagnostic approach. However, the data obtained from high-density microarrays is highly complex and poses considerable challenges in data mining. Past researches prove that neuro diseases damage the brain network interaction, protein- protein interaction and gene-gene interaction. A number of neurological research paper also analyze the relationship among damaged part. Analysis of gene-gene interaction network drawn by using state-of-the-art gene database of Alzheimer’s patient can conclude a lot of information. In this paper we used gene dataset affected with Alzheimer’s disease and normal patient’s dataset from NCBI databank. After proper processing the .CEL affymetrix data using RMA, we use the processed data to find gene interaction outputs. Then we filter the output files using probe set filtering attributes p-value and fold count and draw a gene-gene interaction network. Then we analyze the interaction network using GeneMania software.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
Feature selection improves the classification performance of machine learning models. It also identifies the important features and eliminates those with little significance. Furthermore, feature selection reduces the dimensionality of training and testing data points. This study proposes a feature selection method that uses a multivariate sample similarity measure. The method selects features with significant contributions using a machine-learning model. The multivariate sample similarity measure is evaluated using the University of California, Irvine heart disease dataset and compared with existing feature selection methods. The multivariate sample similarity measure is evaluated with metrics such as minimum subset selected, accuracy, F1-score, and area under the curve (AUC). The results show that the proposed method is able to diagnose chest pain, thallium scan, and major vessels scanned using X-rays with a high capability to distinguish between healthy and heart disease patients with a 99.6% accuracy.
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast Cancer at the early stage was to apply artificial intelligence where machines are simulated with intelligence and programmed to think and act like a human. This allows machines to passively learn and find a pattern, which can be used later to detect any new changes that may occur. In general, machine learning is quite useful particularly in the medical field, which depends on complex genomic measurements such as microarray technique and would increase the accuracy and precision of results. With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast Cancer diagnostic system using complex genomic analysis via microarray technology. The system will combine two machine learning methods, K-means cluster, and linear regression.
Graphical Model and Clustering-Regression based Methods for Causal Interactio...gerogepatton
The early detection of Breast Cancer, the deadly disease that mostly affects women is extremely complex
because it requires various features of the cell type. Therefore, the efficient approach to diagnosing Breast
Cancer at the early stage was to apply artificial intelligence where machines are simulated with
intelligence and programmed to think and act like a human. This allows machines to passively learn and
find a pattern, which can be used later to detect any new changes that may occur. In general, machine
learning is quite useful particularly in the medical field, which depends on complex genomic
measurements such as microarray technique and would increase the accuracy and precision of results.
With this technology, doctors can easily diagnose patients with cancer quickly and apply the proper
treatment in a timely manner. Therefore, the goal of this paper is to address and propose a robust Breast
Cancer diagnostic system using complex genomic analysis via microarray technology. The system will
combine two machine learning methods, K-means cluster, and linear regression.
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...Damian R. Mingle, MBA
Each year it has become more and more difficult for healthcare providers to determine if a patient has a pathology
related to the vertebral column. There is great potential to become more efficient and effective in terms of quality
of care provided to patients through the use of automated systems. However, in many cases automated systems
can allow for misclassification and force providers to have to review more causes than necessary. In this study, we
analyzed methods to increase the True Positives and lower the False Positives while comparing them against stateof-the-art
techniques in the biomedical community. We found that by applying the studied techniques of a data-driven
model, the benefits to healthcare providers are significant and align with the methodologies and techniques utilized
in the current research community.
Performance enhancement of machine learning algorithm for breast cancer diagn...IJECEIAES
Breast cancer is the most fatal women’s cancer, and accurate diagnosis of this disease in the initial phase is crucial to abate death rates worldwide. The demand for computer-aided disease diagnosis technologies in healthcare is growing significantly to assist physicians in ensuring the effectual treatment of critical diseases. The vital purpose of this study is to analyze and evaluate the classification efficiency of several machine learning algorithms with hyperparameter optimization techniques using grid search and random search to reveal an efficient breast cancer diagnosis approach. Choosing the optimal combination of hyperparameters using hyperparameter optimization for machine learning models has a straight influence on the performance of models. According to the findings of several evaluation studies, the k-nearest neighbor is addressed in this study for effective diagnosis of breast cancer, which got a 100.00% recall value with hyperparameters found utilizing grid search. k-nearest neighbor, logistic regression, and multilayer perceptron obtained 99.42% accuracy after utilizing hyperparameter optimization. All machine learning models showed higher efficiency in breast cancer diagnosis with grid search-based hyperparameter optimization except for XGBoost. Therefore, the evaluation outcomes strongly validate the effectiveness and reliability of the proposed technique for breast cancer diagnosis.
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...gerogepatton
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method
A MODIFIED MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION METHOD BASE...ijaia
Parkinson’s disease is a complex chronic neurodegenerative disorder of the central nervous system. One of the common symptoms for the Parkinson’s disease subjects, is vocal performance degradation. Patients usually advised to follow personalized rehabilitative treatment sessions with speech experts. Recent research trends aim to investigate the potential of using sustained vowel phonations for replicating the speech experts’ assessments of Parkinson’s disease subjects’ voices. With the purpose of improving the accuracy and efficiency of Parkinson’s disease treatment, this article proposes a two-stage diagnosis model to evaluate an LSVT dataset. Firstly, we propose a modified minimum Redundancy-Maximum Relevance (mRMR) feature selection approach, based on Cuckoo Search and Tabu Search to reduce the features numbers. Secondly, we apply simple random sampling technique to dataset to increase the samples of the minority class. Promisingly, the developed approach obtained a classification Accuracy rate of 95% with 24 features by 10-fold CV method.
Breast cancer is the leading cause of death for women worldwide. Cancer can be discovered early, lowering the rate of death. Machine learning techniques are a hot field of research, and they have been shown to be helpful in cancer prediction and early detection. The primary purpose of this research is to identify which machine learning algorithms are the most successful in predicting and diagnosing breast cancer, according to five criteria: specificity, sensitivity, precision, accuracy, and F1 score. The project is finished in the Anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries as well as matplotlib and Pandas. In this study, the Wisconsin diagnostic breast cancer dataset was used to evaluate eleven machine learning classifiers: decision tree, quadratic discriminant analysis, AdaBoost, Bagging meta estimator, Extra randomized trees, Gaussian process classifier, Ridge, Gaussian nave Bayes, k-Nearest neighbors, multilayer perceptron, and support vector classifier. During performance analysis, extremely randomized trees outperformed all other classifiers with an F1-score of 96.77% after data collection and data analysis.
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONIJDKP
Developing predictive modelling solutions for risk estimation is extremely challenging in health-care
informatics. Risk estimation involves integration of heterogeneous clinical sources having different
representation from different health-care provider making the task increasingly complex. Such sources are
typically voluminous, diverse, and significantly change over the time. Therefore, distributed and parallel
computing tools collectively termed big data tools are in need which can synthesize and assist the physician
to make right clinical decisions. In this work we propose multi-model predictive architecture, a novel
approach for combining the predictive ability of multiple models for better prediction accuracy. We
demonstrate the effectiveness and efficiency of the proposed work on data from Framingham Heart study.
Results show that the proposed multi-model predictive architecture is able to provide better accuracy than
best model approach. By modelling the error of predictive models we are able to choose sub set of models
which yields accurate results. More information was modelled into system by multi-level mining which has
resulted in enhanced predictive accuracy.
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
Over the past few years, there has been a considerable spread of microarray technology in many biological patterns, particularly in those pertaining to cancer diseases like leukemia, prostate, colon cancer, etc. The primary bottleneck that one experiences in the proper understanding of such datasets lies in their dimensionality, and thus for an efficient and effective means of studying the same, a reduction in their dimension to a large extent is deemed necessary. This study is a bid to suggesting different algorithms and approaches for the reduction of dimensionality of such microarray datasets.This study exploits the matrix-like structure of such microarray data and uses a popular technique called Non-Negative Matrix Factorization (NMF) to reduce the dimensionality, primarily in the field of biological data. Classification accuracies are then compared for these algorithms.This technique gives an accuracy of 98%.
Similar to Evaluation of Logistic Regression and Neural Network Model With Sensitivity Analysis on Medical Datasets (20)
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
A Survey of Techniques for Maximizing LLM Performance.pptx
Evaluation of Logistic Regression and Neural Network Model With Sensitivity Analysis on Medical Datasets
1. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 503
Evaluation of Logistic Regression and Neural Network Model
With Sensitivity Analysis on Medical Datasets
Raghavendra B.K. raghavendra_bk@rediffmail.com
Department of Computer Science & Engineering
Dr. M.G.R. Educational & Research Institute
Chennai, 600 095, India
S.K. Srivatsa profsks@rediffmail.com
Senior Professor
St. Joseph’s College of Engineering
Chennai, 600 119, India
Abstract
Logistic Regression (LR) is a well known classification method in the field of statistical learning. It
allows probabilistic classification and shows promising results on several benchmark problems.
Logistic regression enables us to investigate the relationship between a categorical outcome and
a set of explanatory variables. Artificial Neural Networks (ANNs) are popularly used as universal
non-linear inference models and have gained extensive popularity in recent years. Research
activities are considerable and literature is growing. The goal of this research work is to compare
the performance of logistic regression and neural network models on publicly available medical
datasets. The evaluation process of the model is as follows. The logistic regression and neural
network methods with sensitivity analysis have been evaluated for the effectiveness of the
classification. The classification accuracy is used to measure the performance of both the
models. From the experimental results it is confirmed that the neural network model with
sensitivity analysis model gives more efficient result.
Keywords: Artificial Neural Network, Classification Accuracy, Logistic Regression, Medical
Dataset, Sensitivity Analysis.
1. INTRODUCTION
In the last few years, digital revolution has provided relatively inexpensive and available means to
collect and store large amounts of patient data in databases, i.e., containing rich medical
information and made available through the Internet for Health Services globally. Data mining
techniques like logistic regression is applied on these databases to identify the patterns that are
helpful in predicting or diagnosing the diseases and to take therapeutic measure of those
diseases.
Nowadays statistical methods constitute a very powerful tool for supporting medical decisions.
The size of medical data that any analysis or test of patients makes that doctors can be helped by
statistical models to interpret correctly and to support their decisions. The models are a very
powerful tool for doctors and these cannot substitute their viewpoint. On the other hand, the
characteristics of medical data and the huge number of variables to be considered as
fundamental point for the development of new technique as neural network for the analysis of the
data [1].
Neural networks are considered as a field of artificial intelligence. The development of the models
was inspired by the neural architecture of human brain. ANN have been applied in many
disciplines, including biology, psychology, statistics, mathematics, medical science, and computer
science. It has also been applied to a variety of business areas such as accounting and auditing,
finance, management and decision making, marketing and production. Recently, artificial neural
2. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 504
networks (ANNs) become a very popular model and have been applied to diagnose disease and
predict the survival ratio of the patients. However, for medical analysis, ANNs have been shown
to have some disadvantages as well as advantages. The most important advantages of ANNs
are their discrimination power, detection of complex and nonlinear relationship between
independent and dependent variables, and prediction of the case. The ANN model is developed
empirically, they can be over-fitted for training data, and their usage is very difficult because of
computational requirements. The performance of an ANN depends on the number of parameters,
the network weights, the selection of an appropriate training algorithm, the type of transfer
functions used, and the determination of the network size. Another disadvantage of using ANNs
is that they require the initialization and adjustment of many individual parameters to optimize the
classification performance. Many researchers have compared ANN versus LR. Some of them
found that ANN and LR have similar classification performance. Compared to LR, neural network
models are more flexible [2].
The rest of the paper is organized as follows: Section 2 reviews the prior literature, Logistic
Regression technique is discussed in Section 3. Design of neural network is discussed in Section
4. Experimental validation using publicly available medical dataset is given in Section 5. Section 6
includes Experimental results and discussions followed by conclusion.
2. LITERATURE SURVEY
Logistic regression is used in power distribution fault diagnosis, while neural network, has been
extensively used in power system reliability researches. Evaluation criteria of the goodness of the
classifier includes: correct classification rate, true positive rate, true negative rate, and geometric
mean [3].
The features of logistic regression and ANN have been compared and an experiment has been
conducted on graft outcomes prediction using a kidney transplant dataset. The results shown
reveal that ANN coupled with bagging is an effective data mining method for predicting kidney
graft outcomes. This also confirms that different techniques can potentially be integrated to obtain
a better prediction. Overall, the results reveal that in most cases, the ANN technique outperforms
logistic regression [4].
The author’s presents an evaluation tool for the diagnosis of breast cancer of patients using
clinical, pathological, and immunohistochemical data. The main aim was to compare the LR and
NN models performances in classification. The neural network approach highlights different
inputs from classical statistical model selection [5].
The research work was focuses on a machine learning approach to the classification of LR. The
author’s applies a logistic regression based algorithm to three types of classification tasks: binary
classification, multiple classifications and classification into a hierarchy. The results confirm that
the logistic regression coupled with cross validation is an effective machine learning algorithm. It
is not only a robust classification algorithm but also a very effective dimensionality reduction
method. The author compared classification performance of logistic regression with several
neural network algorithms: backpropagation, fuzzy artificial resonance ART, general regression,
radial basis function, self-organizing map-kohonen. The best neural network: the fuzzy artificial
resonance network, trained on 12 variables, and achieved 82.6% of correct predictions as
compared to 90% for the logistic regression [6].
The research work from the authors was aims to identify the most and least significant factors for
breast cancer survival analysis by means of feature evaluation indices derived from multilayer
feedforward backpropagation neural networks (MLJFBPNN), fuzzy k-nearest neighbor classifier,
and a logistic regression based backward stepwise method (LR). The results appear to suggest
that SPF and NPIh appear to be the most and least important prognostic factors, respectively, for
survival analysis in breast cancer patients, and should be investigated accordingly in future
clinical studies in oncology. The results shown from each method identify a different set of factors
as being the most important. It should therefore be suggested that it could be inappropriate to rely
3. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 505
on one method’s outcome for assessment of the factors, and thus it may be necessary to look at
more than one method‘s outcome for a reliable prognostic factor assessment [7].
In another research work the author compares the performance of LR, NN, and CART decision
tree methodologies and to identify important features for the small business credit scoring model
on a Croatian bank dataset. The models obtained by all three methodologies were estimated and
validated on the same hold-out sample, and their performance is compared. The results shows
that the best NN model is better associated with data than LR and CART models [8].
The classification system was developed by the author and it was based on MLFFNN and LR to
assess the risk of a family having HNPCC, purely on the basis of pedigree data. The proposed
system can eliminate human errors associated with human fatigue and habits. Overall, MLFFNN
outperformed to the LR in terms of the number of cases correctly classified and in terms of
sensitivity, specificity and accuracy. Two out of 313 cases were misclassified by MLFFNN as
opposed to 20 out of 313 by LR [9].
Artificial neural networks can be constructively used to improve the quality of linear models in
medical statistics. ANNs are popularly used as universal non-linear inference models and they
suffer from two major drawbacks. Their operation is not transparent because of the distributed
nature of the representations they form, and this makes it different to interpret what they do.
There is no clearly accepted model of generality, which makes it difficult to demonstrate reliability
when applied to future data. In this paper neural networks generate hypotheses concerning
interaction terms which are integrated into standard statistical models that are linear in the
parameters, where the significance of the non-linear terms, and the generality of the model, can
be assured using well established statistical tests [10].
The use of Artificial Neural Networks (ANN) is to construct distributions to carry out plausible
reasoning in the field of medicine. It describes a comparison between Multivariate Logistic
Regression (MLR) and the Entropy Maximization Network (EMN) in terms of explicit assessment
of their predictive capabilities. The EMN and MLR have been used to determine the probability of
harboring lymph node metastases at the time of initial surgery by assessment of tumor based
parameters. Both predictors were trained on a set of 84 early breast cancer patient records and
evaluated on a separate set of 92 patient records. Differences in performance were evaluated by
comparing the areas under the receiver operating characteristic curve. The EMN model
performed more accurately than the MLR model with AZ = 0.839, compared to the MLR model
with AZ, = 0.809. The difference was statistically significant with two-tailed P value of less than
0.001. Accurate estimation of the prognosis would provide better stratification of patients for
further treatment or investigation [11].
3. LOGISTIC REGRESSION
Regression is the analysis, or measure, of the association between a dependent variable and one
or more independent variables. This association is usually formulated as an equation in which the
independent variables have parametric coefficients that enable future values of the dependent
variable to be predicted. Two of the main types of regression are: linear regression and logistic
regression. In linear regression the dependent variable is continuous and in logistic it is either
discrete or categorical. For logistic regression to be used, the discrete variable must be
transformed into a continuous value that is a function of the probability of the event occurring.
Regression is used for three main purposes: (1) description, (2) control and (3) prediction [12].
Logistic regression is also called as logistic model or logit model, is a type of predictive model
which can be used, when the target variable is a categorical variable with two categories - for
example active or inactive, healthy or unhealthy, win or loss, purchase product or doesn’t
purchase product etc. Logistic regression is used for the prediction of the probability of
occurrence of an event by fitting the data into a logistic curve. Like many forms of regression
analysis, it makes use of predictor variables; variables may be either numerical or categorical. For
example, the probability that a person has a heart attack in a specified time that might be
4. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 506
predicted from the knowledge of person’s age, sex and body mass index. Logistic regression is
used extensively in the medical and social sciences as well as in marketing applications such as
prediction of customer’s propensity to purchase a product or cease a subscription.
The response, Y, of a subject can take one of two possible values, denoted by 1 and 0 (for
example, Y=1 if a disease is present; otherwise Y=0). Let X=(x1, x2, . . ., xn) be the vector of
explanatory variables. The logistic regression model is used to explain the effects of the
explanatory variables in the form of binary response.
Logit{Pr(Y=1|x)} = log { Pr( Y=1|x) / (1- Pr(Y=1|x) ) } = β0+β1x1+β2x2+β3x3+…+βkxk (1)
Where β0 is called the intercept” and β1, β2, β3, and so on are called the “regression coefficients”
of x1, x2, x3 respectively. Each of the regression coefficients describes the size of the contribution
of the risk factor. A positive regression coefficient means that the risk factor increases the
probability of outcome, where as a negative regression coefficient means that the risk factor
decreases the probability of outcome, a large regression coefficient means that the risk factor
strongly influences the probability of that outcome, a non-zero regression coefficient means that
the risk factor has little influence on the probability of outcome [13].
The logistic function is given by
P=1/(1+e
-logit(p)
) (2)
A graph of the function is shown in Figure 1. The logistic function is useful because it can take an
input any value from negative infinity to positive infinity, whereas the output is confined to values
between 0 and 1.
FIGURE 1: A graph of logistic regression function
4. DESIGN OF NEURAL NETWORK
A Neural network is a complex nonlinear modeling technique based on a model of a human
neuron. A neural net is used to predict outputs (dependent variables) from a set of inputs
(independent variables) by taking linear combinations of the inputs and then making nonlinear
transformations of the linear combinations using activation function. It can be shown theoretically
that such combinations and transformations can approximate virtually any type of response
function. Thus, neural nets use large numbers of parameters to approximate any model. Neural
nets are often applied to predict future outcome based on prior experience. For example, a neural
net application could be used to predict who will respond to a direct mailing.
Neural networks are becoming very popular with data mining practitioners, particularly in medical
research, finance and marketing. This is because they have proven their predictive power through
comparison with other statistical techniques using real data sets. The example of a simple feed
forward neural network with two layers is shown in Figure 2.
There are two main types of neural network models: supervised neural networks such as the
multi-layer perceptron or radial basis functions, and unsupervised neural networks such as
5. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 507
Kohonen feature maps. A supervised neural network uses training and testing data to build a
model. The data involves historical data sets containing input variables, or data fields, which
correspond to an output. The training data is what the neural network uses to “learn” how to
predict the known output, and the testing data is used for validation. The aim is for the neural
networks to predict the output for any record given the input variables only [14].
One of the simplest feedforward neural networks (FFNN), such as the one in Figure 2, consists of
two layers: an input layer, and output layer. In each layer there are one or more processing
elements (PEs). PEs are meant to simulate the neurons in the brain and this is why they are often
referred to as neurons or nodes. A PE receives inputs from either the outside world or the
previous layer. There are connections between the PEs in each layer that have a weight
(parameter) associated with them. This weight is adjusted during training. Information only travels
in the forward direction through the network - there are no feedback loops.
FIGURE 2: Example of a simple feed forward neural network with two layers
The simplified process for training a FFNN is as follows:
1. Input data is presented to the network and propagated through the network until it reaches the
output layer. This forward process produces a predicted output.
2. The predicted output is subtracted from the actual output and an error value for the networks is
calculated.
3. The neural network then uses supervised learning, which in most cases is backpropagation, to
train the network. Backpropagation is a learning algorithm for adjusting the weights.
4. Once backpropagation has finished, the forward process starts again, and this cycle is
continued until the error between predicted and actual outputs is minimized [13].
5. EXPERIMENTAL VALIDATION
The framework for neural network model with sensitivity analysis is shown in Figure 3. The
process of evaluation is as follows. Sensitivity analysis has been done for the selected features
from the dataset. The logistic function with steepness parameter (σ) is calculated using the
following equation.
P=1/ (1+e-logit (p) * σ
) (3)
where σ=2, 3
The response Y is then calculated as follows by using threshold (τ). Then the probability is
calculated to develop a predictive model for classification using neural network. A tenfold cross
validation has been used for evaluation on all publicly available medical dataset [15].
Y = 1 if P ≥ τ (4)
0 otherwise
where τ=0.2, 0,4, . . .
PEInput
Input
Input
Output
Input Layer Output Layer
.
.
.
PE
PE
PE
6. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 508
In 10-fold cross validation, the original sample is partitioned into 10 sub samples, of the 10 sub
samples, a single sub sample is retained as the validation data for testing the model, and the
remaining 9 sub samples are used as training data. The cross-validation process is then repeated
10 times (the folds), with each of the 10 sub samples used exactly once as the validation data.
The 10 results from the folds then can be averaged (or otherwise combined) to produce a single
estimation. The advantage of this method over repeated random sub sampling is that all
observations are used for both training and validation and each observation is used for validation
exactly once.
FIGURE 3: Neural network model with sensitivity analysis framework
6. RESULTS AND DISCUSSION
We have used publicly available medical datasets for our experiments whose technical
specifications are as shown in Table 1. All the chosen datasets had at least one or more
attributes that were continuous. The classification accuracy is used to measure the performance
of logistic regression and neural network model on publicly available medical datasets. The
results of the evaluation are given in Table 2. Figure 4 gives classification accuracy details after
evaluation process. From the results it can be observed that the neural network model with
sensitivity analysis gives more efficient result.
Sl.
No
Medical Dataset No of
instances
Total no. of
attributes
No of
classes
1 Asthma 2464 5 2
2 Blood-transfusion 748 5 2
3 Flushot 159 4 2
4 Haberman 306 4 2
5 Liver-disorders 345 7 2
6 Spect test 187 23 2
7 Echocardiagram 132 11 2
TABLE 1: Specifications for the medical datasets
CVS
Neural Network
Model
Feature from the
Dataset
Sensitivity Analysis
Evaluation
Steepness
Parameter and
Threshold
Medical Dataset
7. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 509
FIGURE 4: Classification accuracy after evaluation
Sl.
No.
Name of the
Dataset LR
LR
NN
with
t = 0.4
LR
NN
with
t=0.2,
σ=2
LR
NN
with
t=0.4,
σ=3
LR
NN
with
SA,
t=0.4
LR
NN
with
SA,
t=0.4,
σ=2
LR NN
with
SA,
t=0.4,
σ=3
1 Asthma 56.16 54.58 54.62 55.19 76.29 76.29 76.33
2
blood-
transfusion 77.13 77.4 77.27 77.27 71.65 76.6 77.94
3 Flushot 86.16 85 85 85 85 85 85
4 haberman 74.18 73.52 73.85 73.52 73.2 74.18 73.85
5 liver-disorders 68.11 63.18 62.31 65.21 63.18 64.34 65.21
6 SPECT test 65.24 69.51 69.51 70.58 86.63 87.16 87.16
7 echocardiogram 65.15 75.75 75 75 75.75 75 75
TABLE 2: Logistic regression and neural network specification with sensitivity analysis for the medical
datasets
7. CONCLUSION
Finally, the conclusions of this work are based on the publicly available medical data sets. The
results of this study are more promising. In this research work an attempt was made to evaluate
logistic regression and neural network model with sensitivity analysis on publicly available
medical data sets. The classification accuracy is used to measure the performance of both the
models. From the experimental results it is confirmed that neural network model with sensitivity
analysis gives more efficient result.
8. REFERENCES
[1]. Luis Mariano Esteban Escaño, Gerardo Sanz Saiz, Francisco Javier López Lorente, Ángel
Borque Fernando and José Moría Vergara Ugarriza, “Logistic Regression Versus Neural
Networks for Medical Data”, Monografías del Seminario Matemático García de Galdeano
33, 245-252, 2006.
8. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 510
[2]. Bahar Tasdelen, Sema Helvaci, Hakan Kaleagasi, Aynur Ozge, “Artificial Neural Network
Analysis for Prediction of Headache Prognosis in Elderly Patients”, Turk J Med Sci 2009;
39(1); 5-12.
[3]. LeXu, Mo-Yuen Chow, and Xiao-Zhi Gao, “Comparisons of Logistic Regression and
Artificial Neural Network on Power Distribution Systems Fault Cause Identification”,
Proceedings of 2005 IEEE Mid-Summer Workshop on Soft Computing in Industrial
Applications (SMCia/05), Helsinki, Finland, June 28-30, 2005.
[4]. Fariba Shadabi and Dharmendra Sharma, “Comparison of Artificial Neural Networks with
Logistic Regression in Prediction of Kidney Transplant Outcomes”, Proceedings of the
2009 International Conference of Future Computer and Communication (ICFCC), 543-547,
2009.
[5]. V.S. Bourdes, S. Bonnevay, P.J.G. Lisbosa, M.S.H. Aung, S. Chabaud, T. Bachelot, D.
Perol and S. Negrier, “Breast Cancer Predictions by Neural Networks Analysis: a
Comparison with Logistic Regression”, Proceedings of the 29th
International Conference of
the IEEE Engineering in Medicine and Biology Society, Lyon, France, August 23-26, 2007,
5424-7.
[6]. Jack R. Brzezinski George J. Knaft, “Logistic Regression Modeling for Context-Based
Classification”, DEXA Database and Expert Systems Applications Workshop, 1999.
[7]. Seker H., Odetayo M., Petrovic D., Naguib R.N.G., Bartoli C., Alasio L., Lakshmi M.S.,
Sherbet G.V. (2002), “An Artificial Neural Network Based Feature Evaluation Index for the
Assessment of Clinical Factors in Breast Cancer Survival Analysis”, Proceedings of the
IEEE Canadian Conference on Electrical and Computer Engineering.
[8]. Marijana Zekic-Susac, Natasa Sarlija, Mirta Bensic, “Small Business Credit Scoring: A
Comparison of Logistic Regression, Neural Network, and Decision Tree Models”, 26
th
International Conference on Information Technology Interfaces (ITI 2004), Cavtat, Croatia,
265-270.
[9]. M Münevver Köküuer, Raouf N. G. Naguib, Peter Jančovič, H. Banfield Younghusband and
Roger Green, “A Comparison of Multi-Layer Neural Network and Logistic Regression in
Hereditary Non-Polyposis Colorectal Cancer Risk Assessment”, Proceedings of the 2005
IEEE Engineering in Medicine and Biology , 27
th
Annual Conference, Shanghai, China,
September 2005, 2417-2420.
[10]. Lisbosa P.J.G., and H. Wong (2001), “Are neural networks best used to help logistic
regression? An example from breast cancer survival analysis”, IEEE Transactions on
Neural Networks, 2472-2477.
[11]. Poh Lian Choong, and Christopher J.S. DeSilva (1996), “A Comparison of Maximum
Entropy Estimation and Multivariate Logistic Regression in the Prediction of Axillary Lymph
Node Metastasis in Early Breast Cancer Patients”, The 1996 IEEE International
Conference on Neural Networks, 1468-1473.
[12]. Neter J., Kutner M.H., Nachtsheim C.J., Wasserman W., Applied Linear Regression
Models, 3rd
Ed. 1996, Irwin, USA (ISBN 0-256-08601-X).
[13]. http://en.wikipedia.org/wiki/Logistic_regression
[14]. Portia A. Cerny, 2001, Datamining and Neural Networks from a Commercial Perspective,
Auckland, New Zealand Student of the Department of Mathematical Sciences, University of
Technology, Sydney, Australia.
9. Raghavendra B.K., & S.K. Srivatsa
International Journal of Computer Science and Security (IJCSS), Volume (5) : Issue (5) : 2011 511
[15]. C.L. Blake, C.J. Merz, “UCI repository of machine learning databases”.
[http://www.ics.uci.edu/~mlearn/ MLRepository.html], Department of Information and
Computer Science, University of California, Irvine.