These slides covers more advanced statistical applications including that in data science.
The mode of presentation is that the concept is introduced first, followed by illustration and the use in a real context.
This document introduces an advanced statistical manual for Ayurveda research. It provides more advanced statistical applications, including those used in data science. The topics covered include repeated measures analysis, multiple linear regression, superiority/bioequivalence/non-inferiority trials, logistic regression, and other machine learning techniques. Examples from Ayurveda research are provided to illustrate key statistical concepts and their applications. The goal is to present concepts first, then illustrate them using real contexts in order to help students and researchers better understand and apply advanced statistics.
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...IRJET Journal
The document discusses using ensemble learning and particle swarm optimization to predict heart disease. It aims to select the machine learning algorithm from among AdaBoost, gradient descent, random forest, decision tree and Gaussian naive Bayes that achieves the highest accuracy. Particle swarm optimization is used to select important predictive features from the dataset. The proposed approach uses AdaBoost and particle swarm optimization to achieve an accuracy of 84.88% in predicting heart disease, with an error rate of 4%.
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...sushantparte
This document is a research project submission for a MSc in Data Analytics at the National College of Ireland. The project aims to use supervised machine learning models to predict animal shelter outcomes using a dataset from the Austin Animal Center. Four classification models - logistic regression, neural network, XGboost, and random forest - are implemented and evaluated based on metrics like accuracy, logarithmic loss, sensitivity and specificity. The best performing model is found to be XGboost, which achieves an accuracy of 65.33% on the Austin animal shelter outcomes dataset.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
The document describes an automatic unsupervised data classification method using the Jaya evolutionary algorithm. It proposes using Jaya to optimize multiple cluster validity indices (CVIs) simultaneously to determine the optimal number of clusters and cluster assignments. Twelve real-world datasets from different domains are used to evaluate the method. The results show that the proposed AutoJAYA algorithm is able to accurately detect the number of clusters in each dataset and achieve good performance according to various CVIs, demonstrating its effectiveness at automatic unsupervised data classification.
Implementation of Improved ID3 Algorithm to Obtain more Optimal Decision Tree.IJERD Editor
Decision tree learning is the discipline to create a predictive model to map the different items in the
set and respective target values and associate them in a way that is true to every element. This concept is used in
statistics, data mining and machine learning due to its simple and effectiveness.
Among the various strategies available to construct the decision trees ID3 is one of the simplest and
most widely used decision tree algorithm, but ID3 algorithm gives more importance to attributes having
multiple values while selecting node. This major shortcoming affects the accuracy of decision tree. In this paper
we are proposing improvement in ID3 algorithm using association function (AF). The Experimental result
shows improved ID3 algorithm can overcome shortcomings of ID3 which will also improve the accuracy of ID3
algorithm.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
This document introduces an advanced statistical manual for Ayurveda research. It provides more advanced statistical applications, including those used in data science. The topics covered include repeated measures analysis, multiple linear regression, superiority/bioequivalence/non-inferiority trials, logistic regression, and other machine learning techniques. Examples from Ayurveda research are provided to illustrate key statistical concepts and their applications. The goal is to present concepts first, then illustrate them using real contexts in order to help students and researchers better understand and apply advanced statistics.
IRJET- Human Heart Disease Prediction using Ensemble Learning and Particle Sw...IRJET Journal
The document discusses using ensemble learning and particle swarm optimization to predict heart disease. It aims to select the machine learning algorithm from among AdaBoost, gradient descent, random forest, decision tree and Gaussian naive Bayes that achieves the highest accuracy. Particle swarm optimization is used to select important predictive features from the dataset. The proposed approach uses AdaBoost and particle swarm optimization to achieve an accuracy of 84.88% in predicting heart disease, with an error rate of 4%.
Predicting of Hosting Animal Centre Outcome Based on Supervised Machine Learn...sushantparte
This document is a research project submission for a MSc in Data Analytics at the National College of Ireland. The project aims to use supervised machine learning models to predict animal shelter outcomes using a dataset from the Austin Animal Center. Four classification models - logistic regression, neural network, XGboost, and random forest - are implemented and evaluated based on metrics like accuracy, logarithmic loss, sensitivity and specificity. The best performing model is found to be XGboost, which achieves an accuracy of 65.33% on the Austin animal shelter outcomes dataset.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
The document describes an automatic unsupervised data classification method using the Jaya evolutionary algorithm. It proposes using Jaya to optimize multiple cluster validity indices (CVIs) simultaneously to determine the optimal number of clusters and cluster assignments. Twelve real-world datasets from different domains are used to evaluate the method. The results show that the proposed AutoJAYA algorithm is able to accurately detect the number of clusters in each dataset and achieve good performance according to various CVIs, demonstrating its effectiveness at automatic unsupervised data classification.
Implementation of Improved ID3 Algorithm to Obtain more Optimal Decision Tree.IJERD Editor
Decision tree learning is the discipline to create a predictive model to map the different items in the
set and respective target values and associate them in a way that is true to every element. This concept is used in
statistics, data mining and machine learning due to its simple and effectiveness.
Among the various strategies available to construct the decision trees ID3 is one of the simplest and
most widely used decision tree algorithm, but ID3 algorithm gives more importance to attributes having
multiple values while selecting node. This major shortcoming affects the accuracy of decision tree. In this paper
we are proposing improvement in ID3 algorithm using association function (AF). The Experimental result
shows improved ID3 algorithm can overcome shortcomings of ID3 which will also improve the accuracy of ID3
algorithm.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
A Survey on Heart Disease Prediction Techniquesijtsrd
Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38349.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38349/a-survey-on-heart-disease-prediction-techniques/g-niranjana
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Research scholars evaluation based on guides view using id3eSAT Journals
Abstract Research Scholars finds many problems in their Research and Development activities for the completion of their research work in universities. This paper gives a proficient way for analyzing the performance of Research Scholar based on guides and experts feedback. A dataset is formed using this information. The outcome class attribute will be in view of guides about the scholars. We apply decision tree algorithm ID3 on this dataset to construct the decision tree. Then the scholars can enter the testing data that has comprised with attribute values to get the view of guides for that testing dataset. Guidelines to the scholar can be provided by considering this constructed tree to improve their outcomes.
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...ijtsrd
Customary characterization calculations can be constrained in their execution on exceedingly uneven informational collections. A famous stream of work for countering the substance of class inelegance has been the use of an assorted of inspecting methodologies. In this correspondence, we center on planning alterations neural system to properly handle the issue of class irregularity. We consolidate distinctive rebalance heuristics in ANN demonstrating, including cost delicate learning, and over and under testing. These ANN based systems are contrasted and different best in class approaches on an assortment of informational collections by utilizing different measurements, including G mean, region under the collector working trademark curve, F measure, and region under the exactness review curve. Numerous regular strategies, which can be classified into testing, cost delicate, or gathering, incorporate heuristic and task subordinate procedures. So as to accomplish a superior arrangement execution by detailing without heuristics and errand reliance, presently propose RBF based Network RBF NN . Its target work is the symphonious mean of different assessment criteria got from a perplexity grid, such criteria as affectability, positive prescient esteem, and others for negatives. This target capacity and its enhancement are reliably detailed on the system of CM KLOGR, in light of least characterization mistake and summed up probabilistic plunge MCE GPD learning. Because of the benefits of the consonant mean, CM KLOGR, and MCE GPD, RBF NN improves the multifaceted exhibitions in a very much adjusted way. It shows the definition of RBF NN and its adequacy through trials that nearly assessed RBF NN utilizing benchmark imbalanced datasets. Nitesh Kumar | Dr. Shailja Sharma "Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm Optimization" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25255.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/25255/adaptive-classification-of-imbalanced-data-using-ann-with-particle-of-swarm-optimization/nitesh-kumar
A new model for iris data set classification based on linear support vector m...IJECEIAES
1. The authors propose a new model for classifying the iris data set using a linear support vector machine (SVM) classifier with genetic algorithm optimization of the SVM's C and gamma parameters.
2. Principal component analysis was used to reduce the iris data set features from four to three before classification.
3. The genetic algorithm was shown to optimize the SVM parameters, achieving 98.7% accuracy on the iris data set classification compared to 95.3% accuracy without parameter optimization.
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...ahmedbohy
This work proposes two new classification techniques for predicting hepatitis mortality using a dataset from Ljubljana University. The first technique estimates missing values by finding the minimum difference between attribute values of the instance with missing values and other instances. The second technique computes a weight factor for each attribute by correlating the decision attribute with other attributes, and classifies new instances using correlation in the frequency domain on the top seven attributes. Experimental results on 155 instances show the frequency domain technique achieved a mean accuracy of 90.4%, higher than the first technique and previous methods.
This document compares classification and regression models using the CARET package in R. Four classification algorithms are evaluated on Titanic survival data and three regression algorithms are evaluated on property liability data. For classification, random forests performed best based on the F-measure metric. For regression, gradient boosted models performed best based on RMSE. The document concludes classification can predict Titanic survivor characteristics while regression can predict property hazards.
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...IRJET Journal
This document describes a disease prediction system that uses the Random Forest classification algorithm to predict Dengue, diabetes, and swine flu. The system trains on labeled datasets for each disease. It then takes user-entered symptoms as input and predicts the likelihood of each disease. If a disease is predicted to be positive, the system recommends a specialized doctor. The document discusses related work on disease prediction using data mining techniques. It provides an overview of how the Random Forest algorithm works for classification problems and ensemble learning. The proposed system aims to help users predict diseases and find appropriate doctors for treatment.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELScscpconf
This document summarizes a research paper that proposes new portfolio selection models using possibilistic Sharpe ratio to account for uncertainty in fuzzy environments. It defines possibilistic moments like mean, variance, skewness, and risk premium for fuzzy numbers. It then defines possibilistic Sharpe ratio as the ratio of possibilistic risk premium to standard deviation. New bi-objective and multi-objective portfolio models are presented that maximize possibilistic Sharpe ratio and skewness to allow for asymmetric returns. The models are solved using a genetic algorithm and tested on stock price data to demonstrate the approach.
Fuzzy logic applications for data acquisition systems of practical measurement IJECEIAES
In laboratory works, the error in measurement, reading the measurring devices, similarity of experimental data and lack of understanding of practicum materials are often found. These will lead to the inacurracy and invalid in data obtanined. As an alternative solution, application of fuzzy logic to the data acquisition system using a web server. This research focuses on the design of data acquisition systems with the target of reducing the error rate in measuring experimental data on the laboratory. Data measurement on laboratory practice module is done by taking the analog data resulted from the measurement. Furthermore, the data are converted into digital data via arduino and stored on the server. To get valid data, the server will process the data by using fuzzy logic method. The valid data are integrated into a web server so that it can be accessed as needed. The results showed that the data acquisition system based on fuzzy logic is able to provide recommendation of measurement result on the lab works based on the degree value of membership and truth value. Fuzzy logic will select the measured data with a maximum error percentage of 5% and select the measurement result which has minimum error rate.
This document summarizes an article that proposes a novel cost-free learning (CFL) approach called ABC-SVM to address the class imbalance problem. The approach aims to maximize the normalized mutual information of the predicted and actual classes to balance errors and rejects without requiring cost information. It optimizes misclassification costs, SVM parameters, and feature selection simultaneously using an artificial bee colony algorithm. Experimental results on several datasets show the method performs effectively compared to sampling techniques for class imbalance.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...csandit
Accurate prediction of electricity demand can bring extensive benefits to any country as the
forecast values help the relevant authorities to take decisions regarding electricity generation,
transmission and distribution much appropriately. The literature reveals that, when compared
to conventional time series techniques, the improved artificial intelligent approaches provide
better prediction accuracies. However, the accuracy of predictions using intelligent approaches
like neural networks are strongly influenced by the correct selection of inputs and the number of
neuro-forecasters used for prediction. This research shows how a cluster analysis performed to
group similar day types, could contribute towards selecting a better set of neuro-forecasters in
neural networks. Daily total electricity demands for five years were considered for the analysis
and each date was assigned to one of the thirteen day-types, in a Sri Lankan context. As a
stochastic trend could be seen over the years, prior to performing the k-means clustering, the
trend was removed by taking the first difference of the series. Three different clusters were
found using Silhouette plots, and thus three neuro-forecasters were used for predictions. This
paper illustrates the proposed modified neural network procedure using electricity demand
data.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
The document provides an overview of concepts and topics to be covered in the MIS End Term Exam for AI and A2 on February 6th 2020, including: decision trees, classifier algorithms like ID3, CART and Naive Bayes; supervised and unsupervised learning; clustering using K-means; bias and variance; overfitting and underfitting; ensemble learning techniques like bagging and random forests; and the use of test and train data.
Evidential reasoning based decision system to select health care locationIJAAS Team
The general public’s demand of Bangladesh for safe health is rising promptly with the improvement of the living standard. However, the allocation of limited and unbalanced medical resources is deteriorating the assurance of safe health of the people. Therefore, the new hospital construction with rational allocation of resources is imminent and significant. The site selection for establishing a hospital is one of the crucial policy-related decisions taken by planners and policy makers. The process of hospital site selection is inherently complicated because of this involves many factors to be measured and evaluated. These factors are expressed both in objective and subjective ways where as a hierarchical relationship exists among the factors. In addition, it is difficult to measure qualitative factors in a quantitative way, resulting incompleteness in data and hence, uncertainty. Besides it is essential to address the subject of uncertainty by using apt methodology; otherwise, the decision to choose a suitable site will become inapt. Therefore, this paper demonstrates the application of a novel method named belief rulebased inference methodology-RIMER base intelligent decision system(IDS), which is capable of addressing suitable site for hospital by taking account of large number of criteria, where there exist factors of both subjective and objective nature.
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Performance evaluation of hepatitis diagnosis using single and multi classifi...ahmedbohy
The goal of our paper is to obtain superior accuracy of different classifiers or multi-classifiers fusion in diagnosing Hepatitis using world wide data set from Ljubljana University. We present an implementation among some of the classification methods which are defined as the best algorithms in medical field. Then we apply a fusion between classifiers to get the best multi-classifier fusion approach. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. The experimental results show that for all data sets (complete, reduced, and no missing value) using multi-classifiers fusion achieved better accuracy than the single ones
This document introduces an advanced statistical manual for Ayurveda research. It summarizes 14 statistical and machine learning techniques covered in the manual, including logistic regression, decision trees, random forests, support vector machines, naive Bayes classifiers, neural networks, and K-nearest neighbors. For each technique, it provides a brief conceptual overview and an illustrative example using Ayurveda data. The goal of the manual is to cover more advanced statistical applications relevant for data science in Ayurveda research.
This document describes a major project aimed at predicting health insurance costs using regression models. The objectives are to implement efficient algorithms that provide accurate predictions and to compare different regression algorithms. The project will use multiple linear regression, decision tree regression, and gradient boosting regression on health insurance data to predict costs. Literature on using machine learning and deep learning models for health insurance cost prediction is reviewed. The hardware, software, methods, and key concepts of multiple linear regression, decision tree regression, and gradient boosting regression are described.
A Survey on Heart Disease Prediction Techniquesijtsrd
Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38349.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38349/a-survey-on-heart-disease-prediction-techniques/g-niranjana
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Research scholars evaluation based on guides view using id3eSAT Journals
Abstract Research Scholars finds many problems in their Research and Development activities for the completion of their research work in universities. This paper gives a proficient way for analyzing the performance of Research Scholar based on guides and experts feedback. A dataset is formed using this information. The outcome class attribute will be in view of guides about the scholars. We apply decision tree algorithm ID3 on this dataset to construct the decision tree. Then the scholars can enter the testing data that has comprised with attribute values to get the view of guides for that testing dataset. Guidelines to the scholar can be provided by considering this constructed tree to improve their outcomes.
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...ijtsrd
Customary characterization calculations can be constrained in their execution on exceedingly uneven informational collections. A famous stream of work for countering the substance of class inelegance has been the use of an assorted of inspecting methodologies. In this correspondence, we center on planning alterations neural system to properly handle the issue of class irregularity. We consolidate distinctive rebalance heuristics in ANN demonstrating, including cost delicate learning, and over and under testing. These ANN based systems are contrasted and different best in class approaches on an assortment of informational collections by utilizing different measurements, including G mean, region under the collector working trademark curve, F measure, and region under the exactness review curve. Numerous regular strategies, which can be classified into testing, cost delicate, or gathering, incorporate heuristic and task subordinate procedures. So as to accomplish a superior arrangement execution by detailing without heuristics and errand reliance, presently propose RBF based Network RBF NN . Its target work is the symphonious mean of different assessment criteria got from a perplexity grid, such criteria as affectability, positive prescient esteem, and others for negatives. This target capacity and its enhancement are reliably detailed on the system of CM KLOGR, in light of least characterization mistake and summed up probabilistic plunge MCE GPD learning. Because of the benefits of the consonant mean, CM KLOGR, and MCE GPD, RBF NN improves the multifaceted exhibitions in a very much adjusted way. It shows the definition of RBF NN and its adequacy through trials that nearly assessed RBF NN utilizing benchmark imbalanced datasets. Nitesh Kumar | Dr. Shailja Sharma "Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm Optimization" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25255.pdfPaper URL: https://www.ijtsrd.com/computer-science/other/25255/adaptive-classification-of-imbalanced-data-using-ann-with-particle-of-swarm-optimization/nitesh-kumar
A new model for iris data set classification based on linear support vector m...IJECEIAES
1. The authors propose a new model for classifying the iris data set using a linear support vector machine (SVM) classifier with genetic algorithm optimization of the SVM's C and gamma parameters.
2. Principal component analysis was used to reduce the iris data set features from four to three before classification.
3. The genetic algorithm was shown to optimize the SVM parameters, achieving 98.7% accuracy on the iris data set classification compared to 95.3% accuracy without parameter optimization.
Novel Frequency Domain Classification Algorithm Based On Parameter Weight Fac...ahmedbohy
This work proposes two new classification techniques for predicting hepatitis mortality using a dataset from Ljubljana University. The first technique estimates missing values by finding the minimum difference between attribute values of the instance with missing values and other instances. The second technique computes a weight factor for each attribute by correlating the decision attribute with other attributes, and classifies new instances using correlation in the frequency domain on the top seven attributes. Experimental results on 155 instances show the frequency domain technique achieved a mean accuracy of 90.4%, higher than the first technique and previous methods.
This document compares classification and regression models using the CARET package in R. Four classification algorithms are evaluated on Titanic survival data and three regression algorithms are evaluated on property liability data. For classification, random forests performed best based on the F-measure metric. For regression, gradient boosted models performed best based on RMSE. The document concludes classification can predict Titanic survivor characteristics while regression can predict property hazards.
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...IRJET Journal
This document describes a disease prediction system that uses the Random Forest classification algorithm to predict Dengue, diabetes, and swine flu. The system trains on labeled datasets for each disease. It then takes user-entered symptoms as input and predicts the likelihood of each disease. If a disease is predicted to be positive, the system recommends a specialized doctor. The document discusses related work on disease prediction using data mining techniques. It provides an overview of how the Random Forest algorithm works for classification problems and ensemble learning. The proposed system aims to help users predict diseases and find appropriate doctors for treatment.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELScscpconf
This document summarizes a research paper that proposes new portfolio selection models using possibilistic Sharpe ratio to account for uncertainty in fuzzy environments. It defines possibilistic moments like mean, variance, skewness, and risk premium for fuzzy numbers. It then defines possibilistic Sharpe ratio as the ratio of possibilistic risk premium to standard deviation. New bi-objective and multi-objective portfolio models are presented that maximize possibilistic Sharpe ratio and skewness to allow for asymmetric returns. The models are solved using a genetic algorithm and tested on stock price data to demonstrate the approach.
Fuzzy logic applications for data acquisition systems of practical measurement IJECEIAES
In laboratory works, the error in measurement, reading the measurring devices, similarity of experimental data and lack of understanding of practicum materials are often found. These will lead to the inacurracy and invalid in data obtanined. As an alternative solution, application of fuzzy logic to the data acquisition system using a web server. This research focuses on the design of data acquisition systems with the target of reducing the error rate in measuring experimental data on the laboratory. Data measurement on laboratory practice module is done by taking the analog data resulted from the measurement. Furthermore, the data are converted into digital data via arduino and stored on the server. To get valid data, the server will process the data by using fuzzy logic method. The valid data are integrated into a web server so that it can be accessed as needed. The results showed that the data acquisition system based on fuzzy logic is able to provide recommendation of measurement result on the lab works based on the degree value of membership and truth value. Fuzzy logic will select the measured data with a maximum error percentage of 5% and select the measurement result which has minimum error rate.
This document summarizes an article that proposes a novel cost-free learning (CFL) approach called ABC-SVM to address the class imbalance problem. The approach aims to maximize the normalized mutual information of the predicted and actual classes to balance errors and rejects without requiring cost information. It optimizes misclassification costs, SVM parameters, and feature selection simultaneously using an artificial bee colony algorithm. Experimental results on several datasets show the method performs effectively compared to sampling techniques for class imbalance.
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...IOSRjournaljce
The main purpose of data mining is to extract knowledge from large amount of data. Artificial Neural network (ANN) has already been applied in a variety of domains with remarkable success. This paper presents the application of hybrid model for stroke disease that integrates Genetic algorithm and back propagation algorithm. Selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In addition the hybrid model that leads to further improvised categorization, accuracy compared to the result produced by genetic algorithm alone. In this study, a new hybrid model of Neural Networks and Genetic Algorithm (GA) to initialize and optimize the connection weights of ANN so as to improve the performance of the ANN and the same has been applied in a medical problem of predicting stroke disease for verification of the results.
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...csandit
Accurate prediction of electricity demand can bring extensive benefits to any country as the
forecast values help the relevant authorities to take decisions regarding electricity generation,
transmission and distribution much appropriately. The literature reveals that, when compared
to conventional time series techniques, the improved artificial intelligent approaches provide
better prediction accuracies. However, the accuracy of predictions using intelligent approaches
like neural networks are strongly influenced by the correct selection of inputs and the number of
neuro-forecasters used for prediction. This research shows how a cluster analysis performed to
group similar day types, could contribute towards selecting a better set of neuro-forecasters in
neural networks. Daily total electricity demands for five years were considered for the analysis
and each date was assigned to one of the thirteen day-types, in a Sri Lankan context. As a
stochastic trend could be seen over the years, prior to performing the k-means clustering, the
trend was removed by taking the first difference of the series. Three different clusters were
found using Silhouette plots, and thus three neuro-forecasters were used for predictions. This
paper illustrates the proposed modified neural network procedure using electricity demand
data.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
The document provides an overview of concepts and topics to be covered in the MIS End Term Exam for AI and A2 on February 6th 2020, including: decision trees, classifier algorithms like ID3, CART and Naive Bayes; supervised and unsupervised learning; clustering using K-means; bias and variance; overfitting and underfitting; ensemble learning techniques like bagging and random forests; and the use of test and train data.
Evidential reasoning based decision system to select health care locationIJAAS Team
The general public’s demand of Bangladesh for safe health is rising promptly with the improvement of the living standard. However, the allocation of limited and unbalanced medical resources is deteriorating the assurance of safe health of the people. Therefore, the new hospital construction with rational allocation of resources is imminent and significant. The site selection for establishing a hospital is one of the crucial policy-related decisions taken by planners and policy makers. The process of hospital site selection is inherently complicated because of this involves many factors to be measured and evaluated. These factors are expressed both in objective and subjective ways where as a hierarchical relationship exists among the factors. In addition, it is difficult to measure qualitative factors in a quantitative way, resulting incompleteness in data and hence, uncertainty. Besides it is essential to address the subject of uncertainty by using apt methodology; otherwise, the decision to choose a suitable site will become inapt. Therefore, this paper demonstrates the application of a novel method named belief rulebased inference methodology-RIMER base intelligent decision system(IDS), which is capable of addressing suitable site for hospital by taking account of large number of criteria, where there exist factors of both subjective and objective nature.
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Performance evaluation of hepatitis diagnosis using single and multi classifi...ahmedbohy
The goal of our paper is to obtain superior accuracy of different classifiers or multi-classifiers fusion in diagnosing Hepatitis using world wide data set from Ljubljana University. We present an implementation among some of the classification methods which are defined as the best algorithms in medical field. Then we apply a fusion between classifiers to get the best multi-classifier fusion approach. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. The experimental results show that for all data sets (complete, reduced, and no missing value) using multi-classifiers fusion achieved better accuracy than the single ones
This document introduces an advanced statistical manual for Ayurveda research. It summarizes 14 statistical and machine learning techniques covered in the manual, including logistic regression, decision trees, random forests, support vector machines, naive Bayes classifiers, neural networks, and K-nearest neighbors. For each technique, it provides a brief conceptual overview and an illustrative example using Ayurveda data. The goal of the manual is to cover more advanced statistical applications relevant for data science in Ayurveda research.
This document describes a major project aimed at predicting health insurance costs using regression models. The objectives are to implement efficient algorithms that provide accurate predictions and to compare different regression algorithms. The project will use multiple linear regression, decision tree regression, and gradient boosting regression on health insurance data to predict costs. Literature on using machine learning and deep learning models for health insurance cost prediction is reviewed. The hardware, software, methods, and key concepts of multiple linear regression, decision tree regression, and gradient boosting regression are described.
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
This document analyzes various data mining techniques for classifying heart disease datasets. It compares the performance of classification algorithms like decision trees and lazy learning on aspects like time taken to build models. The algorithms are tested on a heart disease dataset from a public repository using the KEEL data mining tool. Decision trees and k-nearest neighbors are implemented using distance functions like Euclidean and HVDM across different validation modes. The results show that k-nearest neighbors with no validation is the most efficient algorithm for predicting heart disease, taking the least time to build models of the dataset. The study aims to determine the optimal classification algorithm for heart disease prediction systems.
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...IRJET Journal
This document discusses machine learning classification algorithms and their applications for predictive analysis in healthcare. It provides an overview of data mining techniques like association, classification, clustering, prediction, and sequential patterns. Specific classification algorithms discussed include Naive Bayes, Support Vector Machine, Decision Trees, K-Nearest Neighbors, Neural Networks, and Bayesian Methods. The document examines examples of these algorithms being used for disease diagnosis, prognosis, and healthcare management. It analyzes their predictive performance on datasets for conditions like breast cancer, heart disease, and ICU readmissions. Overall, the document reviews how machine learning techniques can enhance predictive accuracy for various healthcare problems.
The document discusses various statistical methodologies that can be applied to Ayurveda research, including experimentation, surveys, case-control studies, meta-analysis, survival studies, and time series analysis. It provides an overview of how these methods are currently used in Ayurveda research and highlights some areas that could be improved, such as employing stratification and larger sample sizes. Logistic regression and decision trees are presented as effective analytical techniques for case-control studies.
Assigning Scores For Ordered Categorical ResponsesMary Montoya
This document summarizes a research article that proposes a new method for assigning scores to ordered categorical response variables in statistical analysis. Specifically, it discusses the ordered stereotype model, which allows for uneven spacing between categories of an ordinal variable through estimated score parameters. The article presents simulation studies showing the disadvantages of assuming equal spacing, and applies the ordered stereotype model to a real dataset, demonstrating non-equal spacing. It also proposes a new median measure for ordinal data based on estimated score parameters from the ordered stereotype model.
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONIJDKP
Developing predictive modelling solutions for risk estimation is extremely challenging in health-care
informatics. Risk estimation involves integration of heterogeneous clinical sources having different
representation from different health-care provider making the task increasingly complex. Such sources are
typically voluminous, diverse, and significantly change over the time. Therefore, distributed and parallel
computing tools collectively termed big data tools are in need which can synthesize and assist the physician
to make right clinical decisions. In this work we propose multi-model predictive architecture, a novel
approach for combining the predictive ability of multiple models for better prediction accuracy. We
demonstrate the effectiveness and efficiency of the proposed work on data from Framingham Heart study.
Results show that the proposed multi-model predictive architecture is able to provide better accuracy than
best model approach. By modelling the error of predictive models we are able to choose sub set of models
which yields accurate results. More information was modelled into system by multi-level mining which has
resulted in enhanced predictive accuracy.
Health Care Application using Machine Learning and Deep LearningIRJET Journal
This document presents a study on using machine learning and deep learning techniques for healthcare applications like disease prediction. It discusses algorithms like logistic regression, decision trees, random forests, SVMs and deep learning models like VGG16 applied on various disease datasets. For diabetes, heart and liver diseases, ML algorithms were used while CNN models were used for malaria and pneumonia image datasets. Random forest achieved the highest accuracy of 84.81% for diabetes prediction, SVM had 81.57% accuracy for heart disease and random forest was best at 83.33% for liver disease. The VGG16 model attained accuracies of 94.29% and 95.48% for malaria and pneumonia respectively. The study aims to develop an intelligent healthcare application for predicting different
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
As we know that health care industry is completely based on assumptions, which after get tested and verified via various tests and patient have to be depend on the doctors knowledge on that topic . so we made a system that uses data mining techniques to predict the health of a person based on various medical test results. so we can predict the health of that person based on that analysis performed by the system.The system currently design only for heart issues, for that we had used Statlog (Heart) Data Set from UCI Machine Learning Repository it includes attributes like age, sex, chest pain type, cholesterol, sugar, outcomes,etc.for training the system. we only need to passed few general inputs in order to generate the prediction and the prediction results from all algorithms are they merged together by calculating there mean value that value shows the actual outcome of the prediction process which entirely works in background
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The document describes a predictive data mining algorithm for medical diagnosis that uses support vector machine (SVM) and random forest (RF) algorithms. It analyzes diabetes, kidney, and liver disease databases using these techniques. The proposed algorithm applies SVM and RF to the datasets and achieves high prediction accuracies of 99.35%, 99.37%, and 99.14% for diabetes, kidney, and liver diseases respectively. It also compares the performance of SVM and RF based on metrics like precision, recall, accuracy, and execution time.
IRJET - Survey on Analysis of Breast Cancer PredictionIRJET Journal
This document compares three machine learning techniques - Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) - for predicting breast cancer using a dataset of 198 patient records. It finds that SVM achieved the highest accuracy of 96.97% for classification, followed by RF at 96.45% and NB at 95.45%. SVM also had the highest recall rate at 0.97, indicating it was best at correctly identifying malignant tumors. While NB had the lowest precision of 0.92, meaning it incorrectly identified some benign cases as malignant, all three techniques showed high performance in predicting breast cancer.
Prediction of Diabetes using Probability ApproachIRJET Journal
This document discusses using a Bayesian Network classifier to predict whether individuals have diabetes based on various attributes. It analyzes a Pima Indian Diabetes dataset containing information on individuals with and without diabetes. The study aims to help identify diabetes and improve people's lifestyles by making them aware of the disease and how to treat it. It evaluates the prediction performance of Bayesian algorithms for classifying individuals as diabetic or non-diabetic.
This document provides an overview of data mining techniques and tools. It discusses data mining processes like predictive and descriptive data mining. It describes various data mining tasks such as classification, clustering, regression, and association rule learning. It then examines specific techniques for prediction using data mining, including classification analysis, association rule learning, decision trees, neural networks, and clustering analysis. Finally, it reviews several popular open-source tools that can be used to implement these data mining techniques, such as RapidMiner, Oracle Data Mining, IBM SPSS Modeler, KNIME, Python, Orange, Kaggle, Rattle, and Weka.
IRJET- Disease Prediction using Machine LearningIRJET Journal
This document discusses using machine learning techniques to predict diseases based on patient symptoms. Specifically, it proposes using naive bayes, k-nearest neighbors (KNN), and logistic regression algorithms on structured and unstructured hospital data to predict diseases like diabetes, malaria, jaundice, dengue, and tuberculosis. The system is intended to make disease prediction more accessible to end users by analyzing their symptoms without needing to visit a doctor. It aims to improve prediction accuracy by handling both structured and unstructured data using machine learning models.
Early Identification of Diseases Based on Responsible Attribute using Data Mi...IRJET Journal
This document describes a proposed method for early identification of diseases using data mining and classification techniques. It begins with an introduction to classification and discusses how it is commonly used in healthcare for tasks like predicting patient risk levels. It then reviews related literature applying classification methods to diseases like heart disease and diabetes. The document outlines the problem of selecting the best classification technique for a given healthcare dataset. It proposes an architecture and method for disease prediction that assigns recommended values to attributes and classifies unknown data based on calculating totals. The method is experimentally analyzed using a heart disease dataset, and its accuracy is compared to Bayesian classification. In conclusion, the proposed method seeks to reduce attributes and complexity while accurately classifying patient data for early disease identification.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
The document discusses several practical issues in learning decision trees: 1) determining the depth to grow the tree to avoid overfitting, 2) handling continuous attributes, 3) choosing an appropriate attribute selection measure, 4) handling missing attribute values, and 5) handling attributes with differing costs. It also discusses techniques for avoiding overfitting like pre-pruning and post-pruning trees as well as reduced error pruning and rule post-pruning.
This document discusses medical data mining and classification techniques. It begins with an introduction to data mining and its applications in healthcare to improve treatment. Medical data mining can help discover patterns in medical data to aid diagnosis. Classification algorithms like decision trees can categorize medical records and help predict outcomes. Specifically, the document discusses the J48 decision tree algorithm available in the WEKA data mining tool, which implements the C4.5 algorithm for classification. Decision trees work by recursively splitting the data into subsets based on attribute values, forming a tree structure. The document concludes that while data mining can help with medical analysis, results from small medical datasets should be interpreted cautiously.
Similar to Advanced Statistical Manual for Ayurveda Research (20)
This document discusses different types of statistical distributions that are found in nature. It provides examples of normal distributions that describe many biological traits like height and IQ, which tend to form a bell curve. Income distributions often follow a lognormal pattern with most people in lower income groups. Tree diameter distributions in natural forests typically take an inverse J-shape. Population age structures also form distinctive patterns over time, like a pyramid shape in the past with high child mortality rates.
This document summarizes the findings of an online survey on obesity prevalence and factors. The survey received poor response with only 53 entries. After excluding pregnant women, there were 50 observations from people in India and other countries. 62% of participants were either overweight or obese according to BMI standards. Multiple regression analysis found that age and disease condition were significantly associated with higher BMI, while disease lowered BMI. Stress, diet, and exercise habits may also contribute to the high rates of overweight and obesity seen in the sample, though larger studies are needed to verify these relationships.
This document discusses health behaviors and health education. It defines types of health behaviors like preventive, illness, and sick-role behaviors. It describes factors that influence health behaviors like lifestyle, culture, knowledge, beliefs, attitudes, values, and norms. It outlines enabling and reinforcing factors for behaviors. It discusses the aims and approaches of health education in motivating healthy behaviors and helping people develop skills to implement their health decisions. It provides tips for effective health messaging like making messages evidence-based, affordable, realistic, culturally acceptable, and meeting felt needs.
AyurData is celebrating its first anniversary and providing an overview of its activities in the past year. It is a group of consultants specialized in clinical trial design and analysis for Ayurvedic research. In the past year, AyurData has released basic and advanced manuals on medical statistics, provided statistical support and training to researchers, and is now tied with a US herbal firm to conduct Ayurvedic clinical trials. It is also part of an international Ayurveda research network and conducted a survey on obesity prevalence.
The document provides information on Ayurveda colleges and courses in India as of October 2020. It lists details of several colleges, including their location, state, contact information, website, email and courses offered. Most colleges offer Bachelor of Ayurvedic Medicine and Surgery (BAMS) degrees and many also have postgraduate programs with seats ranging from 2-6 per course. The colleges are located across several states including Andhra Pradesh, Assam, Bihar, Chhattisgarh, Delhi, Goa, Gujarat, Haryana, Himachal Pradesh, Jharkhand, Karnataka, Jammu & Kashmir.
This document introduces an advanced statistical manual for Ayurveda research. It summarizes 14 statistical topics covered in the manual, including stratified multistage sampling, multiple linear regression, time series analysis, and survival analysis. The goal is to incorporate modern statistical methods into Ayurveda research to help bring Ayurveda into the scientific mainstream. Training workshops are offered to help researchers apply these techniques.
After a long period of stagnancy since its original inception, Ayurveda research has caught up speed in the recent times. The research methodology in general got modernized both in terms of data capturing methods and inferential process. Thereby, we are witnessing more and more sophisticated study designs being employed and more of allopathic parameters being measured in investigations undertaken in Ayurveda. This article attempts to consolidate some of the methodological developments currently being pursued in the domain.
This document introduces an advanced statistical manual for Ayurveda research. It covers more advanced statistical applications, including those used in data science. Some of the topics covered include repeated measures analysis, multiple linear regression, classification techniques like logistic regression, decision trees, random forests, and clustering analysis. Examples of principal component analysis and cluster analysis are provided to illustrate how these techniques can be used to reduce dimensionality and classify objects respectively. The overall document provides an overview of advanced statistical topics and techniques for research in an Ayurveda context.
Advanced statistical manual for ayurveda research sampleAyurdata
Glad to note that we have come up with a second statistical manual on Ayurveda research. This time, it is on more advanced forms of statistical analysis. We hope that researchers will take advantage of the information contained in this manual with interest. The presentation involves some mathematics but the concepts are described in simple terms and illustrated with examples from Ayurveda or from a more general medical context where needed.
‘Allopathy’ is an archaic terminology only used in India. The correct terminology is Modern Medicine. Modern medicine requires that all drugs are proven effective and their safety well-established before they are administered to humans
This document discusses meta-analysis and network meta-analysis in Ayurveda. It defines meta-analysis as a systematic literature review using statistical methods to aggregate findings from multiple related studies. Network meta-analysis extends this concept by including indirect treatment comparisons across different interventions studied. The document provides examples of outcomes that can be analyzed and models used. It also discusses integrating real-world evidence from non-clinical sources with randomized clinical trial data to better predict real-world results.
A manual on statistical analysis in ayurveda researchAyurdata
It took no time for AyurData to recognize the need for a comprehensive document describing the basic aspects of statistical applications in Ayurveda research. In fact, such a specialized publication with examples from Ayurveda was not available. So, our first attempt was to bring out one. Moreover, the content was to agree with the syllabus specified for the course on Medical Statistics for post-graduate students of Ayurveda.
A publication is now available for reference purposes both by students and other researchers working in the domain of Ayurveda for conducting experiments or surveys and also for analyzing and interpreting their results.
This document discusses sample size calculations for clinical trials. It explains that sample size is determined by key factors like the primary variable, test statistic, null and alternative hypotheses, type I and II error rates, and variability estimates. It provides an example calculation for a trial comparing two analgesics. The document also reviews International Conference on Harmonisation guidelines on justifying sample size estimates and assumptions, investigating the sensitivity of sample size to deviations, and conventions for setting type I and II error rates.
Classifiers are algorithms that map input data to categories in order to build models for predicting unknown data. There are several types of classifiers that can be used including logistic regression, decision trees, random forests, support vector machines, Naive Bayes, and neural networks. Each uses different techniques such as splitting data, averaging predictions, or maximizing margins to classify data. The best classifier depends on the problem and achieving high accuracy, sensitivity, and specificity.
Logistic regression is used to model the probability of binary and multiclass classification problems. It assumes a linear relationship between predictors and the log-odds of the target variable. The regression coefficients are estimated using maximum likelihood estimation in an iterative process. Model fit is assessed using measures like deviance and likelihood ratio tests rather than R^2, with smaller deviance indicating better fit. The predictive ability of logistic regression models can be evaluated using metrics like accuracy from a confusion matrix, cross-validation, and the area under the ROC curve (AUC).
AyurData is a consulting firm specialized in clinical trial design and analysis with an emphasis on Ayurveda research. The firm aims to promote scientific rigor in Ayurveda research through modern statistical standards and methods. Services include statistical support for student works and active researchers, training programs, and data analysis services. The firm reviewed current clinical research practices and statistical trends to effectively support researchers.
Naive Bayes algorithm, in particular is a logic-based technique which is simple yet so powerful that it is often known to outperform complex algorithms for very large datasets.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.