Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
The aim of this paper is to use Text mining(TM) concepts in the field of Health care System. We no that now days decision making in health care involves number of opinions given by the group of medical experts for specific disease in the form of decisions which will be presented in medical database in the form of text. These decisions are then mined from database with the help of Data Mining techniques. Text document clustering is considered as tool for performing information based operations. For clustering normally K-means clustering technique is used. In this paper we use Bisecting K-means clustering technique and it is better compared to traditional K-means technique. The objective is to study the revealed
groupings of similar opinion-types associated with the likelihood of physicians and medical experts.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
In this paper, we present an literature survey of the modern data classification and prediction algorithms. All these algorithms are very important in real world applications like- heart disease prediction, cancer prediction etc. Classification of data is a very popular and computationally expensive task. The fundamentals of data classification are also discussed in brief.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
The aim of this paper is to use Text mining(TM) concepts in the field of Health care System. We no that now days decision making in health care involves number of opinions given by the group of medical experts for specific disease in the form of decisions which will be presented in medical database in the form of text. These decisions are then mined from database with the help of Data Mining techniques. Text document clustering is considered as tool for performing information based operations. For clustering normally K-means clustering technique is used. In this paper we use Bisecting K-means clustering technique and it is better compared to traditional K-means technique. The objective is to study the revealed
groupings of similar opinion-types associated with the likelihood of physicians and medical experts.
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
In this paper, we present an literature survey of the modern data classification and prediction algorithms. All these algorithms are very important in real world applications like- heart disease prediction, cancer prediction etc. Classification of data is a very popular and computationally expensive task. The fundamentals of data classification are also discussed in brief.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Preprocessing and Classification in WEKA Using Different ClassifiersIJERA Editor
Data mining is a process of extracting information from a dataset and transform it into understandable structure
for further use, also it discovers patterns in large data sets [1]. Data mining has number of important techniques
such as preprocessing, classification. Classification is one such technique which is based on supervised learning.
It is a technique used for predicting group membership for the data instance. Here in this paper we use
preprocessing, classification on diabetes database. Here we apply classifiers on this database and compare the
result based on certain parameters using WEKA. 77.2 million people in India are suffering from pre diabetes.
ICMR estimates that around 65.1million are diabetes patients. Globally in year 2010, 227 to 285 million people
had diabetes, out of that 90% cases are related to type 2 ,this is equal to 3.3% of the population with equal rates
in both women and men in 2011 it resulted in 1.4 million deaths worldwide making it the leading cause of
death.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
Classification of data is a data mining technique based on machine learning is used to classification of each item set in as a set of dataset into a set of predefined labelled as classes or groups. Classification is tasks for different application such as text classification, image classification, class’s predictions, data Classification etc. In this paper, we presenting the major classification techniques used for prediction of classes using supervised learning dataset. Several major types of classification method including Random Forest, Naive Bayes, Support Vector Machine (SVM) techniques. The goal of this review paper is to provide a review, accuracy and comparative between different classification techniques in data mining.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
Classification of Health Care Data Using Machine Learning Techniqueinventionjournals
Diagnosis of any diseases using intelligent system produces high accuracy and treated as classification problem, also classification of health care data using machine learning techniques may be used as intelligent health care system (IHSM). Diagnosis of heart disease is a major issue and commonly found in human being due to changing life style and also need to be identified in advance. This paper explores various machine learning techniques to classify heart data downloaded from UCI repository site. After applying feature selection technique (FST), a decision tree based technique: CART is producing high accuracy with four features followed by other classification techniques.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Preprocessing and Classification in WEKA Using Different ClassifiersIJERA Editor
Data mining is a process of extracting information from a dataset and transform it into understandable structure
for further use, also it discovers patterns in large data sets [1]. Data mining has number of important techniques
such as preprocessing, classification. Classification is one such technique which is based on supervised learning.
It is a technique used for predicting group membership for the data instance. Here in this paper we use
preprocessing, classification on diabetes database. Here we apply classifiers on this database and compare the
result based on certain parameters using WEKA. 77.2 million people in India are suffering from pre diabetes.
ICMR estimates that around 65.1million are diabetes patients. Globally in year 2010, 227 to 285 million people
had diabetes, out of that 90% cases are related to type 2 ,this is equal to 3.3% of the population with equal rates
in both women and men in 2011 it resulted in 1.4 million deaths worldwide making it the leading cause of
death.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
Classification of data is a data mining technique based on machine learning is used to classification of each item set in as a set of dataset into a set of predefined labelled as classes or groups. Classification is tasks for different application such as text classification, image classification, class’s predictions, data Classification etc. In this paper, we presenting the major classification techniques used for prediction of classes using supervised learning dataset. Several major types of classification method including Random Forest, Naive Bayes, Support Vector Machine (SVM) techniques. The goal of this review paper is to provide a review, accuracy and comparative between different classification techniques in data mining.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
Feature selection is one of the most fundamental steps in machine learning. It is closely related to
dimensionality reduction. A commonly used approach in feature selection is ranking the individual
features according to some criteria and then search for an optimal feature subset based on an evaluation
criterion to test the optimality. The objective of this work is to predict more accurately the presence of
Learning Disability (LD) in school-aged children with reduced number of symptoms. For this purpose, a
novel hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The process of feature ranking
follows a method of calculating the significance or priority of each symptoms of LD as per their
contribution in representing the knowledge contained in the dataset. Each symptoms significance or
priority values reflect its relative importance to predict LD among the various cases. Then by eliminating
least significant features one by one and evaluating the feature subset at each stage of the process, an
optimal feature subset is generated. For comparative analysis and to establish the importance of rough set
theory in feature selection, the backward feature elimination algorithm is combined with two state-of-theart
filter based feature ranking techniques viz. information gain and gain ratio. The experimental results
show the proposed feature selection approach outperforms the other two in terms of the data reduction.
Also, the proposed method eliminates all the redundant attributes efficiently from the LD dataset without
sacrificing the classification performance.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
The problem of variable selection from a large number of variables to predict certain important dependent variables has been of interest to both applied statisticians and other researchers in applied physiology. For this purpose, various statistical techniques have been developed. This framework embedded various statistical techniques of sampling and resampling and help in Statistical Simulation for Physiological Responses under different Environmental condition. The population generation and other statistical calculations are based on the inputs provided by the user as mean vector and covariance matrix and the data. This framework is developed in a way that it can work for the original data as well as for simulated data generated by the software. Approach: The mean vector and covariance matrix are sufficient statistics when the underlying distribution is multivariate normal. This framework uses these two inputs and is able to generate simulated multivariate normal population for any number of variables. The software changes the manual operation into a computer-based system to automate the study, provide efficiency, accuracy, timelessness, and economy. Result: A complete framework that can statistically simulate any type and any number of responses or variables. If the simulated data is analyzed using statistical techniques; the results of such analysis will be the same as that using the original data. If the data is missing for some of the variables, in that case the system will also help. Conclusion: The proposed system makes it possible to carry out the physiological studies and statistical calculations even if the actual data is not present.
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
Classification of Health Care Data Using Machine Learning Techniqueinventionjournals
Diagnosis of any diseases using intelligent system produces high accuracy and treated as classification problem, also classification of health care data using machine learning techniques may be used as intelligent health care system (IHSM). Diagnosis of heart disease is a major issue and commonly found in human being due to changing life style and also need to be identified in advance. This paper explores various machine learning techniques to classify heart data downloaded from UCI repository site. After applying feature selection technique (FST), a decision tree based technique: CART is producing high accuracy with four features followed by other classification techniques.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
Abstract- The goal of this paper is to compare between different classifiers or multi-classifiers fusion with respect to accuracy in discovering breast cancer for four different data sets. We present an implementation among various classification techniques which represent the most known algorithms in this field on four different datasets of breast cancer two for diagnosis and two for prognosis. We present a fusion between classifiers to get the best multi-classifier fusion approach to each data set individually. By using confusion matrix to get classification accuracy which built in 10-fold cross validation technique. Also, using fusion majority voting (the mode of the classifier output). The experimental results show that no classification technique is better than the other if used for all datasets, since the classification task is affected by the type of dataset. By using multi-classifiers fusion the results show that accuracy improved in three datasets out of four.
Data mining techniques are rapidly developed for many applications. In recent year, Data mining in healthcare is an emerging field research and development of intelligent medical diagnosis system. Classification is the major research topic in data mining. Decision trees are popular methods for classification. In this paper many decision tree classifiers are used for diagnosis of medical datasets. AD Tree, J48, NB Tree, Random Tree and Random Forest algorithms are used for analysis of medical dataset. Heart disease dataset, Diabetes dataset and Hepatitis disorder dataset are used to test the decision tree models. Aung Nway Oo | Thin Naing ""Decision Tree Models for Medical Diagnosis"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23510.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-miining/23510/decision-tree-models-for-medical-diagnosis/aung-nway-oo
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM csandit
The aim of this paper is to use data mining techniq
ue and opinion mining(OM) concepts to the
field of health informatics. The decision making in
health informatics involves number of
opinions given by the group of medical experts for
specific disease in the form of decision based
opinions which will be presented in medical databas
e in the form of text. These decision based
opinions are then mined from database with the help
of mining technique. Text document
clustering plays major role in the fast developing
information Explosion. It is considered as tool
for performing information based operations. Text d
ocument clustering generates clusters from
whole document collection automatically, normally K
-means clustering technique used for text
document clustering. In this paper we use Bisecting
K-means clustering technique and it is
better compared to traditional K-means technique. T
he objective is to study the revealed
groupings of similar opinion-types associated with
the likelihood of physicians and medical
experts.
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
Regularized Weighted Ensemble of Deep Classifiers ijcsa
Ensemble of classifiers increases the performance of the classification since the decision of many experts
are fused together to generate the resultant decision for prediction making. Deep learning is a classification algorithm where along with the basic learning technique, fine tuning learning is done for improved precision of learning. Deep classifier ensemble learning is having a good scope of research.Feature subset selection is another for creating individual classifiers to be fused for ensemble learning. All these ensemble techniques faces ill posed problem of overfitting. Regularized weighted ensemble of deep support vector machine performs the prediction analysis on the three UCI repository problems IRIS,Ionosphere and Seed data set, thereby increasing the generalization of the boundary plot between the
classes of the data set. The singular value decomposition reduced norm 2 regularization with the two level
deep classifier ensemble gives the best result in our experiments.
Enhanced Detection System for Trust Aware P2P Communication NetworksEditor IJCATR
Botnet is a number of computers that have been set up to forward transmissions to other computers unknowingly to the user
of the system and it is most significant to detect the botnets. However, peer-to-peer (P2P) structured botnets are very difficult to detect
because, it doesn’t have any centralized server. In this paper, we deliver an infrastructure of P2P that will improve the trust of the peers
and its data. In order to identify the botnets we provide a technique called data provenance integrity. It will ensure the correct origin or
source of information and prevents opponents from using host resources. A reputation based trust model is used for selecting the
trusted peer. In this model, each peer has a reputation value which is calculated based on its past activity. Here a hash table is used for
efficient file searching and data stored in it is based on the reputation value.
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different se
t of problems for
diabetic
patient’s
data
.
The
research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48
,
J48 Graft, Random tree, REP, LAD. Here used to compare the
performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and
to find the error rate measurement for different classifiers in
weka .In this paper the
data
classification is diabetic patients data set is develope
d by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine t
ests.
Weka tool is used to classify the data is evaluated using 10 fold cross validat
ion and the results are compared. When the
performance of algorithms
,
we found J48 is better algorithm in most of the cases
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different set of problems for diabetic patient’s data. The research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48,
J48 Graft, Random tree, REP, LAD. Here used to compare the performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and to find the error rate measurement for different classifiers in
weka .In this paper the data classification is diabetic patients data set is developed by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine tests.
Weka tool is used to classify the data is evaluated using 10 fold cross validation and the results are compared. When the
performance of algorithms, we found J48 is better algorithm in most of the cases.
Data Mining System and Applications: A Reviewijdpsjournal
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
A huge amount of medical data is generated every da
y, which presents a challenge in analysing
these data. The obvious solution to this challenge
is to reduce the amount of data without
information loss. Dimension reduction is considered
the most popular approach for reducing
data size and also to reduce noise and redundancies
in data. In this paper, we investigate the
effect of feature selection in improving the predic
tion of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a su
bset of features would mean choosing the
most important lab tests to perform. If the number
of tests can be reduced by identifying the
most important tests, then we could also identify t
he redundant tests. By omitting the redundant
tests, observation time could be reduced and early
treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be av
oided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deteri
oration using the medical lab results. We
apply our technique on the publicly available MIMIC
-II database and show the effectiveness of
the feature selection. We also provide a detailed a
nalysis of the best features identified by our
approach.
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHcscpconf
A huge amount of medical data is generated every day, which presents a challenge in analysing
these data. The obvious solution to this challenge is to reduce the amount of data without
information loss. Dimension reduction is considered the most popular approach for reducing
data size and also to reduce noise and redundancies in data. In this paper, we investigate the
effect of feature selection in improving the prediction of patient deterioration in ICUs. We
consider lab tests as features. Thus, choosing a subset of features would mean choosing the
most important lab tests to perform. If the number of tests can be reduced by identifying the
most important tests, then we could also identify the redundant tests. By omitting the redundant
tests, observation time could be reduced and early treatment could be provided to avoid the risk.
Additionally, unnecessary monetary cost would be avoided. Our approach uses state-of-the-art
feature selection for predicting ICU patient deterioration using the medical lab results. We
apply our technique on the publicly available MIMIC-II database and show the effectiveness of
the feature selection. We also provide a detailed analysis of the best features identified by our
approach.
Similar to Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision (20)
The Use of Java Swing’s Components to Develop a WidgetWaqas Tariq
Widget is a kind of application provides a single service such as a map, news feed, simple clock, battery-life indicators, etc. This kind of interactive software object has been developed to facilitate user interface (UI) design. A user interface (UI) function may be implemented using different widgets with the same function. In this article, we present the widget as a platform that is generally used in various applications, such as in desktop, web browser, and mobile phone. We also describe a visual menu of Java Swing’s components that will be used to establish widget. It will assume that we have successfully compiled and run a program that uses Swing components.
3D Human Hand Posture Reconstruction Using a Single 2D ImageWaqas Tariq
Passive sensing of the 3D geometric posture of the human hand has been studied extensively over the past decade. However, these research efforts have been hampered by the computational complexity caused by inverse kinematics and 3D reconstruction. In this paper, our objective focuses on 3D hand posture estimation based on a single 2D image with aim of robotic applications. We introduce the human hand model with 27 degrees of freedom (DOFs) and analyze some of its constraints to reduce the DOFs without any significant degradation of performance. A novel algorithm to estimate the 3D hand posture from eight 2D projected feature points is proposed. Experimental results using real images confirm that our algorithm gives good estimates of the 3D hand pose. Keywords: 3D hand posture estimation; Model-based approach; Gesture recognition; human- computer interface; machine vision.
Camera as Mouse and Keyboard for Handicap Person with Troubleshooting Ability...Waqas Tariq
Camera mouse has been widely used for handicap person to interact with computer. The utmost important of the use of camera mouse is must be able to replace all roles of typical mouse and keyboard. It must be able to provide all mouse click events and keyboard functions (include all shortcut keys) when it is used by handicap person. Also, the use of camera mouse must allow users troubleshooting by themselves. Moreover, it must be able to eliminate neck fatigue effect when it is used during long period. In this paper, we propose camera mouse system with timer as left click event and blinking as right click event. Also, we modify original screen keyboard layout by add two additional buttons (button “drag/ drop” is used to do drag and drop of mouse events and another button is used to call task manager (for troubleshooting)) and change behavior of CTRL, ALT, SHIFT, and CAPS LOCK keys in order to provide shortcut keys of keyboard. Also, we develop recovery method which allows users go from camera and then come back again in order to eliminate neck fatigue effect. The experiments which involve several users have been done in our laboratory. The results show that the use of our camera mouse able to allow users do typing, left and right click events, drag and drop events, and troubleshooting without hand. By implement this system, handicap person can use computer more comfortable and reduce the dryness of eyes.
A Proposed Web Accessibility Framework for the Arab DisabledWaqas Tariq
The Web is providing unprecedented access to information and interaction for people with disabilities. This paper presents a Web accessibility framework which offers the ease of the Web accessing for the disabled Arab users and facilitates their lifelong learning as well. The proposed framework system provides the disabled Arab user with an easy means of access using their mother language so they don’t have to overcome the barrier of learning the target-spoken language. This framework is based on analyzing the web page meta-language, extracting its content and reformulating it in a suitable format for the disabled users. The basic objective of this framework is supporting the equal rights of the Arab disabled people for their access to the education and training with non disabled people. Key Words : Arabic Moon code, Arabic Sign Language, Deaf, Deaf-blind, E-learning Interactivity, Moon code, Web accessibility , Web framework , Web System, WWW.
Real Time Blinking Detection Based on Gabor FilterWaqas Tariq
New method of blinking detection is proposed. The utmost important of blinking detections method is robust against different users, noise, and also change of eye shape. In this paper, we propose blinking detections method by measuring of distance between two arcs of eye (upper part and lower part). We detect eye arcs by apply Gabor filter onto eye image. As we know that Gabor filter has advantage on image processing application since it able to extract spatial localized spectral features, such line, arch, and other shape are more easily detected. After two of eye arcs are detected, we measure the distance between both by using connected labeling method. The open eye is marked by the distance between two arcs is more than threshold and otherwise, the closed eye is marked by the distance less than threshold. The experiment result shows that our proposed method robust enough against different users, noise, and eye shape changes with perfectly accuracy.
Computer Input with Human Eyes-Only Using Two Purkinje Images Which Works in ...Waqas Tariq
A method for computer input with human eyes-only using two Purkinje images which works in a real time basis without calibration is proposed. Experimental results shows that cornea curvature can be estimated by using two light sources derived Purkinje images so that no calibration for reducing person-to-person difference of cornea curvature. It is found that the proposed system allows usersf movements of 30 degrees in roll direction and 15 degrees in pitch direction utilizing detected face attitude which is derived from the face plane consisting three feature points on the face, two eyes and nose or mouth. Also it is found that the proposed system does work in a real time basis.
Toward a More Robust Usability concept with Perceived Enjoyment in the contex...Waqas Tariq
Mobile multimedia service is relatively new but has quickly dominated people¡¯s lives, especially among young people. To explain this popularity, this study applies and modifies the Technology Acceptance Model (TAM) to propose a research model and conduct an empirical study. The goal of study is to examine the role of Perceived Enjoyment (PE) and what determinants can contribute to PE in the context of using mobile multimedia service. The result indicates that PE is influencing on Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) and directly Behavior Intention (BI). Aesthetics and flow are key determinants to explain Perceived Enjoyment (PE) in mobile multimedia usage.
Collaborative Learning of Organisational KnolwedgeWaqas Tariq
This paper presents recent research into methods used in Australian Indigenous Knowledge sharing and looks at how these can support the creation of suitable collaborative envi- ronments for timely organisational learning. The protocols and practices as used today and in the past by Indigenous communities are presented and discussed in relation to their relevance to a personalised system of knowledge sharing in modern organisational cultures. This research focuses on user models, knowledge acquisition and integration of data for constructivist learning in a networked repository of or- ganisational knowledge. The data collected in the repository is searched to provide collections of up-to-date and relevant material for training in a work environment. The aim is to improve knowledge collection and sharing in a team envi- ronment. This knowledge can then be collated into a story or workflow that represents the present knowledge in the organisation.
Our research aims to propose a global approach for specification, design and verification of context awareness Human Computer Interface (HCI). This is a Model Based Design approach (MBD). This methodology describes the ubiquitous environment by ontologies. OWL is the standard used for this purpose. The specification and modeling of Human-Computer Interaction are based on Petri nets (PN). This raises the question of representation of Petri nets with XML. We use for this purpose, the standard of modeling PNML. In this paper, we propose an extension of this standard for specification, generation and verification of HCI. This extension is a methodological approach for the construction of PNML with Petri nets. The design principle uses the concept of composition of elementary structures of Petri nets as PNML Modular. The objective is to obtain a valid interface through verification of properties of elementary Petri nets represented with PNML.
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
An overview on Advanced Research Works on Brain-Computer InterfaceWaqas Tariq
A brain–computer interface (BCI) is a proficient result in the research field of human- computer synergy, where direct articulation between brain and an external device occurs resulting in augmenting, assisting and repairing human cognitive. Advanced works like generating brain-computer interface switch technologies for intermittent (or asynchronous) control in natural environments or developing brain-computer interface by Fuzzy logic Systems or by implementing wavelet theory to drive its efficacies are still going on and some useful results has also been found out. The requirements to develop this brain machine interface is also growing day by day i.e. like neuropsychological rehabilitation, emotion control, etc. An overview on the control theory and some advanced works on the field of brain machine interface are shown in this paper.
Exploring the Relationship Between Mobile Phone and Senior Citizens: A Malays...Waqas Tariq
There is growing ageing phenomena with the rise of ageing population throughout the world. According to the World Health Organization (2002), the growing ageing population indicates 694 million, or 223% is expected for people aged 60 and over, since 1970 and 2025.The growth is especially significant in some advanced countries such as North America, Japan, Italy, Germany, United Kingdom and so forth. This growing older adult population has significantly impact the social-culture, lifestyle, healthcare system, economy, infrastructure and government policy of a nation. However, there are limited research studies on the perception and usage of a mobile phone and its service for senior citizens in a developing nation like Malaysia. This paper explores the relationship between mobile phones and senior citizens in Malaysia from the perspective of a developing country. We conducted an exploratory study using contextual interviews with 5 senior citizens of how they perceive their mobile phones. This paper reveals 4 interesting themes from this preliminary study, in addition to the findings of the desirable mobile requirements for local senior citizens with respect of health, safety and communication purposes. The findings of this study bring interesting insight to local telecommunication industries as a whole, and will also serve as groundwork for more in-depth study in the future.
Principles of Good Screen Design in WebsitesWaqas Tariq
Visual techniques for proper arrangement of the elements on the user screen have helped the designers to make the screen look good and attractive. Several visual techniques emphasize the arrangement and ordering of the screen elements based on particular criteria for best appearance of the screen. This paper investigates few significant visual techniques in various web user interfaces and showcases the results for better understanding and their presence.
Virtual teams are used more and more by companies and other organizations to receive benefits. They are a great way to enable teamwork in situations where people are not sitting in the same physical place at the same time. As companies seek to increase the use of virtual teams, a need exists to explore the context of these teams, the virtuality of a team and software that may help these teams working virtualy. Virtual teams have the same basic principles as traditional teams, but there is one big difference. This difference is the way the team members communicate. Instead of using the dynamics of in-office face-to-face exchange, they now rely on special communication channels enabled by modern technologies, such as e-mails, faxes, phone calls and teleconferences, virtual meetings etc. This is why this paper is focused on the issues regarding virtual teams, and how these teams are created and progressing in Albania.
Cognitive Approach Towards the Maintenance of Web-Sites Through Quality Evalu...Waqas Tariq
It is a well established fact that the Web-Applications require frequent maintenance because of cutting– edge business competitions. The authors have worked on quality evaluation of web-site of Indian ecommerce domain. As a result of that work they have made a quality-wise ranking of these sites. According to their work and also the survey done by various other groups Futurebazaar web-site is considered to be one of the best Indian e-shopping sites. In this research paper the authors are assessing the maintenance of the same site by incorporating the problems incurred during this evaluation. This exercise gives a real world maintainability problem of web-sites. This work will give a clear picture of all the quality metrics which are directly or indirectly related with the maintainability of the web-site.
USEFul: A Framework to Mainstream Web Site Usability through Automated Evalua...Waqas Tariq
A paradox has been observed whereby web site usability is proven to be an essential element in a web site, yet at the same time there exist an abundance of web pages with poor usability. This discrepancy is the result of limitations that are currently preventing web developers in the commercial sector from producing usable web sites. In this paper we propose a framework whose objective is to alleviate this problem by automating certain aspects of the usability evaluation process. Mainstreaming comes as a result of automation, therefore enabling a non-expert in the field of usability to conduct the evaluation. This results in reducing the costs associated with such evaluation. Additionally, the framework allows the flexibility of adding, modifying or deleting guidelines without altering the code that references them since the guidelines and the code are two separate components. A comparison of the evaluation results carried out using the framework against published evaluations of web sites carried out by web site usability professionals reveals that the framework is able to automatically identify the majority of usability violations. Due to the consistency with which it evaluates, it identified additional guideline-related violations that were not identified by the human evaluators.
Robot Arm Utilized Having Meal Support System Based on Computer Input by Huma...Waqas Tariq
A robot arm utilized having meal support system based on computer input by human eyes only is proposed. The proposed system is developed for handicap/disabled persons as well as elderly persons and tested with able persons with several shapes and size of eyes under a variety of illumination conditions. The test results with normal persons show the proposed system does work well for selection of the desired foods and for retrieve the foods as appropriate as usersf requirements. It is found that the proposed system is 21% much faster than the manually controlled robotics.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Home assignment II on Spectroscopy 2024 Answers.pdf
Decision Tree Classifiers to determine the patient’s Post-operative Recovery Decision
1. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 75
Decision Tree Classifiers to Determine the Patient’s Post-
Operative Recovery Decision
D.Shanthi dshan71@gmail.com
Department of Computer Science and IT
Mazoon University College
Seeb, P.O.Box 101, Muscat, Sultanate of Oman
Dr.G.Sahoo gsahoo@bitmesra.ac.in
Department of Information Technology
Birla Institute of Technology
Mesra, Ranchi, India
Dr.N.Saravanan saranshanu@gmail.com
Department of Computer Science and IT
Mazoon University College
Seeb, P.O.Box 101, Muscat, Sultanate of Oman
Abstract
Machine Learning aims to generate classifying expressions simple enough to be
understood easily by the human. There are many machine learning approaches
available for classification. Among which decision tree learning is one of the most
popular classification algorithms. In this paper we propose a systematic approach
based on decision tree which is used to automatically determine the patient’s
post–operative recovery status. Decision Tree structures are constructed, using
data mining methods and then are used to classify discharge decisions.
Keywords: Data Mining, Decision Tree, Machine Learning, Post-operative Recovery.
1. INTRODUCTION
Decision support systems help physicians and also play an important role in medical decision
making. They are based on different models and the best of them are providing an explanation
together with an accurate, reliable and quick response. Clinical decision-making is a unique
process that involves the interplay between knowledge of pre-existing pathological conditions,
explicit patient information, nursing care and experiential learning [1]. One of the most popular
among machine learning approaches is decision trees. A data object, referred to as an example,
is described by a set of attributes or variables and one of the attributes describes the class that
an example belongs to and is thus called the class attribute or class variable. Other attributes are
often called independent or predictor attributes (or variables). The set of examples used to learn
the classification model is called the training data set. Tasks related to classification include
regression, which builds a model from training data to predict numerical values, and clustering,
which groups examples to form categories. Classification belongs to the category of supervised
learning, distinguished from unsupervised learning. In supervised learning, the training data
consists of pairs of input data (typically vectors), and desired outputs, while in unsupervised
learning there is no a priori output. For years they have been successfully used in many medical
decision making applications. Transparent representation of acquired knowledge and fast
algorithms made decision trees what they are today: one of the most often used symbolic
machine learning approaches [2]. Decision trees have been already successfully used in
2. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 76
medicine, but as in traditional statistics, some hard real world problems cannot be solved
successfully using the traditional way of induction [3]. A hybrid neuro-genetic approach has been
used for the selection of input features for the neural network and the experimental results proved
that the performance of ANN can be improved by selecting good combination of input variables
and this hybrid approach gives better average prediction accuracy than the traditional ANN [4].
Classification has various applications, such as learning from a patient database to diagnose a
disease based on the symptoms of a patient, analyzing credit card transactions to identify
fraudulent transactions, automatic recognition of letters or digits based on handwriting samples,
and distinguishing highly active compounds from inactive ones based on the structures of
compounds for drug discovery [5]. Seven algorithms are compared in the training of the multi-
layered Neural Network Architecture for the prediction of patient’s post-operative recovery area
and the best classification rates are compared [6].
In this study, the Gini Splitting algorithm is used with promising results in a crucial way and at the
same time complicated classification problem concerning the prediction of post-operative
recovery area.
2. MOTIVATIONS AND RELATED WORK
In data mining, there are three primary components: model representation, model evaluation and
search. The two basic types of search methods used in data mining consist of two components:
Parameter Search and Model Search [15]. In parameter search, the algorithm searches for the
set of parameters for a fixed model representation, which optimizes the model evaluation criteria
given the observed data. For relatively simple problems, the search is simple and the optimal
parameter estimates can be obtained in a closed form. Typically, for more general models, a
closed form solution is not available. In such cases, iterative methods, such as the gradient
descent method of back-propagation for neural networks, are commonly used. The gradient
descent method is one of the popular search techniques in conventional optimization [16].
Decision trees are considered to be one of the most popular approaches for representing
classifiers. It induces a decision tree from data. A decision tree is a tree structured prediction
model where each internal node denotes a test on an attribute, each outgoing branch represents
an outcome of the test, and each leaf node is labeled with a class or class distribution. Decision
trees are often used in classification and prediction. It is simple yet a powerful way of knowledge
representation. The models produced by decision trees are represented in the form of tree
structure. Learning a decision tree involves deciding which split to make at each node, and how
deep the tree should be. A leaf node indicates the class of the examples. The instances are
classified by sorting them down the tree from the root node to some leaf node [2].
FIGURE 1: Decision Tree Structure
3. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 77
Decision trees (DTs) are either univariate or multivariate [17]. Univariate decision trees (UDTs)
approximate the underlying distribution by partitioning the feature space recursively with axis-
parallel hyperplanes. The underlying function, or relationship between inputs and outputs, is
approximated by a synthesis of the hyper-rectangles generated from the partitions. Multivariate
decision trees (MDTs) have more complicated partitioning methodologies and are
computationally more expensive than UDTs. A typical decision tree learning algorithm adopts a
top-down recursive divide-and-conquer strategy to construct a decision tree. Starting from a root
node representing the whole training data, the data is split into two or more subsets based on the
values of an attribute chosen according to a splitting criterion. For each subset a child node is
created and the subset is associated with the child. The process is then separately repeated on
the data in each of the child nodes, and so on, until a termination criterion is satisfied. Many
decision tree learning algorithms exist. They differ mainly in attribute-selection criteria, such as
information gain, gain ratio [7], gini index [8], termination criteria and post-pruning strategies.
Post-pruning is a technique that removes some branches of the tree after the tree is constructed
to prevent the tree from over-fitting the training data. Representative decision tree algorithms
include CART [8] and C4.5 [7]. There are also studies on fast and scalable construction of
decision trees. Representative algorithms of such kind include RainForest [9] and SPRINT [10].
3. METHODS AND METHODOLOGY
The tool used in this study is known as DTREG [14]. It basically follows the same principles as
many other decision tree building tools, but it also implements different extensions. One of those
extensions is called dynamic discretization of continuous attributes, which was used in our
experiments with success. For this study, we have taken classification problem. The problem
consists of determining the class (General Hospital, Go-home, Intensive care) for a certain input
vector. For this study, the data was taken from UCI Machine Learning Repository.
(http://www.ics.uci.edu/~mlearn/ MLSummary.html) [11]. The data were originally created by
Sharon Summers (School of Nursing, University of Kansas) and Linda Woolery (School of
Nursing, University of Missouri) and donated by Jerzy W. Grzymala-Busse. The goal is to
determine where patients in a postoperative recovery area should be sent to next. Because
hypothermia is a significant concern after surgery. The attributes correspond roughly to body
temperature measurements. The dataset contains 90 records with 8 characteristics of a patient's
state in a postoperative period.
1) InternalTemp - patient's internal temperature in C: high (> 37), mid (>= 36 and <= 37),
low (< 36)
2) SurfaceTemp - patient's surface temperature in C: high (> 36.5), mid (>= 36.5 an d <= 35),
low (< 35)
3) OxygenSat - oxygen saturation in %: excellent (>= 98), good (>= 90 and < 98),
fair (>= 80 and < 90), poor (< 80)
4) BloodPress - last measurement of blood pressure: high (> 130/90), mid (<= 130/90
and >= 90/70), low (< 90/70)
5) SurfTempStab - stability of patient's surface temperature: stable, mod-stable, unstable
6) IntTempStab - stability of patient's internal temperature: stable, mod-stable, unstable
7) BloodPressStab - stability of patient's blood pressure: stable, mod-stable, unstable
8) Comfort - patient's perceived comfort at discharge, measured as an integer between 0 and 20
A. Target Column
9) Discharge Decision - discharge decision (Intensive-Care, Go-Home, General-Hospital):
Intensive-Care (patient sent to Intensive Care Unit),
Go-Home (patient prepared to go home),
General-Hospital (patient sent to general hospital floor).
4. EXPERIMENTAL RESULTS AND EVALUATION
The input to a classification problem is a dataset of training records and each record has several
attributes. Attributes whose domains are numerical are called numerical attributes (continuous
attributes), whereas attributes whose domains are not are called categorical attributes (discrete
4. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 78
attributes). There is one distinguished attribute, called class label, which is a categorical attribute
with a very small domain. The remaining attributes are called predictor attributes; they are either
continuous or discrete in nature. The goal of classification is to build a concise model of the
distribution of the class label in terms of the predictor attributes [12].
Sl.No Attributes
1. InternalTemp
2. SurfaceTemp
3. OxygenSat
4. BloodPress
5. SurfTempStab
6. IntTempStab
7. BloodPressStab
8. Comfort
TABLE 1: Input Attributes
In this data analysis the last column will be considered as the target one and other columns will
be considered as input columns. The target classes are depicted below in Table 2.
Sl.No Output
1. General-Hospital
2. Go Home
3. Intensive Care
TABLE 2: Target Classes
In this data set total numbers of variables are 9 and for data sub setting has been done using all
data rows. The total numbers of data rows are 88 and total weights for all rows are 88. From
analysis it has been noticed that number of rows with missing target or weight values are 0 and
rows with missing predictor values are 3 and were discarded because these variables had
missing values. Decision tree inducers are algorithms that automatically construct a decision tree
from a given dataset. Typically the goal is to find the optimal decision tree by minimizing the
generalization error. However, other target functions can be also defined, for instance, minimizing
the number of nodes or minimizing the average depth.
5. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 79
Target variable Discharge Decision
Number of predictor variables 8
Type of model Single tree
Maximum splitting levels 10
Type of analysis Classification
Splitting algorithm Gini
Category weights (priors) Data file distribution
Misclassification costs Equal (unitary)
Variable weights Equal
Minimum size node to split 10
Max. categories for continuous
predictors
200
Use surrogate splitters for missing
values
Yes
Always compute surrogate splitters No
Tree pruning and validation method Cross validation
Tree pruning criterion Minimum cost complexity
(0.00 S.E.)
Number of cross-validation folds 10
TABLE 3: Decision Tree Parameters
The Gini splitting algorithm is used for classification trees. Each split is chosen to maximize the
heterogeneity of the categories of the target variable in the child nodes. Cross Validation method
is used to evaluate the quality of the model. In this study, single decision tree model was built
with maximum splitting level is 10 and misclassification cost (Equal) are same for all
categories. Surrogate splitters are used for the classification of rows with missing values.
There are various top-down decision trees inducers such as ID3 [13], C4.5 [7], CART [8]. Some
consist of two conceptual phases: growing and pruning (C4.5 and CART). Other inducers perform
only the growing phase. In most of the cases, the discrete splitting functions are univariate.
Univariate means that an internal node is split according to the value of a single attribute.
Consequently, the inducer searches for the best attribute upon which to split. There are various
univariate criteria. These criteria can be characterized in different ways:
• According to the origin of the measure: information theory, dependence, and distance.
• According to the measure structure: impurity based criteria, normalized impurity based
criteria and Binary criteria.
Gini index is an impurity-based criterion that measures the divergences between the probability
distributions of the target attribute's values. In this work Gini index has been used and it is
defined as
(1)
Consequently the evaluation criterion for selecting the attribute ai is defined as:
(2)
The summary of variables section displays information about each variable that was present in
the input dataset. The first column shows the name of the variable, the second column shows
how the variable was used; the possibilities are Target, Predictor, Weight and Unused. The third
column shows whether the variable is categorical or continuous, the forth column shows how
6. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 80
many data rows had missing values on the variable, and the fifth column shows how many
categories (discrete values) the variable has, as shown in table 4.
TABLE 4: Summary of variables
In Table 4, Discharge Decision is a target variable and the values are to modeled and predicted
by other variables. There must be one and only one target variable in a model. The variables
such as InternalTemp, SurfaceTemp, OxygenSat, etc are predictor variables and the values of
this variable are used to predict the value of the target variable. In this study there are 8
predictor variables and 1 target variable are used to predict the model. All the variables except
comfort are categorical variables.
Maximum depth of the
tree
9
Total number of group
splits
16
No of terminal (leaf)
nodes of a tree
9
The minimum
validation relative
error occurs
with 9 nodes.
The relative error
value
1.1125
Standard error 0.1649
The tree will be
pruned
from 28 to 7
nodes.
TABLE 5: Model Size - Summary Report
Table 5 displays information about the maximum size tree that was built, and it shows summary
information about the parameters that were used to prune the tree.
Nodes
Val
Cost
Val
std err
RS
cost
Complexity
9 1.1125 0.1649 0.7200 0.000000
1 1.0000 0.0000 1.0000 0.009943
TABLE 6: Validation Statistics
No Variable Class Type Missing
rows
Categories
1 InternalTemp Predictor Categorical 0 3
2 SurfaceTemp Predictor Categorical 0 3
3 OxygenSat Predictor Categorical 0 2
4 BloodPress Predictor Categorical 0 3
5 SurfaceTempStab Predictor Categorical 0 2
6 IntTempStab Predictor Categorical 0 3
7 BloodPressStab Predictor Categorical 0 3
8 Comfort Predictor Continuous 3 4
9 Discharge
Decision
Target Categorical 0 3
7. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 81
Table 6 displays information about the size of the generated tree and statistics used to prune the
tree. There are five columns in the table:
Nodes – This is the number of terminal nodes in a particular pruned version of the tree. It will
range from 1 up to the maximum nodes in the largest tree that was generated. The maximum
number of nodes will be limited by the maximum depth of the tree and the minimum node size
allowed to be split on the Design property page for the model.
Val cost – This is the validation cost of the tree pruned to the reported number of nodes. It is the
error cost computed using either cross-validation or the random-row-holdback data. The
displayed cost value is the cost relative to the cost for a tree with one node.
The validation cost is the best measure of how well the tree will fit an independent dataset
different from the learning dataset.
Val std. err. – This is the standard error of the validation cost value.
RS cost – This is the re-substitution cost computed by running the learning dataset through the
tree. The displayed re-substitution cost is scaled relative to the cost for a tree with one node.
Since the tree is being evaluated by the same data that was used to build it, the re-substitution
cost does not give an honest estimate of the predictive accuracy of the tree for other data.
Complexity – This is a Cost Complexity measure that shows the relative tradeoff between the
number of terminal nodes and the misclassification cost.
Actual Misclassified
Category Count Weight Count Weight Percent Cost
General-
Hospital
63 63 1 1 1.587 0.0616
Go-Home 23 23 15 15 65.217 0.652
Intensive-
Care
2 2 2 2 100.00 1.000
Total 88 88 18 18 20.455 0.205
TABLE 7: Misclassification Table: Training Data
Table 7 shows the misclassifications for the training dataset and it describes the number of rows
for “General – Hospital” misclassified by the tree 1, for Go-Home is 15 and for Intensive-Care is 2.
This misclassification cost for Intensive-Care is high.
Actual Misclassified
Category Count Weight Count Weight Percent Cost
General-
Hospital
63 63 8 8 12.698 0.127
Go-Home 23 23 18 18 78.261 0.783
Intensive-
Care
2 2 2 2 100.00 1.000
Total 88 88 28 28 31.818 0.318
TABLE 8: Misclassification Table: Validation Data
8. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 82
Table 8 shows the misclassifications for the validation dataset and total misclassification percent
is 31.818%. A Confusion Matrix provides detailed information about how data rows are classified
by the model. The matrix has a row and column for each category of the target variable. The
categories shown in the first column are the actual categories of the target variable. The
categories shown across the top of the table are the predicted categories. The numbers in the
cells are the weights of the data rows with the actual category of the row and the predicted
category of the column. Table 9 & 10 shows confusion matrix for training data and for validation
data.
Predicated Category
Actual
Category
General-
Hospital
Go-Home
Intensive-
care
General-
Hospital
62 1 0
Go-Home 15 8 0
Intensive-
Care
2 0 0
TABLE 9: Confusion Matrix- Training Data
The numbers in the diagonal cells are the weights for the correctly classified cases where the
actual category matches the predicted category. The off-diagonal cells have misclassified row
weights.
Predicated Category
Actual Category
General-
Hospital
Go-Home
Intensive-
care
General-Hospital 58 8 0
Go-Home 18 5 0
Intensive-Care 1 1 0
TABLE 10: Confusion Matrix - Validation Data
A. Lift/Gain for Training Data
The lift and gain table is a useful tool for measuring the value of a predictive model. Lift and gain
values are especially useful when a model is being used to target (prioritize) marketing efforts.
The following 3 table’s shows Lift/Gain for the three different outputs such as General-Hospital,
Go-Home, and Intensive Care.
Lift/Gain for Discharge Decision = General-Hospital
9. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 83
Bin
Index
Class
% of
Bin
Cum %
Population
Cum
% of
Class
Gain
% of
Population
% of
class
Lift
1 88.89 10.23 12.70 1.24 10.23 12.70 1.24
2 66.67 20.45 22.22 1.09 10.23 9.52 0.93
3 77.78 30.68 33.33 1.09 10.23 11.11 1.09
4 88.89 40.91 46.03 1.13 10.23 12.70 1.24
5 77.78 51.14 57.14 1.12 10.23 11.11 1.09
6 77.78 61.36 68.25 1.11 10.23 11.11 1.09
7 100.00 71.59 82.54 1.15 10.23 14.29 1.40
8 66.67 81.82 92.06 1.13 10.23 9.52 0.93
9 44.44 92.05 98.41 1.07 10.23 6.35 0.62
10 14.29 100.00 100.00 1.00 7.95 1.59 0.20
TABLE 11: Lift and Gain for training data (General-Hospital)
Average gain = 1.112
Percent of cases with Discharge Decision = General-Hospital: 71.59%
Lift/Gain for Discharge Decision = Go-Home
Bin
Index
Class
% of
Bin
Cum %
Population
Cum
% of
Class
Gain
% of
Population
% of
class
Lift
1 88.89 10.23 34.78 3.40 10.23 34.78 3.40
2 0.00 20.45 34.78 1.70 10.23 0.00 0.00
3 33.33 30.68 47.83 1.56 10.23 12.04 1.28
4 11.11 40.91 52.17 1.28 10.23 4.35 0.43
5 22.22 51.14 60.87 1.19 10.23 8.70 0.85
6 22.22 61.36 69.57 1.13 10.23 8.70 0.85
7 22.22 71.59 78.26 1.09 10.23 8.70 0.85
8 22.22 81.82 86.96 1.06 10.23 8.70 0.85
9 11.11 92.05 91.30 0.99 10.23 4.35 0.43
10 28.57 100.00 100.00 1.00 7.95 8.70 1.09
TABLE 12: Lift and Gain for training data (Go-Home)
Average gain = 1.441
Percent of cases with Discharge Decision = Go-Home: 26.14%
Lift/Gain for Discharge Decision = Intensive-Care
Bin
Index
Class
% of
Bin
Cum %
Population
Cum
% of
Class
Gain
% of
Population
% of
class
Lift
1 2.27 10.00 10.00 1.00 10.00 10.00 1.00
2 2.27 20.00 20.00 1.00 10.00 10.00 1.00
3 2.27 30.00 30.00 1.00 10.00 10.00 1.00
4 2.27 40.00 40.00 1.00 10.00 10.00 1.00
5 2.27 50.00 50.00 1.00 10.00 10.00 1.00
6 2.27 60.00 60.00 1.00 10.00 10.00 1.00
7 2.27 70.00 70.00 1.00 10.00 10.00 1.00
8 2.27 80.00 80.00 1.00 10.00 10.00 1.00
9 2.27 90.00 90.00 1.00 10.00 10.00 1.00
10 2.27 100.00 100.00 1.00 10.00 10.00 1.00
10. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 84
TABLE 13: Lift and Gain for training data (Intensive -Care)
Average gain = 1.000
Percent of cases with Discharge Decision = Intensive-Care: 2.27%
Lift and Gain for validation data
Lift/Gain for Discharge Decision = General-Hospital
Bin
Index
Class
% of
Bin
Cum %
Population
Cum
% of
Class
Gain
% of
Population
% of
class
Lift
1 77.78 10.23 11.11 1.09 10.23 11.11 1.09
2 55.56 20.45 19.05 0.93 10.23 7.94 0.78
3 77.78 30.68 30.16 0.98 10.23 11.11 1.09
4 77.78 40.91 41.27 1.01 10.23 11.11 1.09
5 88.89 51.14 53.97 1.06 10.23 12.70 1.24
6 88.89 61.36 66.67 1.09 10.23 12.70 1.24
7 100.00 71.59 80.95 1.13 10.23 14.29 1.40
8 44.44 81.82 87.30 1.07 10.23 6.35 0.62
9 44.44 92.05 93.65 1.02 10.23 6.35 0.62
10 57.14 100.00 100.00 1.00 7.95 6.35 0.80
TABLE 14: Lift and Gain for Validation data (General Hospital)
Average gain = 1.037
Percent of cases with Discharge Decision = General-Hospital: 71.59%
Lift/Gain for Discharge Decision = Go-Home
Bin
Index
Class
% of
Bin
Cum %
Population
Cum
% of
Class
Gain
% of
Population
% of
class
Lift
1 22.22 10.23 8.70 0.85 10.23 8.70 0.85
2 44.44 20.45 26.09 1.28 10.23 17.39 1.70
3 44.44 30.68 43.48 1.42 10.23 17.39 1.70
4 33.33 40.91 56.52 1.38 10.23 13.04 1.28
5 22.22 51.14 65.22 1.28 10.23 8.70 0.85
6 22.22 61.36 73.91 1.20 10.23 8 .70 0.85
7 11.11 71.59 78.26 1.09 10.23 4.35 0.43
8 22.22 81.82 86.96 1.06 10.23 8.70 0.85
9 22.22 92.05 95.65 1.04 10.23 8.70 0.85
10 14.29 100.00 100.00 1.00 7.95 4.35 0.55
TABLE 15: Lift and Gain for Validation data (Go-Home)
Average gain = 1.160
Percent of cases with Discharge Decision = Go-Home: 26.14%
Lift/Gain for Discharge Decision = Intensive-Care
11. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 85
Bin
Index
Class
% of
Bin
Cum %
Population
Cum
% of
Class
Gain
% of
Population
% of
class
Lift
1 2.27 10.00 10.00 1.00 10.00 10.00 1.00
2 2.27 20.00 20.00 1.00 10.00 10.00 1.00
3 2.27 30.00 30.00 1.00 10.00 10.00 1.00
4 2.27 40.00 40.00 1.00 10.00 10.00 1.00
5 2.27 50.00 50.00 1.00 10.00 10.00 1.00
6 2.27 60.00 60.00 1.00 10.00 10.00 1.00
7 2.27 70.00 70.00 1.00 10.00 10.00 1.00
8 2.27 80.00 80.00 1.00 10.00 10.00 1.00
9 2.27 90.00 90.00 1.00 10.00 10.00 1.00
10 2.27 100.00 100.00 1.00 10.00 10.00 1.00
TABLE 16: Lift and Gain for Validation data (Intensive Care)
Average gain = 1.000
Percent of cases with Discharge Decision = Intensive-Care: 2.27%
Table 17 shows the relative importance of each predictor variables. The calculation is performed
using sensitive analysis where the values of each variables are randomized and the effect on the
quality of the model is measured.
Sl.No Variable Importance
1 Comfort 100.00
2 BloodPressStab 51.553
3 InternalTemp 34.403
4 SurfaceTemp 25.296
5 IntTempStab 20.605
6 BloodPress 20.045
TABLE 17: Over all Importance of Variables
The results of this experimental research through the above critical analysis using this decision
tree based method shows more accurate data prediction and helping the clinical experts in taking
decisions such as General Hospital, Go-Home and Intensive-care. More accuracy will be
guaranteed for large data sets.
5. CONCLUSION
This paper has presented a decision tree classifier to determine the patient’s post-operative
recovery decisions such as General Hospital, Go-Home and Intensive-care. Figure 2 depicts the
decision tree model for the prediction of patient’s post-operative recovery area.
12. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 86
FIGURE 2: Decision tree for the prediction of post-operative recovery area
In figure 2, each node represents set of records from the original data set. Nodes that have child
nodes such as 1, 2, 3, 4, 25, 7, 26, 9 are called interior nodes. Nodes 5, 24,6,27,8,28,29,18,19
are called leaf nodes. Node 1 is root node. The value N in each node shows number of rows
that fall on that category. The target variable is Discharge Decision. Misclassification percentage
in each node shows the percentage of the rows in this node that had target variable categories
different from the category that was assigned to the node. In other words, it is the percentage of
rows that were misclassified. For instance, from the tree it is clearly understood the value of
predictor variable comfort is <= 6 the decision is Go-Home. If Comfort is >6 additional split is
required to classify. The total time consumed for analysis is 33ms.
The Decision tree analysis discussed in this paper highlights the patient sub-groups and critical
values in variables assessed. Importantly the results are visually informative and often have clear
clinical interpretation about risk factors faced by these subgroups. The results shows 71.59% of
cases with General-Hospital 26.14% of cases with Go-Home and 2.27% cases with Intensive-
Care.
6. REFERENCES
1. Banning, M. (2007). “A review of clinical decision making models and current research,”
Journal of clinical nursing, [online] available
at:http://www.blackwellsynergy.com/doi/pdf/10.1111/j.1365-2702.2006.01791.x
2. T. af Klercker (1996): “Effect of Pruning of a Decision-Tree for the Ear, Nose and Throat
Realm in Primary Health Care Based on Case-Notes”. Journal of Medical Systems, 20(4):
215-226.
13. D.Shanthi, Dr.G.Sahoo & Dr.N.Saravanan
International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (1): Issue (4) 87
3. M. Zorman, V.Podgorelec, P.Kokol, M.Peterson, J. Lane (2000). “Decision tree's induction
strategies evaluated on a hard real world problem.” In: 13th IEEE symposium on computer-
based medical systems 22-24 June 2000, Houston, Texas, USA: proceedings, Los Alamitos,
IEEE Computer Society 19-24.
4. Shanthi D, Sahoo G, Saravanan N. (2008). “Input Feature Selection using Hybrid Neuro-
Genetic Approach in the diagnosis of Stroke.” International Journal of Computer Science and
Network Security, ISSN: 1738-7906, 8(12):99-107.
5. F.S,Khan, R,M,Anwer, of Torgersson, and G. Falkman (2009), “Data Mining in Oral Medicine
Using Decision Trees”, International Journal of Biological and Medical Sciences 4:3.
6. Shanthi D, Sahoo G, Saravanan N. (2009), ”Comparison of Neural Network Training
Algorithms for the prediction of the patient’s post-operative recovery area”, Journal of
Convergence Information Technology, ISSN: 1975-9320,4(1):24-32.
7. Quinlan JR (1993) C4.5: “programs for machine learning”. California: Morgan Kaufmann
Publishers.
8. Breiman L, Friedman JH, Olshen RA, et al., (1984). “Classification and regression trees.”
Belmont Calif: Wadsworth International Group.
9. Gehrke, J., Ramakrishnan, R., & Ganti, V. (1998). “Rain-Forest A framework for fast decision
tree construction of large datasets”. Proceedings of the 24
th
International Conference on Very
Large Data Bases, pp. 416–427.
10. Shafer, J., Agrawal, R., & Mehta, M. (1996). “SPRINT: A scalable parallel classifier for data
mining.” Proceedings of the 22
th
International Conference on Very Large Data Bases, pp.
544–555.
11. “UCI Machine Learning Repository” [online] available at:.
(http://www.ics.uci.edu/~mlearn/MLSummary.html). The data was originally created by
Sharon Summers (School of Nursing, University of Kansas) and Linda Woolery (School of
Nursing, University of Missouri) and donated by Jerzy W. Grzymala-Busse.
12. Abdel-badeeh M.Salem, Abeer M.Mahmoud (2002). “A hybrid genetic algorithm –Decision
tree classifier,” IJICIS, 2(2)
13. Quinlan, J. R. (1986). “Induction of decision trees”. Machine Learning, 1, 81–106.
14. DTREG version 9.1 “Decision Tree Software for Predictive Modeling and Forecasting”.
15. Fayyad, U., Piatetsky-Shaprio, G., Smyth, P. & Uthurusamy, R. (1996). (eds.),” Advances in
Knowledge Discovery and Data Mining”, MIT Press, Cambridge, MA
16. Hillier, F. & Lieberman, G. (2001). “Introduction to Operations Research,” McGrawHill,
Boston.
17. Abbass, H.A., Towsey M., &Finn G. (2001). C-Net: “A Method for Generating Non-
deterministic and Dynamic Multivariate Decision Trees.” Knowledge and Information
Systems: An International Journal, Springer-Verlag, 5(2).