An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
University servers and databases store a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. main problem that faces any system administration or any users is data increasing per-second, which is stored in different type and format in the servers, learning about students from a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. Graduation and academic information in the future and maintaining structure and content of the courses according to their previous results become importance. The paper objectives are extract knowledge from incomplete data structure and what the suitable method or technique of data mining to extract knowledge from a huge amount of data about students to help the administration using technology to make a quick decision. Data mining aims to discover useful information or knowledge by using one of data mining techniques, this paper used classification technique to discover knowledge from student’s server database, where all students’ information were registered and stored. The classification task is used, the classifier tree C4.5, to predict the final academic results, grades, of students. We use classifier tree C4.5 as the method to classify the grades for the students .The data include four years period [2006-2009]. Experiment results show that classification process succeeded in training set. Thus, the predicted instances is similar to the training set, this proves the suggested classification model. Also the efficiency and effectiveness of C4.5 algorithm in predicting the academic results, grades, classification is very good. The model also can improve the efficiency of the academic results retrieving and evidently promote retrieval precision.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based
tool developed using Asp.Net framework and php can be helpful for universities or institutions providing
the students with elective courses as well improving academic activities based on feedback collected from
students. In Asp.Net tool, association rule mining using Apriori algorithm is used whereas in php based
Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from
students and based on that Feedback it shows performance of faculty and institution. Using that data, it
helps management to improve in-house training skills and gains knowledge about educational trends which
is to be followed by faculty to improve the effectiveness of the course and teaching skills.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
University servers and databases store a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. main problem that faces any system administration or any users is data increasing per-second, which is stored in different type and format in the servers, learning about students from a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. Graduation and academic information in the future and maintaining structure and content of the courses according to their previous results become importance. The paper objectives are extract knowledge from incomplete data structure and what the suitable method or technique of data mining to extract knowledge from a huge amount of data about students to help the administration using technology to make a quick decision. Data mining aims to discover useful information or knowledge by using one of data mining techniques, this paper used classification technique to discover knowledge from student’s server database, where all students’ information were registered and stored. The classification task is used, the classifier tree C4.5, to predict the final academic results, grades, of students. We use classifier tree C4.5 as the method to classify the grades for the students .The data include four years period [2006-2009]. Experiment results show that classification process succeeded in training set. Thus, the predicted instances is similar to the training set, this proves the suggested classification model. Also the efficiency and effectiveness of C4.5 algorithm in predicting the academic results, grades, classification is very good. The model also can improve the efficiency of the academic results retrieving and evidently promote retrieval precision.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based
tool developed using Asp.Net framework and php can be helpful for universities or institutions providing
the students with elective courses as well improving academic activities based on feedback collected from
students. In Asp.Net tool, association rule mining using Apriori algorithm is used whereas in php based
Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from
students and based on that Feedback it shows performance of faculty and institution. Using that data, it
helps management to improve in-house training skills and gains knowledge about educational trends which
is to be followed by faculty to improve the effectiveness of the course and teaching skills.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
Prediction of student performance has become an essential issue for improving the educational system. However, this has turned to be a challenging task due to the huge quantity of data in the educational environment. Educational data mining is an emerging field that aims to develop techniques to manipulate and explore the sizable educational data. Classification is one of the primary approaches of the educational data mining methods that is the most widely used for predicting student performance and characteristics. In this work, three linear classification techniques; logistic regression, support vector machines (SVM), and stochastic gradient descent (SGD), and three nonlinear classification methods; decision tree, random forest and adaptive boosting (AdaBoost) are explored and evaluated on a dataset of ASSISTment system. A k-fold cross validation method is used to evaluate the implemented techniques. The results demonstrate that decision tree algorithm outperforms the other techniques, with an average accuracy of 0.7254, an average sensitivity of 0.8036 and an average specificity of 0.901. Furthermore, the importance of the utilized features is obtained and the system performance is computed using the most significant features. The results reveal that the best performance is reached using the first 80 important features with accuracy, sensitivity and specificity of 0.7252, 0.8042 and 0.9016, respectively.
With the emergence of virtualization and cloud computing technologies, several services are housed on virtualization platform. Virtualization is the technology that many cloud service providers rely on for efficient management and coordination of the resource pool. As essential services are also housed on cloud platform, it is necessary to ensure continuous availability by implementing all necessary measures. Windows Active Directory is one such service that Microsoft developed for Windows domain networks. It is included in Windows Server operating systems as a set of processes and services for authentication and authorization of users and computers in a Windows domain type network. The service is required to run continuously without downtime. As a result, there are chances of accumulation of errors or garbage leading to software aging which in turn may lead to system failure and associated consequences. This results in software aging. In this work, software aging patterns of Windows active directory service is studied. Software aging of active directory needs to be predicted properly so that rejuvenation can be triggered to ensure continuous service delivery. In order to predict the accurate time, a model that uses time series forecasting technique is built.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
A Decision tree is termed as good DT when it has small size and when new data is introduced it can be classified accurately. Pre-processing the input data is one of the good approaches for generating a good DT. When different data pre-processing methods are used with the combination of DT classifier it evaluates to give high performance. This paper involves the accuracy variation in the ID3 classifier when used in combination with different data pre-processing and feature selection method. The performances of DTs are produced from comparison of original and pre-processed input data and experimental results are shown by using standard decision tree algorithm-ID3 on a dataset.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEaciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
Prediction of student performance has become an essential issue for improving the educational system. However, this has turned to be a challenging task due to the huge quantity of data in the educational environment. Educational data mining is an emerging field that aims to develop techniques to manipulate and explore the sizable educational data. Classification is one of the primary approaches of the educational data mining methods that is the most widely used for predicting student performance and characteristics. In this work, three linear classification techniques; logistic regression, support vector machines (SVM), and stochastic gradient descent (SGD), and three nonlinear classification methods; decision tree, random forest and adaptive boosting (AdaBoost) are explored and evaluated on a dataset of ASSISTment system. A k-fold cross validation method is used to evaluate the implemented techniques. The results demonstrate that decision tree algorithm outperforms the other techniques, with an average accuracy of 0.7254, an average sensitivity of 0.8036 and an average specificity of 0.901. Furthermore, the importance of the utilized features is obtained and the system performance is computed using the most significant features. The results reveal that the best performance is reached using the first 80 important features with accuracy, sensitivity and specificity of 0.7252, 0.8042 and 0.9016, respectively.
With the emergence of virtualization and cloud computing technologies, several services are housed on virtualization platform. Virtualization is the technology that many cloud service providers rely on for efficient management and coordination of the resource pool. As essential services are also housed on cloud platform, it is necessary to ensure continuous availability by implementing all necessary measures. Windows Active Directory is one such service that Microsoft developed for Windows domain networks. It is included in Windows Server operating systems as a set of processes and services for authentication and authorization of users and computers in a Windows domain type network. The service is required to run continuously without downtime. As a result, there are chances of accumulation of errors or garbage leading to software aging which in turn may lead to system failure and associated consequences. This results in software aging. In this work, software aging patterns of Windows active directory service is studied. Software aging of active directory needs to be predicted properly so that rejuvenation can be triggered to ensure continuous service delivery. In order to predict the accurate time, a model that uses time series forecasting technique is built.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
A Decision tree is termed as good DT when it has small size and when new data is introduced it can be classified accurately. Pre-processing the input data is one of the good approaches for generating a good DT. When different data pre-processing methods are used with the combination of DT classifier it evaluates to give high performance. This paper involves the accuracy variation in the ID3 classifier when used in combination with different data pre-processing and feature selection method. The performances of DTs are produced from comparison of original and pre-processed input data and experimental results are shown by using standard decision tree algorithm-ID3 on a dataset.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEaciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...ijcnes
Educational data mining is used to study the data available in the educational field and bring out the hidden knowledge from it. Classification methods like decision trees, rule mining can be applied on the educational data for predicting the students behavior. This paper focuses on finding thesuitablealgorithm which yields the best result to find out the reason behind students absenteeism in an academic year. The first step in this processis to gather students data by using questionnaire.The datais collected from 123 under graduate students from a private college which is situated in a semirural area. The second step is to clean the data which is appropriate for mining purpose and choose the relevant attributes. In the final step, three different Decision tree induction algorithms namely, ID3(Iterative Dichotomiser), C4.5 and CART(Classification and Regression Tree)were applied for comparison of results for the same data sample collected using questionnaire. The results were compared to find the algorithm which yields the best result in predicting the reason for student s absenteeism.
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
Data Mining System and Applications: A Reviewijdpsjournal
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to Predicting students' performance using id3 and c4.5 classification algorithms (20)
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Predicting students' performance using id3 and c4.5 classification algorithms
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
DOI : 10.5121/ijdkp.2013.3504 39
PREDICTING STUDENTS’ PERFORMANCE USING
ID3 AND C4.5 CLASSIFICATION ALGORITHMS
Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao
Department of Computer Engineering,
Fr. C.R.I.T., Navi Mumbai, Maharashtra, India
ABSTRACT
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
KEYWORDS
Classification, C4.5, Data Mining, Educational Research, ID3, Predicting Performance
1. INTRODUCTION
Every year, educational institutes admit students under various courses from different locations,
educational background and with varying merit scores in entrance examinations. Moreover,
schools and junior colleges may be affiliated to different boards, each board having different
subjects in their curricula and also different level of depths in their subjects. Analyzing the past
performance of admitted students would provide a better perspective of the probable academic
performance of students in the future. This can very well be achieved using the concepts of data
mining.
For this purpose, we have analysed the data of students enrolled in first year of engineering. This
data was obtained from the information provided by the admitted students to the institute. It
includes their full name, gender, application ID, scores in board examinations of classes X and
XII, scores in entrance examinations, category and admission type. We then applied the ID3 and
C4.5 algorithms after pruning the dataset to predict the results of these students in their first
semester as precisely as possible.
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
40
2. LITERATURE SURVEY
2.1. Data Mining
Data mining is the process of discovering interesting knowledge, such as associations, patterns,
changes, significant structures and anomalies, from large amounts of data stored in databases or
data warehouses or other information repositories [1]. It has been widely used in recent years due
to the availability of huge amounts of data in electronic form, and there is a need for turning such
data into useful information and knowledge for large applications. These applications are found in
fields such as Artificial Intelligence, Machine Learning, Market Analysis, Statistics and Database
Systems, Business Management and Decision Support [2].
2.1.1. Classification
Classification is a data mining technique that maps data into predefined groups or classes. It is a
supervised learning method which requires labelled training data to generate rules for classifying
test data into predetermined groups or classes [2]. It is a two-phase process. The first phase is the
learning phase, where the training data is analyzed and classification rules are generated. The next
phase is the classification, where test data is classified into classes according to the generated
rules. Since classification algorithms require that classes be defined based on data attribute
values, we had created an attribute “class” for every student, which can have a value of either
“Pass” or “Fail”.
2.1.2. Clustering
Clustering is the process of grouping a set of elements in such a way that the elements in the same
group or cluster are more similar to each other than to those in other groups or clusters [1]. It is a
common technique for statistical data analysis used in the fields of pattern recognition,
information retrieval, bioinformatics, machine learning and image analysis. Clustering can be
achieved by various algorithms that differ about the similarities required between elements of a
cluster and how to find the elements of the clusters efficiently. Most algorithms used for
clustering try to create clusters with small distances among the cluster elements, intervals, dense
areas of the data space or particular statistical distributions.
2.2. Selecting Classification over Clustering
In clustering, classes are unknown apriori and are discovered from the data. Since our goal is to
predict students’ performance into either of the predefined classes - “Pass” and “Fail”, clustering
is not a suitable choice and so we have used classification algorithms instead of clustering
algorithms.
2.3. Issues Regarding Classification
2.3.1. Missing Data
Missing data values cause problems during both the training phase and to the classification
process itself. For example, the reason for non-availability of data may be due to [2]:
• Equipment malfunction
• Deletion due to inconsistency with other recorded data
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
41
• Non-entry of data due to misunderstanding
• Certain data considered unimportant at the time of entry
• No registration of data or its change
This missing data can be handled using following approaches [3]:
• Data miners can ignore the missing data
• Data miners can replace all missing values with a single global constant
• Data miners can replace a missing value with its feature mean for the given class
• Data miners and domain experts, together, can manually examine samples with missing
values and enter a reasonable, probable or expected value
In our case, the chances of getting missing values in the training data are very less. The training
data is to be retrieved from the admission records of a particular institute and the attributes
considered for the input of classification process are mandatory for each student. The tuple which
is found to have missing value for any attribute will be ignored from training set as the missing
values cannot be predicted or set to some default value. Considering low chances of the
occurrence of missing data, ignoring missing data will not affect the accuracy adversely.
2.3.2. Measuring Accuracy
Determining which data mining technique is best depends on the interpretation of the problem by
users. Usually, the performance of algorithms is examined by evaluating the accuracy of the
result. Classification accuracy is calculated by determining the percentage of tuples placed in the
correct class. At the same time there may be a cost associated with an incorrect assignment to the
wrong class which can be ignored.
2.4. ID3 Algorithm
In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross
Quinlan used to generate a decision tree from the dataset. ID3 is typically used in the machine
learning and natural language processing domains. The decision tree technique involves
constructing a tree to model the classification process. Once a tree is built, it is applied to each
tuple in the database and results in classification for that tuple. The following issues are faced by
most decision tree algorithms [2]:
• Choosing splitting attributes
• Ordering of splitting attributes
• Number of splits to take
• Balance of tree structure and pruning
• Stopping criteria
The ID3 algorithm is a classification algorithm based on Information Entropy, its basic idea is
that all examples are mapped to different categories according to different values of the condition
attribute set; its core is to determine the best classification attribute form condition attribute sets.
The algorithm chooses information gain as attribute selection criteria; usually the attribute that
has the highest information gain is selected as the splitting attribute of current node, in order to
make information entropy that the divided subsets need smallest [4]. According to the different
values of the attribute, branches can be established, and the process above is recursively called on
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
42
each branch to create other nodes and branches until all the samples in a branch belong to the
same category. To select the splitting attributes, the concepts of Entropy and Information Gain are
used.
2.4.1. Entropy
Given probabilities p1, p2, …, ps, where ∑pi = 1, Entropy is defined as
H(p1, p2, …, ps) = ∑ - (pi log pi)
Entropy finds the amount of order in a given database state. A value of H = 0 identifies a
perfectly classified set. In other words, the higher the entropy, the higher the potential to improve
the classification process.
2.4.2. Information Gain
ID3 chooses the splitting attribute with the highest gain in information, where gain is defined as
difference between how much information is needed after the split. This is calculated by
determining the differences between the entropies of the original dataset and the weighted sum of
the entropies from each of the subdivided datasets. The formula used for this purpose is:
G(D, S) = H(D) - ∑P(Di)H(Di)
2.5. C4.5
C4.5 is a well-known algorithm used to generate a decision trees. It is an extension of the ID3
algorithm used to overcome its disadvantages. The decision trees generated by the C4.5 algorithm
can be used for classification, and for this reason, C4.5 is also referred to as a statistical classifier.
The C4.5 algorithm made a number of changes to improve ID3 algorithm [2]. Some of these are:
• Handling training data with missing values of attributes
• Handling differing cost attributes
• Pruning the decision tree after its creation
• Handling attributes with discrete and continuous values
Let the training data be a set S = s1, s2 ... of already classified samples. Each sample Si = xl, x2... is
a vector where xl, x2 ... represent attributes or features of the sample. The training data is a vector
C = c1, c2..., where c1, c2... represent the class to which each sample belongs to.
At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits data set
of samples S into subsets that can be one class or the other [5]. It is the normalized information
gain (difference in entropy) that results from choosing an attribute for splitting the data. The
attribute factor with the highest normalized information gain is considered to make the decision.
The C4.5 algorithm then continues on the smaller sub-lists having next highest normalized
information gain.
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
43
3. TECHNOLOGIES USED
3.1. HTML and CSS
HyperText Markup Language (HTML) is a markup language for creating web pages or other
information to display in a web browser. HTML allows images and objects to be included and
that can be used to create interactive forms. From this, structured documents are created by using
structural semantics for text such as headings, links, lists, paragraphs, quotes etc.
CSS (Cascading Style Sheets) is designed to enable the separation between document content (in
HTML or similar markup languages) and document presentation. This technique is used to
improve content accessibility also to provide more flexibility and control in the specification of
content and presentation characteristics. This enables multiple pages to share formatting and
reduce redundancies.
3.2. PHP and the CodeIgniter Framework
PHP (recursive acronym for PHP: Hypertext Preprocessor) is a widely-used open source general-
purpose server side scripting language that is especially suited for web development and can be
embedded into HTML.
CodeIgniter is a well-known open source web application framework used for building dynamic
web applications in PHP [6]. Its goal is to enable developers to develop projects quickly by
providing a rich set of libraries and functionalities for commonly used tasks with a simple
interface and logical structure for accessing these libraries. CodeIgniter is loosely based on the
Model-View-Controller (MVC) pattern and we have used it to build the front end of our
implementation.
3.3. MySQL
MySQL is the most popular open source RDBMS which is supported, distributed and developed
by Oracle [8]. In the implementation of our web application, we have used it to store user
information and students’ data.
3.4. RapidMiner
RapidMiner is an open source data mining tool that provides data mining and machine learning
procedures including data loading and transformation, data preprocessing and visualization,
modelling, evaluation, and deployment [7]. It is written in the Java programming language and
makes use of learning schemes and attribute evaluators from the WEKA machine learning
environment and statistical modelling schemes for the R-Project. We have used RapidMiner to
generate decision trees of ID3 and C4.5 algorithms.
4. IMPLEMENTATION
We had divided the entire implementation into five stages. In the first stage, information about
students who have been admitted to the second year was collected. This included the details
submitted to the college at the time of enrolment. In the second stage, extraneous information was
removed from the collected data and the relevant information was fed into a database. The third
stage involved applying the ID3 and C4.5 algorithms on the training data to obtain decision trees
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
44
of both the algorithms. In the next stage, the test data, i.e. information about students currently
enrolled in the first year, was applied to the decision trees. The final stage consisted of developing
the front end in the form of a web application.
These stages of implementation are depicted in Figure 1.
Figure 1. Processing model
4.1. Student Database
We were provided with a training dataset consisting of information about students admitted to the
first year. This data was in the form of a Microsoft Excel 2003 spreadsheet and had details of
each student such as full name, application ID, gender, caste, percentage of marks obtained in
board examinations of classes X and XII, percentage of marks obtained in Physics, Chemistry and
Mathematics in class XII, marks obtained in the entrance examination, admission type, etc. For
ease of performing data mining operations, the data was filled into a MySQL database.
4.2. Data Preprocessing
Once we had details of all the students, we then segmented the training dataset further,
considering various feasible splitting attributes, i.e. the attributes which would have a higher
impact on the performance of a student. For instance, we had considered ‘location’ as a splitting
attribute, and then segmented the data according to students’ locality.
A snapshot of the student database is shown in Figure 2. Here, irrelevant attributes such as
students residential address, name, application ID, etc. had been removed. For example, the
admission date of the student was irrelevant in predicting the future performance of the student.
The attributes that had been retained are those for merit score or marks scored in entrance
examination, gender, percentage of marks scored in Physics, Chemistry and Mathematics in the
board examination of class XII and admission type. Finally, the “class” attribute was added and it
held the predicted result, which can be either “Pass” or “Fail”.
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
45
Since the attributes for marks would have discrete values, to produce better results, specific
classes were defined. Thus, the “merit” attribute had a value “good” if the merit score of the
student was 120 or above out of a maximum score of 200, and was classified as “bad” if the merit
score was below 120. Also, the value that can be held by the “percentage” attribute of the student
are three - “distinction” if the percentage of marks scored by the student in the subjects of
Physics, Chemistry and Mathematics was 70 or above, “first_class” if the percentage was less
than 70 and greater than or equal to 60, then it was classified as “second_class” if the percentage
was less than 60. The attribute for admission type is labelled “type” and the value held by a
student for it can be either “AI” (short for All-India), if the student was admitted to a seat
available for All-India candidates, or “OTHER” if the student was admitted to another seat.
Figure 2. Preprocessed student database
4.3. Data Processing Using RapidMiner
The next step was to feed the pruned student database as input to RapidMiner. This helped us in
evaluating interesting results by applying classification algorithms on the student training dataset.
The results obtained are shown in the following subsections:
4.3.1. ID3 Algorithm
Since ID3 is a decision tree algorithm, we obtained a decision tree as the final result with all the
splitting attributes and it is shown in Figure 3.
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
46
4.3.2. C4.5 Algorithm
The C4.5 algorithm too generates a decision tree, and we obtained one from RapidMiner in the
same way as ID3. This tree, shown in Figure 4, has fewer decision nodes as compared to the tree
for improved ID3, which is shown in Figure 3.
4.4. Implementing the Performance Prediction Web Application
RapidMiner helped significantly in finding hidden information from the training dataset. These
newly learnt predictive patterns for predicting students’ performance were then implemented in a
working web application for staff members to use to get the predicted results of admitted
students.
Figure 3. Decision tree for ID3
Figure 4. Decision tree for C4.5
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
47
4.4.1. CodeIgniter
The web application was developed using a popular PHP framework named CodeIgniter. The
application has provisions for multiple simultaneous staff registrations and staff logins. This
ensures that the work of no two staff members is interrupted during performance evaluation.
Figure 5 and Figure 6 depict the staff registration and staff login pages respectively.
4.4.2. Mapping Decision Trees to PHP
The essence of the web application was to map the results achieved after data processing to code.
This was done in form of class methods in PHP. The result of the improved ID3 and C4.5
algorithms were in the form of trees and these were translated to code in the form of if-else
ladders. We then placed these ladders into PHP class methods that accept only the splitting
attributes - PCM percentage, merit marks, admission type and gender as method parameters. The
class methods return the final result of that particular evaluation, indicating whether that student
would pass or fail in the first semester examination. Figure 7 shows a class method with the if-
else ladder.
Figure 5. Registration page for staff members
Figure 6. Login page for staff members
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
48
4.4.3. Singular Evaluation
Once the decision trees were mapped as class methods, we built a web page for staff members to
feed values for the name, application ID and splitting attributes of a student, as can be seen in
Figure 8. These values were then used to predict the result of that student as either “Pass” or
“Fail”.
4.4.4. Upload Excel Sheet
Singular Evaluation is beneficial when the results of a small number of students are to be
predicted, one at a time. But in case of large testing datasets, it is feasible to upload a data file in a
format such as that of a Microsoft Excel spreadsheet, and evaluate each student's record. For this,
staff members can upload a spreadsheet containing records of students with attributes in a
predetermined order. Figure 9 shows the upload page for Excel spreadsheets.
Figure 7. PHP class method mapping a decision tree
Figure 8. Web page for Singular Evaluation
11. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
49
4.4.5. Bulk Evaluation
Under the Bulk Evaluation tab, a staff member can choose an uploaded dataset to evaluate the
results, along with the algorithm to be applied over it. After submitting the dataset and algorithm,
the predicted result of each student is displayed in a table as the value of the attribute "class". A
sample result of Bulk Evaluation can be seen in Figure 10.
Figure 9. Page to upload Excel spreadsheet
Figure 10. Page showing results after Bulk Evaluation
4.4.6. Verifying Accuracy of Predicted Results
The accuracy of the algorithm results can be tested under the Verify tab. A staff member has to
select the uploaded verification file which already has the actual results and the algorithm that has
to be tested for accuracy. After submission the predicted result of evaluation is compared with
actual results obtained and the accuracy is calculated. Figure 11 shows that the accuracy achieved
is 75.145% for both ID3 and C4.5 algorithms. Figure 12 shows the mismatched tuples, i.e. the
tuples which were predicted wrongly by the application for the current test data.
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
50
Figure 11. Accuracy achieved after evaluation
Figure 12. Mismatched tuples shown during verification
4.4.7. Singular Evaluation History
Using the web interface, staff members can view all Singular Evaluations they had conducted in
the past. This is displayed in the form of a table, containing attributes of the student and the
predicted result. If required, a record from this table may be deleted by a staff member. A
snapshot of this table is shown in Figure 13.
Figure 13. History of Singular Evaluations performed by staff members
13. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
51
5. FUTURE WORK
In this project, prediction parameters such as the decision trees generated using RapidMiner are
not updated dynamically within the source code. In the future, we plan to make the entire
implementation dynamic to train the prediction parameters itself when new training sets are fed
into the web application. Also, in the current implementation, we have not considered extra-
curricular activities and other vocational courses completed by students, which we believe may
have a significant impact on the overall performance of the students. Considering such parameters
would result in better accuracy of prediction.
6. CONCLUSIONS
In this paper, we have explained the system we have used to predict the results of students
currently in the first year of engineering, based on the results obtained by students currently in the
second year of engineering during their first year.
The results of Bulk Evaluation are shown in Table 1. Random test cases considered during
individual testing resulted in approximately equal accuracy, as indicated in Table 2.
Table 1. Results of Bulk Evaluation
Algorithm Total Students
Students whose results are
correctly predicted
Accuracy (%)
Execution Time
(in milliseconds)
ID3 173 130 75.145 47.6
C4.5 173 130 75.145 39.1
Table 2. Results of Singular Evaluation.
Algorithm Total Students Students whose results are correctly predicted Accuracy (%)
ID3 9 7 77.778
C4.5 9 7 77.778
Thus, for a total of 182 students, the average percentage of accuracy achieved in Bulk and
Singular Evaluations is approximately 75.275.
ACKNOWLEDGEMENTS
We express sincere gratitude to our project guides, Ms M. Kiruthika, and Ms Smita Dange for
their guidance and support.
REFERENCES
[1] Han, J. and Kamber, M., (2006) Data Mining: Concepts and Techniques, Elsevier.
[2] Dunham, M.H., (2003) Data Mining: Introductory and Advanced Topics, Pearson Education Inc.
[3] Kantardzic, M., (2011) Data Mining: Concepts, Models, Methods and Algorithms, Wiley-IEEE
Press.
14. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
52
[4] Ming, H., Wenying, N. and Xu, L., (2009) “An improved decision tree classification algorithm
based on ID3 and the application in score analysis”, Chinese Control and Decision Conference
(CCDC), pp1876-1879.
[5] Xiaoliang, Z., Jian, W., Hongcan Y., and Shangzhuo, W., (2009) “Research and Application of
the improved Algorithm C4.5 on Decision Tree”, International Conference on Test and
Measurement (ICTM), Vol. 2, pp184-187.
[6] CodeIgnitor User Guide Version 2.14, http://ellislab.com/codeigniter/user-guide/toc.html
[7] RapidMiner, http://rapid-i.com/content/view/181/190/
[8] MySQL – The world’s most popular open source database, http://www.mysql.com/