Prediction of student performance has become an essential issue for improving the educational system. However, this has turned to be a challenging task due to the huge quantity of data in the educational environment. Educational data mining is an emerging field that aims to develop techniques to manipulate and explore the sizable educational data. Classification is one of the primary approaches of the educational data mining methods that is the most widely used for predicting student performance and characteristics. In this work, three linear classification techniques; logistic regression, support vector machines (SVM), and stochastic gradient descent (SGD), and three nonlinear classification methods; decision tree, random forest and adaptive boosting (AdaBoost) are explored and evaluated on a dataset of ASSISTment system. A k-fold cross validation method is used to evaluate the implemented techniques. The results demonstrate that decision tree algorithm outperforms the other techniques, with an average accuracy of 0.7254, an average sensitivity of 0.8036 and an average specificity of 0.901. Furthermore, the importance of the utilized features is obtained and the system performance is computed using the most significant features. The results reveal that the best performance is reached using the first 80 important features with accuracy, sensitivity and specificity of 0.7252, 0.8042 and 0.9016, respectively.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
This document analyzes and compares the performance of various classification algorithms (J48, Random Forest, Multilayer Perceptron, IB1, Decision Table) in predicting student performance using data from 260 students. Random Forest performed the best with 89.23% accuracy, taking the least time to build the model and having the lowest error rates compared to the other algorithms. Attributes like attendance, economic status, and parental education were found to be most important factors influencing student results. The analysis provides insight into how different factors impact student performance.
Extending the Student’s Performance via K-Means and Blended Learning IJEACS
In this paper, we use the clustering technique to monitor the status of students’ scholastic recital. This paper spotlights on upliftment the education system via K-means clustering. Clustering is the process of grouping the similar objects. Commonly in the academic, the performances of the students are grouped by their Graded Point (GP). We adopted K-means algorithm and implemented it on students’ mark data. This system is a promising index to screen the development of students and categorize the students by their academic performance. From the categories, we train the students based on their GP. It was implemented in MATLAB and obtained the clusters of students exactly.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document provides a systematic review of educational data mining (EDM) techniques and their applications. It discusses how EDM can be used to extract hidden information from large student data repositories using clustering, classification, prediction, and recommendation algorithms. These algorithms help group similar students, categorize students, predict student outcomes, and suggest courses. The document also reviews literature applying these EDM techniques and outlines future work on semantic and opinion mining to improve adaptive learning systems.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
This document analyzes and compares the performance of various classification algorithms (J48, Random Forest, Multilayer Perceptron, IB1, Decision Table) in predicting student performance using data from 260 students. Random Forest performed the best with 89.23% accuracy, taking the least time to build the model and having the lowest error rates compared to the other algorithms. Attributes like attendance, economic status, and parental education were found to be most important factors influencing student results. The analysis provides insight into how different factors impact student performance.
Extending the Student’s Performance via K-Means and Blended Learning IJEACS
In this paper, we use the clustering technique to monitor the status of students’ scholastic recital. This paper spotlights on upliftment the education system via K-means clustering. Clustering is the process of grouping the similar objects. Commonly in the academic, the performances of the students are grouped by their Graded Point (GP). We adopted K-means algorithm and implemented it on students’ mark data. This system is a promising index to screen the development of students and categorize the students by their academic performance. From the categories, we train the students based on their GP. It was implemented in MATLAB and obtained the clusters of students exactly.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document provides a systematic review of educational data mining (EDM) techniques and their applications. It discusses how EDM can be used to extract hidden information from large student data repositories using clustering, classification, prediction, and recommendation algorithms. These algorithms help group similar students, categorize students, predict student outcomes, and suggest courses. The document also reviews literature applying these EDM techniques and outlines future work on semantic and opinion mining to improve adaptive learning systems.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Analyzing undergraduate students’ performance in various perspectives using d...Alexander Decker
This document discusses analyzing undergraduate student performance using data mining techniques. It analyzes student performance in two perspectives: 1) supervised vs unsupervised assessment instruments and 2) performance in mathematics, English, and programming courses. The study uses association rule mining with the Apriori algorithm to discover patterns in student performance data from both analyses. The goal is to identify useful insights that can help improve assessment methods, curriculum structure, and course prerequisites.
IRJET- Using Data Mining to Predict Students PerformanceIRJET Journal
This document describes a study that used logistic regression to predict student performance based on educational data. The researchers collected student data including exam scores, attendance, study hours, family income, etc. from a large dataset. Logistic regression achieved the best prediction accuracy of 82.03% compared to other models like naive bayes, K-nearest neighbor, and multi-layer perceptron. The results indicate that around 230 students would perform poorly, 600 would perform fairly, and 200 would perform well based on the predictive model. This analysis can help identify students needing extra support and help universities improve academic outcomes.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
Predicting instructor performance using data mining techniques in higher education
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Educational Data Mining to Analyze Students Performance – Concept PlanIRJET Journal
This document discusses using data mining techniques to analyze student performance data from educational institutions. It proposes using clustering and classification algorithms like K-means and Naive Bayesian on data collected from sources like learning management systems and surveys. The goals are to classify students into performance levels, identify factors affecting performance, and make recommendations to help students improve. Clustering could group students and classification could predict performance based on attributes. Analyzing the data may provide insights to enhance guidance and outcomes. The paper presents this as a conceptual plan to apply data mining in education.
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
Data mining is the crucial steps to find out previously unknown information from large relational database. various technique and algorithm are their used in data mining such as association rules, clustering and classification and prediction techniques. Ease of the techniques contains particular characteristics and behaviour. In this paper the prime focus on clustering technique and prediction technique. Now a days large amount of data stored in educational database increasing rapidly. The database for particular set of student was collected. The clustering and prediction is made on some detailed manner and the results were produce. The K-means clustering algorithm is used here. To find nearest possible a cluster a similar group the turning point India is the performance in higher education for all students. This academic performance is influenced by various factor, therefore to identify the difference between high learners and slow learner students it is important for student performance to develop predictive data mining model.
This document summarizes a research paper that evaluates the performance of decision tree and clustering techniques using the WEKA data mining tool. The paper uses student academic and performance data to apply decision tree and clustering algorithms and compare the results of each technique. Specifically, it uses WEKA to classify and cluster a dataset containing the marks and percentages of students from educational institutions. The paper aims to determine which technique (decision tree or clustering) provides more accurate and useful results for predicting student performance.
Buddi health class imbalance based deep learningRam Swaminathan
BUDDI Health is a disruptive Deep Learning Platform which helps automate healthcare administrative functions such as Medical Coding, Claims Denial Prediction, Clinical Documentation Improvement and more.
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...ijaia
Along with the spreading of online education, the importance of active support of students involved in
online learning processes has grown. The application of artificial intelligence in education allows
instructors to analyze data extracted from university servers, identify patterns of student behavior and
develop interventions for struggling students. This study used student data stored in a Moodle server and
predicted student success in course, based on four learning activities - communication via emails,
collaborative content creation with wiki, content interaction measured by files viewed and self-evaluation
through online quizzes. Next, a model based on the Multi-Layer Perceptron Neural Network was trained to
predict student performance on a blended learning course environment. The model predicted the
performance of students with correct classification rate, CCR, of 98.3%.
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Data mining to predict academic performance. Ranjith Gowda
This document proposes using data warehousing and data mining techniques to predict student academic performance in schools. It describes collecting student data like scores, attendance, discipline, and assignments into a data warehouse. Data mining methods are then used to analyze the student data and identify relationships between variables to predict performance, such as whether students are progressing, being retained, or conditionally progressing. The results could help schools identify students at risk of failing and take actions to help them succeed.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
The document discusses using Learning Factor Analysis (LFA), an educational data mining technique, to model student knowledge based on student-tutor interaction log data. LFA uses a multiple logistic regression model with difficulty factors defined by subject experts to quantify skills. A combinatorial search method called A* search is used to select the best-fitting model. The document illustrates applying LFA to data from an online math tutor, identifying 5 skills and presenting the results of the logistic regression modeling, including fit statistics and learning rates for skills. Learning curves are used to visualize student performance over time.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Educational Data Mining is used to find interesting patterns from the data taken from
educational settings to improve teaching and learning. Assessing student’s ability and performance with
EDM methods in e-learning environment for math education in school level in India has not been
identified in our literature review. Our method is a novel approach in providing quality math education
with assessments indicating the knowledge level of a student in each lesson. This paper illustrates how
Learning Curve – an EDM visualization method is used to compare rural and urban students’ progress
in learning mathematics in an e-learning environment. The experiment is conducted in two different
schools in Tamil Nadu, India. After practicing the problems the students attended the test and their
interaction data are collected and analyzed their performance in different aspects: Knowledge
component level, time taken to solve a problem, error rate. This work studies the student actions for
identifying learning progress. The results show that the learning curve method is much helpful to the
teachers to visualize the students’ performance in granular level which is not possible manually. Also it
helps the students in knowing about their skill level when they complete each unit.
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...IRJET Journal
This document discusses using machine learning classification algorithms to predict student performance based on educational data. It compares the performance of five classification algorithms - J48, Naive Bayes, Bayes Net, Backpropagation Network, and Radial Basis Function Network - in predicting student academic achievement using attributes like demographic information, test scores, and academic factors. The experiment found that the Radial Basis Function Network algorithm achieved the highest accuracy, correctly classifying 100% of instances, compared to 75-95% accuracy for the other algorithms. Convolutional neural networks are also discussed as a powerful tool for image and language processing in educational data mining.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Analyzing undergraduate students’ performance in various perspectives using d...Alexander Decker
This document discusses analyzing undergraduate student performance using data mining techniques. It analyzes student performance in two perspectives: 1) supervised vs unsupervised assessment instruments and 2) performance in mathematics, English, and programming courses. The study uses association rule mining with the Apriori algorithm to discover patterns in student performance data from both analyses. The goal is to identify useful insights that can help improve assessment methods, curriculum structure, and course prerequisites.
IRJET- Using Data Mining to Predict Students PerformanceIRJET Journal
This document describes a study that used logistic regression to predict student performance based on educational data. The researchers collected student data including exam scores, attendance, study hours, family income, etc. from a large dataset. Logistic regression achieved the best prediction accuracy of 82.03% compared to other models like naive bayes, K-nearest neighbor, and multi-layer perceptron. The results indicate that around 230 students would perform poorly, 600 would perform fairly, and 200 would perform well based on the predictive model. This analysis can help identify students needing extra support and help universities improve academic outcomes.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
Predicting instructor performance using data mining techniques in higher education
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Educational Data Mining to Analyze Students Performance – Concept PlanIRJET Journal
This document discusses using data mining techniques to analyze student performance data from educational institutions. It proposes using clustering and classification algorithms like K-means and Naive Bayesian on data collected from sources like learning management systems and surveys. The goals are to classify students into performance levels, identify factors affecting performance, and make recommendations to help students improve. Clustering could group students and classification could predict performance based on attributes. Analyzing the data may provide insights to enhance guidance and outcomes. The paper presents this as a conceptual plan to apply data mining in education.
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
Data mining is the crucial steps to find out previously unknown information from large relational database. various technique and algorithm are their used in data mining such as association rules, clustering and classification and prediction techniques. Ease of the techniques contains particular characteristics and behaviour. In this paper the prime focus on clustering technique and prediction technique. Now a days large amount of data stored in educational database increasing rapidly. The database for particular set of student was collected. The clustering and prediction is made on some detailed manner and the results were produce. The K-means clustering algorithm is used here. To find nearest possible a cluster a similar group the turning point India is the performance in higher education for all students. This academic performance is influenced by various factor, therefore to identify the difference between high learners and slow learner students it is important for student performance to develop predictive data mining model.
This document summarizes a research paper that evaluates the performance of decision tree and clustering techniques using the WEKA data mining tool. The paper uses student academic and performance data to apply decision tree and clustering algorithms and compare the results of each technique. Specifically, it uses WEKA to classify and cluster a dataset containing the marks and percentages of students from educational institutions. The paper aims to determine which technique (decision tree or clustering) provides more accurate and useful results for predicting student performance.
Buddi health class imbalance based deep learningRam Swaminathan
BUDDI Health is a disruptive Deep Learning Platform which helps automate healthcare administrative functions such as Medical Coding, Claims Denial Prediction, Clinical Documentation Improvement and more.
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...ijaia
Along with the spreading of online education, the importance of active support of students involved in
online learning processes has grown. The application of artificial intelligence in education allows
instructors to analyze data extracted from university servers, identify patterns of student behavior and
develop interventions for struggling students. This study used student data stored in a Moodle server and
predicted student success in course, based on four learning activities - communication via emails,
collaborative content creation with wiki, content interaction measured by files viewed and self-evaluation
through online quizzes. Next, a model based on the Multi-Layer Perceptron Neural Network was trained to
predict student performance on a blended learning course environment. The model predicted the
performance of students with correct classification rate, CCR, of 98.3%.
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
Education data mining is an emerging stream which h
elps in mining academic data for solving various
types of problems. One of the problems is the selec
tion of a proper academic track. The admission of a
student in engineering college depends on many fact
ors. In this paper we have tried to implement a
classification technique to assist students in pred
icting their success in admission in an engineering
stream.We have analyzed the data set containing inf
ormation about student’s academic as well as socio-
demographic variables, with attributes such as fami
ly pressure, interest, gender, XII marks and CET ra
nk
in entrance examinations and historical data of pre
vious batch of students. Feature selection is a pro
cess
for removing irrelevant and redundant features whic
h will help improve the predictive accuracy of
classifiers. In this paper first we have used featu
re selection attribute algorithms Chi-square.InfoGa
in, and
GainRatio to predict the relevant features. Then we
have applied fast correlation base filter on given
features. Later classification is done using NBTree
, MultilayerPerceptron, NaiveBayes and Instance bas
ed
–K- nearest neighbor. Results showed reduction in c
omputational cost and time and increase in predicti
ve
accuracy for the student model
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Data mining to predict academic performance. Ranjith Gowda
This document proposes using data warehousing and data mining techniques to predict student academic performance in schools. It describes collecting student data like scores, attendance, discipline, and assignments into a data warehouse. Data mining methods are then used to analyze the student data and identify relationships between variables to predict performance, such as whether students are progressing, being retained, or conditionally progressing. The results could help schools identify students at risk of failing and take actions to help them succeed.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
The document discusses using Learning Factor Analysis (LFA), an educational data mining technique, to model student knowledge based on student-tutor interaction log data. LFA uses a multiple logistic regression model with difficulty factors defined by subject experts to quantify skills. A combinatorial search method called A* search is used to select the best-fitting model. The document illustrates applying LFA to data from an online math tutor, identifying 5 skills and presenting the results of the logistic regression modeling, including fit statistics and learning rates for skills. Learning curves are used to visualize student performance over time.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Educational Data Mining is used to find interesting patterns from the data taken from
educational settings to improve teaching and learning. Assessing student’s ability and performance with
EDM methods in e-learning environment for math education in school level in India has not been
identified in our literature review. Our method is a novel approach in providing quality math education
with assessments indicating the knowledge level of a student in each lesson. This paper illustrates how
Learning Curve – an EDM visualization method is used to compare rural and urban students’ progress
in learning mathematics in an e-learning environment. The experiment is conducted in two different
schools in Tamil Nadu, India. After practicing the problems the students attended the test and their
interaction data are collected and analyzed their performance in different aspects: Knowledge
component level, time taken to solve a problem, error rate. This work studies the student actions for
identifying learning progress. The results show that the learning curve method is much helpful to the
teachers to visualize the students’ performance in granular level which is not possible manually. Also it
helps the students in knowing about their skill level when they complete each unit.
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...IRJET Journal
This document discusses using machine learning classification algorithms to predict student performance based on educational data. It compares the performance of five classification algorithms - J48, Naive Bayes, Bayes Net, Backpropagation Network, and Radial Basis Function Network - in predicting student academic achievement using attributes like demographic information, test scores, and academic factors. The experiment found that the Radial Basis Function Network algorithm achieved the highest accuracy, correctly classifying 100% of instances, compared to 75-95% accuracy for the other algorithms. Convolutional neural networks are also discussed as a powerful tool for image and language processing in educational data mining.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
The main objective of this paper is to develop a basic prototype model which can determine and extract
unknown knowledge (patterns, concepts and relations) related with multiple factors from past database records of
specific students. Data mining is science and engineering study of extracting previously undiscovered patterns
from a huge set of data. Data mining techniques are helpful for decision making as well as for discovering patterns
of data. In this paper students eligibility prediction system using Rule based classification is proposed to predict
the eligibility of students based on their details with high prediction accuracy. In Educational Institutes, a
tremendous amount of data is generated. This paper outlines the idea of predicting a particular student’s placement
eligibility by performing operations on the data stored. In this paper an efficient algorithm with the technique
Fuzzy for prediction is proposed.
A Systematic Review On Educational Data MiningKatie Robinson
This document provides a summary of a research paper on educational data mining. It discusses how educational data mining applies machine learning, statistics, and data mining techniques to educational data sets. One focus is on clustering algorithms as a preprocessing technique for educational data mining. The paper reviews over 30 years of literature on applying clustering algorithms to educational contexts. It finds that clustering can provide insights into student learning styles and variables that differentiate student groups. However, educational data has a hierarchical and non-independent nature, so clustering algorithms must be carefully chosen to match the research question.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
UNIVERSITY ADMISSION SYSTEMS USING DATA MINING TECHNIQUES TO PREDICT STUDENT ...IRJET Journal
This document summarizes a research study that aimed to predict student performance and support decision making for university admission systems using data mining techniques. The study analyzed data from 2,039 students at a university in Saudi Arabia to compare the predictive power of different data mining classification models (ANN, decision trees, SVM, naive Bayes). It found that a student's score on the pre-admission Scholastic Proficiency Admission Test was the best predictor of their first year GPA. Based on this, the university adjusted its admission criteria to give greater weight to this pre-admission test score. After making this change, the number of students with high GPAs increased while the number with low GPAs decreased.
An approach for improved students’ performance prediction using homogeneous ...IJECEIAES
Web-based learning technologies of educational institutions store a massive amount of interaction data which can be helpful to predict students’ performance through the aid of machine learning algorithms. With this, various researchers focused on studying ensemble learning methods as it is known to improve the predictive accuracy of traditional classification algorithms. This study proposed an approach for enhancing the performance prediction of different single classification algorithms by using them as base classifiers of homogeneous ensembles (bagging and boosting) and heterogeneous ensembles (voting and stacking). The model utilized various single classifiers such as multilayer perceptron or neural networks (NN), random forest (RF), naïve Bayes (NB), J48, JRip, OneR, logistic regression (LR), k-nearest neighbor (KNN), and support vector machine (SVM) to determine the base classifiers of the ensembles. In addition, the study made use of the University of California Irvine (UCI) open-access student dataset to predict students’ performance. The comparative analysis of the model’s accuracy showed that the best-performing single classifier’s accuracy increased further from 93.10% to 93.68% when used as a base classifier of a voting ensemble method. Moreover, results in this study showed that voting heterogeneous ensemble performed slightly better than bagging and boosting homogeneous ensemble methods.
The increasing need for data driven decision making recently has resulted in the application of data mining in various fields including the educational sector which is referred to as educational data mining. The need for improving the performance of data mining models has also been identified as a gap for future researcher. In Nigeria, higher educational institutions collect various students’ data, but these data are rarely used in any decision or policy making to improve the academic performance of students. This research work, attempts to improve the performance of data mining models for predicting students’ academic performance using stacking classifiers ensemble and synthetic minority over-sampling techniques. The research was conducted by adopting and evaluating the performance of J48, IBK and SMO classifiers. The individual classifiers models, standard stacking classifier ensemble model and stacking classifiers ensemble model were trained and tested on 206 students’ data set from the faculty of science federal university Dutse. Students’ specific previous academic performance records at Unified Tertiary Matriculation Examination, Senior Secondary Certificate Examination and first year Cumulative Grade Point Average of students are used as data inputs in WEKA 3.9.1 data mining tool to predict students’ graduation classes of degrees at undergraduate level. The result shows that application of synthetic minority over-sampling technique for class balancing improves all the various models performance with the proposed modified stacking classifiers ensemble model outperforming the various classifiers models in both performance accuracy and RSME values making it the best model.
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approach to facial age classification which allows a classifier to select the data from which it learns. The classifier is initially trained using a small pool of labeled training images. This is achieved by using the bilateral two dimension linear discriminant analysis. Then the most informative unlabeled image is found out from the unlabeled pool using the furthest nearest neighbor criterion, labeled by the user and added to the
appropriate class in the training set. The incremental learning is performed using an incremental version of bilateral two dimension linear discriminant analysis. This active learning paradigm is proposed to be applied to the k nearest neighbor classifier and the support vector machine classifier and to compare the performance of these two classifiers.
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational institutions. Classification techniques assumes significant importance in the machine learning tasks and are mostly employed in the prediction related problems. In machine learning problems, feature selection techniques are used to reduce the attributes of the class variables by removing the redundant and irrelevant features from the dataset. The aim of this research work is to compares the performance of various feature selection techniques is done using WEKA tool in the prediction of students’ performance in the final semester examination using different classification algorithms. Particularly J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in this research work. The dataset for the study were collected from the student’s performance report of a private college in Tamil Nadu state of India. The effectiveness of various feature selection algorithms was compared with six classifiers and the results are discussed. The results of this study shows that the accuracy of IBK is 99.680% which is found to be
This document discusses using data mining techniques to analyze faculty performance at an engineering college in India. It proposes analyzing 4 parameters - student complaints, feedback, results, and reviews - to evaluate faculty instead of just 2 parameters (feedback and results) used previously. It will use opinion mining to analyze faculty performance and calculate scores. The system will collect data, preprocess it, apply a KNN algorithm to the 4 parameters to calculate scores for each faculty, sum the scores, classify results using rule-based classification, and analyze outcomes by subject and class. It reviews related work applying educational data mining and concludes the multiple classifier approach is better, and future work could consider more parameters and expand to all college branches and departments.
A comparative study of machine learning algorithms for virtual learning envir...IAESIJAI
Virtual learning environment is becoming an increasingly popular study option for students from diverse cultural and socioeconomic backgrounds around the world. Although this learning environment is quite adaptable, improving student performance is difficult due to the online-only learning method. Therefore, it is essential to investigate students' participation and performance in virtual learning in order to improve their performance. Using a publicly available Open University learning analytics dataset, this study examines a variety of machine learning-based prediction algorithms to determine the best method for predicting students' academic success, hence providing additional alternatives for enhancing their academic achievement. Support vector machine, random forest, Nave Bayes, logical regression, and decision trees are employed for the purpose of prediction using machine learning methods. It is noticed that the random forest and logistic regression approach predict student performance with the highest average accuracy values compared to the alternatives. In a number of instances, the support vector machine has been seen to outperform the other methods.
A Survey on Educational Data Mining TechniquesIIRindia
Educational data mining (EDM) creates high impact in the field of academic domain. The methods used in this topic are playing a major advanced key role in increasing knowledge among students. EDM explores and gives ideas in understanding behavioral patterns of students to choose a correct path for choosing their carrier. This survey focuses on such category and it discusses on various techniques involved in making educational data mining for their knowledge improvement. Also, it discusses about different types of EDM tools and techniques in this article. Among the different tools and techniques, best categories are suggested for real world usage.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
A Survey on Research work in Educational Data Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document summarizes research on educational data mining. It discusses topics such as student modeling, improving educational software, mining assessment data, and generic frameworks/methods. Student modeling research focuses on automatically improving student models and predicting student performance. Research on improving software examines identifying learning behaviors and adapting intelligent tutoring systems based on individual differences. Assessment data mining analyzes optimal/worst-case mastery learning and predicting dropout using social behavior data. Generic frameworks include knowledge tracing approaches and tools for visualizing interaction networks. The conclusion recommends continued collaboration across research, education, and industry to further the field.
Technology Enabled Learning to Improve Student Performance: A SurveyIIRindia
The use of recent technology creates more impact in the teaching and learning process nowadays. Improvement of students’ knowledge by using the various technologies like smart class room environment, internet, mobile phones, television programs, use of iPods and etc. are play a very important role. Most of the education institutions used classroom teaching using advanced technologies such as smart class environment, visualization by power point projector and etc. This research work focusses on such technologies used for the improvement of student’s performance using some of the Data Mining (DM) techniques particularly classification and clustering. Information repositories (Educational Data Bases, Data Warehouses) are the source place for collecting study materials and use them for their learning purposes is the number one source for preparation of examinations. Particularly, this research work analyzes about the use of clustering and classification algorithms to enable the student’s performances and their learning capabilities using these modern technologies. During the study period, the student’s family background and their economic status are also play a very important role in their daily activities. These things are not considered in this survey work. A comparative study is carried out in this work by comparing students performance based on their results. The comparison is carried out based on the results of some of the classification and clustering algorithms. Finally, it states that the best algorithm for the improvement of students performance using these algorithms.
Similar to A Comparative Study of Educational Data Mining Techniques for Skill-based Predicting Student Performance (20)
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
A Comparative Study of Educational Data Mining Techniques for Skill-based Predicting Student Performance
1. A Comparative Study of Educational Data Mining
Techniques for skill-based Predicting Student
Performance
Mustafa M. Zain Eddin1
, Nabila A. Khodeir1
, Heba A. Elnemr2
1
Dep. Of Informatics research, Electronics Research Institute, Giza, Egypt
2
Computers and Systems Dep., Electronics Research Institute, Giza, Egypt
mustafa@eri.sci.eg
nkhodeir@eri.sci.eg
heba@eri.sci.eg
Abstract— Prediction of student performance has become an
essential issue for improving the educational system. However,
this has turned to be a challenging task due to the huge quantity
of data in the educational environment. Educational data mining
is an emerging field that aims to develop techniques to
manipulate and explore the sizable educational data.
Classification is one of the primary approaches of the
educational data mining methods that is the most widely used for
predicting student performance and characteristics. In this work,
three linear classification techniques; logistic regression, support
vector machines (SVM), and stochastic gradient descent (SGD),
and three nonlinear classification methods; decision tree, random
forest and adaptive boosting (AdaBoost) are explored and
evaluated on a dataset of ASSISTment system. A k-fold cross
validation method is used to evaluate the implemented
techniques. The results demonstrate that decision tree algorithm
outperforms the other techniques, with an average accuracy of
0.7254, an average sensitivity of 0.8036 and an average specificity
of 0.901. Furthermore, the importance of the utilized features is
obtained and the system performance is computed using the most
significant features. The results reveal that the best performance
is reached using the first 80 important features with accuracy,
sensitivity and specificity of 0.7252, 0.8042 and 0.9016,
respectively.
I. INTRODUCTION
Intelligent Tutoring Systems (ITSs) are computer-assisted
tutoring systems that permit personalization of the system
interactions. ITSs consider a student model as an input to
different tutoring systems components to adapt their
behaviours accordingly. Student model is a representation of
the student features [1] such as knowledge [2, 3], performance,
interests [4], goals [5, 6] and individual traits [7].
Diverse techniques are utilized in student modelling
process. They are generally classified into two categories:
cognitive science and data mining approaches [8]. Cognitive
Science approaches modelled how humans learn based on
domain modelling and expert systems [9]. Model-tracing (MT)
and constraint-based modelling (CBM) are the two common
techniques in cognitive science approach. Student modelling
in MT is represented in terms of rules that represent the
procedural knowledge of the domain. The model reasons for
student knowledge through tracing student execution of the
defined domain rules [10]. In CBM, student modelling is
based on recognizing the student errors in terms of violations
of defined domain constraints. Domain constraints are used to
represent not only the procedural knowledge as MT but also
the declarative knowledge [11].
On the other hand, educational data mining student
modelling approaches are based on the generated data through
interactions between the system and a large number of
students. For example, the student answers to the presented
questions and different assistance types that are requested by
the student. Reasonable data mining techniques are used as a
tool to understand and model the student properties of interest
using such data. The focus of student models that are based on
cognitive science is modelling of the student knowledge. On
the other hand, data mining techniques model different
features such as student performance, student affect, and
student learning [12, 13].
Generally, the student model is used to predict the student
response to tutorial actions and to assess the student or the ITS.
Data mining-based student models can achieve such tasks [14].
Different data mining techniques are designed to achieve
predictive tasks. Based on the values of different independent
variables, the value of a target dependent variable is predicted
[12, 14]. In ITSs, the most popular predicted variables are the
correctness of a student’s response to a question.
The focus point of this paper is to discuss the
implementation of educational data mining techniques for
predicting student performance. In this study, we appraise and
compare some of the most widespread classification
algorithms that tackle this problem. Six data mining
techniques: logistic regression, support vector machine (SVM),
stochastic gradient descent (SGD), decision tree, random
forest and adaptive boosting (AdaBoost), are investigated and
assessed. Furthermore, the feature importance method is
accomplished to reduce the feature space as well as selecting
the significant features to enhance the system performance.
The paper is organized as follows: A short overview of
educational data mining techniques is presented in section II,
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
56 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. while section III reviews some related work on using
classification techniques in students' performance prediction.
The procedure for the comparative study is exhibited in
section IV. The conclusion and future work are provided in
section V.
II. EDUCATIONAL DATA MINING TECHNIQUES
Educational data mining is a procedure applied to distill
valuable information and patterns from a massive educational
database [15]. The valuable information and patterns may be
exploited for predicting students' performance. Accordingly,
this can help the educationalists in affording an effectual
teaching methodology.
Typically, in educational data mining technique, predictive
modelling is applied to portend the student performance.
Several tasks are commonly used to build the predictive
model, including classification, categorization and regression.
Yet, classification is considered the most widespread
method to predict students' performance. Several classification
techniques can be applied to portend the students'
performance. Among these algorithms are the logistic
regression, SGD, SVM, decision tree, random forest and
AdaBoost classifier.
Classification aims to take an input vector 𝑥 and assign it
to one of 𝐾 discrete classes 𝐶 𝑘 where 𝑘 = 1, 2, … , 𝐾 .
Typically, the classes are separated and therefore each input is
assigned to one and only one class. In other words, the input
space is thereby divided into decision regions whose
boundaries are called decision surfaces. In linear models for
classification, the decision surfaces are linear functions of the
input vector.
Linear prediction methods, such as logistic regression and
SVM, have been widely used in statistics and machine
learning. In this paper, we will focus on logistic regression [16,
17, 18], SVM [19] and SGD [20].
On the other hand, the decision tree has the ability to
capture the non-linearity in the data by dividing the space into
smaller sub-spaces based on the taken decision. In this work,
the decision tree technique is explored for predicting the
student performance [21, 22]. Ensembles of decision trees,
random forest and AdaBoost, which combine the predictions
of several base estimators, are also considered. [23, 24]. A
brief description of the utilized data mining methods is
presented in the next section
A. Logistic Regression
Regression aims to estimate the conditional expectations of
continuous variables using input-output relationships. Logistic
regression is used for categorical or binary outcomes.
Therefore logistic regression is a linear model for
classification rather than regression. It is extensively utilized
owing mainly to its simplicity. Logistic regression investigates
the relation between a categorical dependent variable and
several independent variables, and predicts the probability of
incidence of an event by fitting the data into a logistic curve.
Generally, logistic regression is categorized into two models:
binary logistic regression and multinomial logistic regression.
The binary logistic regression is normally used if the
dependent variable is dichotomous, whether the independent
variables are continuous or categorical. Whereas, the
multinomial logistic regression are used if the dependent
variable is not dichotomous and contains more than two
classes. [16, 17, 18].
B. Support Vector Machine
SVM is a supervised learning method that analyses the
training data and builds a model, which in turn assigns the
unlabelled data to the appropriate class. For a binary class
learning task, the SVM aims to obtain the superior
classification function to discriminate between the two classes
elements in the training data. In this work, we have adopted
linear SVM, where a linear classification function is used to
create a separating hyperplane that goes along the middle path
separating the two classes. Outwardly, there exist infinitely
several hyperplanes that may detach the training data. The
SVM technique attempts to find the hyperplanes that
maximize the gap between both classes so as to achieve the
testing classification without perpetrating misclassification
errors. Maximizing the margin to get the largest possible
distance between the separating hyperplane and the instances
on either side of it is proven to reduce an upper bound on the
expected generalization error [19].
C. Stochastic Gradient Descent Machine
SGD is a linear classifier beneath convex loss functions like
SVM and Logistic Regression. It is considered an effective
approach to discriminative learning. Despite the availability of
this approach in the machine learning society since long time,
it has lately got more attention owing to its potency with
large-scale and sparse data mining problems. It has been
successfully exploited in issues related to natural language
processing and text classification.
Generally, gradient descent is deemed the best method used
insomuch that the parameters cannot be determined
analytically and need to be searched for through an
optimization algorithm [20]. Therefore, SGD is considered an
optimization algorithm used to find the values of parameters
of a function (f) that minimizes a cost function (cost). The aim
of the algorithm is to get model parameters that reduce the
error of the model on the training dataset. That is achieved by
changing the model that moves it through a gradient or slope
of errors down on the way to a minimum error value. This
provides the algorithm with its name of “gradient descent".
D. Decision Trees
Decision tree is a supervised learning scheme based on a
non-parametric model and utilized for classification and
regression. Decision tree constructs a model to portend the
target variable value through learning plain decision rules
inferred from the data features [21].
Decision tree algorithms start with a set of cases and create
a tree data structure that can be utilized to classify new cases.
Each case is defined by a set of features which can have
numeric or symbolic values. A label representing the name of
a class is associated with each training case [22]. Decision
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
57 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. Tree is characterized by simplicity to understand and to
interpret. Decision Tree is able to handle both numerical and
categorical data. Other techniques are usually specialised in
analysing datasets that have only one type of variable. On the
other hand, decision Tree model can create over-complex
trees that do not generalise the data well (overfitting). Setting
the minimum number of samples required at a leaf node or
setting the maximum depth of the tree is necessary to avoid
this problem.
Small deviations in the data may generate a completely
distinct decision tree, which causes instability to the tree. This
problem is alleviated by adopting an ensemble of the decision
tree.
E. Ensemble Methods
To enhance generalizability robustness over a single
classifier, ensemble methods are used which combine the
predictions of several base estimators built with a learning
algorithm.
Principally, the ensemble methods are categorized into the
averaging methods and the boosting methods. In the averaging
methods, several estimators are created independently and
then their average prediction is computed. Random forests
classifier is an example of the average methods. Conversely,
the base estimators in the boosting methods are constructed
sequentially and one attempts to decrease the bias of the
joined estimator. AdaBoost is an example of boosting
methods.
The ensemble methods target to combine several feeble
models to generate a powerful ensemble. Generally, the joined
estimator is usually superior to any of a single base estimator
as its variance is reduced [23, 24].
1) Random forests
Random Forest algorithm is an example of averaging
ensemble methods. Random Forests is more robust than
decision trees and able to model large feature spaces.
Random Forests is a bagged classifier linking a collection
of decision tree classifiers which constitute a forest of trees
[23]. The varied set of classifiers is created by introducing
randomness in the classifier construction. The prediction
of the ensemble is given as the averaged prediction of the
discrete classifiers. In random forests, each tree in the
ensemble is grown on a different bootstrap sample that
containing randomly drawn instances with replacement
from the original training sample. In addition, random
forest uses random feature selection where at each node of
the decision tree t, m features are nominated at random out
of the M features and the best split selected out of this m.
When splitting a node during the building of the tree, the
split that is selected is no longer the best split among all
features. Alternatively, the selected split is the best split
between a random subset of features. Accordingly, the
forest bias typically increases to some extent regarding the
bias of an individual non-random tree. However, averaging
is usually more than compensating for the increase in bias
that gives an overall better model
2) Adaptive Boosting (AdaBoost)
Boosting is a general ensemble method that produces a
strong classifier from a number of weak classifiers. It is
based on building a model from the training data, then
creating a second model that tries to correct the errors
from the first model. Models are added up to the training
set is predicted perfectly or a maximum number of models
are added. AdaBoost was the first really successful
boosting algorithm established for binary classification. It
can be used in combination with many other types of
learning algorithms to increase performance. The output of
the other learning algorithms is combined into a weighted
sum that denotes the final output of the boosted classifier.
AdaBoost is adaptive in the sense that subsequent weak
learners are tweaked in favour of those instances
misclassified by preceding classifiers [24].
For each consecutive iteration, the weights of sample
data are singly modified and then, the learning process is
reapplied to the reweighted sample. The incorrectly
predicted training examples induced by the boosted model
at the previous step possess increased weights, while the
correctly predicted ones hold decreased weights. Usually,
this permits decreasing the variance within the model.
III. RELATED WORK
Predicting student performance in solving problems is the
focus of a number of literature. Different educational data
mining techniques, such as decision trees [25], artificial neural
networks [26], matrix factorization [27], collaborative filters
[28] and probabilistic graphical models [29], have been
applied to develop prediction algorithms. These classifiers can
be used to identify weak students and thus assist the students
develop their learning activities.
Pereira et al, utilizes decision tree classifiers to predict the
student marks based on previous semester marks and internal
grades [25]. The accuracy of the classifiers was computed and
it was shown that the decision tree classifier CHAID has the
highest accuracy followed by C4.5.
Thai et al, compared the accuracy of decision tree and
Bayesian Network algorithms for predicting the academic
performance of undergraduate and postgraduate students at
two very different academic based on their grades and
demographic information. They concluded that the decision
tree was more accurate than the Bayesian Network [30].
Xu et al. developed an ongoing prediction of the students’
performance using ensemble learning technique.
Exponentially Weighted Average Forecaster (EWAF) is used
as a building block algorithm to enable progressive prediction
of students’ performance [31].
Feng and Heffernan utilized skill model for predicting
student performance in a large scale test Massachusetts
Comprehensive Assessment System (MCAS). Skill model is a
matrix that relates questions to the needed skills to solve the
problem. Based on the assessed skills of the students, the
model of the performance of the predicted MCAS test score is
measured. They used Mixed-effects logistic regression model
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
58 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. to predict the student response based on student response data
through time, and skills parameter. Two different grain-sized
skill models were tried. Skill model that has large grain-sized
of skills gave more accurate prediction [32].
Ostrow et al. implemented partial credit models within
ASSISTments system to predict student performance. Partial
credit scoring is defined based on penalties for hints and
attempts. The maximum likelihood probabilities for the next
problem correctness within each test fold are used as predicted
values. Implementing partial credit scoring improves
prediction of student performance within adaptive tutoring
systems [33].
Wang et al. introduced the Opportunity Count Model
(OCM) and investigated the significance of considering OC in
student models [34]. The OCM built separate models for
differing OCs by using random forest to determine
fluctuations in the importance of student performance details
across a dataset stratified by OC.
IV.DATASET AND METHOD
The ASSISTment system is based on Intelligent Tutoring
System technology and is delivered through the web. The
main feature of ASSISTments is that they offer instructional
assistance in the process of assessing students. ASSISTments
utilizes amount and type of the assistance that students receive
as a way to assess the student knowledge. In addition, the
questions and related needed skills to be solved are defined in
[33]. As shown in table I, The dataset contained performance
details for 96,331 transaction log logged by 2,889 unique
students for 978 problems spanning 106 unique Skill Builders
and 269 of these problems are multi-skilled problems. Multi-
skilled problems are problems that need more than one skill to
be solved. Such problems are called original problems.
TABLE I
USED DATASET STATISTICS
Students Questions Original Questions Logs
2889 978 269 96331
ASSISTment system provides a number of original
questions and associated scaffolding questions. The original
questions typically have the same text as in MCAS test
whereas the scaffolding questions were created by content
experts to train students who fail to answer the original
question. The process starts when the student submits an
incorrect answer, then the student is not allowed to try the
original question further, but instead must then answer a
sequence of scaffolding questions that are presented one at a
time. Students work through the scaffolding questions,
probably with hints, until he finally gets the problem correct.
Scaffolding questions allow tracking the learning of
individual skills where each question is related and tagged by
the needed skills to be solved. That is used to express the skill
model. The dataset of our experiment has 103 skills that are
defined for all used questions.
The next sections indicate the reason of for selecting
dataset features. The conducting of the comparative study of
applying various classification techniques on the addressed
dataset is considered also for predicting the student
performance.
A. Dataset Features
Our model works in predicting either the student will
answer a question or not based on a set of features. These
features are the student-id who is accessing the system, the
skills of the question if the question is a multi-skilled question
or single-skill and either the student tried to solve this
question before or not (is_done).
The system based on this information can learn and predict
the student’s behavior against different questions even a
newly added question, as the question itself is an independent
factor in our learning progress, but we care about the skills. So
whatever the question is even the questions not included in the
learning process or added later, we can predict student’s
behavior only based on the skills of a question.
B. Comparative Analysis
This work presents a comprehensive comparative study of
the applicability of the data mining techniques for predicting
student performance. Several data mining techniques, are
investigated and evaluated, logistic regression, SVM and SGD
as linear prediction methods as well as decision tree, random
forest and AdaBoost classifier as non-linear prediction
methods
C. Evaluation methods
The evaluation stage is an essential step for selecting the
appropriate classifier for a given data. In this work, we
adopted several methods to achieve the evaluation task; mean
squared error (MSE), accuracy, sensitivity and specificity.
MSE measures the average of the squares of the errors or
deviations. The model is initially fit on a training dataset,
which is a set of examples used to fit the classifier parameters.
The resulting model is run with the training dataset and
produces a result, which is then compared with the target, for
each input vector in the training dataset, to measure the MSE
for the training data. Then, the fitted model is used to predict
the responses for the observations in another part of the
dataset that called the validation dataset. The obtained results
are compared with each input vector in the validating dataset
to compute the Testing MSE. The MSE is obtained as follows
𝑀𝑆𝐸(𝑦, 𝑦̂) =
1
𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
∑ (𝑦𝑖, 𝑦̂𝑖)2
(1)
𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠−1
𝑖=0
where 𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 is the number of predictions, 𝑦̂ is the vector of
observed values, and y ̂ is the vector of predicted values.
Estimation of the accuracy, sensitivity and specificity is
based on four terms, namely:
True positives (TP): positive instances that are correctly
predicted.
True negatives (TN): negative instances that are correctly
predicted.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
59 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. False positives (FP): positive instances that are incorrectly
predicted.
False negatives (FN): negative instances that are incorrectly
predicted.
The accuracy is an empirical rate of correct prediction, the
sensitivity is the ability to correctly classify the output to a
particular class, and the specificity is the ability to predict that
the outputs of other classes are not part of a stated class. These
performance measures are computed as follows:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
(2)
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
(3)
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
(4)
A k-fold cross-validation is performed to assess the overall
performance of the implemented classifier techniques. The
dataset is split randomly into k consecutive subsets called
folds of approximately the same size. The model is trained
and tested k times. In this paper, 10-fold cross-validation is
applied.
D. Result and Discussion
TABLE II
THE AVERAGE TRAINING AND TESTING MSE FOR DIFFERENT CLASSIFIERS
Classifier Training MSE Testing MSE
Linear SVM 0.5069 0.5079
SGD Classifier 0.4922 0.4929
Logistic Regression 0.3085 0.3089
Decision Tree 0.1158 0.2746
Random Forest Classifier 0.1263 0.2717
AdaBoost Classifier 0.281 0.2817
In the light of the training and testing MSE illustrated in
figure 1 and table II, we can see that the linear SVM has the
worst training and testing MSE (0.507 & 0.508, respectively),
which means that the linear SVM has high variance and high
bias. Decision tree classifier, on the other hand, achieves the
least MSE for the training phase (0.116), while random forest
classifier attains the least MSE for the testing phase (0.272)
with an insignificant difference of the decision tree (0.275).
Fig. 1. The average Training and Testing MSE for the different classifiers
Furthermore, the accuracy, sensitivity and specificity for
the different classifiers are recorded in Table II and displayed
in figure 2, figure 3 and figure 4, respectively. It can be
observed that the worst accuracy value (0.49) is made using
the linear SVM classifier. Additionally the poorest
specificity (~0.0) is produced when applying the linear SVM
and SGD techniques. Conversely, both have the highest
sensitivity, ~1.0. While the minimum sensitivity (0.66) is
reached using the logistic regression method. Regarding the
accuracy, the decision tree and random forest classifiers
realize the best performance (~0.73) with a trivial difference.
On the other hand, the specificity accomplishes its maximum
value (0.9) using the decision tree.
TABLE III
AVERAGE ACCURACY, SENSITIVITY AND SPECIFICITY FOR THE DIFFERENT
CLASSIFIERS
Classifier Accuracy Sensitivity Specificity
Linear SVM 0.4921 0.9975 0.0042
SGD classifier 0.5071 1 0
Logistic Regression 0.6911 0.6641 0.7485
Decision tree 0.7254 0.8036 0.901
Random Forest classifier 0.7283 0.8378 0.8562
AdaBoost classifier 0.7183 0.671 0.764
Fig. 2. Accuracy of the different classifiers
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
60 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. Fig. 3. Sensitivity of the different classifiers
Fig. 4. Specificity of the different classifiers
The perfect sensitivity and deficient specificity of the linear
SVM and SGD, signifies that both classifiers are good for
capturing actual true cases, but they fail to catch false
positives. Furthermore, the results indicate that the decision
tree and random forest classifiers have roughly the best
training MSE, testing MSE, accuracy and specificity as well
as a high sensitivity. Nevertheless, their sensitivity is less than
that of linear SVM and SGD classifiers. However, the SVM
and SGD Classifiers have very small accuracy and poor
specificity, as well as very high training and testing MSE.
Additionally, despite that the random forest classifier has a
comparable performance to decision tree approach, it
disburses a long time for training large datasets. Therefore, it
may be emphasized that the decision tree classifier is the best
techniques for predicting new data that hadn’t been fitted
before.
E. Feature Selection
Feature selection is the process of selecting the most
significant features that can be used for model construction.
Furthermore, feature selection techniques delivers a way for
reducing the computation time as well as improving the
prediction performance. In order to reduce the feature domain,
the feature importance for the triumphed classifier (decision
tree) is detected using the Gini coefficient.
The Gini importance reveals the frequency of selecting a
specific feature for a split and the potential of this feature for
discrimination during the classification problem. As stated this
criterion, the features are ranked and nominated preceding to
the classification procedure. Every time a split of a node is
made on variable, the Gini impurity criterion for the two
descendant nodes is less than the parent node. Adding up the
Gini decreases for each individual variable over all trees in the
forest, gives a fast variable importance that is often very
consistent with the permutation importance measure. The Gini
importance principle demonstrated robustness versus noise
and efficiency in selecting beneficial features.
Each feature has an importance in detecting the student’s
behavior against questions, the table below shows the most
important features in predicting student’s behavior. The higher
ratio indicates the more important feature.
As shown in Fig. 5, the most important features for our
model is the student-id, then whether the student fulfilled this
question or not, without caring about his previous answer was
wrong or not, thereafter, if this question is multi_skilled or not.
Fig. 5. Different features important for decision trees classification
The impact of the feature selection on the student
performance prediction is investigated. The features are first
rated descending according to their importance. Then, the
effect of selecting different number of the first ranked features
on predicting the student performance is recorded. Fig. 6
portrays the effect of the number of features on the system
performance. The results reveal that the best performance is
achieved using the initial 80 important features. Despite the
slight improvement in the prediction accuracy (0.7252),
sensitivity (0.8042) and specificity (0.9016), the feature space
is reduced significantly.
Fig. 6. The system performance using different number of features
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
61 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
7. V. CONCLUSION
Students’ performance prediction is typically valuable to
assist the instructors and learners refining their teaching and
learning process. This paper discusses the performance of
different data mining techniques in predicting the student
performance. Logistic regression, SVM, SGD, decision-Trees,
and AdaBoost classifier are investigated and their
classification performance is compared. The obtained results
unveiled that data mining tools may broadly be employed by
education institutions to predict the students’ performance.
The uppermost classification accuracy (0.7254), sensitivity
(0.8036) and specificity (0.901) are produced using decision
tree scheme. Feature selection technique based on Gini
coefficient is performed to reduce the feature space and
improve the system performance. The superior performance is
attained using the first 80 important features.
For future work more extended datasets may be utilized.
Furthermore, hybrid classification techniques can be applied
for predicting the students’ performance.
VI.ACKNOWLEDGMENT
We used the 'Assistments Math 2006-2007 (5046 Students)'
dataset accessed via DataShop [35].
REFERENCES
[1] J. Beck, M. Stern, and E. Haugsjaa, Applications of AI in Education,
ACM Crossroads, 1996
[2] P. Brusilovsky, E. Schwarz, and G. Weber: ELM-ART: An intelligent
tutoring system on World Wide Webí. Third International Conference
on Intelligent Tutoring Systems, ITS-96, 1996, Montreal.
[3] Z. Pardos, N. Heffernan, Modeling individualization in a Bayesian
networks implementation of Knowledge Tracing. In De Bra, P., Kobsa,
A., Chin, D. (eds.) UMAP2010 Proceedings of the 18th International
Conference on User Modeling, Adaptation, and Personalization, LNCS
vol. 6075, pp. 255-266,. 2010, Springer, Heidelberg
[4] M. Feng, and N. Heffernan, Assessing Students’ Performance: Item
Difficulty Parameter vs. Skill Learning Tracking. Paper presented at
the National Council on Educational Measurement 2007 Annual
Conference, Chicago.
[5] R. Roll, V, Baker, and K. Koedinger, What goals do students have
when choosing the actions they perform? In Proceedings of the 6th
International Conference on Cognitive Modelling, (pp. 380-381), 2004,.
Pittsburgh, PA.
[6] M. Mhlenbrock, and O. Scheue, Using action analysis in active math to
estimate student motivation. In Proceedings of the 13th Annual
Workshop of the SIG Adaptivity and User Modelling in Interactive
Systems (ABIS 2005), pp. 56-60, 2005, Saarbrcken Germany.
[7] C. Carmona, G. Castillo, and E. Millan, Designing a dynamic network
for modelling students’ learning style. In Proceedings of the 8th IEEE
International Conference on Advanced Learning Technologies, pp.
346-350, 2008. Santander, Cantabria.
[8] Y. Gong,. Student modeling in Intelligent Tutoring Systems. Worcester
Polytechnic Institute, 2014
[9] N. Khodeir, H. Elazhary and N. Wanas :Rule-based Cognitive
Modeling and Model Tracing for Symbolization in a Math Story
Problem Tutor. International Journal of Emerging Technologies in
Learning 12 (4). 2017.
[10] J. Anderson, A Corbett, Cognitive tutors: Lessons learned., Journal of
the Learning Sciences 4(2): 167-207, 1995.
[11] S. Ohlsson, Learning from performance errors." Psychological Review
103: 241-262. 1996
[12] W. Hämäläinen, M. Vinni,. Comparison of machine learning methods
for intelligent tutoring systems. In international conference in
intelligent tutoring systems, Taiwan, 525-534. 2006.
[13] C. Romero , S. Ventura, "Educational data mining: A survey from
1995 to 2005", Expert Systems with Applications 33 (2007) 135–146
[14] Tan, P., M. Steinbach, et al., Eds. Introduction to Data Mining.,
Addison-Wesley, 2005.
[15] A. Shahiria, W.Husaina, and N. Rashida, A Review on Predicting
Student’s Performance using Data Mining Techniques”, Procedia
Computer Science 72 414 – 422, 2015 .
[16] Nasrabadi, Nasser M, Pattern Recognition and Machine Learning,
Journal of electronic imaging, 16(4), 2007.
[17] M. Schmidt, N. Le Roux, and F. Bach, Minimizing Finite Sums with
the Stochastic Average Gradient. 162(1,2), p 83—112, 2017
[18] A. Defazio, F. Bach, S. Lacoste-Julien: SAGA, A Fast Incremental
Gradient Method With Support for Non-Strongly Convex Composite
Objectives, Advances in neural information processing systems,
p1646—1654, 2014
[19] C. Burges, A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery 2: 1:47, 1998.
[20] T. Zhang, “Solving large scale linear prediction problems using
stochastic gradient descent algorithms” - In Proceedings of ICML ‘04.
2004.
[21] T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical
Learning, Springer, 2009.
[22] Quinlan, J. Ross, C4. 5: programs for machine learning, Morgan
Kaufmann, 1993.
[23] A. Prinzie and P. Van den, Random multiclass classification:
Generalizing random forests to random mnl and random nb,
International Conference on Database and Expert Systems Applications,
p. 349--358, 2007.
[24] J. Zhu, H. Zou, S. Rosset, T. Hastie, Multi-class AdaBoost , 2009.
[25] BA Pereira, A. Pai, C Fernandes. A Comparative Analysis of Decision
Tree Algorithms for Predicting Student’s Performance - International
Journal of Engineering Science and Computing, April, 2017
[26] W., Y.-h., and Liao, H.-C. Data mining for adaptive learning in a tesl-
based e-learning system. Expert Systems with Applications 38(6), p
6480–6485, . 2011.
[27] N. Thai-Nghe, L.; Drumond, L. and T. Horvath, T.;
[28] L. Schmidt-Thieme, Multi-relational factorization models for
predicting student performance. In Proc. of the KDD Workshop on
Knowledge Discovery in Educational Data. Citeseer. 2011.
[29] A., Toscher, and M. Jahrer, Collaborative filtering applied to
educational data mining. KDD cup. 2010
[30] Z. Pardos, and N. Heffernan, Using hmms and bagged decision trees to
leverage rich features of user and skill from an intelligent tutoring
system dataset. Journal of Machine Learning Research W & CP. 2010.
[31] N. Thai, Janecek, P. Haddawy, A comparative analysis of techniques
for predicting academic performance. In Proc. of 37th Conf. on
ASEE/IEEE Frontiers in Education, 2007
[32] J. Xu, Y. Han, D. Marcu, M. van der Schaar. Progressive Prediction of
Student Performance in College Programs. Proceedings of the Thirty-
First AAAI Conference on Artificial Intelligence (AAAI-17) p. 1604—
1610, 2017.
[33] M. Feng and N. Heffernan, Assessing Students’ Performance
Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking
[34] K. Ostrow, C. Donnelly, and N. Heffernan, “Optimizing partial credit
algorithms to predict student performance,” Proc. 8th International
Conference on Educational Data Mining, pp.404–407, Madrid, Spain,
2015
[35] Y Wang, K Ostrow, S Adjei, N Heffernan, The Opportunity Count
Model: A Flexible Approach to Modeling Student Performance.
Proceedings of the Third ACM Conference on Learning @ Scale Pages
p 113-116, 2016.
[36] K. Koedinger, R. Baker, K, Cunningham, A. Skogsholm, B. Leber, J.
Stamper, A Data Repository for the EDM community: The PSLC
DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d.
(Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC
Press, 2010.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
62 https://sites.google.com/site/ijcsis/
ISSN 1947-5500