In this paper, we apply different data mining approaches for the purpose of examining and predicting students’ dropouts through their university programs. For the subject of the study we select a total of 1290 records of computer science students Graduated from ALAQSA University between 2005 and 2011. The
collected data included student study history and transcript for courses taught in the first two years of
computer science major in addition to student GPA , high school average , and class label of (yes ,No) to
indicate whether the student graduated from the chosen major or not. In order to classify and predict
dropout students, different classifiers have been trained on our data sets including Decision Tree (DT),
Naive Bayes (NB). These methods were tested using 10-fold cross validation. The accuracy of DT, and NlB
classifiers were 98.14% and 96.86% respectively. The study also includes discovering hidden relationships
between student dropout status and enrolment persistence by mining a frequent cases using FP-growth
algorithm.
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
Predicting instructor performance using data mining techniques in higher education
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
Predicting student performance in higher education using multi-regression modelsTELKOMNIKA JOURNAL
Supporting the goal of higher education to produce graduation who will be a professional leader is a crucial. Most of universities implement intelligent information system (IIS) to support in achieving their vision and mission. One of the features of IIS is student performance prediction. By implementing data mining model in IIS, this feature could precisely predict the student’ grade for their enrolled subjects. Moreover, it can recognize at-risk students and allow top educational management to take educative interventions in order to succeed academically. In this research, multi-regression model was proposed to build model for every student. In our model, learning management system (LMS) activity logs were computed. Based on the testing result on big students datasets, courses, and activities indicates that these models could improve the accuracy of prediction model by over 15%.
Data Mining Techniques in Higher Education an Empirical Study for the Univer...IJMER
Nowadays, ones of the biggest challenges that educational institutions face is the explosive
growth of educational data. and how to use these data to improve the quality of managerial decisions.
Data mining, as an analytical tools that can be used to extract meaningful knowledge from large data
sets, can be used to achieve this goal.
This paper addresses the applications of Educational Data Mining (EDM) to extract useful information
from registration information of student at university of Palestine in Gaza strip. The data include five
years period [2005-2011] by providing analytical tool to view and use this information for decision
making processes by taking real life example such as grade and GPA for the students. abstract should
summarize the content of the paper.
Students Knowledge based Advisor System for Colleges Admission With an Applie...IJRES Journal
This paper presents a knowledge based advisor system designed to aid preparatory year students to achieve their desire of colleges they wanted to be dwelled allocated. The system predicts the 2nd term GPA, and the final term GPA by knowing the 1st semester GPA, then advise students to suitable colleges. A student who is far from achieving his desire is advised to study some aided courses. The system uses historical data extracted from the university database. Then statistical prediction algorithms are developed, to evaluate and to match students’ current desires with colleges’ qualified criteria.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
Data mining is the crucial steps to find out previously unknown information from large relational database. various technique and algorithm are their used in data mining such as association rules, clustering and classification and prediction techniques. Ease of the techniques contains particular characteristics and behaviour. In this paper the prime focus on clustering technique and prediction technique. Now a days large amount of data stored in educational database increasing rapidly. The database for particular set of student was collected. The clustering and prediction is made on some detailed manner and the results were produce. The K-means clustering algorithm is used here. To find nearest possible a cluster a similar group the turning point India is the performance in higher education for all students. This academic performance is influenced by various factor, therefore to identify the difference between high learners and slow learner students it is important for student performance to develop predictive data mining model.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Predicting instructor performance using data mining techniques in higher educ...redpel dot com
Predicting instructor performance using data mining techniques in higher education
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
Predicting student performance in higher education using multi-regression modelsTELKOMNIKA JOURNAL
Supporting the goal of higher education to produce graduation who will be a professional leader is a crucial. Most of universities implement intelligent information system (IIS) to support in achieving their vision and mission. One of the features of IIS is student performance prediction. By implementing data mining model in IIS, this feature could precisely predict the student’ grade for their enrolled subjects. Moreover, it can recognize at-risk students and allow top educational management to take educative interventions in order to succeed academically. In this research, multi-regression model was proposed to build model for every student. In our model, learning management system (LMS) activity logs were computed. Based on the testing result on big students datasets, courses, and activities indicates that these models could improve the accuracy of prediction model by over 15%.
Data Mining Techniques in Higher Education an Empirical Study for the Univer...IJMER
Nowadays, ones of the biggest challenges that educational institutions face is the explosive
growth of educational data. and how to use these data to improve the quality of managerial decisions.
Data mining, as an analytical tools that can be used to extract meaningful knowledge from large data
sets, can be used to achieve this goal.
This paper addresses the applications of Educational Data Mining (EDM) to extract useful information
from registration information of student at university of Palestine in Gaza strip. The data include five
years period [2005-2011] by providing analytical tool to view and use this information for decision
making processes by taking real life example such as grade and GPA for the students. abstract should
summarize the content of the paper.
Students Knowledge based Advisor System for Colleges Admission With an Applie...IJRES Journal
This paper presents a knowledge based advisor system designed to aid preparatory year students to achieve their desire of colleges they wanted to be dwelled allocated. The system predicts the 2nd term GPA, and the final term GPA by knowing the 1st semester GPA, then advise students to suitable colleges. A student who is far from achieving his desire is advised to study some aided courses. The system uses historical data extracted from the university database. Then statistical prediction algorithms are developed, to evaluate and to match students’ current desires with colleges’ qualified criteria.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
The document discusses a proposed students' performance prediction system using multi-agent data mining techniques. It aims to predict student performance with high accuracy and help low-performing students. The system uses ensemble classifiers like Adaboost.M1 and LogitBoost and compares their prediction accuracy to the single classifier C4.5 decision tree. Experimental results showed SAMME boosting improved prediction accuracy over C4.5 and LogitBoost.
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
Data mining is the crucial steps to find out previously unknown information from large relational database. various technique and algorithm are their used in data mining such as association rules, clustering and classification and prediction techniques. Ease of the techniques contains particular characteristics and behaviour. In this paper the prime focus on clustering technique and prediction technique. Now a days large amount of data stored in educational database increasing rapidly. The database for particular set of student was collected. The clustering and prediction is made on some detailed manner and the results were produce. The K-means clustering algorithm is used here. To find nearest possible a cluster a similar group the turning point India is the performance in higher education for all students. This academic performance is influenced by various factor, therefore to identify the difference between high learners and slow learner students it is important for student performance to develop predictive data mining model.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
Data Analysis and Result Computation (DARC) Algorithm for Tertiary InstitutionsIOSR Journals
The document describes a Data Analysis and Result Computation (DARC) algorithm written in Fortran for analyzing student data and computing examination results in tertiary institutions. DARC takes student information, course data, examination scores, and previous academic records as input. It outputs analysis of student demographics, individual student result sheets showing grades and GPA/CGPA calculations, summaries of academic performance, and logs of courses passed or still outstanding. Testing on sample student data validated DARC's reliability in accurately processing records and computing results for large student populations in a tertiary institution.
This paper highlights important issues of higher education system such as predicting student’s academic performance. This is trivial to study predominantly from the point of view of the institutional administration, management, different stakeholder, faculty, students as well as parents. For making analysis on the student data we selected algorithms like Decision Tree, Naive Bayes, Random Forest, PART and Bayes Network with three most important techniques such as 10-fold cross-validation, percentage split (74%) and training set. After performing analysis on different metrics (Time to build Classifier, Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, Root Relative Squared Error, Precision, Recall, F-Measure, ROC Area) by different data mining algorithm, we are able to find which algorithm is performing better than other on the student dataset in hand, so that we are able to make a guideline for future improvement in student performance in education. According to analysis of student dataset we found that Random Forest algorithm gave the best result as compared to another algorithm with Recall value approximately equal to one. The analysis of different data mini g algorithm gave an in-depth awareness about how these algorithms predict student the performance of different student and enhance their skill.
Data mining to predict academic performance. Ranjith Gowda
This document proposes using data warehousing and data mining techniques to predict student academic performance in schools. It describes collecting student data like scores, attendance, discipline, and assignments into a data warehouse. Data mining methods are then used to analyze the student data and identify relationships between variables to predict performance, such as whether students are progressing, being retained, or conditionally progressing. The results could help schools identify students at risk of failing and take actions to help them succeed.
Predicting Success : An Application of Data Mining Techniques to Student Outc...IJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter
tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
This document discusses using data mining techniques to analyze faculty performance at an engineering college in India. It proposes analyzing 4 parameters - student complaints, feedback, results, and reviews - to evaluate faculty instead of just 2 parameters (feedback and results) used previously. It will use opinion mining to analyze faculty performance and calculate scores. The system will collect data, preprocess it, apply a KNN algorithm to the 4 parameters to calculate scores for each faculty, sum the scores, classify results using rule-based classification, and analyze outcomes by subject and class. It reviews related work applying educational data mining and concludes the multiple classifier approach is better, and future work could consider more parameters and expand to all college branches and departments.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...IJITE
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one letter grade standard deviation), based on participation data. Student background variables are not useful for predicting grades. By using a combination of segmentation, random forest regression, linear transformation and application beyond the segmented data, it is possible to determine the population of the Auditors student use case, a population larger than those students completing courses with grades.
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one
letter grade standard deviation), based on participation data. Student background variables are not useful
for predicting grades. By using a combination of segmentation, random forest regression, linear
transformation and application beyond the segmented data, it is possible to determine the population of the
Auditors student use case, a population larger than those students completing courses with grades.
This document presents a study that uses linear regression to predict university freshmen's academic performance (GPA) based on their scores on the Joint Matriculation Examination (JME). The study finds a weak positive correlation (R=0.137) between GPA and JME scores, with the regression model only explaining 1.9% of variability in GPA. Statistical tests show no significant relationship between JME score and university GPA (p>0.05). The study concludes that JME score is not a strong predictor of freshmen academic performance.
Empirical Study on Classification Algorithm For Evaluation of Students Academ...iosrjce
Data mining techniques (DMT) are extensively used in educational field to find new hidden patterns
from student’s data. In recent years, the greatest issues that educational institutions are facing the unstable
expansion of educational data and to utilize this information data to progress the quality of managerial
decisions. Educational institutions are playing a prominent role in the public and also playing an essential role
for enlargement and progress of nation. The idea is predicting the paths of students, thus identifying the student
achievement. The data mining methods are very useful in predicting the educational database. Educational data
mining is concerns with improving techniques for determining knowledge from data which comes from the
educational database. However it has issue with accuracy of classification algorithms. To overcome this
problem the higher accuracy of the classification J48 algorithm is used. This work takes consideration with the
locality and the performance of the student in education in order to analyse the student achievement is high over
schooling or in graduation
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and science were studied and compared. The purpose of this research is to predict the academic major of high school students using Bayesian networks. The effective factors have been used in academic major selection for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on each other, discretization data and processing them was performed by GeNIe. The proper course would be advised for students to continue their education.
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
The efficiency examination of teaching of different normalization methodsijdms
1. The document discusses different methods for teaching database normalization to students and examines their effectiveness through surveys of students. It introduces conventional normalization, normalization based on dependency diagrams, normalization through solving practice tests, and a "cookbook" normalization method.
2. Surveys found that students generally found normalization the most difficult topic, with determining normal forms and decomposing tables being particularly challenging. Additional methods helped improve student understanding to varying degrees.
3. The author concludes by discussing how they teach multiple normalization methods, including conventional and dependency diagram approaches, and have students practice through computational tests to reinforce the normalization process step-by-step. Further research on the most effective teaching methods could help optimize student learning of this important database
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGIJDKP
This document discusses a hybrid data mining approach called combined mining that can generate informative patterns from complex data sources. It proposes applying three techniques: 1) Using the Lossy-counting algorithm on individual data sources to obtain frequent itemsets, 2) Generating incremental pair and cluster patterns using a multi-feature approach, 3) Combining FP-growth and Bayesian Belief Network using a multi-method approach to generate classifiers. The approach is tested on two datasets to obtain more useful knowledge and the results are compared.
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...IJDKP
This document summarizes an algorithm called Principal Component Outlier Detection (PrCmpOut) for identifying outliers in high-dimensional molecular descriptor datasets. PrCmpOut uses principal component analysis to transform the data into a lower-dimensional space, where it can more efficiently detect outliers using robust estimators of location and covariance. The properties of PrCmpOut are analyzed and compared to other robust outlier detection methods through simulation studies using a dataset of oxazoline and oxazole molecular descriptors. Numerical results show PrCmpOut performs well at outlier detection in high-dimensional data.
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATIONIJDKP
The document describes a new method for multi-domain document classification using keyword sequences extracted from documents. It introduces the Word AdHoc Network (WANET) system which uses Google's Keyword Relation and a new similarity measurement called Google Purity to classify documents into domains based on their extracted 4-word keyword sequences, without requiring pre-established keyword repositories. Experimental results showed the classification was accurate and efficient, allowing cross-domain classification and management of knowledge from different sources.
On a business level, everyone wants to get hold of the business value and other organizational advantages that big data has to offer. Analytics has arisen as the primitive path to business value from big data. Hadoop is not just a storage platform for big data; it’s also a computational and processing platform for business analytics. Hadoop is, however, unsuccessful in fulfilling business requirements when it comes to live data streaming. The initial architecture of Apache Hadoop did not solve the problem of live stream data mining. In summary, the traditional approach of big data being co-relational to Hadoop is false; focus needs to be given on business value as well. Data Warehousing, Hadoop and stream processing complement each other very well. In this paper, we have tried reviewing a few frameworks and products
which use real time data streaming by providing modifications to Hadoop.
This document discusses methods for incremental machine learning. It begins with an introduction that defines incremental learning as adapting what has been learned based on new examples over time, without assuming all training data is initially available.
The document then reviews two traditional approaches: data accumulation, where a new hypothesis is generated based on all accumulated data when new data arrives, and ensemble learning, where new hypotheses are generated from new data and combined through voting.
Specific incremental learning algorithms are then summarized, including unsupervised methods like CF1, CF2, and ARTMAP, and supervised methods like Learn++, Learn++.NC, and using genetic algorithms. The document provides an overview of current incremental learning research.
This document summarizes and evaluates various rule extraction algorithms from trained artificial neural networks. It begins with an introduction explaining the importance of explanation capabilities for neural networks. It then provides a taxonomy for classifying rule extraction approaches based on the expressiveness of the extracted rules, whether the approach takes an open-box or black-box view of the neural network, any specialized training regimes used, the quality of explanations generated, and computational complexity. The document discusses sensitivity analysis as a basic method for understanding neural network relationships before focusing on decompositional and pedagogical rule extraction approaches.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
Data Analysis and Result Computation (DARC) Algorithm for Tertiary InstitutionsIOSR Journals
The document describes a Data Analysis and Result Computation (DARC) algorithm written in Fortran for analyzing student data and computing examination results in tertiary institutions. DARC takes student information, course data, examination scores, and previous academic records as input. It outputs analysis of student demographics, individual student result sheets showing grades and GPA/CGPA calculations, summaries of academic performance, and logs of courses passed or still outstanding. Testing on sample student data validated DARC's reliability in accurately processing records and computing results for large student populations in a tertiary institution.
This paper highlights important issues of higher education system such as predicting student’s academic performance. This is trivial to study predominantly from the point of view of the institutional administration, management, different stakeholder, faculty, students as well as parents. For making analysis on the student data we selected algorithms like Decision Tree, Naive Bayes, Random Forest, PART and Bayes Network with three most important techniques such as 10-fold cross-validation, percentage split (74%) and training set. After performing analysis on different metrics (Time to build Classifier, Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, Root Relative Squared Error, Precision, Recall, F-Measure, ROC Area) by different data mining algorithm, we are able to find which algorithm is performing better than other on the student dataset in hand, so that we are able to make a guideline for future improvement in student performance in education. According to analysis of student dataset we found that Random Forest algorithm gave the best result as compared to another algorithm with Recall value approximately equal to one. The analysis of different data mini g algorithm gave an in-depth awareness about how these algorithms predict student the performance of different student and enhance their skill.
Data mining to predict academic performance. Ranjith Gowda
This document proposes using data warehousing and data mining techniques to predict student academic performance in schools. It describes collecting student data like scores, attendance, discipline, and assignments into a data warehouse. Data mining methods are then used to analyze the student data and identify relationships between variables to predict performance, such as whether students are progressing, being retained, or conditionally progressing. The results could help schools identify students at risk of failing and take actions to help them succeed.
Predicting Success : An Application of Data Mining Techniques to Student Outc...IJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter
tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
This document discusses using data mining techniques to analyze faculty performance at an engineering college in India. It proposes analyzing 4 parameters - student complaints, feedback, results, and reviews - to evaluate faculty instead of just 2 parameters (feedback and results) used previously. It will use opinion mining to analyze faculty performance and calculate scores. The system will collect data, preprocess it, apply a KNN algorithm to the 4 parameters to calculate scores for each faculty, sum the scores, classify results using rule-based classification, and analyze outcomes by subject and class. It reviews related work applying educational data mining and concludes the multiple classifier approach is better, and future work could consider more parameters and expand to all college branches and departments.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
Role of education is very critical for the development of any country. So it is the responsibility of each and every person to do something for the betterment of education. Taking this fact into consideration we start working on the education system. Education system ranging from basic to higher education. Now a day education system generates a lots of data related to student. If we cannot analyze that data properly then that data is useless. With the help of data mining techniques we can find the hidden information from the data collected for the different educational setting. With the help of that information we can review our educational process or make improvement in our education system. Here in this article we are considering a case of an engineering college student and try to predict the final result in advance. The result of the prediction provides timely help to those students who are on risk of failure in the final examination. There are different techniques of data mining are available and we are using J48, RandomForest, and ADTree to predict the performance of the student in their final examination. On the basis of this predication we can make a decision whether the student will be promoted to next year or not. We the help of the result we can improve the performance of the student who are on risk of fail or promoted. After the declaration of the final result of the student, result is fed into the system and hence the result will analysed for the next semester. The comparative result shows that, prediction help in the improvement of overall result of the weaker students.
Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the ...IJITE
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one letter grade standard deviation), based on participation data. Student background variables are not useful for predicting grades. By using a combination of segmentation, random forest regression, linear transformation and application beyond the segmented data, it is possible to determine the population of the Auditors student use case, a population larger than those students completing courses with grades.
Predictive models are able to predict edX student grades with an accuracy error of 0.1 (10%, about one
letter grade standard deviation), based on participation data. Student background variables are not useful
for predicting grades. By using a combination of segmentation, random forest regression, linear
transformation and application beyond the segmented data, it is possible to determine the population of the
Auditors student use case, a population larger than those students completing courses with grades.
This document presents a study that uses linear regression to predict university freshmen's academic performance (GPA) based on their scores on the Joint Matriculation Examination (JME). The study finds a weak positive correlation (R=0.137) between GPA and JME scores, with the regression model only explaining 1.9% of variability in GPA. Statistical tests show no significant relationship between JME score and university GPA (p>0.05). The study concludes that JME score is not a strong predictor of freshmen academic performance.
Empirical Study on Classification Algorithm For Evaluation of Students Academ...iosrjce
Data mining techniques (DMT) are extensively used in educational field to find new hidden patterns
from student’s data. In recent years, the greatest issues that educational institutions are facing the unstable
expansion of educational data and to utilize this information data to progress the quality of managerial
decisions. Educational institutions are playing a prominent role in the public and also playing an essential role
for enlargement and progress of nation. The idea is predicting the paths of students, thus identifying the student
achievement. The data mining methods are very useful in predicting the educational database. Educational data
mining is concerns with improving techniques for determining knowledge from data which comes from the
educational database. However it has issue with accuracy of classification algorithms. To overcome this
problem the higher accuracy of the classification J48 algorithm is used. This work takes consideration with the
locality and the performance of the student in education in order to analyse the student achievement is high over
schooling or in graduation
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and science were studied and compared. The purpose of this research is to predict the academic major of high school students using Bayesian networks. The effective factors have been used in academic major selection for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on each other, discretization data and processing them was performed by GeNIe. The proper course would be advised for students to continue their education.
Recommendation of Data Mining Technique in Higher Education Prof. Priya Thaka...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
The efficiency examination of teaching of different normalization methodsijdms
1. The document discusses different methods for teaching database normalization to students and examines their effectiveness through surveys of students. It introduces conventional normalization, normalization based on dependency diagrams, normalization through solving practice tests, and a "cookbook" normalization method.
2. Surveys found that students generally found normalization the most difficult topic, with determining normal forms and decomposing tables being particularly challenging. Additional methods helped improve student understanding to varying degrees.
3. The author concludes by discussing how they teach multiple normalization methods, including conventional and dependency diagram approaches, and have students practice through computational tests to reinforce the normalization process step-by-step. Further research on the most effective teaching methods could help optimize student learning of this important database
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGIJDKP
This document discusses a hybrid data mining approach called combined mining that can generate informative patterns from complex data sources. It proposes applying three techniques: 1) Using the Lossy-counting algorithm on individual data sources to obtain frequent itemsets, 2) Generating incremental pair and cluster patterns using a multi-feature approach, 3) Combining FP-growth and Bayesian Belief Network using a multi-method approach to generate classifiers. The approach is tested on two datasets to obtain more useful knowledge and the results are compared.
IDENTIFICATION OF OUTLIERS IN OXAZOLINES AND OXAZOLES HIGH DIMENSION MOLECULA...IJDKP
This document summarizes an algorithm called Principal Component Outlier Detection (PrCmpOut) for identifying outliers in high-dimensional molecular descriptor datasets. PrCmpOut uses principal component analysis to transform the data into a lower-dimensional space, where it can more efficiently detect outliers using robust estimators of location and covariance. The properties of PrCmpOut are analyzed and compared to other robust outlier detection methods through simulation studies using a dataset of oxazoline and oxazole molecular descriptors. Numerical results show PrCmpOut performs well at outlier detection in high-dimensional data.
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATIONIJDKP
The document describes a new method for multi-domain document classification using keyword sequences extracted from documents. It introduces the Word AdHoc Network (WANET) system which uses Google's Keyword Relation and a new similarity measurement called Google Purity to classify documents into domains based on their extracted 4-word keyword sequences, without requiring pre-established keyword repositories. Experimental results showed the classification was accurate and efficient, allowing cross-domain classification and management of knowledge from different sources.
On a business level, everyone wants to get hold of the business value and other organizational advantages that big data has to offer. Analytics has arisen as the primitive path to business value from big data. Hadoop is not just a storage platform for big data; it’s also a computational and processing platform for business analytics. Hadoop is, however, unsuccessful in fulfilling business requirements when it comes to live data streaming. The initial architecture of Apache Hadoop did not solve the problem of live stream data mining. In summary, the traditional approach of big data being co-relational to Hadoop is false; focus needs to be given on business value as well. Data Warehousing, Hadoop and stream processing complement each other very well. In this paper, we have tried reviewing a few frameworks and products
which use real time data streaming by providing modifications to Hadoop.
This document discusses methods for incremental machine learning. It begins with an introduction that defines incremental learning as adapting what has been learned based on new examples over time, without assuming all training data is initially available.
The document then reviews two traditional approaches: data accumulation, where a new hypothesis is generated based on all accumulated data when new data arrives, and ensemble learning, where new hypotheses are generated from new data and combined through voting.
Specific incremental learning algorithms are then summarized, including unsupervised methods like CF1, CF2, and ARTMAP, and supervised methods like Learn++, Learn++.NC, and using genetic algorithms. The document provides an overview of current incremental learning research.
This document summarizes and evaluates various rule extraction algorithms from trained artificial neural networks. It begins with an introduction explaining the importance of explanation capabilities for neural networks. It then provides a taxonomy for classifying rule extraction approaches based on the expressiveness of the extracted rules, whether the approach takes an open-box or black-box view of the neural network, any specialized training regimes used, the quality of explanations generated, and computational complexity. The document discusses sensitivity analysis as a basic method for understanding neural network relationships before focusing on decompositional and pedagogical rule extraction approaches.
The music objects are classified into Monophonic and Polyphonic. In Monophonic there is only one track
which is the main melody that leads the song. In Polyphonic objects, there are several tracks that
accompany the main melody. Each track is a sequence of notes played simultaneously with other tracks.
But, the main melody captures the essence of the music and plays vital role in MIR. The MIR involves
representation of main melody as a sequence of notes played, extraction of repeating patterns from it and
matching of query sequence with frequent repeating sequential patterns constituting the music object.
Repeating patterns are subsequences of notes played time and again in a main melody with possible
variations in the notes to a tolerable extent. Similarly, the query sequence meant for retrieving a music
object may not contain the repeating patterns of the main melody in its exact form. Hence, extraction of
approximate patterns is essential for a MIR system. This paper proposes a novel method of finding
approximate repeating patterns for the purpose of MIR. The effectiveness of methodology is tested and
found satisfactory on real world data namely ‘Raga Surabhi’ an Indian Carnatic Music portal.
A comparative study on term weighting methods for automated telugu text categ...IJDKP
Automatic Text categorization refers to the process of assigning a category or some categories
automatically among predefined ones. Text categorization is challenging in Indian languages has rich in
morphology, a large number of word forms and large feature spaces. This paper investigates the
performance of different classification approaches using different term weighting approaches in order to
decide the most applicable one to Telugu text classification problem. We have investigated on different
term weighting methods for Telugu corpus in combination with Naive Bayes ( NB), Support Vector
Machine (SVM) and k Nearest Neighbor (kNN) classifiers.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
A statistical model for gist generation a case study on hindi news articleIJDKP
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex task that requires a complete understanding of each contributing database’s schema to define the structure of its information. The key aim is to approximate the structure
and content of the induced data into a concise synopsis in order to extract and link meaningful data-driven facts. We identify such problems as four major research issues in Data Linkage: associated costs in pairwise matching, record matching overheads, semantic flow of information restrictions, and single order classification limitations. In this paper, we give a literature review of research in Data Linkage. The
purpose for this review is to establish a basic understanding of Data Linkage, and to discuss the
background in the Data Linkage research domain. Particularly, we focus on the literature related to the recent advancements in Approximate Matching algorithms at Attribute Level and Structure Level. Their efficiency, functionality and limitations are critically analysed and open-ended problems have been
exposed.
This document summarizes a proposed framework for sentiment classification using fuzzy logic. The framework aims to detect both implicit and explicit sentiment expressions in text by incorporating multiple datasets and techniques. It involves preprocessing text data, classifying sentiment, applying fuzzy logic to reduce emotions, and using fuzzy c-means clustering to further group similar emotions. The goal is to more accurately extract sentiment from transcripts by identifying both implicit and explicit expressions as well as topics through this combined approach. Evaluation metrics like precision, recall and f-measure will be used to assess performance.
Evaluation metric plays a critical role in achieving the optimal classifier during the classification training.
Thus, a selection of suitable evaluation metric is an important key for discriminating and obtaining the
optimal classifier. This paper systematically reviewed the related evaluation metrics that are specifically
designed as a discriminator for optimizing generative classifier. Generally, many generative classifiers
employ accuracy as a measure to discriminate the optimal solution during the classification training.
However, the accuracy has several weaknesses which are less distinctiveness, less discriminability, less
informativeness and bias to majority class data. This paper also briefly discusses other metrics that are
specifically designed for discriminating the optimal solution. The shortcomings of these alternative metrics
are also discussed. Finally, this paper suggests five important aspects that must be taken into consideration
in constructing a new discriminator metric.
A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERINGIJDKP
Web Ming faces huge problems due to Duplicate and Near Duplicate Web pages. Detecting Near
Duplicates is very difficult in large collection of data like ”internet”. The presence of these web pages
plays an important role in the performance degradation while integrating data from heterogeneous
sources. These pages either increase the index storage space or increase the serving costs. Detecting these
pages has many potential applications for example may indicate plagiarism or copyright infringement.
This paper concerns detecting, and optionally removing duplicate and near duplicate documents which are
used to perform clustering of documents .We demonstrated our approach in web news articles domain. The
experimental results show that our algorithm outperforms in terms of similarity measures. The near
duplicate and duplicate document identification has resulted reduced memory in repositories.
THE EVALUATION OF FACTORS INFLUENCING SAFETY PERFORMANCE: A CASE IN AN INDUST...IJDKP
Safety has become a very important element in firms and organisations especially in Ghana. The impact of safety factors on a firm’s 3E’s (Employee, Environment and Equipment) can improve or deteriorate firm’s public image. This paper identified the key safety indicators and also provided a set of core factors that contribute meaningful in promoting safety performance in an Industrial Gas producer in Ghana using the Analytic Hierarchy Process. Organisational, Human, Technical and Environmental factors were identified as the safety indicators in relation to the study area. The studies revealed that organisational factor is the most important factor or criterion that could facilitate a better safety performance of the Industrial Gas
Company. In addition, employees was identified the best safety alternative, whilst environment and equipment followed sequentially.
Mining high utility itemsets in data streams based on the weighted sliding wi...IJDKP
This document proposes an algorithm for mining high utility itemsets from data streams based on the weighted sliding window model. The weighted sliding window model allows the user to specify the number of windows, window size, and weight for each window. The algorithm, called HUI_W, makes a single pass over the data stream and takes advantage of reusing stored information to efficiently discover high utility itemsets. It first identifies high transaction-weighted utilization (HTWU) 1-itemsets, then generates candidate itemsets and determines actual high utility itemsets based on a transaction-weighted downward closure property to prune the search space.
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where
data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data
mining, its techniques and methods in it.
The recruitment of new personnel is one of the most essential business processes which affect the quality of
human capital within any company. It is highly essential for the companies to ensure the recruitment of
right talent to maintain a competitive edge over the others in the market. However IT companies often face
a problem while recruiting new people for their ongoing projects due to lack of a proper framework that
defines a criteria for the selection process. In this paper we aim to develop a framework that would allow
any project manager to take the right decision for selecting new talent by correlating performance
parameters with the other domain-specific attributes of the candidates. Also, another important motivation
behind this project is to check the validity of the selection procedure often followed by various big
companies in both public and private sectors which focus only on academic scores, GPA/grades of students
from colleges and other academic backgrounds. We test if such a decision will produce optimal results in
the industry or is there a need for change that offers a more holistic approach to recruitment of new talent
in the software companies. The scope of this work extends beyond the IT domain and a similar procedure
can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide
useful information from the historical projects depending on which the hiring-manager can make decisions
for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining
framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The
results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for
quality objectives.
Social bookmarking system is a web-based resource sharing system that allows users to upload, share and
organize their resources i.e. bookmarks and publications. The system has shifted the paradigm of
bookmarking from an individual activity limited to desktop to a collective activity on the web. It also
facilitates user to annotate his resource with free form tags that leads to large communities of users to
collaboratively create accessible repositories of web resources. Tagging process has its own challenges
like ambiguity, redundancy or misspelled tags and sometimes user tends to avoid it as he has to describe
tag at his own. The resultant tag space is noisy or very sparse and dilutes the purpose of tagging. The
effective solution is Tag Recommendation System that automatically suggests appropriate set of tags to
user while annotating resource. In this paper, we propose a framework that does not depend on tagging
history of the resource or user and thereby capable of suggesting tags to the resources which are being
submitted to the system first time. We model tag recommendation task as multi-label text classification
problem and use Naive Bayes classifier as the base learner of the multilabel classifier. We experiment with
Boolean, bag-of-words and term frequency-inverse document frequency (TFIDF) representation of the
resources and fit appropriate distribution to the data based on the representation used. Impact of feature
selection on the effectiveness of the tag recommendation is also studied. Effectiveness of the proposed
framework is evaluated through precision, recall and f-measure metrics.
This paper describes the Collective Experience Engine (CEE), a system for indexing Experiential-
Knowledge about Web knowledge-sources (websites), and performing relative-experience calculations
between participants of the CEE. The CEE provides an in-browser interface to query the collective
experience of others participating in the CEE. This interface accepts a list of URLs, to which the CEE adds
additional information based on the Queryee's previously indexed Experiential-Knowledge. The core of the
CEE is its Experiential-Context Conversation (ECConversation) functionality, whereby an collection of a
person’s Web Experiential-Knowledge can be utilized to allow a real-world conversation-like exchange of
information to take place, including adjusting information-flow based on the Queryee's experiential
background and knowledge, and providing additional experientially-related knowledge integrated into the
answer from multiple selected 'experience donors'. A relative-experience calculation ensures that
information is retrieved only from relative-experts, to ensure sufficient additional information exists, but
that such information isn't too advanced for the Queryee to process. This paper gives an overview of the
CEE, and the underlying algorithms and data structures, and describes a system architecture and
implementation details.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
A comparative study of machine learning algorithms for virtual learning envir...IAESIJAI
Virtual learning environment is becoming an increasingly popular study option for students from diverse cultural and socioeconomic backgrounds around the world. Although this learning environment is quite adaptable, improving student performance is difficult due to the online-only learning method. Therefore, it is essential to investigate students' participation and performance in virtual learning in order to improve their performance. Using a publicly available Open University learning analytics dataset, this study examines a variety of machine learning-based prediction algorithms to determine the best method for predicting students' academic success, hence providing additional alternatives for enhancing their academic achievement. Support vector machine, random forest, Nave Bayes, logical regression, and decision trees are employed for the purpose of prediction using machine learning methods. It is noticed that the random forest and logistic regression approach predict student performance with the highest average accuracy values compared to the alternatives. In a number of instances, the support vector machine has been seen to outperform the other methods.
Accurate prediction and early identification of student at-risk of attrition are of high concern for higher educational institutions (HEIs). It is of a great importance not only to the students but also to the educational administrators and the institutions in the areas of improving academic quality and efficient utilisation of the available resources for effective intervention. However, despite the different frameworks and various models that researchers have used across institutions for predicting performance, only negligible success has been recorded in terms of accuracy, efficiency and reduction of student attrition. This has been attributed to the inadequate and selective use of variables for the predictive models. This paper presents a multi-dimensional and an integrated system framework that involves considerable learners’ input and engagement in predicting their academic performance and intervention in HEIs. The purpose and functionality of the framework are to produce a comprehensive, unbiased and efficient way of predicting student performance that its implementation is based upon multi-sources data and database system. It makes use of student demographic and learning management system (LMS) data from the institutional databases as well as the student psychosocial-personality (SPP) data from the survey collected from the student to predict performance. The proposed approach will be robust, generalizable, and possibly give a prediction at a higher level of accuracy that educational administrators can rely on for providing timely intervention to students. --
Accurate prediction and early identification of student at-risk of attrition are of high concern for higher
educational institutions (HEIs). It is of a great importance not only to the students but also to the
educational administrators and the institutions in the areas of improving academic quality and
efficient utilisation of the available resources for effective intervention. However, despite the different
frameworks and various models that researchers have used across institutions for predicting performance,
only negligible success has been recorded in terms of accuracy, efficiency and reduction of student
attrition. This has been attributed to the inadequate and selective use of variables for the predictive models.
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...ijcsit
Accurate prediction and early identification of student at-risk of attrition are of high concern for higher educational institutions (HEIs). It is of a great importance not only to the students but also to the educational administrators and the institutions in the areas of improving academic quality and efficient utilisation of the available resources for effective intervention. However, despite the different frameworks and various models that researchers have used across institutions for predicting performance, only negligible success has been recorded in terms of accuracy, efficiency and reduction of student
attrition. This has been attributed to the inadequate and selective use of variables for the predictive models. This paper presents a multi-dimensional and an integrated system framework that involves considerable learners’ input and engagement in predicting their academic performance and intervention in HEIs. The purpose and functionality of the framework are to produce a comprehensive, unbiased and efficient way of predicting student performance that its implementation is based upon multi-sources data and database
system. It makes use of student demographic and learning management system (LMS) data from the institutional databases as well as the student psychosocial-personality (SPP) data from the survey collected from the student to predict performance. The proposed approach will be robust, generalizable, and possibly give a prediction at a higher level of accuracy that educational administrators can rely on for providing timely intervention to students.
UNIVERSITY ADMISSION SYSTEMS USING DATA MINING TECHNIQUES TO PREDICT STUDENT ...IRJET Journal
This document summarizes a research study that aimed to predict student performance and support decision making for university admission systems using data mining techniques. The study analyzed data from 2,039 students at a university in Saudi Arabia to compare the predictive power of different data mining classification models (ANN, decision trees, SVM, naive Bayes). It found that a student's score on the pre-admission Scholastic Proficiency Admission Test was the best predictor of their first year GPA. Based on this, the university adjusted its admission criteria to give greater weight to this pre-admission test score. After making this change, the number of students with high GPAs increased while the number with low GPAs decreased.
PREDICTING SUCCESS: AN APPLICATION OF DATA MINING TECHNIQUES TO STUDENT OUTCOMESIJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
PREDICTING SUCCESS: AN APPLICATION OF DATA MINING TECHNIQUES TO STUDENT OUTCOMESIJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
Predictive and Statistical Analyses for Academic Advisory Supportijcsit
The ability to recognize students’ weakness and solving any problem may confront them in timely fashion is
always a target of all educational institutions. This study was designed to explore how can predictive and
statistical analysis support the academic advisor’s work mainly in analysis students’ progress. The sample
consisted of a total of 249 undergraduate students; 46% of them were Female and 54% Male. A one-way
analysis of variance (ANOVA) and t-test were conducted to analysis if there was different behaviour in
registering courses. Predictive data mining is used for support advisor in decision making. Several
classification techniques with 10-fold Cross-validation were applied. Among of them, C4.5 constitutes the
best agreement among the finding results.
Data mining approach to predict academic performance of studentsBOHRInternationalJou1
Powerful data mining techniques are available in a variety of educational fields. Educational research is
advancing rapidly due to the vast amount of student data that can be used to create insightful patterns
related to student learning. Educational data mining is a tool that helps universities assess and identify student
performance. Well-known classification techniques have been widely used to determine student success in
data mining. A decisive and growing exploration area in educational data mining (EDM) is predicting student
academic performance. This area uses data mining and automaton learning approaches to extract data from
education repositories. According to relevant research, there are several academic performance prediction
methods aimed at improving administrative and teaching staff in academic institutions. In the put-forwarded
approach, the collected data set is preprocessed to ensure data quality and labeled student education data
is used to apply ANN classifiers, support vector classifiers, random forests, and DT Compute and train a
classifier. The achievement of the four classifications is measured by accuracy value, receiver operating curve
(ROC), F1 score, and confusion matrix scored by each model. Finally, we found that the top three algorithmic
models had an accuracy of 86–95%, an F1 score of 85–95%, and an average area under ROC curve of
OVA of 98–99.6%
IRJET- A Conceptual Framework to Predict Academic Performance of Students usi...IRJET Journal
This document presents a conceptual framework for predicting student academic performance using classification algorithms. The framework uses factors like socioeconomic status, psychological attributes, cognitive attributes, and lifestyle to analyze student performance based on their semester GPA. The document proposes classifying student performance into three classes (first class, second class, third class) based on their first semester GPA. Various classification algorithms like Naive Bayes, random forest, and bagging are evaluated on the student data to identify the best model for predicting performance. The conceptual framework is intended to guide the development of a recommendation system that can help educational institutions identify at-risk students early and improve student outcomes.
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTTye Rausch
This document discusses learning analytics, academic analytics, and educational data mining. It defines each term and differentiates their processes and purposes. Learning analytics uses predictive models and data analysis to optimize student learning experiences and identify at-risk students. Educational data mining focuses on developing new data analysis methods and algorithms to solve educational issues. Academic analytics applies business intelligence principles to improve decision-making in educational institutions. The document provides examples of universities using learning analytics to track performance, predict outcomes, and improve retention.
Due to the increasing interest in big data especially in the educational field and online education has led to a conflict in terms of performance indicators of the student. In this paper we discuss the methodology of assessing the student performance in terms of the success indicators revealing a number of indicators that is recommended to indicate success of the final academic achievement.
IRJET- Analysis of Student Performance using Machine Learning TechniquesIRJET Journal
This document discusses using machine learning techniques to analyze student performance data and predict student outcomes. It begins with an abstract describing how educational data has become important for supporting student success. It then discusses prior related work applying classification algorithms like decision trees to predict student grades or performance. The document goes on to describe applying various classification algorithms like J48 decision trees, K-nearest neighbors, and others to student data and comparing their performance at predicting outcomes. It discusses preprocessing the data with k-means clustering before classification. The goal is to identify at-risk students early to better support them.
Due to the increasing interest in big data especially in the educational field and online education has led to a conflict in terms of performance indicators of the student. In this paper we discuss the methodology of assessing the student performance in terms of the success indicators revealing a number of indicators that is recommended to indicate success of the final academic achievement
Multiple educational data mining approaches to discover patterns in universit...IJICTJOURNAL
This paper presented the utilization of pattern discovery techniques by using multiple relationships and clustering educational data mining approaches to establish a knowledge base that will aid in the prediction of ideal college program selection and enrollment forecasting for incoming freshmen. Results show a significant level of accuracy in predicting college programs for students by mining two years of student college admission and graduation final grade scholastic records. The results of educational predictive data mining methods can be applied in improving the services of the admission department of an educational institution, particularly in its course alignment, student mentoring, admission forecast, marketing, and enrollment preparedness.
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...IRJET Journal
This document discusses using machine learning classification algorithms to predict student performance based on educational data. It compares the performance of five classification algorithms - J48, Naive Bayes, Bayes Net, Backpropagation Network, and Radial Basis Function Network - in predicting student academic achievement using attributes like demographic information, test scores, and academic factors. The experiment found that the Radial Basis Function Network algorithm achieved the highest accuracy, correctly classifying 100% of instances, compared to 75-95% accuracy for the other algorithms. Convolutional neural networks are also discussed as a powerful tool for image and language processing in educational data mining.
The International Journal of Mechanical Engineering Research and Technology is an international online journal published Quarterly offers fast publication schedule whilst maintaining rigorous peer review. The use of recommended electronic formats for article delivery expedites the process All submitted research articles are subjected to the immediate rapid screening by editors consultation with Editorial Board or others working in the field of appropriate to ensure that they are likely to be the level of interest and importance of appropriate for the journal.
ISSN 2454-535X
International Journal of Mechanical Engineering Research and Technology aims to provide the best possible service to authors of original research articles, and the fairest system of peer review.
This document discusses using machine learning to predict student performance in online learning environments. It reviews studies that have examined online course data to predict student outcomes using machine learning techniques. The studies identified features of online courses used for prediction, outputs of prediction models, methodologies for feature extraction, evaluation metrics, and challenges. Machine learning algorithms commonly used in the studies include logistic regression, naive Bayes, decision trees, AdaBoost, k-nearest neighbor, and neural networks. The document provides an in-depth analysis of different machine learning models and their effectiveness in predicting student certificate acquisition, grades, and students at risk of failure.
Similar to Data mining in higher education university student dropout case study (20)
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Building Production Ready Search Pipelines with Spark and Milvus
Data mining in higher education university student dropout case study
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
DOI : 10.5121/ijdkp.2015.5102 15
DATA MINING IN HIGHER EDUCATION :
UNIVERSITY STUDENT DROPOUT CASE STUDY
Ghadeer S. Abu-Oda and Alaa M. El-Halees
Faculty of Information Technology, Islamic University of Gaza – Palestine
ABSTRACT
In this paper, we apply different data mining approaches for the purpose of examining and predicting
students’ dropouts through their university programs. For the subject of the study we select a total of 1290
records of computer science students Graduated from ALAQSA University between 2005 and 2011. The
collected data included student study history and transcript for courses taught in the first two years of
computer science major in addition to student GPA , high school average , and class label of (yes ,No) to
indicate whether the student graduated from the chosen major or not. In order to classify and predict
dropout students, different classifiers have been trained on our data sets including Decision Tree (DT),
Naive Bayes (NB). These methods were tested using 10-fold cross validation. The accuracy of DT, and NlB
classifiers were 98.14% and 96.86% respectively. The study also includes discovering hidden relationships
between student dropout status and enrolment persistence by mining a frequent cases using FP-growth
algorithm.
1. INTRODUCTION
Nowadays, higher learning institutions encounter many problems, which keep them away from
achieving their quality objectives. Most of these problems caused from knowledge gap.
Knowledge gap is the lack of significant knowledge at the educational main processes such as
advising, planning, registration, evaluation and marketing. [1] For example, many learning
institutions do not have access to the necessary information to advise students. Therefore, they
are not able to give suitable recommendation for them.
Data mining is a powerful technology that can be best defined as the automated process of
extracting useful knowledge and information including, patterns, associations [2]. The knowledge
discovered by data mining techniques would enable the higher learning institutions in divers
ways not limited to making better decisions, having more advanced planning in directing
students, predicting individual behaviors with higher accuracy, and enabling the institution to
allocate resources and staff more effectively. It results in improving the effectiveness and
efficiency of the processes [3] [4] .
One of the biggest challenges that higher education faces today is predicting the academic paths
of student. Many higher education systems are unable detecting student population who are likely
to drop out because of lack of intelligence method to use information and guidance from the
university system. Although, various types of studies exist to analyze student persistence patterns,
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
16
it remains a challenging task to accurately predict a currently enrolled student’s likelihood of
returning to university the next term or change his major [5] [6].
Developing learning and teaching initiatives to improve retention and progression in education
process is an important academic concern, which mainly depends on monitoring student
performance and exploiting student feedback. Student marks and achievement are the main
sources to study student feedback and progress, yet university and educational centers can use to
predict the student performance, student dropout, and study path.
The main objective of this study is to identify those students who are less likely to return from
one semester to other. Having these students identified soon enough for the institution to
accommodate its interventions and marketing strategies will greatly enhance the student
persistence rate in specific majors. This paper focuses on computer science major of students
Graduated from ALAQSA University. In the rest of the paper, we cited related studies that
employ data mining approaches in educational domain in section 2. Section 3 expresses in more
detail the tasks of preparation and preprocessing of data set for the further analysis. Classification
models and their accuracy are written on section 4, where the mined association rules and their
analysis are in section 5.
2. RELATED WORK
Luan in [7] aims to predict student’s persistency by considering the students who either re-enroll
or does not re-enroll the following semester. Using the prediction techniques, the likelihood of a
student’s persistency can be determined. The identification of these students is very useful to
universities because if they know those who are less likely to persist, then the university can
attempt to increase the persistency rate of these students. By knowing students who are not
persistence, the faculty can identify the factors affecting their non-persistency. Therefore, the
university’s managerial systems have to attempt on improving these factors, which would result
in improving the student’s persistency rate.
In [8] applied the classification task as one data mining technique to evaluate student’
performance, they used decision tree method for classification. They depend on extracting actual
student performance indicators by the end of semester examination using the student’ previous
database including Attendance, Class test, Seminar, and Assignment marks. This study helps
earlier in identifying the dropouts and students who need special attention and allow the teacher
to provide appropriate advising.
In [9] have employed decision tree to predict to help have the tutor to identify the weak students
and improve their Performance before being dropouts. The model aims to classify the students
into PASS and FAIL target even before the students take their exams.
Applying modern machine learning methods and cost effective analysis is the main contribute of
author in [10], he predicts dropout of new students with satisfying accuracy and thus become a
useful tool in an attempt to reduce dropouts. He collects the data set is from the module
‘Introduction in Informatics’ from the distance education system as an application on eLearning
systems. With the support of university freshmen, authors conducted a study to early predict
student failure that may lead to drop-out [11] , They used decision tree, Bayesian classifiers,
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
17
logistic models, the rule-based learner and random forest to detect/predict first year student drop
out .
Authors in [12] go a further step for acquiring data-set in educational domain, they depend on
social networks to acquire relevant information that can be used in educational data mining tasks
to improve the accuracy of classification models in comparison with usage of only demographic
and academic attributes, e.g., students’ age, gender, or number of finished semesters. They
measure the prediction of dropouts: degree, indegree, outdegree, and betweenness. They conclude
that these improve the accuracy of classification models.
3. DATA PREPARATION AND PREPROCESSING
The data is collected from ALAQSA university database for bachelor degree students graduated
between 2005 and 2011. The attributes selected are indicated in table 3.1. The courses selected
for the study are those courses presented in computer science major taken in the first two years of
study according to major academic plan, as a suggestion that the student may likely be dropped
from the major in the first two years of study. The number of records obtained is about (1290)
record representing graduate students and their transcripts. For the first view of data, figure 3.1
shows the large number of students dropped from their original chosen major which is in this
case computer science department.
Figure 3-1 Number of dropouts
3.1 Data Integration
The goal of integration is to retrieve the records that represents the graduated students history
with their associated enrollment data, graduated GPA, major, and their marks in computer
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
18
science subjects (subject is interchangeably means course) conforming to the computer science
department plan.
According to the relational database schema in the ALAQSA university system, the student
enrollment information is held in different relation to the one that holds courses transcripts for
each university semester. Figure 2 shows a set of retrieved records by a simple join of the two
relations; each student has a record for each course he had registered. However, this kind of
combining is not suitable for our processing for instance, (i) there are many repeated records for
each student who studied more than one course of the chosen major. (ii) The student mark in each
course is scattered in midterm, work term, and final term. (iii) Each record has the major from
which the student had graduated including the computer science major; which means the target
class has no rigid values but the different university academic programs.
Figure 3-2 Integration resulted records
For data preparation we do the following modifications: 1) get the sum of the three attribute
(midterm, work term, and final term) can result the student overall mark in each course. 2) We
use two values for target class (MAJOR_NO), yes value if student has graduated from the
computer science major and no, for others. Thus, we map target class (graduated major) to a
computer science major or not, which is more applicable in our case. 3) To solve the repeated
records for each student course we prepare SQL-based method to roll student courses into
separated columns. Figure 3 shows how the data is after modification. The highlighted two
records are for different student records, the courses taken is depicted as columns, and the student
mark in the associated course is the column value otherwise the mark will be NULL in case the
student did not join that course.
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
19
Figure 3-3 The resulted modified records
The final attributes selected for each student record and used for further analysis and data-mining
tasks is declared in the following table 6.attribute role, type, and description is defined.
Table 3.1 data-set Attributes and description
3.2 Data preprocessing:
We did the following preprocessing steps:
3.2.1 Replace missing values
As seen in figure (5), it is a student record in the data set collected, notice there are empty values
for some columns. It is actually not error in the data, that is because the students did not enroll all
the courses. Thus, his mark simply will be nothing.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
20
Figure 3-4 Missing Value Example
The missed values replaced by zero to indicate the student did not register the course at all (it
does not replaced by the academic zero value, since it may be considered as being failed in the
course)
Figure 3-5 Missing values is replaced
3.2.2 Over Sampling:
According to the pie chart (see figure (3-7)), there is a large gap between the number of students
graduated from computer science major to those graduated from the university in other majors
(ratio 0.33, 0.66). I use over sampling technique to balance the number of records in the target
class values. The sampling task in this case is done by feeding the data with records belong to the
least occurred target class value. The added instances will be the existence records but trying to
repeat it many time until balance the peer value of the label.
Figure 3-6 Ratio of 'Yes' to 'No' Labels
The resulted records after oversampling are increased to (1895) record because of inserted
records with (964) for no label and (931) for yes label (Figure 3-8,3- 9).
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
21
Figure 3-7 Oversampling output
Figure 3-8 Over sampling resulted ratio
3.2.3 Discretize attributes values into groups:
Before applying any classification algorithm, the data has specific preprocessing step; it is
discretize which is the task of converting the absolute column values into bin or ranges. For
example, we need to have the columns of subject’s marks, GPA, and average attributes using the
ranges in figure (3-10) below. The resulted records will show as that in figure (3-11), notice
integer values replaced by F, A, C. . . Categories.
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
22
Figure 3-9 Bins ranges
Figure 3-10 Discreteize process resulted records
4. CLASSIFICATION
Classifying whether the student is dropped or graduated from the major is the core goal of this
study; classification task is depicted to do the rules or paths that would lead to each case. At this
phase, we train different models for predicting student graduation major. At all classification
models the data is split into training of 0.6 (860) of data to 0.3 (430) of data for testing. We use
10-fold cross validation to test the accuracy of each model.
4.1 Decision tree
Decision Trees (DTs) are a supervised learning method used for classification and regression
[13]. The goal is to create a model that predicts the value of a target variable by learning simple
decision rules inferred from the data features. We apply this method on our data set of student
records to map observations about each student instance to conclusions about his target major of
graduation.
Figure 4-1 depicts the decision tree that resulted from applying the decision-tree classification
algorithm on the graduation major as a target class for each student record. As it is seen from the
figure, the attribute of course “COMP2315” has a great influence on whether the student will
drop out of computer science department or not. In other words, if the student got grade F on
algorithm analysis course, the model concludes that he/she would dropped from the major.
The model presented in figure 4-1 has an accuracy of 98.14% as shown in figure 4-2. To interpret
the rules in the decision tree, Table 4-1 contains more observed cases based on training set.
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
23
Figure 4-1 Decision Tree classifier output
Table 4-1 Decision Tree observation cases
Figure 4-2 Decision-tree model accuracy
4.2 Naïve base algorithm
A Bayes classifier [14] is a simple probabilistic classifier depends on applying Bayes' theorem
with strong independence assumptions. In simple terms, a naive Bayes classifier assumes that the
presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of
any other feature.
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
24
By applying the algorithm in classifying around 1800 record in the built model, the chart in figure
(4-3) indicates the classification for student college in case he register “digital design” course: if
he got grade A , he would classified to be graduated from computer science department. The
degree is reducing until it reached the maximum likelihood of being dropped in case he did fail
the course and the classifier behavior will change. The accuracy is about 96% (see figure 14).
Figure 4-3 classifying record based on (COMP2316) course Grade
Figure 4-4 Naïve base algorithm model accuracy
From this task, it can be concluded that the prediction technique is useful in predicting the
likelihood of a student persistency as early as possible. By identifying the students who are less
likely to persist, the university interacts by providing them the academic assist and as a result,
student retention rate will be increased. This has a positive impact on improving the existence
rate through enhancing the student’s assessment process in a higher learning institution.
Table 4-2 Naïve Bayes vs Decision-Tree accuracy
11. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
25
5. ASSOCIATION RULES
In Data Mining, the task of finding frequent pattern in large databases is very important and has
been studied in large scale in the past few years. It is useful for discovering interesting
relationships hidden in large data sets. The association analysis reveals a set of strong relations
between study cases that expressed into a set of association rules. An Association rules are if/then
statements that help uncover relationships between seemingly unrelated data in a relational
database or other information repository.
In educational domain, such a rule may indeed influence the university strategy and practices in
aim to increase student performance, reduce the number of dropped students, or even review
courses syllables early.
5.1 Frequent-pattern growth (FP-growth):
The FP-Growth Algorithm, proposed by Han in [15] is an efficient and scalable method for
mining the complete set of frequent patterns by encode the data set using a compact data structure
into frequent-pattern tree called FP-tree and extract frequent item sets directly from this structure
In our study, we apply FP_growth algorithm for a set of (1800) record samples in the hope of
discovering un-known relationships between student study path and persistent enrollment in
“computer science” major and not being dropout. Figure 15, and 16 show asset of rules obtained
for label ‘no’ and ‘yes’ respectively.
Figure 5-1 Rules for label 'NO'
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
26
Figure 5-2 Rules for label 'Yes'
Table 5-1 Example of Association Rules
For table 5-1, we explore two examples of association rules inferred after applying the FP-growth
algorithm. Rule 1 argues the following: if a student is a boy and got grade F (less than 50%) in
“algorithm analysis” course, he is likely to be dropped from computer science department and
subsequently fail in “programming language” course. That is true by 0.95 for boys who fail in
this course. In addition, it is 0.994 for all accrued cases.
However, Rule 2 indicates the following: If the case in the hand is for a student who is resident in
the south areas of Gaza Strip, and get grade A (more than 80%) in both data structure I and II
courses, and “algorithm analysis” course, then the student likely continue in computer science
department and will not be dropped. It will be happens for all students has the same conditions by
0.95 and this is true by 0.994 according to accrued cases.
6. CONCLUSION
In this paper, we study the student dropout in computer science major in ALAQSA University for
the purpose of improving the current teaching procedures and education strategies.
Technologically, we do not propose new methods like FB-growth or decision-tree. However, we
only use the mature classification and approaches in this study, cannot present more competitive
algorithms or improve the existing algorithms. The study finds that mastering “digital design”
and “algorithm analysis” courses has a great affect on predicting student persistence in the major
and decrease student likelihood of dropout.
13. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015
27
REFERENCES
[1] P. Baepler and C. J. Murdoch, "Academic Analytics and Data Mining in Higher Education,"
International Journal for the Scholarship of Teaching and Learning, vol. 4, no. 2, pp. 1-9, 2010.
[2] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, "Advances in knowledge
discovery and data mining," 1996.
[3] R. S. J. D. Baker and K. Yacef, "The State of Educational Data Mining in 2009 : A Review and
Future Visions," Journal of Educational Data Mining, vol. 1, no. 1, pp. 3-16, 2009.
[4] A. AL-Malaise, A. Malibari and M. Alkhozae, "STUDENTS’ PERFORMANCE PREDICTION
SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE," International Journal of Data
Mining & Knowledge Management Process (IJDKP) , vol. 4, 2014.
[5] C. Romero and S. Ventura, "Educational Data Mining: A Review of the State of the Art," IEEE
Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 40, no. 6,
2010.
[6] B. R.B., T. S.S and S. A.K, "Importance of Data Mining in Higher Education System," Journal Of
Humanities And Social Science (IOSR-JHSS), vol. 6, no. 6, pp. 18-21, 2013.
[7] J. Luan, "Data Mining and Knowledge Management in Higher Education -Potential Applications.," in
Processdings ofAIR Forum, Torento , Canada, 2002.
[8] B. Baradwaj and S. Pal, "Mining educational data to analyze student's performance," Internation
Journal od Advamced Computer Science and Applications, vol. 2, no. 6, pp. 63-69, 2012.
[9] A. Kumar and Vijayalakshmi, "Implication Of Classification Techniques In Predicting Student’s
Recital," International Journal of Data Mining & Knowledge Management Process, vol. 1, no. 5, pp.
41-51, 2011.
[10] S. Kotsiantis, "Educational data mining: a case study for predicting dropout-prone students,"
International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 1, no. 2, p. 101, 2009.
[11] D. G. W, P. Mykola and V. J. M, "Predicting Students Drop Out: A Case Study," International
Working Group on Educational Data Mining, 2009.
[12] B. Jaroslav, H. Bydzovská, J. Géryk, T. Obsivac and L. Popelinsky, "Predicting Drop-Out from
Social Behaviour of Students," International Educational Data Mining Society, 2012.
[13] L. Rokach, Data mining with decision trees: theory and applications, vol. 69, World scientific, 2008.
[14] S. J. Russell and P. Norvig., Artificial Intelligence: A Modern Approach (AIMA), 3rd ed., Prentice
Hall, 2009.
[15] J. a. P. Han and Y. Jian and Yin, "Mining frequent patterns without candidate generation," in ACM
SIGMOD Record, 2000.