Education data mining is an emerging stream which helps in mining academic data for solving various
types of problems. One of the problems is the selection of a proper academic track. The admission of a
student in engineering college depends on many factors. In this paper we have tried to implement a
classification technique to assist students in predicting their success in admission in an engineering
stream.We have analyzed the data set containing information about student’s academic as well as sociodemographic variables, with attributes such as family pressure, interest, gender, XII marks and CET rank
in entrance examinations and historical data of previous batch of students. Feature selection is a process
for removing irrelevant and redundant features which will help improve the predictive accuracy of
classifiers. In this paper first we have used feature selection attribute algorithms Chi-square.InfoGain, and
GainRatio to predict the relevant features. Then we have applied fast correlation base filter on given
features. Later classification is done using NBTree, MultilayerPerceptron, NaiveBayes and Instance based
–K- nearest neighbor. Results showed reduction in computational cost and time and increase in predictive
accuracy for the student model
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approach to facial age classification which allows a classifier to select the data from which it learns. The classifier is initially trained using a small pool of labeled training images. This is achieved by using the bilateral two dimension linear discriminant analysis. Then the most informative unlabeled image is found out from the unlabeled pool using the furthest nearest neighbor criterion, labeled by the user and added to the
appropriate class in the training set. The incremental learning is performed using an incremental version of bilateral two dimension linear discriminant analysis. This active learning paradigm is proposed to be applied to the k nearest neighbor classifier and the support vector machine classifier and to compare the performance of these two classifiers.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
With the emergence of virtualization and cloud computing technologies, several services are housed on virtualization platform. Virtualization is the technology that many cloud service providers rely on for efficient management and coordination of the resource pool. As essential services are also housed on cloud platform, it is necessary to ensure continuous availability by implementing all necessary measures. Windows Active Directory is one such service that Microsoft developed for Windows domain networks. It is included in Windows Server operating systems as a set of processes and services for authentication and authorization of users and computers in a Windows domain type network. The service is required to run continuously without downtime. As a result, there are chances of accumulation of errors or garbage leading to software aging which in turn may lead to system failure and associated consequences. This results in software aging. In this work, software aging patterns of Windows active directory service is studied. Software aging of active directory needs to be predicted properly so that rejuvenation can be triggered to ensure continuous service delivery. In order to predict the accurate time, a model that uses time series forecasting technique is built.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...ijcsa
Active learning is a supervised learning method that is based on the idea that a machine learning algorithm can achieve greater accuracy with fewer labelled training images if it is allowed to choose the image from which it learns. Facial age classification is a technique to classify face images into one of the several predefined age groups. The proposed study applies an active learning approach to facial age classification which allows a classifier to select the data from which it learns. The classifier is initially trained using a small pool of labeled training images. This is achieved by using the bilateral two dimension linear discriminant analysis. Then the most informative unlabeled image is found out from the unlabeled pool using the furthest nearest neighbor criterion, labeled by the user and added to the
appropriate class in the training set. The incremental learning is performed using an incremental version of bilateral two dimension linear discriminant analysis. This active learning paradigm is proposed to be applied to the k nearest neighbor classifier and the support vector machine classifier and to compare the performance of these two classifiers.
Clustering Students of Computer in Terms of Level of ProgrammingEditor IJCATR
Educational data mining (EDM) is one of the applications of data mining. In educational data mining, there are two key domains, i.e. student domain and faculty domain. Different type of research work has been done in both domains.
In existing system the faculty performance has calculated on the basis of two parameters i.e. Student feedback and the result of student in that subject. In existing system we define two approaches one is multiple classifier approach and the other is a single classifier approach and comparing them, for relative evaluation of faculty performance using data mining
Techniques. In multiple classifier approach K-nearest neighbor (KNN) is used in first step and Rule based classification is used in the second step of classification while in single classifier approach only KNN is used in both steps of classification.
But in proposed system, I will analyse the faculty performance using 4 parameters i.e., student complaint about faculty, Student review feedback for faculty, students feedback, and students result etc.
For this proposed system I will be going to use opinion mining technique for analyzing performance of faculty and calculating score of each faculty.
With the emergence of virtualization and cloud computing technologies, several services are housed on virtualization platform. Virtualization is the technology that many cloud service providers rely on for efficient management and coordination of the resource pool. As essential services are also housed on cloud platform, it is necessary to ensure continuous availability by implementing all necessary measures. Windows Active Directory is one such service that Microsoft developed for Windows domain networks. It is included in Windows Server operating systems as a set of processes and services for authentication and authorization of users and computers in a Windows domain type network. The service is required to run continuously without downtime. As a result, there are chances of accumulation of errors or garbage leading to software aging which in turn may lead to system failure and associated consequences. This results in software aging. In this work, software aging patterns of Windows active directory service is studied. Software aging of active directory needs to be predicted properly so that rejuvenation can be triggered to ensure continuous service delivery. In order to predict the accurate time, a model that uses time series forecasting technique is built.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
Fuzzy Association Rule Mining based Model to Predict Students’ Performance IJECEIAES
The major intention of higher education institutions is to supply quality education to its students. One approach to get maximum level of quality in higher education system is by discovering knowledge for prediction regarding the internal assessment and end semester examination. The projected work intends to approach this objective by taking the advantage of fuzzy inference technique to classify student scores data according to the level of their performance. In this paper, student’s performance is evaluated using fuzzy association rule mining that describes Prediction of performance of the students at the end of the semester, on the basis of previous database like Attendance, Midsem Marks, Previous semester marks and Previous Academic Records were collected from the student’s previous database, to identify those students which needed individual attention to decrease fail ration and taking suitable action for the next semester examination.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and science were studied and compared. The purpose of this research is to predict the academic major of high school students using Bayesian networks. The effective factors have been used in academic major selection for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on each other, discretization data and processing them was performed by GeNIe. The proper course would be advised for students to continue their education.
Data Mining Application in Advertisement Management of Higher Educational Ins...ijcax
In recent years, Indian higher educational institute’s competition grows rapidly for attracting students to get enrollment in their institutes. To attract students educational institutes select a best advertisement method. There are different advertisements available in the market but a selection of them is very difficult
for institutes. This paper is helpful for institutes to select a best advertisement medium using some data mining methods.
Predictive models are quasi experimental structures used to determine the future
patterns in data. These meaningful data patterns form the building block of any
decision support system. Researchers all over the world have built many prediction
models for major industries. Research works in the educational sector has increased
steeply. This steep increase may be due to the high availability of data in the
educational domain. This survey tries to comprehend a few literary works on
academic performance prediction of engineering students with the focus on grade
predictions. Meaningful interpretations have been made and inferences are presented
at the end of this paper
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...indexPub
Student academic performance is the great value of institutes, universities and colleges. All colleges majorly focus on the career development of students. The academic performance of students plays a vital role in the establishment of a bright career. On the basis of better academic performance, the placement of the students will be better and the same will be reflected in the form of better admission and future. Machine learning can be deployed for the prediction of student performance. Various algorithms are playing an important role in the prediction of the accuracy of various machine learning models. These articles discuss various algorithms that can be helpful to deploy for predicting student academic performance. The article discusses various methods, predictive features and the accuracy of machine learning algorithms. The primary factors used for predicting students performance are academic institution, sessional marks, semester progress, family occupation, methods and algorithms. The accuracy level of various machine learning algorithms is discussed in this article.
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational institutions. Classification techniques assumes significant importance in the machine learning tasks and are mostly employed in the prediction related problems. In machine learning problems, feature selection techniques are used to reduce the attributes of the class variables by removing the redundant and irrelevant features from the dataset. The aim of this research work is to compares the performance of various feature selection techniques is done using WEKA tool in the prediction of students’ performance in the final semester examination using different classification algorithms. Particularly J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in this research work. The dataset for the study were collected from the student’s performance report of a private college in Tamil Nadu state of India. The effectiveness of various feature selection algorithms was compared with six classifiers and the results are discussed. The results of this study shows that the accuracy of IBK is 99.680% which is found to be
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...cscpconf
According to the increase of using data mining techniques in improving educational systems
operations, Educational Data Mining has been introduced as a new and fast growing research
area. Educational Data Mining aims to analyze data in educational environments in order to
solve educational research problems. In this paper a new associative classification technique
has been proposed to predict students final performance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along
with high accuracy. In this research work, we have employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract association rule for student performance prediction
as a multi-objective classification problem. Results indicate that the proposed swarm based
algorithm outperforms well-known classification techniques on student performance prediction
classification problem.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
Fuzzy Association Rule Mining based Model to Predict Students’ Performance IJECEIAES
The major intention of higher education institutions is to supply quality education to its students. One approach to get maximum level of quality in higher education system is by discovering knowledge for prediction regarding the internal assessment and end semester examination. The projected work intends to approach this objective by taking the advantage of fuzzy inference technique to classify student scores data according to the level of their performance. In this paper, student’s performance is evaluated using fuzzy association rule mining that describes Prediction of performance of the students at the end of the semester, on the basis of previous database like Attendance, Midsem Marks, Previous semester marks and Previous Academic Records were collected from the student’s previous database, to identify those students which needed individual attention to decrease fail ration and taking suitable action for the next semester examination.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
A Study on Learning Factor Analysis – An Educational Data Mining Technique fo...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and science were studied and compared. The purpose of this research is to predict the academic major of high school students using Bayesian networks. The effective factors have been used in academic major selection for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on each other, discretization data and processing them was performed by GeNIe. The proper course would be advised for students to continue their education.
Data Mining Application in Advertisement Management of Higher Educational Ins...ijcax
In recent years, Indian higher educational institute’s competition grows rapidly for attracting students to get enrollment in their institutes. To attract students educational institutes select a best advertisement method. There are different advertisements available in the market but a selection of them is very difficult
for institutes. This paper is helpful for institutes to select a best advertisement medium using some data mining methods.
Predictive models are quasi experimental structures used to determine the future
patterns in data. These meaningful data patterns form the building block of any
decision support system. Researchers all over the world have built many prediction
models for major industries. Research works in the educational sector has increased
steeply. This steep increase may be due to the high availability of data in the
educational domain. This survey tries to comprehend a few literary works on
academic performance prediction of engineering students with the focus on grade
predictions. Meaningful interpretations have been made and inferences are presented
at the end of this paper
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
DATA MINING METHODOLOGIES TO STUDY STUDENT'S ACADEMIC PERFORMANCE USING THE...ijcsa
The study placed a particular emphasis on the so ca
lled data mining algorithms, but focuses the bulk o
f
attention on the C4.5 algorithm. Each educational i
nstitution, in general, aims to present a high qual
ity of
education. This depends upon predicting the student
s with poor results prior they entering in to final
examination. Data mining techniques give many tasks
that could be used to investigate the students'
performance. The main objective of this paper is to
build a classification model that can be used to i
mprove
the students' academic records in Faculty of Mathem
atical Science and Statistics. This model has been
done using the C4.5 algorithm as it is a well-known
, commonly used data mining technique. The
importance of this study is that predicting student
performance is useful in many different settings.
Data
from the previous students' academic records in the
faculty have been used to illustrate the considere
d
algorithm in order to build our classification mode
l.
ANALYSIS OF STUDENT ACADEMIC PERFORMANCE USING MACHINE LEARNING ALGORITHMS:– ...indexPub
Student academic performance is the great value of institutes, universities and colleges. All colleges majorly focus on the career development of students. The academic performance of students plays a vital role in the establishment of a bright career. On the basis of better academic performance, the placement of the students will be better and the same will be reflected in the form of better admission and future. Machine learning can be deployed for the prediction of student performance. Various algorithms are playing an important role in the prediction of the accuracy of various machine learning models. These articles discuss various algorithms that can be helpful to deploy for predicting student academic performance. The article discusses various methods, predictive features and the accuracy of machine learning algorithms. The primary factors used for predicting students performance are academic institution, sessional marks, semester progress, family occupation, methods and algorithms. The accuracy level of various machine learning algorithms is discussed in this article.
Performance Evaluation of Feature Selection Algorithms in Educational Data Mi...IIRindia
Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational institutions. Classification techniques assumes significant importance in the machine learning tasks and are mostly employed in the prediction related problems. In machine learning problems, feature selection techniques are used to reduce the attributes of the class variables by removing the redundant and irrelevant features from the dataset. The aim of this research work is to compares the performance of various feature selection techniques is done using WEKA tool in the prediction of students’ performance in the final semester examination using different classification algorithms. Particularly J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in this research work. The dataset for the study were collected from the student’s performance report of a private college in Tamil Nadu state of India. The effectiveness of various feature selection algorithms was compared with six classifiers and the results are discussed. The results of this study shows that the accuracy of IBK is 99.680% which is found to be
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...cscpconf
According to the increase of using data mining techniques in improving educational systems
operations, Educational Data Mining has been introduced as a new and fast growing research
area. Educational Data Mining aims to analyze data in educational environments in order to
solve educational research problems. In this paper a new associative classification technique
has been proposed to predict students final performance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along
with high accuracy. In this research work, we have employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract association rule for student performance prediction
as a multi-objective classification problem. Results indicate that the proposed swarm based
algorithm outperforms well-known classification techniques on student performance prediction
classification problem.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
The increasing need for data driven decision making recently has resulted in the application of data mining in various fields including the educational sector which is referred to as educational data mining. The need for improving the performance of data mining models has also been identified as a gap for future researcher. In Nigeria, higher educational institutions collect various students’ data, but these data are rarely used in any decision or policy making to improve the academic performance of students. This research work, attempts to improve the performance of data mining models for predicting students’ academic performance using stacking classifiers ensemble and synthetic minority over-sampling techniques. The research was conducted by adopting and evaluating the performance of J48, IBK and SMO classifiers. The individual classifiers models, standard stacking classifier ensemble model and stacking classifiers ensemble model were trained and tested on 206 students’ data set from the faculty of science federal university Dutse. Students’ specific previous academic performance records at Unified Tertiary Matriculation Examination, Senior Secondary Certificate Examination and first year Cumulative Grade Point Average of students are used as data inputs in WEKA 3.9.1 data mining tool to predict students’ graduation classes of degrees at undergraduate level. The result shows that application of synthetic minority over-sampling technique for class balancing improves all the various models performance with the proposed modified stacking classifiers ensemble model outperforming the various classifiers models in both performance accuracy and RSME values making it the best model.
With the growth of voluminous amount of data in educational institutes’, the need is to mine the large dataset to produce some useful information out of it. In this research we focused on to form a decision support system for the educational institutes’ which can help them to know about the placement possibility of students. Our research is not limited to find out placement possibility but we did multi-level analysis on student performance dataset which will predict that what level of interview process a student is likely to pass. For this we have applied Naïve Bayes and Improved Naïve Bayes which is integrated with relief feature selection technique to obtain the prediction. Data analysis was done using NetBeans and WEKA. For this our proposed technique gave better accuracy than existing naïve Bayes which was 84.7% and naïve Bayes gave 80.96% accuracy.
Data mining approach to predict academic performance of studentsBOHRInternationalJou1
Powerful data mining techniques are available in a variety of educational fields. Educational research is
advancing rapidly due to the vast amount of student data that can be used to create insightful patterns
related to student learning. Educational data mining is a tool that helps universities assess and identify student
performance. Well-known classification techniques have been widely used to determine student success in
data mining. A decisive and growing exploration area in educational data mining (EDM) is predicting student
academic performance. This area uses data mining and automaton learning approaches to extract data from
education repositories. According to relevant research, there are several academic performance prediction
methods aimed at improving administrative and teaching staff in academic institutions. In the put-forwarded
approach, the collected data set is preprocessed to ensure data quality and labeled student education data
is used to apply ANN classifiers, support vector classifiers, random forests, and DT Compute and train a
classifier. The achievement of the four classifications is measured by accuracy value, receiver operating curve
(ROC), F1 score, and confusion matrix scored by each model. Finally, we found that the top three algorithmic
models had an accuracy of 86–95%, an F1 score of 85–95%, and an average area under ROC curve of
OVA of 98–99.6%
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
Data mining is the crucial steps to find out previously unknown information from large relational database. various technique and algorithm are their used in data mining such as association rules, clustering and classification and prediction techniques. Ease of the techniques contains particular characteristics and behaviour. In this paper the prime focus on clustering technique and prediction technique. Now a days large amount of data stored in educational database increasing rapidly. The database for particular set of student was collected. The clustering and prediction is made on some detailed manner and the results were produce. The K-means clustering algorithm is used here. To find nearest possible a cluster a similar group the turning point India is the performance in higher education for all students. This academic performance is influenced by various factor, therefore to identify the difference between high learners and slow learner students it is important for student performance to develop predictive data mining model.
The journal publishes original works with practical significance and academic value. Authors are invited to submit theoretical or empirical papers in all aspects of management, including strategy, human resources, marketing, operations, technology, information systems, finance and accounting, business economics, and public sector management.
The Architecture of System for Predicting Student Performance based on the Da...Thada Jantakoon
The goals of this study are to develop the architecture of a system for predicting student performance based on data science approaches (SPPS-DSA Architecture) and evaluate the SPPS-DSA Architecture. The research process is divided into two stages: (1) context analysis and (2) development and assessment. The data is analyzed by means of standardized deviations statistically. The research findings suggested that the SPPS-DSA architecture, according to the research findings, consists of three key components: (i) data source, (ii) machine learning methods and attributes, and (iii) data science process. The SPPS-DSA architecture is rated as the highest appropriate overall. Predicting student performance helps educators and students improve their teaching and learning processes. Predicting student performance using various analytical methods is reviewed here. Most researchers used CGPA and internal assessment as data sets. In terms of prediction methods, classification is widely used in educational data science. Researchers most commonly used neural networks and decision trees to predict student performance under classification techniques.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
Vehicle Ad Hoc Networks (VANETs) have become a viable technology to improve traffic flow and safety on the roads. Due to its effectiveness and scalability, the Wingsuit Search-based Optimised Link State Routing Protocol (WS-OLSR) is frequently used for data distribution in VANETs. However, the selection of MultiPoint Relays (MPRs) plays a pivotal role in WS-OLSR's performance. This paper presents an improved MPR selection algorithm tailored to WS-OLSR, designed to enhance the overall routing efficiency and reduce overhead. The analysis found that the current OLSR protocol has problems such as redundancy of HELLO and TC message packets or failure to update routing information in time, so a WS-OLSR routing protocol based on improved-MPR selection algorithm was proposed. Firstly, factors such as node mobility and link changes are comprehensively considered to reflect network topology changes, and the broadcast cycle of node HELLO messages is controlled through topology changes. Secondly, a new MPR selection algorithm is proposed, considering link stability issues and nodes. Finally, evaluate its effectiveness in terms of packet delivery ratio, end-to-end delay, and control message overhead. Simulation results demonstrate the superior performance of our improved MR selection algorithm when compared to traditional approaches.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
May_2024 Top 10 Read Articles in Computer Networks & Communications.pdfIJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...IJCNCJournal
The efficient use of energy in wireless sensor networks is critical for extending node lifetime. The network topology is one of the factors that have a significant impact on the energy usage at the nodes and the quality of transmission (QoT) in the network. We propose a topology control algorithm for software-defined wireless sensor networks (SDWSNs) in this paper. Our method is to formulate topology control algorithm as a nonlinear programming (NP) problem with the objective to optimizing two metrics, maximum communication range, and desired degree. This NP problem is solved at the SDWSN controller by employing the genetic algorithm (GA) to determine the best topology. The simulation results show that the proposed algorithm outperforms the MaxPower algorithm in terms of average node degree and energy expansion ratio.
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...IJCNCJournal
The integration of artificial intelligence technology with a scalable Internet of Things (IoT) platform facilitates diverse smart communication services, allowing remote users to access services from anywhere at any time. The multi-server environment within IoT introduces a flexible security service model, enabling users to interact with any server through a single registration. To ensure secure and privacy preservation services for resources, an authentication scheme is essential. Zhao et al. recently introduced a user authentication scheme for the multi-server environment, utilizing passwords and smart cards, claiming resilience against well-known attacks. This paper conducts cryptanalysis on Zhao et al.'s scheme, focusing on denial of service and privacy attacks, revealing a lack of user-friendliness. Subsequently, we propose a new multi-server user authentication scheme for privacy preservation with fuzzy commitment over the IoT environment, addressing the shortcomings of Zhao et al.'s scheme. Formal security verification of the proposed scheme is conducted using the ProVerif simulation tool. Through both formal and informal security analyses, we demonstrate that the proposed scheme is resilient against various known attacks and those identified in Zhao et al.'s scheme.
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsIJCNCJournal
In -Vehicle Ad-Hoc Network (VANET), vehicles continuously transmit and receive spatiotemporal data with neighboring vehicles, thereby establishing a comprehensive 360-degree traffic awareness system. Vehicular Network safety applications facilitate the transmission of messages between vehicles that are near each other, at regular intervals, enhancing drivers' contextual understanding of the driving environment and significantly improving traffic safety. Privacy schemes in VANETs are vital to safeguard vehicles’ identities and their associated owners or drivers. Privacy schemes prevent unauthorized parties from linking the vehicle's communications to a specific real-world identity by employing techniques such as pseudonyms, randomization, or cryptographic protocols. Nevertheless, these communications frequently contain important vehicle information that malevolent groups could use to Monitor the vehicle over a long period. The acquisition of this shared data has the potential to facilitate the reconstruction of vehicle trajectories, thereby posing a potential risk to the privacy of the driver. Addressing the critical challenge of developing effective and scalable privacy-preserving protocols for communication in vehicle networks is of the highest priority. These protocols aim to reduce the transmission of confidential data while ensuring the required level of communication. This paper aims to propose an Advanced Privacy Vehicle Scheme (APV) that periodically changes pseudonyms to protect vehicle identities and improve privacy. The APV scheme utilizes a concept called the silent period, which involves changing the pseudonym of a vehicle periodically based on the tracking of neighboring vehicles. The pseudonym is a temporary identifier that vehicles use to communicate with each other in a VANET. By changing the pseudonym regularly, the APV scheme makes it difficult for unauthorized entities to link a vehicle's communications to its real-world identity. The proposed APV is compared to the SLOW, RSP, CAPS, and CPN techniques. The data indicates that the efficiency of APV is a better improvement in privacy metrics. It is evident that the AVP offers enhanced safety for vehicles during transportation in the smart city.
April 2024 - Top 10 Read Articles in Computer Networks & CommunicationsIJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionIJCNCJournal
Malware is one of the threats to security of computer networks and information systems. Since malware instances are available sufficiently, there is increased interest among researchers on usage of Artificial Intelligence (AI). Of late AI-enabled methods such as machine learning (ML) and deep learning paved way for solving many real-world problems. As it is a learning-based approach, accumulated training samples help in improving thequality of training and thus leveraging malware detection accuracy. Existing deep learning methods are focusing on learning-based malware detection systems. However, there is need for improving the state of the art through ensemble approach. Towards this end, in this paper we proposed a framework known as Deep Ensemble Framework (DEF) for automatic malware detection. The framework obtains features from training samples. From given malware instance a grayscale image is generated. There is another process to extract the opcode sequences. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) techniques are used to obtain grayscale image and opcode sequence respectively. Afterwards, a stacking ensemble is employed in order to achieve efficient malware detection and classification. Malware samples collected fromthe Internet sources and Microsoft are used for theempirical study. An algorithm known as Ensemble Learning for Automatic Malware Detection (EL-AML) is proposed to realize our framework. Another algorithm named Pre-Process is proposed to assist the EL-AML algorithm for obtaining intermediate features required by CNN and LSTM.Empirical study reveals that our framework outperforms many existing methods in terms of speed-up and accuracy.
High Performance NMF Based Intrusion Detection System for Big Data IOT TrafficIJCNCJournal
With the emergence of smart devices and the Internet of Things (IoT), millions of users connected to the network produce massive network traffic datasets. These vast datasets of network traffic, Big Data are challenging to store, deal with and analyse using a single computer. In this paper we developed parallel implementation using a High Performance Computer (HPC) for the Non-Negative Matrix Factorization technique as an engine for an Intrusion Detection System (HPC-NMF-IDS). The large IoT traffic datasets of order of millions samples are distributed evenly on all the computing cores for both storage and speedup purpose. The distribution of computing tasks involved in the Matrix Factorization takes into account the reduction of the communication cost between the computing cores. The experiments we conducted on the proposed HPC-IDS-NMF give better results than the traditional ML-based intrusion detection systems. We could train the HPC model with datasets of one million samples in only 31 seconds instead of the 40 minutes using one processor), that is a speed up of 87 times. Moreover, we have got an excellent detection accuracy rate of 98% for KDD dataset.
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...IJCNCJournal
So far, Wireless Body Area Networks (WBANs) have played a pivotal role in driving the development of intelligent healthcare systems with broad applicability across various domains. Each WBAN consists of one or more types of sensors that can be embedded in clothing, attached directly to the body, or even implanted beneath an individual's skin. These sensors typically serve asingle application. However, the traffic generated by each sensor may have distinct requirements. This diversity necessitates a dual approach: tailored treatment based on the specific needs of each traffic typeand the fulfillment of application requirements, such asreliability and timeliness. Never the less, the presence of energy constraints and the unreliable nature of wireless communications make QoS provisioning under such networks a non-trivial task. In this context, the current paper introduces a novel Medium AccessControl (MAC) strategy for the regular traffic applications of WBANs, designed to significantly enhance efficiency when compared to the established MAC protocols IEEE 802.15.4 and IEEE 802.15.6, with a particular focus on improving reliability, timeliness, and energy efficiency.
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...IJCNCJournal
The efficient use of energy in wireless sensor networks is critical for extending node lifetime. The network topology is one of the factors that have a significant impact on the energy usage at the nodes and the quality of transmission (QoT) in the network. We propose a topology control algorithm for software-defined wireless sensor networks (SDWSNs) in this paper. Our method is to formulate topology control algorithm as a nonlinear programming (NP) problem with the objective to optimizing two metrics, maximum communication range, and desired degree. This NP problem is solved at the SDWSN controller by employing the genetic algorithm (GA) to determine the best topology. The simulation results show that the proposed algorithm outperforms the MaxPower algorithm in terms of average node degree and energy expansion ratio.
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...IJCNCJournal
The integration of artificial intelligence technology with a scalable Internet of Things (IoT) platform facilitates diverse smart communication services, allowing remote users to access services from anywhere at any time. The multi-server environment within IoT introduces a flexible security service model, enabling users to interact with any server through a single registration. To ensure secure and privacy preservation services for resources, an authentication scheme is essential. Zhao et al. recently introduced a user authentication scheme for the multi-server environment, utilizing passwords and smart cards, claiming resilience against well-known attacks. This paper conducts cryptanalysis on Zhao et al.'s scheme, focusing on denial of service and privacy attacks, revealing a lack of user-friendliness. Subsequently, we propose a new multi-server user authentication scheme for privacy preservation with fuzzy commitment over the IoT environment, addressing the shortcomings of Zhao et al.'s scheme. Formal security verification of the proposed scheme is conducted using the ProVerif simulation tool. Through both formal and informal security analyses, we demonstrate that the proposed scheme is resilient against various known attacks and those identified in Zhao et al.'s scheme.
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsIJCNCJournal
In -Vehicle Ad-Hoc Network (VANET), vehicles continuously transmit and receive spatiotemporal data with neighboring vehicles, thereby establishing a comprehensive 360-degree traffic awareness system. Vehicular Network safety applications facilitate the transmission of messages between vehicles that are near each other, at regular intervals, enhancing drivers' contextual understanding of the driving environment and significantly improving traffic safety. Privacy schemes in VANETs are vital to safeguard vehicles’ identities and their associated owners or drivers. Privacy schemes prevent unauthorized parties from linking the vehicle's communications to a specific real-world identity by employing techniques such as pseudonyms, randomization, or cryptographic protocols. Nevertheless, these communications frequently contain important vehicle information that malevolent groups could use to Monitor the vehicle over a long period. The acquisition of this shared data has the potential to facilitate the reconstruction of vehicle trajectories, thereby posing a potential risk to the privacy of the driver. Addressing the critical challenge of developing effective and scalable privacy-preserving protocols for communication in vehicle networks is of the highest priority. These protocols aim to reduce the transmission of confidential data while ensuring the required level of communication. This paper aims to propose an Advanced Privacy Vehicle Scheme (APV) that periodically changes pseudonyms to protect vehicle identities and improve privacy. The APV scheme utilizes a concept called the silent period, which involves changing the pseudonym of a vehicle periodically based on the tracking of neighboring vehicles. The pseudonym is a temporary identifier that vehicles use to communicate with each other in a VANET. By changing the pseudonym regularly, the APV scheme makes it difficult for unauthorized entities to link a vehicle's communications to its real-world identity. The proposed APV is compared to the SLOW, RSP, CAPS, and CPN techniques. The data indicates that the efficiency of APV is a better improvement in privacy metrics. It is evident that the AVP offers enhanced safety for vehicles during transportation in the smart city.
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionIJCNCJournal
Malware is one of the threats to security of computer networks and information systems. Since malware instances are available sufficiently, there is increased interest among researchers on usage of Artificial Intelligence (AI). Of late AI-enabled methods such as machine learning (ML) and deep learning paved way for solving many real-world problems. As it is a learning-based approach, accumulated training samples help in improving thequality of training and thus leveraging malware detection accuracy. Existing deep learning methods are focusing on learning-based malware detection systems. However, there is need for improving the state of the art through ensemble approach. Towards this end, in this paper we proposed a framework known as Deep Ensemble Framework (DEF) for automatic malware detection. The framework obtains features from training samples. From given malware instance a grayscale image is generated. There is another process to extract the opcode sequences. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) techniques are used to obtain grayscale image and opcode sequence respectively. Afterwards, a stacking ensemble is employed in order to achieve efficient malware detection and classification. Malware samples collected fromthe Internet sources and Microsoft are used for theempirical study. An algorithm known as Ensemble Learning for Automatic Malware Detection (EL-AML) is proposed to realize our framework. Another algorithm named Pre-Process is proposed to assist the EL-AML algorithm for obtaining intermediate features required by CNN and LSTM.Empirical study reveals that our framework outperforms many existing methods in terms of speed-up and accuracy.
High Performance NMF based Intrusion Detection System for Big Data IoT TrafficIJCNCJournal
With the emergence of smart devices and the Internet of Things (IoT), millions of users connected to the network produce massive network traffic datasets. These vast datasets of network traffic, Big Data are challenging to store, deal with and analyse using a single computer. In this paper we developed parallel implementation using a High Performance Computer (HPC) for the Non-Negative Matrix Factorization technique as an engine for an Intrusion Detection System (HPC-NMF-IDS). The large IoT traffic datasets of order of millions samples are distributed evenly on all the computing cores for both storage and speedup purpose. The distribution of computing tasks involved in the Matrix Factorization takes into account the reduction of the communication cost between the computing cores. The experiments we conducted on the proposed HPC-IDS-NMF give better results than the traditional ML-based intrusion detection systems. We could train the HPC model with datasets of one million samples in only 31 seconds instead of the 40 minutes using one processor), that is a speed up of 87 times. Moreover, we have got an excellent detection accuracy rate of 98% for KDD dataset.
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IJCNCJournal
Cyber intrusion attacks increasingly target the Internet of Things (IoT) ecosystem, exploiting vulnerable devices and networks. Malicious activities must be identified early to minimize damage and mitigate threats. Using actual benign and attack traffic from the CICIoT2023 dataset, this WORK aims to evaluate and benchmark machine-learning techniques for IoT intrusion detection. There are four main phases to the system. First, the CICIoT2023 dataset is refined to remove irrelevant features and clean up missing and duplicate data. The second phase employs statistical models and artificial intelligence to discover novel features. The most significant features are then selected in the third phase based on cooperative game theory. Using the original CICIoT2023 dataset and a dataset containing only novel features, we train and evaluate a variety of machine learning classifiers. On the original dataset, Random Forest achieved the highest accuracy of 99%. Still, with novel features, Random Forest's performance dropped only slightly (96%) while other models achieved significantly lower accuracy. As a whole, the work contributes substantial contributions to tailored feature engineering, feature selection, and rigorous benchmarking of IoT intrusion detection techniques. IoT networks and devices face continuously evolving threats, making it necessary to develop robust intrusion detection systems.
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...IJCNCJournal
IoT networking uses real items as stationary or mobile nodes. Mobile nodes complicate networking. Internet of Things (IoT) networks have a lot of control overhead messages because devices are mobile. These signals are generated by the constant flow of control data as such device identity, geographical positioning, node mobility, device configuration, and others. Network clustering is a popular overhead communication management method. Many cluster-based routing methods have been developed to address system restrictions. Node clustering based on the Internet of Things (IoT) protocol, may be used to cluster all network nodes according to predefined criteria. Each cluster will have a Smart Designated Node. SDN cluster management is efficient. Many intelligent nodes remain in the network. The network design spreads these signals. This paper presents an intelligent and responsive routing approach for clustered nodes in IoT networks. An existing method builds a new sub-area clustered topology. The Nodes Clustering Based on the Internet of Things (NCIoT) method improves message transmission between any two nodes. This will facilitate the secure and reliable interchange of healthcare data between professionals and patients. NCIoT is a system that organizes nodes in the Internet of Things (IoT) by grouping them together based on their proximity. It also picks SDN routes for these nodes. This approach involves selecting one option from a range of choices and preparing for likely outcomes problem addressing limitations on activities is a primary focus during the review process. Predictive inquiry employs the process of analyzing data to forecast and anticipate future events. This document provides an explanation of compact units. The Predictive Inquiry Small Packets (PISP) improved its backup system and partnered with SDN to establish a routing information table for each intelligent node, resulting in higher routing performance. Both principal and secondary roads are available for use. The simulation findings indicate that NCIoT algorithms outperform CBR protocols. Enhancements lead to a substantial 78% boost in network performance. In addition, the end-to-end latency dropped by 12.5%. The PISP methodology produces 5.9% more inquiry packets compared to alternative approaches. The algorithms are constructed and evaluated against academic ones.
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IJCNCJournal
Cyber intrusion attacks increasingly target the Internet of Things (IoT) ecosystem, exploiting vulnerable devices and networks. Malicious activities must be identified early to minimize damage and mitigate threats. Using actual benign and attack traffic from the CICIoT2023 dataset, this WORK aims to evaluate and benchmark machine-learning techniques for IoT intrusion detection. There are four main phases to the system. First, the CICIoT2023 dataset is refined to remove irrelevant features and clean up missing and duplicate data. The second phase employs statistical models and artificial intelligence to discover novel features. The most significant features are then selected in the third phase based on cooperative game theory. Using the original CICIoT2023 dataset and a dataset containing only novel features, we train and evaluate a variety of machine learning classifiers. On the original dataset, Random Forest achieved the highest accuracy of 99%. Still, with novel features, Random Forest's performance dropped only slightly (96%) while other models achieved significantly lower accuracy. As a whole, the work contributes substantial contributions to tailored feature engineering, feature selection, and rigorous benchmarking of IoT intrusion detection techniques. IoT networks and devices face continuously evolving threats, making it necessary to develop robust intrusion detection systems.
** Connect, Collaborate, And Innovate: IJCNC - Where Networking Futures Take ...IJCNCJournal
The International Journal of Computer Networks & Communications (IJCNC) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Computer Networks & Communications. The journal focuses on all technical and practical aspects of Computer Networks & data Communications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...IJCNCJournal
IoT networking uses real items as stationary or mobile nodes. Mobile nodes complicate networking. Internet of Things (IoT) networks have a lot of control overhead messages because devices are mobile. These signals are generated by the constant flow of control data as such device identity, geographical positioning, node mobility, device configuration, and others. Network clustering is a popular overhead communication management method. Many cluster-based routing methods have been developed to address system restrictions. Node clustering based on the Internet of Things (IoT) protocol, may be used to cluster all network nodes according to predefined criteria. Each cluster will have a Smart Designated Node. SDN cluster management is efficient. Many intelligent nodes remain in the network. The network design spreads these signals. This paper presents an intelligent and responsive routing approach for clustered nodes in IoT networks. An existing method builds a new sub-area clustered topology. The Nodes Clustering Based on the Internet of Things (NCIoT) method improves message transmission between any two nodes. This will facilitate the secure and reliable interchange of healthcare data between professionals and patients. NCIoT is a system that organizes nodes in the Internet of Things (IoT) by grouping them together based on their proximity. It also picks SDN routes for these nodes. This approach involves selecting one option from a range of choices and preparing for likely outcomes problem addressing limitations on activities is a primary focus during the review process. Predictive inquiry employs the process of analyzing data to forecast and anticipate future events. This document provides an explanation of compact units. The Predictive Inquiry Small Packets (PISP) improved its backup system and partnered with SDN to establish a routing information table for each intelligent node, resulting in higher routing performance. Both principal and secondary roads are available for use. The simulation findings indicate that NCIoT algorithms outperform CBR protocols. Enhancements lead to a substantial 78% boost in network performance. In addition, the end-to-end latency dropped by 12.5%. The PISP methodology produces 5.9% more inquiry packets compared to alternative approaches. The algorithms are constructed and evaluated against academic ones.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFROMANCE
1. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
DOI : 10.5121/ijcnc.2014.6315 197
CORRELATION BASED FEATURE
SELECTION (CFS) TECHNIQUE TO
PREDICT STUDENT PERFROMANCE
Mital Doshi 1
, Dr.Setu K Chaturvedi, Ph.D 2
1
Mtech. Research Scholar
Technocrats Institute of Technology Bhopal, India
2
Professor & HOD (Dept. of CSE)
Technocrats Institute of Technology Bhopal, India
ABSTRACT
Education data mining is an emerging stream which helps in mining academic data for solving various
types of problems. One of the problems is the selection of a proper academic track. The admission of a
student in engineering college depends on many factors. In this paper we have tried to implement a
classification technique to assist students in predicting their success in admission in an engineering
stream.We have analyzed the data set containing information about student’s academic as well as socio-
demographic variables, with attributes such as family pressure, interest, gender, XII marks and CET rank
in entrance examinations and historical data of previous batch of students. Feature selection is a process
for removing irrelevant and redundant features which will help improve the predictive accuracy of
classifiers. In this paper first we have used feature selection attribute algorithms Chi-square.InfoGain, and
GainRatio to predict the relevant features. Then we have applied fast correlation base filter on given
features. Later classification is done using NBTree, MultilayerPerceptron, NaiveBayes and Instance based
–K- nearest neighbor. Results showed reduction in computational cost and time and increase in predictive
accuracy for the student model
KEYWORDS
Chi-square, Correlation feature selection, IBK, Infogain, Gainratio, Multilayer perceptron, NaiveBayes,
NBTree
1. INTRODUCTION
Feature selection is a preprocessing step in machine learning. We have three main categories
wrapper, filter and embedded .algorithms [1]. The filter model selects some features without the
help of any learning algorithm. In the wrapper model we use some predetermined learning
algorithm to find out the relevant features and test them.Wrapper model is more expensive than
filter one because it requires more computations so when generally there are large number of
features we prefer filter model. In this paper, we have tried to use the filter model and our aim is
to improve the accuracy of recommending the stream to the student to help him develop a bright
future according to his choice by predicting the success at the earliest. Fast correlation base filter
is an algorithm which is much successful in removing the redundant and irrelevant features from
the dataset so that computation time is decreased and predictive accuracy is increased.
2. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
198
2.CLASSFICATIONTECHNIQUES
2.1 NBTree
NBTree is a hybrid algorithm with Decision Tree and Naïve-Bayes. In this algorithm the basic
concept of recursive partitioning of the schemes remains the same but here the difference is that
the leaf nodes are naïve Bayes categorizers and will not have nodes predicting a single class. [2]
2.2 Naïve Bayes
The Naïve Bayes classifier technique is used when dimensionality of the inputs is high. This is a
simple algorithm but gives good output than others. We are using this to predict the dropout of
students by calculating the probability of each input for a predictable state. It trains the weighted
training data and also helps prevent over fitting.
2.3 Instance-based-k-nearest neighbor
In this technique a new item is classified by comparing the memorized data items using a distance
measure. For this we require storing of a dataset. Matching of items is done by putting them close
to original item. Nearest neighbors can be done by using cross-validation either automatically or
manually.
2.4 Multilayer Perceptron
It is one of the most widely used and popular neural networks. Its network consists of a set of
sensory elements which forms the input layer, one or more hidden layers of processing elements,
and the output layer is of the processing elements. The back propagation algorithm ANN can be
used for predicting both continuous and discrete data. ANN Algorithm represents each cluster by
a neuron based on the neural structure of the brain. Here each connection has an associated
weight, which is calculated adaptively during learning. The only point about ANN is that it takes
long training times and is therefore more suitable for applications where long training is feasible.
Here we have used Multilayer Perceptron technique of ANN. [3]
3. RELATED WORK
Pumpuang [4]had proposed the classifier algorithm for building Course Registration Planning
Model from historical dataset.The model used four classifiers including Bayesian Network, C4.5,
Decision Forest and NBTree. Results showed that NBTree seemed to be the best for prediction of
GPA of the student.
Tanna[5] has implemented a decision support system for admission in engineering colleges
which is based on entrance exam marks. Results show it will return colleges and streams
categorized as Ambitious, Best Bargain and Safe using an offset value.
In [6] Malaya used a knowledge based decision technique will guide the student for admission in
proper branch of engineering. They used two algorithms decision tree algorithm and ANN to find
out which one is more accurate for decision making. Results showed that accuracy of MLP
algorithm has proved to be better for training partition size 50 & testing partition size 50 upto
86%
3. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
199
Al-Radaideh [7] proposed in his paper a simple classification model to provide a guideline to help
students and school management to choose the right track of study for a student. Decision tree
using the C4.5 algorithm (J48 in WEKA), was built by selecting the best attributes using the
information gain measure. The classification rules to find were based on more than one factor
such as the Ratio and the average of student mark in the 10th class (AVERAGE), and the average
of the student mark in 8th
, 9th
, and 10th
classes (AVG89_10). Results show that accuracy of the
model was 87.9% where 218 students were correctly classified out of 248 students.
Hany et al. [8] applied six classifiers on ASSISTments dataset having 15 features. They used
VF1,IBK,NaiveBayes Updateable, ONER, j48 and k means clustering classifiers to rank the
features. Results showed that k means clustering was the best in giving ranks to features and
Naïve Bayes was better in giving prediction accuracy.
Lei Yu [9] in their work proposed a feature selection algorithm which is specially used for high
dimensional data which is called as fast correlation base filter. This algorithm is for removing
irrelevant and redundant data. They applied FCBF, ReliefF, CorrF, and ConSF on four datasets
and recorded the running time and number of features selected. Then they applied C4.5 and NBC
classification on the data.
Bharadwaj and Pal [10] conducted experiment to predict the performance at the end of semester
using student’s data like attendance, class test, seminar and assignment marks from the student’s
previous database results
Hijazi and Naqvi [11] conducted a study on student performance on 300 students from group of
colleges of Punjab University. Results showed that student’s attitude towards attendance in class
are dependent on the time they spend in college for study after college hours. Other factors such
as mother’s age and education are related with student’s performance found by simple linear
regression analysis.
Khan [12] conducted an experiment on 200 boys and 200 girls of Secondary school of Aligarh
Muslim University. Their main aim was to find out variables which determine the success in
higher education in science stream. So they used demographic variables, personality measures as
an input. They had used cluster sampling technique for division into groups or clusters and a
random sample of cluster was used for further analysis. Results showed that girls with high socio-
economic status had relatively higher academic achievement in science whereas boys with low
socio-economic status had higher academic achievement in general.
Z. J. Kovacic [13] presented a case study on educational data mining to identify up to what extent
enrolment data can be used to predict student’s success. They had used CHAID and CART on
students of diploma college of New Zealand. They got two decision trees in their results and
accuracy of classifiers obtained was 59.4 and 60.5.
Al-Radaideh [7] proposed in his paper a simple classification model to provide a guideline to help
students and school management to choose the right track of study for a student. Decision tree
using the C4.5 algorithm (J48 in WEKA), was built by selecting the best attributes using the
information gain measure. The classification rules to find were based on more than one factor
such as the Ratio and the average of student mark in the 10th class (AVERAGE), and the average
of the student mark in 8th
, 9th
, and 10th
classes (AVG89_10). Results show that accuracy of the
model was 87.9% where 218 students were correctly classified out of 248 students.
Hany et al. [8] applied six classifiers on ASSISTments dataset having 15 features. They used
VF1,IBK,NaiveBayes Updateable, ONER, j48 and k means clustering classifiers to rank the
features. Results showed that k means clustering was the best in giving ranks to features and
Naïve Bayes was better in giving prediction accuracy.
4. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
200
Lei Yu [9] in their work proposed a feature selection algorithm which is specially used for high
dimensional data which is called as fast correlation base filter. This algorithm is for removing
irrelevant and redundant data. They applied FCBF, ReliefF, CorrF, and ConSF on four datasets
and recorded the running time and number of features selected. Then they applied C4.5 and NBC
classification on the data.
Bharadwaj and Pal [10] conducted experiment to predict the performance at the end of semester
using student’s data like attendance, class test, seminar and assignment marks from the student’s
previous database results
Hijazi and Naqvi [11] conducted a study on student performance on 300 students from group of
colleges of Punjab University. Results showed that student’s attitude towards attendance in class
are dependent on the time they spend in college for study after college hours. Other factors such
as mother’s age and education are related with student’s performance found by simple linear
regression analysis.
Khan [12] conducted an experiment on 200 boys and 200 girls of Secondary school of Aligarh
Muslim University. Their main aim was to find out variables which determine the success in
higher education in science stream. So they used demographic variables, personality measures as
an input. They had used cluster sampling technique for division into groups or clusters and a
random sample of cluster was used for further analysis. Results showed that girls with high socio-
economic status had relatively higher academic achievement in science whereas boys with low
socio-economic status had higher academic achievement in general.
Z. J. Kovacic [13] presented a case study on educational data mining to identify up to what extent
enrolment data can be used to predict student’s success. They had used CHAID and CART on
students of diploma college of New Zealand. They got two decision trees in their results and
accuracy of classifiers obtained was 59.4 and 60.5.
4.CORRELATIONFEATURE SELECTION
Feature selection is a preprocessing step to machine learning which is effective in reducing
dimensionality, removing irrelevant data, increasing learning accuracy, and improving result
comprehensibility. [14]
4.1 STEPS OF FEATURE SELECTION
A feature of a subset is good if it is highly correlated with the class but not much correlated with
other features of the class. [15]
Steps:
a. Subset generation: We have used four classifiers to rank all the features of the data set.
Then we have used top 3, 4, and 5 features for classification.
b. Subset evaluation: Each classifier is applied to generated subset.
c. Stopping criterion: Testing process continues until 5 features of the subset are selected.
d. Result validation: We have used 10-fold cross validation method for testing each
classifier’s accuracy.
5. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
201
4.2CORRELATION-BASED MEASURES
Here we shall discuss the measures used to find the goodness of a feature for classification. We
find a feature to be good if it is more relevant to the class and not redundant to any other features
of the class. So in short a feature should be highly correlated to the class and not much correlated
to any other feature of the class. For this we have used information theory based on entropy.
Entropy is a measure of uncertainty of a random variable. It can be defined by the following
equation 1 as
H(X) = - ∑P (xi) log2 (P (xi) (1)
i
And the entropy of X after observing values of another variable Y is defined in equation 2 as
H(X/Y) = - ∑P (yj) ∑P (xi/yj) log2 (P (xi/yj)) (2)
j i
Here, P (xi) is the prior probabilities for all values of X, and P (xi/yj) is the posterior probabilities
of X when values of Y are given. The amount by which the entropy of X decreases reflects
additional information about X provided by Y is called information gain given the equation 3 as
IG(X/Y) = H(X)-H(X/Y)(3)
We can conclude that feature Y is regarded to be more correlated to feature X than
to feature Z, if IG(X/Y) > IG (Z/Y).
We have one more measure symmetrical uncertainty which shows correlation between features
defined by equation 4 as
SU(X, Y) = 2 [IG(X/Y) / H(X) + H(Y)] (4)
SU compensates information gain’s bias toward features with more values and normalizes its
value to range of [0,1] with 1 showing that knowledge of either one completely predicts the value
of other and 0 shows that X and Y are independent. It considers pair of features symmetrically.
Entropy based measures require nominal features, but they can be applied to measure correlations
between continuous features as well if they are discretized properly.
5. ALGORITHM
Based on the methodology presentedbefore, wehave used the following algorithm,named
FCBF(FastCorrelation- Based Filter). [9]
Input: S (F1,F2, FN , C) // training data set
δ // predefined threshold value
Output: Sbest // an optimal subset
1 begin
2 for i = 1 to N do begin
3 calculate SUi,c for Fi ;
4 if (SUi,c ≥ δ)
5 append Fi to S'list ;
6. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
202
6 end;
7 order S'list in descending SUi,c value;
8 Fp = getFirstElement(S'list)
9 do begin
10 Fq =getNextElement(S'list,Fp)
11 if (Fq<> NULL)
12 do begin
13 F 'q = Fq ;
14 if (SUp,q ≥ SUq,c)
15 remove Fq from S'list
16. Fq = getNextElement(S'list F'q);
17. else Fq = getNextElement(S'list, Fq);
18 end until (Fq = = NULL);
19 Fp = getNextElement(S'list,Fp);
20 end until (FP = = NULL);
21 Sbest = S'list ;
22 end;
6. PROPOSED SYSTEM
6.1 DATA PREPARATIONS
We have collected students data from a Mumbai college going to enroll in 2014 which is a
training dataset consisting of information about students admitted to the first year. Data is in the
excel format and has details of students personal and academic record. It has details such a s
student’s name, admission type, sex, marks in 12th
standard, marks in math, physics, chemistry,
average of all, common entrance test marks, and personal details as father’s occupation,
qualification, mother’s qualification and occupation, interest of student.
6.2 DATA PROCESSING
Student data warehouse contains details as follows. It contains 380 instances with 32 attributes.
From this list we have selected 17 attributes which we felt as relevant related to our work.
Following table 1 is the list of reduced number of attributes.
7. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
203
Table 1: List of Attributes
6.3 IMPLEMENTATION OF MODEL
WEKA is open source software which is freely available for mining data and implements a large
collection of mining algorithms. It can accept data in various formats and also has converter
supported with it.So we have converted the student dataset into arff file. The file was loaded into
WEKA explorer. The classify panel is used for classification, to estimate the accuracy of resulting
predictive model, visualize erroneous predictions, or the model itself. Net Beans is used to
implement FCBF. For good results we need to know the weightage of each variable necessary for
the success of admission of student in engineering. So we have used feature selection algorithms
tests such as Info gain, Chi squared, gain Ratio. The following table 2 shows the features ranked
according to the algorithm.
8. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
204
Table 2: Rank of features and Average rank.
So instead of trusting on any one attribute selector we have taken the average of their ranks and
selected the features. So the ranking is 17,8,4,16,10. From the above table we conclude that
family pressure is the most important factor for prediction of admission in engineering which is
followed by admission_type, interest of student, mother’s occupation, and residence in hostel.
Next we have applied classification algorithms NBTree, MultilayerPerceptron, Naïve Bayes and
IBK on the selected features. For this we take the subset of 3 features and then add on feature to
see the accuracy of the algorithms. The below table 3 shows the evaluation criteria of features
classified.
TABLE 3: EVALUATION OF CLASSIFIERS USING SUBSET OF 3, 4, 5 FEATURES (PA-Predictive
Accuracy)
9. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
205
From the table we can see that highest PA of NBTree is 65% with three features. For MLP we get
highest PA of 65.83% with four features gives highest accuracy with 3 features. For Naïve Bayes
we get highest accuracy of 61.66% with four features. And for IBK we get PA of 75% also with
four features. Also amongst all the classifiers we conclude that IBK is the best classifier amongst
all with minimum time.
Now we do the classification using the FCBF algorithm which is implemented in JAVA using net
beans. FCBF is not supported by WEKA.
The following are the attributes which have been selected with their symmetric uncertainty
values. The most important factor that we have found using this algorithm is family income
followed by father qualification, all India rank in common entrance test. Now we apply the
classifiers on the selected attributes. The following table 4 shows the classification using 3, 4, and
5 features.
TABLE 4: EVALUATION OF CLASSIFIERS USING FCBF ALGORITHM SUBSET OF 3, 4, 5
FEATURES (PA-Predictive Accuracy)
No. of
Features
NBTree MLP Naïve
Bayes
IBK
PA time PA time PA time PA time
3 65.83 .05 75 .82 65.83 0 75 0
4 65.83 .08 81.6 1.64 66.6 0 100 0
5 75 .26 87.5 1.89 65.83 .01 100 0
MAX 75 87.5 66.6 100
Results show that using FCBF we get the maximum accuracy by using the classifier IBK i.e.
100%. Other than that we see from the table that PA of NBTree is 75% and that of MLP is 87.5%
and that of Naïve Bayes PA is 66.6 Also we get conclude that time is saved and accuracy is
increased.
7. CONCLUSION
From the above results we conclude that feature selection techniques can improve the
accuracy and efficiency of the classification algorithms by removing irrelevant and
redundant features. Also by using the average of Infogain, gainratio, and Chi-square test
we get the most relevant attributes. Four classifiers have been applied on the selected
attributes. From the results we conclude that family pressure and interest of student are the
most important factor for prediction of admission of student in engineering. So we get a
predictive idea that the student should take or not admission in engineering. Also we
conclude that amongst all selection techniques used FCBF gives the best output of
relevancy of features. In future other feature selection techniques can be applied on the
dataset.
10. International Journal of Computer Networks & Communications (IJCNC) Vol.6, No.3, May 2014
206
REFERENCES
[1] Ladha L. and Deepa T., "Feature Selection Methods and Algorithms", International Journal on
Computer Science and Engineering (IJCSE), 2011.
[2] R. Kohavi. “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”
Proceedings of the Second International Conference on Knowledge Discovery and Data Mining,
1996
[3] Baker, R.S.J.D. (2010). Data Mining for Education. In B. McGaw, P. Peterson, E. Baker (eds.),
International Encyclopaedia of Education (3rd edition), (pp. 112-118). Oxford, UK: Elsevier
[4] Pathom Pumpuang, Anongnart Srivihok , Prasong Praneetpolgrang, “Comparisons of Classifier
Algorithms: Bayesian Network, C4.5, Decision Forest and NBTree for Course Registration
Planning Model of Undergraduate Students”, 1-4244-2384-2/08/ 2008 IEEE
[5] Miren Tanna, “Decision Support System for Admission in Engineering Colleges based on
Entrance Exam Marks”, IJCA(0975 – 8887) Volume 52– No.11, August 2012
[6] Malaya Dutta Borah, Rajni Jindal, Daya Gupta Ganesh Chandra Deka, “Application of knowledge
based decision technique to predict student enrollment decision”, 978-1-4577-0792-6/11 2011
IEEE
[7] Qasem A. Al-Radaideh, Ahmad Al Ananbeh, and Emad M. Al-Shawakfa, “A classification model
for predicting the suitable study track for school students”, Vol8 Issue2/IJRRAS_8_2_15.pdf,
August 2011
[8] Hany M. Harb1, Malaka A. Moustafa, “Selecting optimal subset of features for student
performance model”, IJCSI Vol. 9, Issue 5, No 1, September 2012, 1694-0814
[9] Lei Yu leiyu,Huan Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based
Filter Solution”, (ICML-2003), Washington DC, 2003.
[10] B. K. Bharadwaj and S. Pal. "Mining Educational Data to Analyze Students' Performance",
International Journal of Advance Computer Science and Applications (IJACSA), Vol. 2, No. 6,
pp.63-69, 2011.
[11] S. T. Hijazi, and R. S. M. M. Naqvi, "Factors affecting student's performance: A Case of Private
Colleges", Bangladesh e-Journal of Sociology, Vol. 3, No. 1, 2006.
[12] Z. N. Khan, "Scholastic achievement of higher secondary students in science stream", Journalof
Social Sciences, Vol. 1, No. 2, pp. 84-87, 2005.
[13] Z. J. Kovacic, “Early prediction of student success: Mining student enrollment data”,Proceedings
of Informing Science & IT Education Conference 2010
[14] Blum & Langley, 1997; Kohavi &John, 1997
[15] Hall, M. (1999). Correlation based feature selection for machine learning. Doctoral dissertation,
Universityof Waikato, Dept. of Computer Science.
[16] WEKA,http://www.cs.waikato.ac.nz/ml/weka, Last access, 8 April 2008.
Authors
Mital Mehta,B.E. in Computer engineering. Pursuing Mtech in software systems from Bhopal T.I.T College