This document describes a study that uses data mining techniques to construct a model for predicting graduates' employability in Malaysia. The study uses classification algorithms like Bayes algorithms and decision trees to analyze data from a government survey on graduates' employment status. The algorithms are tested on a dataset of over 12,000 graduate profiles. Results show that the J48 decision tree algorithm achieved the highest accuracy of 92.3% at predicting whether graduates were employed, unemployed, or of undetermined status. Attribute analysis found that the top three attributes impacting employability were job sector, job status, and reason for not working. The study aims to help guide education planning and improve graduates' employment outcomes.
Data Mining Model for Predicting Student Enrolment in STEM Courses in Higher ...Editor IJCATR
Educational data mining is the process of applying data mining tools and techniques to analyze data at educational
institutions. In this paper, educational data mining was used to predict enrollment of students in Science, Technology, Engineering and
Mathematics (STEM) courses in higher educational institutions. The study examined the extent to which individual, sociodemographic
and school-level contextual factors help in pre-identifying successful and unsuccessful students in enrollment in STEM
disciplines in Higher Education Institutions in Kenya. The Cross Industry Standard Process for Data Mining framework was applied to
a dataset drawn from the first, second and third year undergraduate female students enrolled in STEM disciplines in one University in
Kenya to model student enrollment. Feature selection was used to rank the predictor variables by their importance for further analysis.
Various predictive algorithms were evaluated in predicting enrollment of students in STEM courses. Empirical results showed the
following: (i) the most important factors separating successful from unsuccessful students are: High School final grade, teacher
inspiration, career flexibility, pre-university awareness and mathematics grade. (ii) among classification algorithms for prediction,
decision tree (CART) was the most successful classifier with an overall percentage of correct classification of 85.2%. This paper
showcases the importance of Prediction and Classification based data mining algorithms in the field of education and also presents
some promising future lines.
The International Journal of Mechanical Engineering Research and Technology is an international online journal published Quarterly offers fast publication schedule whilst maintaining rigorous peer review. The use of recommended electronic formats for article delivery expedites the process All submitted research articles are subjected to the immediate rapid screening by editors consultation with Editorial Board or others working in the field of appropriate to ensure that they are likely to be the level of interest and importance of appropriate for the journal.
ISSN 2454-535X
International Journal of Mechanical Engineering Research and Technology aims to provide the best possible service to authors of original research articles, and the fairest system of peer review.
The International Journal of Mechanical Engineering Research and Technology is an international online journal in English published Quarterly. This offers a fast publication schedule whilst maintaining rigorous peer review; the use of recommended electronic formats for article delivery expedites the process. All submitted research articles are subjected to immediate rapid screening by the editors, in consultation with the Editorial Board or others working in the field as appropriate, to ensure they are likely to be of the level of interest and importance appropriate for the journal.
This tracer study determined the employment status
of BS Computer Science
Graduates of LPU from 2004-2009. It also assessed t
he relevance of BSCS curricula,
knowledge, skills and work values acquired by the g
raduates relevant to their
employment; identify the personal and professional
characteristics and job placement
of Computer Science graduates and the school relate
d factors associated with their
employment. The findings of the study served as the
basis of the researcher to
improve, update or enhance the curricula of BSCS pr
ogram to make this more
responsive to the needs of fast changing technology
.
There were 85 percent of the surveyed respondents w
ho were gainfully employed;
majority have professional, technical and superviso
ry position, landed on their first
job related to their course completed, obtained the
ir first jobs in less than 1 year;
stayed in their first job more than 1 year, career
challenge, salaries and benefits are
the prime reasons for changing the job and lack of
work experience is the number 1
problem they encountered when looking for a job.
Information Technology and communication skills dev
eloped by LPU were
considered very much useful to the present work of
the respondents. Work related
values like love for God, supportiveness, courage,
tolerance and perseverance were
also deemed very much useful to the present employm
ent of the respondents. The
proposed program of the study focused on academic d
evelopment, employment
opportunity and enhancing leadership capability of
Computer Science graduates.
It is strongly recommended that the graduating stud
ents before graduation must be
given ample time to experience pre – employment exa
minations and interviews.
Faculty development trainings must be given to the
faculty members teaching
professional subjects. As to general Education Subj
ects, Mathematics and Language
subjects must also be strengthened. All Offices and
Departments must continue to
improve their services towards the attainment of ma
ximum customer satisfaction.
Data Mining Model for Predicting Student Enrolment in STEM Courses in Higher ...Editor IJCATR
Educational data mining is the process of applying data mining tools and techniques to analyze data at educational
institutions. In this paper, educational data mining was used to predict enrollment of students in Science, Technology, Engineering and
Mathematics (STEM) courses in higher educational institutions. The study examined the extent to which individual, sociodemographic
and school-level contextual factors help in pre-identifying successful and unsuccessful students in enrollment in STEM
disciplines in Higher Education Institutions in Kenya. The Cross Industry Standard Process for Data Mining framework was applied to
a dataset drawn from the first, second and third year undergraduate female students enrolled in STEM disciplines in one University in
Kenya to model student enrollment. Feature selection was used to rank the predictor variables by their importance for further analysis.
Various predictive algorithms were evaluated in predicting enrollment of students in STEM courses. Empirical results showed the
following: (i) the most important factors separating successful from unsuccessful students are: High School final grade, teacher
inspiration, career flexibility, pre-university awareness and mathematics grade. (ii) among classification algorithms for prediction,
decision tree (CART) was the most successful classifier with an overall percentage of correct classification of 85.2%. This paper
showcases the importance of Prediction and Classification based data mining algorithms in the field of education and also presents
some promising future lines.
The International Journal of Mechanical Engineering Research and Technology is an international online journal published Quarterly offers fast publication schedule whilst maintaining rigorous peer review. The use of recommended electronic formats for article delivery expedites the process All submitted research articles are subjected to the immediate rapid screening by editors consultation with Editorial Board or others working in the field of appropriate to ensure that they are likely to be the level of interest and importance of appropriate for the journal.
ISSN 2454-535X
International Journal of Mechanical Engineering Research and Technology aims to provide the best possible service to authors of original research articles, and the fairest system of peer review.
The International Journal of Mechanical Engineering Research and Technology is an international online journal in English published Quarterly. This offers a fast publication schedule whilst maintaining rigorous peer review; the use of recommended electronic formats for article delivery expedites the process. All submitted research articles are subjected to immediate rapid screening by the editors, in consultation with the Editorial Board or others working in the field as appropriate, to ensure they are likely to be of the level of interest and importance appropriate for the journal.
This tracer study determined the employment status
of BS Computer Science
Graduates of LPU from 2004-2009. It also assessed t
he relevance of BSCS curricula,
knowledge, skills and work values acquired by the g
raduates relevant to their
employment; identify the personal and professional
characteristics and job placement
of Computer Science graduates and the school relate
d factors associated with their
employment. The findings of the study served as the
basis of the researcher to
improve, update or enhance the curricula of BSCS pr
ogram to make this more
responsive to the needs of fast changing technology
.
There were 85 percent of the surveyed respondents w
ho were gainfully employed;
majority have professional, technical and superviso
ry position, landed on their first
job related to their course completed, obtained the
ir first jobs in less than 1 year;
stayed in their first job more than 1 year, career
challenge, salaries and benefits are
the prime reasons for changing the job and lack of
work experience is the number 1
problem they encountered when looking for a job.
Information Technology and communication skills dev
eloped by LPU were
considered very much useful to the present work of
the respondents. Work related
values like love for God, supportiveness, courage,
tolerance and perseverance were
also deemed very much useful to the present employm
ent of the respondents. The
proposed program of the study focused on academic d
evelopment, employment
opportunity and enhancing leadership capability of
Computer Science graduates.
It is strongly recommended that the graduating stud
ents before graduation must be
given ample time to experience pre – employment exa
minations and interviews.
Faculty development trainings must be given to the
faculty members teaching
professional subjects. As to general Education Subj
ects, Mathematics and Language
subjects must also be strengthened. All Offices and
Departments must continue to
improve their services towards the attainment of ma
ximum customer satisfaction.
Multiple educational data mining approaches to discover patterns in universit...IJICTJOURNAL
This paper presented the utilization of pattern discovery techniques by using multiple relationships and clustering educational data mining approaches to establish a knowledge base that will aid in the prediction of ideal college program selection and enrollment forecasting for incoming freshmen. Results show a significant level of accuracy in predicting college programs for students by mining two years of student college admission and graduation final grade scholastic records. The results of educational predictive data mining methods can be applied in improving the services of the admission department of an educational institution, particularly in its course alignment, student mentoring, admission forecast, marketing, and enrollment preparedness.
Data mining approach to predict academic performance of studentsBOHRInternationalJou1
Powerful data mining techniques are available in a variety of educational fields. Educational research is
advancing rapidly due to the vast amount of student data that can be used to create insightful patterns
related to student learning. Educational data mining is a tool that helps universities assess and identify student
performance. Well-known classification techniques have been widely used to determine student success in
data mining. A decisive and growing exploration area in educational data mining (EDM) is predicting student
academic performance. This area uses data mining and automaton learning approaches to extract data from
education repositories. According to relevant research, there are several academic performance prediction
methods aimed at improving administrative and teaching staff in academic institutions. In the put-forwarded
approach, the collected data set is preprocessed to ensure data quality and labeled student education data
is used to apply ANN classifiers, support vector classifiers, random forests, and DT Compute and train a
classifier. The achievement of the four classifications is measured by accuracy value, receiver operating curve
(ROC), F1 score, and confusion matrix scored by each model. Finally, we found that the top three algorithmic
models had an accuracy of 86–95%, an F1 score of 85–95%, and an average area under ROC curve of
OVA of 98–99.6%
Automated Data Integration, Cleaning and Analysis Using Data Mining and SPSS ...CSCJournals
Students’ performance plays major role in determining the quality of our education system. Sijil Pelajaran Malaysia (SPM) is a public examination compulsory to be taken by Form 5 students in Malaysia. The performance gap is not only a school and classroom issue but also a national issue that must be addressed properly. This study aims to integrate, clean and analysis through automated data mining techniques. Using data mining techniques is one of the processes of transferring raw data from current educational system to meaningful information that can be used to help the school community to make a right decision to achieve much better results. This proved DM provides means to assist both educators and students, and improve the quality of education. The result and findings in the study show that automated system will give the same result compare with manual system of integration and analysis and also could be used by the management to make faster and more efficient decision in order to map or plan efficient teaching approach for students in the future.
Data Mining Techniques in Higher Education an Empirical Study for the Univer...IJMER
Nowadays, ones of the biggest challenges that educational institutions face is the explosive
growth of educational data. and how to use these data to improve the quality of managerial decisions.
Data mining, as an analytical tools that can be used to extract meaningful knowledge from large data
sets, can be used to achieve this goal.
This paper addresses the applications of Educational Data Mining (EDM) to extract useful information
from registration information of student at university of Palestine in Gaza strip. The data include five
years period [2005-2011] by providing analytical tool to view and use this information for decision
making processes by taking real life example such as grade and GPA for the students. abstract should
summarize the content of the paper.
This paper introduces the competency models for Operations Manager, User Interface
Designer, and Application Developers. It will serve as a guide for Information Systems students
to identify which among the three of the offered tracks would be most suited for them to pursue
according to their knowledge, skills, values and interests. The Holland’s RIASEC model and the
Values Search model of Bronwyn and Holt were utilized to determine the most dominant interest
and most dominant values of the industry computing experts. Survey assessment forms were sent
to IT Operations Manager, User Interface Designer, and Application Developer. Most dominant
values and interests of industry computing experts were determined as well as the knowledge
and skills which are mostly required by the industry in their particular area. Based on the result
of the survey, it shows that application developer and user interface designer have a closely
related values. Thus a second round of a survey would be needed to come up with the most
exclusive dominant values for the particular information systems specialization track.
The Big data concept emerged to meet the growing demands in analysing large
volumes of fast moving, heterogeneous and complex data, which traditional data
analysis systems could not manage further. The application of big data technology
across various sectors of the economy has aided better utilization of multiple data
collated and hence decision making. Organizations no longer base operations on
assumptions or constructed models solely, but can make inferences from generated
data. Educational organizations are more efficient and the pedagogical processes
more effective, when multiple streams of data can be collated from the various
personnel and facilitators involved. This data when analysed, maximizes the
performance of administrators andrecipients alike. This paper looks at the
components and techniques in bigdata technology, and how it can be implemented in
the education system for effective administration and delivery
Higher education institutions now a days are operating in an increasingly complex and
competitive environment. The application of innovation is a must for sustaining its competitive advantage.
Institution leaders are using data management and analytics to question the status quo and develop effective
solutions. Achieving these insights and information requires not a single report from a single system, but
rather the ability to access, share, and explore institution-wide data that can be transformed into meaningful
insights at every level of the institution. Consequently, institutions are facing problems in providing necessary
information technology support for fulfilling excellence in performance. More specifically, the best practices
of big data management and analytics need to be considered within higher education institutions. Therefore,
the study aimed at investigating big data and analytics, in terms of: (1) definition; (2) its most important
principles; (3) models; and (4) benefits of its use to fulfill performance excellence in higher education
institutions. This involves shedding light on big data and analytics models and the possibility of its use in
higher education institutions, and exploring the effect of using big data and analytics in achieving performance
excellence. To reach these objectives, the researcher employed a qualitative research methodology for
collecting and analyzing data. The study concluded the most important result, that there is a significant
relationship between big data and analytics and excellence of performance as big data management and
analytics mainly aims at achieving tasks quickly with the least effort and cost. These positive results support
the use of big data and analytics in institutions and improving knowledge in this field and providing a practical
guide adaptable to the institution structure. This paper also identifies the role of big data and analytics in
institutions of higher education worldwide and outlines the implementation challenges and opportunities in the
education industry.
Predicting student performance in higher education using multi-regression modelsTELKOMNIKA JOURNAL
Supporting the goal of higher education to produce graduation who will be a professional leader is a crucial. Most of universities implement intelligent information system (IIS) to support in achieving their vision and mission. One of the features of IIS is student performance prediction. By implementing data mining model in IIS, this feature could precisely predict the student’ grade for their enrolled subjects. Moreover, it can recognize at-risk students and allow top educational management to take educative interventions in order to succeed academically. In this research, multi-regression model was proposed to build model for every student. In our model, learning management system (LMS) activity logs were computed. Based on the testing result on big students datasets, courses, and activities indicates that these models could improve the accuracy of prediction model by over 15%.
The recruitment of new personnel is one of the most essential business processes which affect the quality of
human capital within any company. It is highly essential for the companies to ensure the recruitment of
right talent to maintain a competitive edge over the others in the market. However IT companies often face
a problem while recruiting new people for their ongoing projects due to lack of a proper framework that
defines a criteria for the selection process. In this paper we aim to develop a framework that would allow
any project manager to take the right decision for selecting new talent by correlating performance
parameters with the other domain-specific attributes of the candidates. Also, another important motivation
behind this project is to check the validity of the selection procedure often followed by various big
companies in both public and private sectors which focus only on academic scores, GPA/grades of students
from colleges and other academic backgrounds. We test if such a decision will produce optimal results in
the industry or is there a need for change that offers a more holistic approach to recruitment of new talent
in the software companies. The scope of this work extends beyond the IT domain and a similar procedure
can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide
useful information from the historical projects depending on which the hiring-manager can make decisions
for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining
framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The
results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for
quality objectives.
Due to the increasing interest in big data especially in the educational field and online education has led to a conflict in terms of performance indicators of the student. In this paper we discuss the methodology of assessing the student performance in terms of the success indicators revealing a number of indicators that is recommended to indicate success of the final academic achievement.
ow-a-days data volumes are growing rapidly in several domains. Many factors have contributed to this growth, including inter alia proliferation of observational devices, miniaturization of various sensors ,improved logging and tracking of systems, and improvements in the quality and capacity of both disk storage and networks .Analyzing such data provides insights that can be used to guide decision making. To be effective, analysis must be timely and cope with data scales. The scale of the data and the rates at which they arrive make manual inspection infeasible. As an educational management tool, predictive analytics can help and improve the quality of education by letting decision makers address critical issues such as enrollment management and curriculum Development. This paper presents an analytical study of this approach’s prospects for education planning. The goals of predictive analytics are to produce relevant information, actionable insight, better outcomes, and smarter decisions, and to predict future events by analyzing the volume, veracity, velocity, variety, value of large amounts of data and interactive exploration.
Due to the increasing interest in big data especially in the educational field and online education has led to a conflict in terms of performance indicators of the student. In this paper we discuss the methodology of assessing the student performance in terms of the success indicators revealing a number of indicators that is recommended to indicate success of the final academic achievement
DATA MINING FOR STUDENTS’ EMPLOYABILITY PREDICTIONCSEIJJournal
This study has been undertaken to predict the student employability.Assessing student employability
provides a method of integrating student abilities and employer business requirements, which is becoming
an increasingly important concern for academic institutions. Improving student evaluation techniques for
employability can help students to have a better understanding of business organizations and find the right
one for them. The data for the training classification models is gathered through a survey in which students
are asked to fill out a questionnaire in which they may indicate their abilities and academic achievement.
This information may be used to determine their competency in a variety of skill categories, including soft
skills, problem-solving skills and technical abilities and so on.The goal of this research is to use data
mining to predict student employability by considering different factors such as skills that the students have
gained during their diploma level and time duration with respect to the knowledge they have captured
when they expect the placement at the end of graduation. Further during this research most specific skills
with relevant to each job category also was identified. In this research for the prediction of the student
employability different data mining models such as such as KNN, Naive Bayer’s, and Decision Tree were
evaluated and out of that best model also was identified for this institute's student’s employability
prediction.So, in this research classification and association techniques were used and evaluated.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Multiple educational data mining approaches to discover patterns in universit...IJICTJOURNAL
This paper presented the utilization of pattern discovery techniques by using multiple relationships and clustering educational data mining approaches to establish a knowledge base that will aid in the prediction of ideal college program selection and enrollment forecasting for incoming freshmen. Results show a significant level of accuracy in predicting college programs for students by mining two years of student college admission and graduation final grade scholastic records. The results of educational predictive data mining methods can be applied in improving the services of the admission department of an educational institution, particularly in its course alignment, student mentoring, admission forecast, marketing, and enrollment preparedness.
Data mining approach to predict academic performance of studentsBOHRInternationalJou1
Powerful data mining techniques are available in a variety of educational fields. Educational research is
advancing rapidly due to the vast amount of student data that can be used to create insightful patterns
related to student learning. Educational data mining is a tool that helps universities assess and identify student
performance. Well-known classification techniques have been widely used to determine student success in
data mining. A decisive and growing exploration area in educational data mining (EDM) is predicting student
academic performance. This area uses data mining and automaton learning approaches to extract data from
education repositories. According to relevant research, there are several academic performance prediction
methods aimed at improving administrative and teaching staff in academic institutions. In the put-forwarded
approach, the collected data set is preprocessed to ensure data quality and labeled student education data
is used to apply ANN classifiers, support vector classifiers, random forests, and DT Compute and train a
classifier. The achievement of the four classifications is measured by accuracy value, receiver operating curve
(ROC), F1 score, and confusion matrix scored by each model. Finally, we found that the top three algorithmic
models had an accuracy of 86–95%, an F1 score of 85–95%, and an average area under ROC curve of
OVA of 98–99.6%
Automated Data Integration, Cleaning and Analysis Using Data Mining and SPSS ...CSCJournals
Students’ performance plays major role in determining the quality of our education system. Sijil Pelajaran Malaysia (SPM) is a public examination compulsory to be taken by Form 5 students in Malaysia. The performance gap is not only a school and classroom issue but also a national issue that must be addressed properly. This study aims to integrate, clean and analysis through automated data mining techniques. Using data mining techniques is one of the processes of transferring raw data from current educational system to meaningful information that can be used to help the school community to make a right decision to achieve much better results. This proved DM provides means to assist both educators and students, and improve the quality of education. The result and findings in the study show that automated system will give the same result compare with manual system of integration and analysis and also could be used by the management to make faster and more efficient decision in order to map or plan efficient teaching approach for students in the future.
Data Mining Techniques in Higher Education an Empirical Study for the Univer...IJMER
Nowadays, ones of the biggest challenges that educational institutions face is the explosive
growth of educational data. and how to use these data to improve the quality of managerial decisions.
Data mining, as an analytical tools that can be used to extract meaningful knowledge from large data
sets, can be used to achieve this goal.
This paper addresses the applications of Educational Data Mining (EDM) to extract useful information
from registration information of student at university of Palestine in Gaza strip. The data include five
years period [2005-2011] by providing analytical tool to view and use this information for decision
making processes by taking real life example such as grade and GPA for the students. abstract should
summarize the content of the paper.
This paper introduces the competency models for Operations Manager, User Interface
Designer, and Application Developers. It will serve as a guide for Information Systems students
to identify which among the three of the offered tracks would be most suited for them to pursue
according to their knowledge, skills, values and interests. The Holland’s RIASEC model and the
Values Search model of Bronwyn and Holt were utilized to determine the most dominant interest
and most dominant values of the industry computing experts. Survey assessment forms were sent
to IT Operations Manager, User Interface Designer, and Application Developer. Most dominant
values and interests of industry computing experts were determined as well as the knowledge
and skills which are mostly required by the industry in their particular area. Based on the result
of the survey, it shows that application developer and user interface designer have a closely
related values. Thus a second round of a survey would be needed to come up with the most
exclusive dominant values for the particular information systems specialization track.
The Big data concept emerged to meet the growing demands in analysing large
volumes of fast moving, heterogeneous and complex data, which traditional data
analysis systems could not manage further. The application of big data technology
across various sectors of the economy has aided better utilization of multiple data
collated and hence decision making. Organizations no longer base operations on
assumptions or constructed models solely, but can make inferences from generated
data. Educational organizations are more efficient and the pedagogical processes
more effective, when multiple streams of data can be collated from the various
personnel and facilitators involved. This data when analysed, maximizes the
performance of administrators andrecipients alike. This paper looks at the
components and techniques in bigdata technology, and how it can be implemented in
the education system for effective administration and delivery
Higher education institutions now a days are operating in an increasingly complex and
competitive environment. The application of innovation is a must for sustaining its competitive advantage.
Institution leaders are using data management and analytics to question the status quo and develop effective
solutions. Achieving these insights and information requires not a single report from a single system, but
rather the ability to access, share, and explore institution-wide data that can be transformed into meaningful
insights at every level of the institution. Consequently, institutions are facing problems in providing necessary
information technology support for fulfilling excellence in performance. More specifically, the best practices
of big data management and analytics need to be considered within higher education institutions. Therefore,
the study aimed at investigating big data and analytics, in terms of: (1) definition; (2) its most important
principles; (3) models; and (4) benefits of its use to fulfill performance excellence in higher education
institutions. This involves shedding light on big data and analytics models and the possibility of its use in
higher education institutions, and exploring the effect of using big data and analytics in achieving performance
excellence. To reach these objectives, the researcher employed a qualitative research methodology for
collecting and analyzing data. The study concluded the most important result, that there is a significant
relationship between big data and analytics and excellence of performance as big data management and
analytics mainly aims at achieving tasks quickly with the least effort and cost. These positive results support
the use of big data and analytics in institutions and improving knowledge in this field and providing a practical
guide adaptable to the institution structure. This paper also identifies the role of big data and analytics in
institutions of higher education worldwide and outlines the implementation challenges and opportunities in the
education industry.
Predicting student performance in higher education using multi-regression modelsTELKOMNIKA JOURNAL
Supporting the goal of higher education to produce graduation who will be a professional leader is a crucial. Most of universities implement intelligent information system (IIS) to support in achieving their vision and mission. One of the features of IIS is student performance prediction. By implementing data mining model in IIS, this feature could precisely predict the student’ grade for their enrolled subjects. Moreover, it can recognize at-risk students and allow top educational management to take educative interventions in order to succeed academically. In this research, multi-regression model was proposed to build model for every student. In our model, learning management system (LMS) activity logs were computed. Based on the testing result on big students datasets, courses, and activities indicates that these models could improve the accuracy of prediction model by over 15%.
The recruitment of new personnel is one of the most essential business processes which affect the quality of
human capital within any company. It is highly essential for the companies to ensure the recruitment of
right talent to maintain a competitive edge over the others in the market. However IT companies often face
a problem while recruiting new people for their ongoing projects due to lack of a proper framework that
defines a criteria for the selection process. In this paper we aim to develop a framework that would allow
any project manager to take the right decision for selecting new talent by correlating performance
parameters with the other domain-specific attributes of the candidates. Also, another important motivation
behind this project is to check the validity of the selection procedure often followed by various big
companies in both public and private sectors which focus only on academic scores, GPA/grades of students
from colleges and other academic backgrounds. We test if such a decision will produce optimal results in
the industry or is there a need for change that offers a more holistic approach to recruitment of new talent
in the software companies. The scope of this work extends beyond the IT domain and a similar procedure
can be adopted to develop a recruitment framework in other fields as well. Data-mining techniques provide
useful information from the historical projects depending on which the hiring-manager can make decisions
for recruiting high-quality workforce. This study aims to bridge this hiatus by developing a data-mining
framework based on an ensemble-learning technique to refocus on the criteria for personnel selection. The
results from this research clearly demonstrated that there is a need to refocus on the selection-criteria for
quality objectives.
Due to the increasing interest in big data especially in the educational field and online education has led to a conflict in terms of performance indicators of the student. In this paper we discuss the methodology of assessing the student performance in terms of the success indicators revealing a number of indicators that is recommended to indicate success of the final academic achievement.
ow-a-days data volumes are growing rapidly in several domains. Many factors have contributed to this growth, including inter alia proliferation of observational devices, miniaturization of various sensors ,improved logging and tracking of systems, and improvements in the quality and capacity of both disk storage and networks .Analyzing such data provides insights that can be used to guide decision making. To be effective, analysis must be timely and cope with data scales. The scale of the data and the rates at which they arrive make manual inspection infeasible. As an educational management tool, predictive analytics can help and improve the quality of education by letting decision makers address critical issues such as enrollment management and curriculum Development. This paper presents an analytical study of this approach’s prospects for education planning. The goals of predictive analytics are to produce relevant information, actionable insight, better outcomes, and smarter decisions, and to predict future events by analyzing the volume, veracity, velocity, variety, value of large amounts of data and interactive exploration.
Due to the increasing interest in big data especially in the educational field and online education has led to a conflict in terms of performance indicators of the student. In this paper we discuss the methodology of assessing the student performance in terms of the success indicators revealing a number of indicators that is recommended to indicate success of the final academic achievement
DATA MINING FOR STUDENTS’ EMPLOYABILITY PREDICTIONCSEIJJournal
This study has been undertaken to predict the student employability.Assessing student employability
provides a method of integrating student abilities and employer business requirements, which is becoming
an increasingly important concern for academic institutions. Improving student evaluation techniques for
employability can help students to have a better understanding of business organizations and find the right
one for them. The data for the training classification models is gathered through a survey in which students
are asked to fill out a questionnaire in which they may indicate their abilities and academic achievement.
This information may be used to determine their competency in a variety of skill categories, including soft
skills, problem-solving skills and technical abilities and so on.The goal of this research is to use data
mining to predict student employability by considering different factors such as skills that the students have
gained during their diploma level and time duration with respect to the knowledge they have captured
when they expect the placement at the end of graduation. Further during this research most specific skills
with relevant to each job category also was identified. In this research for the prediction of the student
employability different data mining models such as such as KNN, Naive Bayer’s, and Decision Tree were
evaluated and out of that best model also was identified for this institute's student’s employability
prediction.So, in this research classification and association techniques were used and evaluated.
Data Mining Techniques for School Failure and Dropout SystemKumar Goud
Abstract: Data mining techniques are applied to predict college failure and bum of the student. This is method uses real data on middle-school students for prediction of failure and drop out. It implements white-box classification strategies, like induction rules and decision trees or call trees. Call tree could be a call support tool that uses tree-like graph or a model of call and their possible consequences. A call tree is a flowchart-like structure in which internal node represents a "test" on an attribute. Attribute is the real information of students that is collected from college in middle or pedagogy, each branch represents the outcome of the test and each leaf node represents a class label. The paths from root to leaf represent classification rules and it consists of three kinds of nodes which incorporates call node, likelihood node and finish node. It is specifically used in call analysis. Using this technique to boost their correctness for predicting which students might fail or dropout (idler) by first, using all the accessible attributes next, choosing the most effective attributes. Attribute choice is done by using WEKA tool.
Keywords: dataset, classification, clustering.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...Editor IJCATR
Higher learning institutions nowadays operate in a more complex and competitive due to a high demand from prospective
students and an emerging increase of universities both public and private. Management of Universities face challenges and concerns of
predicting students’ academic performance in to put mechanisms in place prior enough for their improvement. This research aims at
employing Decision tree and K-means data mining algorithms to model an approach to predict the performance of students in advance
so as to devise mechanisms of alleviating student dropout rates and improve on performance. In Kenya for example, there has been
witnessed an increase student enrolling in universities since the Government started free primary education. Therefore the Government
expects an increased workforce of professionals from these institutions without compromising quality so as to achieve its millennium
development and vision 2030. Backlog of students not finishing their studies in stipulated time due to poor performance is another
issue that can be addressed from the results of this research since predicting student performance in advance will enable University
management to devise ways of assisting weak students and even make more decisions on how to select students for particular courses.
Previous studies have been done Educational Data Mining mostly focusing on factors affecting students’ performance and also used
different algorithms in predicting students’ performance. In all these researches, accuracy of prediction is key and what researchers
look forward to try and improve.
Similar to A Data Mining Approach To Construct Graduates Employability Model In Malaysia (20)
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
A Data Mining Approach To Construct Graduates Employability Model In Malaysia
1. A Data Mining Approach to Construct
Graduates Employability Model in Malaysia
Myzatul Akmam Sapaat, Aida Mustapha, Johanna Ahmad, Khadijah Chamili,
Rahamirzam Muhamad
Faculty of Computer Science and Information Technology, Universiti Putra Malaysia,
43400 UPM Serdang, Selangor, Malaysia
{angahmyz@yahoo.com, aida@fsktm.upm.edu.my, anna_lee207@yahoo.co.uk,
khadijah@usim.edu.my, raha_muhd@yahoo.com}
ABSTRACT
This study is to construct the Graduates
Employability Model using classification
task in data mining. To achieve it, we use
data sourced from the Tracer Study, a web-
based survey system from the Ministry of
Higher Education, Malaysia (MOHE) for the
year 2009. The classification experiment is
performed using various Bayes algorithms
to determine whether a graduate has been
employed, remains unemployed or in an
undetermined situation. The performance of
Bayes algorithms are also compared against
a number of tree-based algorithms.
Information Gain is also used to rank the
attributes and the results showed that top
three attributes that have direct impact on
employability are the job sector, job status
and reason for not working. Results showed
that J48, a variant of decision-tree algorithm
performed with highest accuracy, which is
92.3% as compared to the average of 91.3%
from other Bayes algorithms. This leads to
the conclusion that a tree-based classifier is
more suitable for the tracer data due to the
information gain strategy.
KEYWORDS
Classification, Bayes Methods, Decision
Tree, Employability
1 INTRODUCTION
Tracer Study is a web-based survey
system developed by the Ministry of
Higher Education, Malaysia (MOHE). It
is compulsory to be filled by all students
graduating from polytechnics, public or
private institutions before their
convocation for any level of degree
awarded. The sole purpose of the survey
is to guide future planning and to
improve various aspects of local higher
education administrative system. The
survey also serves as a tool to gauge the
adequacy of higher education in
Malaysia in supplying manpower needs
in all areas across technical, managerial
or social science. Data sourced from the
Tracer Study is invaluable because it
provides correlation about the graduate
qualifications and skills along with
employment status.
Graduates employability remains as
national issues due to the increasing
number of graduates produced by higher
education institutions each year.
According to statistics generated from
the Tracer Study, total number of
graduates produced by higher
institutions in 2008 is 139,278. In 2009,
the volume has increased to 155,278
graduates. Taking this into
consideration, 50% of graduates in 2009
1086
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
2. are bachelor holder from public and
private universities. Only 49.20% or
38,191 of them successfully employed
within the first six months after finishing
their studies. Previous research on
graduate employability covers wide
range of domain such as education,
engineering, and social science. While
the researches are mainly based on
surveys or interviews, little has been
done using data mining techniques.
Bayes’ theorem is among the earliest
statistical method that is used to identify
patterns in data. But as datasets have
grown in size and complexity, data
mining has emerged as a technology to
apply methods such as neural networks,
genetic algorithms, decision trees, and
support vector machines to uncover
hidden patterns [1]. Today, data mining
technologies are dealing with huge
amount of data from various sources, for
example relational or transactional
databases, data warehouse, images, flat
files or in the form World Wide Web.
Classification is the task of
generalizing observations in the training
data, which are accompanied by specific
class of the observations. The objective
of this paper is to predict whether a
graduate has been employed, remains
unemployed or in an undetermined
situation within the first six months after
graduation. This will be achieved
through a classification experiment that
classifies a graduate profile as employed,
unemployed or others. The main
contribution of this paper is the
comparison of classification accuracy
between various algorithms from the two
most commonly used data mining
techniques in the education domain,
which are the Bayes methods and
decision trees.
The remainder of this paper is
organized as follows. Section 2 presents
the related works on graduate
employability and reviews recent
techniques employed in data mining.
Section 3 introduces the dataset and the
experimental setting. Section 4 discusses
finding of the results. Finally Section 5
concludes the paper with some direction
for future work.
2 RELATED WORK
A number of works have been done to
identify the factors that influenced
graduates employability in Malaysia. It
is as an initiative step to align the higher
education with the industry, where
currently exists unquestionable impact
against each other. Nonetheless, most of
the previous works were carried out
beyond the data mining domain.
Besides, data sources for previous works
were collected and assembled through
survey in sample population.
Research in [2] identifies three major
requirements concerned by the
employers in hiring employees, which
are basic academic skills, higher order
thinking skills, and personal qualities.
The work is restricted in the education
domain specifically analyzing the
effectiveness of a subject, English for
Occupational Purposes (EOP) in
enhancing employability skills. Similar
to [2], work by [3] proposes to
restructure the curriculum and methods
of instruction in preparing future
graduates for the forthcoming challenges
based on the model of the T-shaped
professional and newly developed field
1087
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
3. of Service Science, Management and
Engineering (SSME).
More recently, [4] proposes a new
Malaysian Engineering Employability
Skills Framework (MEES), which is
constructed based on requirement by
accrediting bodies and professional
bodies and existing research findings in
employability skills as a guideline in
training package and qualification in
Malaysia. Nonetheless, not surprisingly,
graduates employability is rarely being
studied especially within the scope of
data mining, mainly due to limited and
authentic data source available.
Employability issues have also been
taken into consideration in other
countries. Research by The Higher
Education Academy with the Council for
Industry and Higher Education (CIHE)
in United Kingdom concluded that there
are six competencies that employers
observe in individual who can transform
the organizations and add values in their
careers [5]. The six competencies are
cognitive skills or brainpower, generic
competencies, personal capabilities,
technical ability, business or
organization awareness and practical
elements. Furthermore, it covers a set of
achievements comprises skills,
understandings and personal attributes
that make graduates more likely to gain
employment and successful in their
chosen occupations which benefits the
graduates, the community and also the
economy.
However, data mining techniques
have indeed been employed in education
domain, for instance in prediction and
classification of student academic
performance using Artificial Neural
Network [6, 7] and a combination of
clustering and decision tree classification
techniques [6]. Experiments in [8]
classifies students to predict their final
grade using six common classifiers
(Quadratic Bayesian classifier, 1-nearest
neighbour (1-NN), k-nearest neighbor
(k-NN), Parzen-window, multilayer
perceptron (MLP), and Decision Tree).
With regards to student performance, [9]
discovers individual student
characteristics that are associated with
their success according to grade point
averages (GPA) by using a Microsoft
Decision Trees (MDT) classification
technique. [10] has shown some
applications of data mining in
educational institution that extract useful
information from the huge data sets.
Data mining through analytical tool
offers user to view and use current
information for decision making process
such as organization of syllabus,
predicting the registration of students in
an educational program, predicting
student performance, detecting cheating
in online examination as well as
identifying abnormal/erroneous values.
Among the related work, we found
that work done by [11] is most related to
this research, whereby the work mines
historical data of students' academic
results using different classifiers (Bayes,
trees, function) to rank influencing
factors that contribute in predicting
student academic performance.
3 MATERIALS AND METHODS
The main objective of this paper is to
classify a graduate profile as employed,
unemployed or undetermined using data
sourced from the Tracer Study database
for the year of 2009. The dataset consists
1088
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
4. of 12,830 instances and 20 attributes
related to graduate profiles from 19
public universities and 138 private
universities. Table 1 shows the complete
attributes for the Tracer Study dataset.
To construct the classifiers, we use
the Waikato Environment for
Knowledge Analysis (WEKA), an open-
source data mining tool [12] which was
developed at University of Waikato New
Zealand. It provides various learning
algorithm that can be easily
implemented to the dataset. WEKA only
accepts dataset in Attribute-Relation File
Format (ARFF) format. Therefore, once
the data preparation being done, we
transform the dataset into ARFF file
with extension of .arff.
1089
nternational Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
5. Table 1. Attributes from the Tracer Study dataset after the pre-processing is performed.
No. Attributes Values Descriptions
1 sex {male, female} Gender of the graduate
2 age {20-25, 25-30, 30-40, 40-50, >50} Age of the graduate
3 univ {public_univ, private_univ} University/institution of current
qualification
4 level {certificate, diploma,
advanced_diploma, first_degree,
postGraduate_diploma, masters_ thesis,
masters_courseWork& Thesis,
masters_courseWork, phd_ thesis,
phd_courseWork&Thesis, professional}
Level of study for current
qualification
5 field {technical, ict, education, science,
art&soc_science }
Field of study for current qualification
6 cgpa {2.00-2.49, 2.50-2.99, 3.00-3.66, 3.67-
4.00, failed, 4.01-6.17}
CGPA for current qualification
7 emp_status {employed, unemployed, others} Current employment status
8 general_IT skills {satisfied, extremely_satisfied, average,
strongly_not_satisfied, not_satisfied,
not_applicable}
Level of IT skills, Malay and English
language proficiency, general
knowledge, interpersonal
communication, creative and critical
thinking, analytical skills, problem
solving, inculcation of positive values,
and teamwork acquired from the
programme of study
9 Malay_lang
10 English_lang
11 gen_knowledge
12 interpersonal_
comm
13 cc_thinking
14 analytical
15 prob_solving
16 positive_value
17 teamwork
18 job_status {permanent, contract, temp, self_
employed, family_business}
Job status of employed graduates
19 job_sector {local_private_company, multinational_
company, own_company, government,
NGO, GLC, statutory_body, others}
Job sector of employed graduates
20 reason_not_
working
{job_hunting, waiting_for_ posting,
further_study, participating_skills_
program, waiting_posting_of_study,
unsuitable_job, resting, others, family_
responsibilities, medical_ issues, not_
interested_to_work,
not_going_to_work,
lack_of_confidence, chambering}
Reason for not working for
unemployed graduates
3.1 Data-Preprocessing
The raw data retrieved from the Tracer
Study database required pre-processing
to prepare the dataset for the
classification task. First, cleaning
activities involved eliminating data with
missing values in critical attributes,
identifying outliers, correcting
inconsistent data, as well as removing
duplicate data. From the total of 89,290
instances in the raw data, the data
cleaning process ended up 12,830
instances that are ready to be mined. For
1090
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
6. missing values (i.e., age attribute), we
replaced them with the mean values of
the attribute.
Second, data discretization is
required due to the fact that most of
attributes from the Tracer Study are
continuous attributes. In this case, we
discretized the values into interval so as
to prepare the dataset into categorical or
nominal attributes as below.
cgpa previously in continuous
number is transformed into grade
range
sex previously coded as 1 and 2 is
transformed into nominal
age previously in continuous number
is transformed into age range
field of study previously in
numerical code 1-4 is transformed
into nominal
skill information (i.e., language
proficiency, general knowledge,
interpersonal communication etc)
previously in numerical 1-9 is
transformed into nominal
employment status previously in
numerical code 1-3 is transformed
into nominal
3.2 Classification Task
The classification task at hand is to
predict the employment status
(employed, unemployed, others) for
graduate profiles in the Tracer Study.
The task is performed in two stages,
training and testing. Once the classifier
is constructed, testing dataset is used to
estimate the predictive accuracy of the
classifier.
There are four types of testing option
in WEKA, which are using the training
set, supplied test set, cross validation and
percentage split. If we use training set as
the test option, the test data will be
sourced from the same training data,
hence this will decrease reliable estimate
of the true error rate. Supplied test set
permit us to set the test data which been
prepared separately from the training
data. Cross-validation is suitable for
limited dataset whereby the number of
fold can be determined by user. 10-fold
cross validation is widely use to get the
best estimate of error. It has been proven
by extensive test on numerous datasets
with different learning techniques [13].
With a number of dataset and to avoid
overfitting, we employed hold-out
validation method with 70-30 percentage
split, whereby 70% out of the 12,830
instances is used for training while the
remaining instances are used for testing.
Various algorithms from both Bayes and
decision tree families are used in
predicting the accuracy of the
employment status.
Information Gain. Information Gain is
an attribute selection measure uses in
ID3. If node N represents tuples of
partition D, attribute with highest
information gain will be chosen as
splitting attribute for node N. It resulted
towards minimizing number of tests
needed to classify a given tuples as well
as guarantees that a simple tree is found.
The expected information needed to
classify a tuple in D is given by
m
Info(D) = - ∑ pi log2(pi)
i=1
1091
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
7. Bayes Methods. In Bayes methods, the
classification task consists of classifying
a class variable, given a set of attribute
variables. It is a type of statistical in
which the prior distribution is estimated
from the data before any new data are
observed, hence every parameter is
assigned with a prior probability
distribution [14]. A Bayesian classifier
learns from the samples over both class
and attribute variables.
The naïve Bayesian classifier works
as follows: Let D be a training set of
tuples and their associated class labels.
As usual, each tuple is represented by an
n-dimensional attribute vector, X = (x1,
x2, …, xn), depicting n measurements
made on the tuple from n attributes,
respectively, A1, A2, … , An.
Suppose that there are m classes, C1,
C2, …, Cm. Given a tuple, X, the
classifier will predict that X belongs to
the class having the highest posterior
probability, conditioned on X. That is,
the naïve Bayesian classifier predicts
that tuple X belongs to the class Ci if and
only if
P(Ci|X) > P(Cj|X) for 1 ≤ j ≤ m; j ≠ i
Thus, we maximize P(Ci|X). The class Ci
for which P(Ci|X) is maximized is called
the maximum posteriori hypothesis.
Under the Bayes method in WEKA, we
performed the experiment with eight
algorithms, which are Averaged One-
Dependence Estimators (AODE),
AODEsr, WAODE, Bayes Network,
HNB, Naïve Bayesian, Naïve Bayesian
Simple and Naïve Bayesian Updateable.
AODE, HNB and Naïve Bayesian was
also used in [11] and the rest algorithms
were chosen to further compare the
results from the Bayes algorithm
experiment using the same dataset.
AODE algorithm achieved the
highest accuracy percentage averaging
all of smaller searching-space in
alternative naive Bayes-like models that
have weaker and hence less detrimental
independence assumptions than naive
Bayes. The resulting algorithm is
computationally efficient while
delivering highly accurate classification
on many learning tasks. AODEsr and
WAODE are expended from AODE.
AODEsr complement AODE with
Subsumption Resolution, which is
capable to detect specializations between
two attribute values at classification time
and deletes the generalization attribute
value.
Meanwhile, WAODE constructs the
model called Weightily Averaged One-
Dependence Estimators by assigning
weight to each dataset. Bayes Network
learning using various search algorithms
and quality measures. HNB constructs
Hidden Naive Bayes classification
model with high classification accuracy
and AUC. In Naive Bayes, numeric
estimator precision values are chosen
based on analysis of the training data.
The Naïve Bayes Updateable classifier
will use a default precision of 0.1 for
numeric attributes when build classifier
is called with zero training instances.
Naive Bayes Simple modeled numeric
attributes by a normal distribution.
Tree Methods. Tree-based methods
classify instances by sorting the
instances down the tree from the root to
some leaf node, which provides the
classification of a particular instance.
Each node in the tree specifies a test of
1092
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
8. some attribute of the instance and each
branch descending from that node
corresponds to one of the possible values
for this attribute [15]. Figure 1 shows the
model produced by decision trees, which
is represented in the form of tree
structure.
Under the tree method in WEKA, we
performed the classification experiment
with nine algorithms, which are ID3,
J48, REPTree, J48graft, Random Tree,
Decision Stump, LADTree, Random
Forest and Simple Cart. J48 and
REPTree was also used in [11], but we
did not managed to use NBTree and
BFTree because the experiment worked
on large amount of datasets, thus
incompatible with the memory allocation
in WEKA. FT, User Classifier and LMT
algorithm also experienced the same
problem as NBTree and BFTree. In
addition, we employed ID3, J48graft,
Random Tree, Decision Stump, LAD
Tree, Random Forest and Simple Cart to
experiment with other alternative
algorithms in decision tree.
Figure 1. In a tree structure, each node denotes a
test on an attribute value, each branch represents
an outcome of the test, and tree leaves represent
classes or class distributions. A leaf node
indicates the class of the examples. The instances
are classified by sorting them down the tree from
the root node to some leaf node.
ID3 is a class for constructing an
unpruned decision tree based on the ID3
algorithm, which only deals with
nominal attributes. J48 is a class for
generating a pruned or unpruned C4.5
decision tree while J48 grafted generates
a grafted (pruned or unpruned) C4.5
decision tree. REPTree is fast decision
tree learner which builds a decision/
regression tree using information gain/
variance and prunes it using reduced-
error pruning (with backfitting).
Decision stump is usually being used in
conjunction with a boosting algorithm. A
multi-class alternating decision tree is
generated in LADTree using the
LogitBoost strategy. Random Forest
constructs a forest of random trees
whereas Random Tree constructs a tree
that considers K randomly chosen
attributes at each node without pruning.
SimpleCart implements minimal cost-
complexity pruning.
4 RESULTS AND DISCUSSIONS
We segregated the experimental results
into three parts. The first is the result
from ranking attributes in the Tracer
Study dataset using the Information
Gain. The second and third parts
presents the predictive accuracy results
by various algorithms from the Bayes
method and decision tree families,
respectively.
4.1 Information Gain
In this study, we employed Information
Gain to rank the attributes in
determining the target values as well as
to reduce the size of prediction. Decision
set of possible
answers
leaf leaf
root
node
set of possible
answers
1093
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
9. tree algorithms adopt a mutual-
information criterion to choose the
particular attribute to branch on that gain
the most information. This is inherently
a simple preference bias that explicitly
searches for a simple hypothesis.
Ranking attributes also increases the
speed and accuracy in making
prediction. Based on the attribute
selection using the Information Gain, the
job sector attribute was found the most
important factor in discriminating the
graduate profiles to predict the
graduate’s employment status. This is
shown in Figure 2.
Figure 2. Job sector is ranked the highest by attribute selection based on Information Gain. This is largely
because the attribute has small set of values, thus one instance is easily distinguishable than the remaining
instances.
4.2 Bayes Methods
Table 2 shows the classification
accuracies for various algorithms under
Bayes method. In addition, the table
provides comparative results for the
kappa statistics, mean absolute error,
root mean squared error, relative
absolute error, and root relative squared
error from the total of 3,840 testing
instances.
1094
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
10. The Weightily Averaged One-
Dependence Estimators (WAODE)
algorithm achieved the highest accuracy
percentage as compared to other
algorithms. Despite treating each tree
augmented naive Bayes equally, [16]
have extended AODE by assigning
weight for each tree augmented naive
Bayes differently as the facts that each
attributes do not play the same role in
classification.
Table 2. Classification accuracy using various algorithms under Bayes method in WEKA.
Algorithm Accurac
y (%)
Error
Rate
(%)
Kappa
Statistic
s
Mean
Absolut
e Error
Root
Mean
Squared
Error
Relative
Absolut
e Error
(%)
Root
Relative
Squared
Error
(%)
WAODE 91.3 8.7 0.834 0.073 0.203 20.8 48.4
AODE 91.1 8.9 0.827 0.069 0.208 19.5 49.6
Naïve
Bayesian
90.9 9.1 0.825 0.072 0.214 20.5 51.3
Naïve Bayes
simple
90.9 9.1 0.825 0.072 0.214 20.5 51.3
BayesNet 90.9 9.1 0.824 0.072 0.215 20.5 51.4
AODEsr 90.9 9.1 0.824 0.071 0.210 20.1 50.2
Naïve Bayes
Updateable
90.9 9.1 0.825 0.072 0.214 20.5 51.3
HNB 90.3 9.7 0.816 0.091 0.214 25.7 51.1
4.3 Tree Methods
Table 3 shows the classification
accuracies for various algorithms under
tree method. In addition, the table
provides comparative results for the
kappa statistics, mean absolute error,
root mean squared error, relative
absolute error, and root relative squared
error from the total of 3,840 testing
instances.
Table 3. Classification accuracy using various algorithms under Tree method in WEKA.
Algorithm Accuracy
(%)
Error
Rate
(%)
Kappa
Statistics
Mean
Absolute
Error
Root Mean
Squared
Error
Relative
Absolute
Error (%)
Root Relative
Squared Error
(%)
J48Graft 92.3 7.7 0.849 0.078 0.204 22.1 48.7
J48 92.2 7.8 0.848 0.078 0.204 22.2 48.8
Simple Cart 92.0 8.0 0.844 0.079 0.199 22.3 47.5
Random Forest 91.4 8.6 0.832 0.083 0.205 23.4 49.1
LAD Tree 91.3 8.7 0.830 0.077 0.197 22.0 47.0
1095
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
11. REPTree 91.0 9.0 0.825 0.080 0.213 22.8 50.9
Decision Stump 91.0 9.0 0.821 0.108 0.232 30.6 55.3
RandomTree 88.9 11.1 0.787 0.081 0.269 23.0 64.4
ID3 86.7 13.3 0.795 0.072 0.268 21.1 65.2
The J48Graft algorithm achieved the
highest accuracy percentage as
compared to other algorithms. J48Graft
generates a grafted C4.5 decision tree,
whether pruned or unprunned. Grafting
is an inductive process that adds nodes
to the inferred decision tree. Unlike
pruning that uses only information as the
tree grows, grafting uses non-local
information to provide better predictive
accuracy. Figure 3 shows the difference
of tree structure in a J48 tree as well as
the grafted J48 tree.
Figure 3. The top figure is the tree structure for
J48 and the bottom figure is the tree structure for
grafted J48. Grafting adds nodes to the decision
trees to increase the predictive accuracy. In the
grafted J48, new branches are added in the place
of a single leaf or graft within leaves.
Comparing the performance of both
Bayes and tree-based methods, the
J48Graft algorithm achieved the highest
accuracy of 92.3% using the Tracer
Study dataset. The second highest
accuracy is also under Tree method,
which is J48 algorithm with an accuracy
of 92.2%. Bayes method only falls to
number three using WAODE algorithm
with prediction accuracy of 91.3%.
Nonetheless, we found that both
classification approaches were
complementary because the Bayes
methods provide better view of
association or dependencies among the
attributes while the results from the tree
method are easier to interpret.
Figure 4 shows the mapping of root
mean squared error values that resulted
from the classification experiment. This
knowledge could be used in getting
insights on the employment trend of
graduates from local higher institutions.
1096
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
12. 0
0.05
0.1
0.15
0.2
0.25
0.3
1
2
3
4
5
Bayes Methods
Tree-based Methods
AODE vs.
J48Graft
Naïve
Bayesian
Naïve Bayes
Simple vs.
REPTree
BayesNet
vs.
RandomTr
HNB
vs. ID3
Figure 4. A radial display of the root mean squared error across all algorithms under both Bayes and tree-
based methods relative to accuracy. The smaller the mean squared error, the better is the forecast. Based on
this figure, three out of five tree-based algorithms indicate better forecast as compared to the corresponding
algorithms under the Bayes methods.
6 CONCLUSIONS
As the education sector blooms every
year, graduates are facing stiff
competitions to ensure their
employability in the industry. The sole
purpose of the Tracer Study system is to
aid the higher educational institutions in
preparing their graduates with sufficient
skills to enter the job market. This paper
focussed on identifying attributes that
influenced graduates’ employability
based on actual data from the graduates
themselves after six month of
graduation. Nonetheless, assembling the
dataset was difficult because only 90%
of the attributes made their way to the
classification task. This is due to
confidentiality and sensitivity issues,
hence the remaining 10% of the
attributes are not permitted by the data
owner.
This paper attempts to predict
whether a graduate has been employed,
remains unemployed or in an
undetermined situation within the first
six months after their graduation. The
prediction has been performed through a
series of classification experiments using
various algorithms under Bayes and
decision methods to classify a graduate
profile as employed, unemployed or
others. Results showed that J48, a
variant of decision-tree algorithm
yielded the highest accuracy, which is
92.3% as compared to the average of
91.3% across other Bayes algorithms.
As for future work, we are hoping to
expand the dataset from the Tracer Study
with more attributes and to annotate the
attributes with information like
correlation factor between the current
employer and the previous employer.
We are also looking at integration
dataset from different sources of data,
1097
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)
13. for instance graduate profiles from the
alumni organization in the respective
educational institutions. Having this,
next we plan to introduce clustering as
part of pre-processing to cluster the
attributes before attribute ranking is
performed. Finally, other data mining
techniques such as anomaly detection or
classification-based association may be
implemented in order to gain more
knowledge on the graduates
employability in Malaysia.
Acknowledgments. Special thanks to
Prof. Dr. Md Yusof Abu Bakar and Puan
Salwati Badaroddin from Ministry of
Higher Education Malaysia (MOHE) for
their help with data gathering as well as
expert opinion.
7 REFERENCES
1. Han, J., Kamber, M.: Data Mining: Concepts
and Techniques. Morgan Kaufman (2006)
2. Shafie, L.A, Nayan, S.: Employability
Awareness among Malaysian
Undergraduates. International Journal of
Business and Management, 5(8):119--123
(2010)
3. Mukhtar, M., Yahya, Y., Abdullah, S.,
Hamdan, A.R., Jailani, N., Abdullah, Z.:
Employability and Service Science: Facing
the Challenges via Curriculum Design and
Restructuring. In: International Conference
on Electrical Engineering and Informatics,
pp. 357--361 (2009)
4. Zaharim, A., Omar, M.Z., Yusoff, Y.M.,
Muhamad, N., Mohamed, A., Mustapha, R.:
Practical Framework of Employability Skills
for Engineering Graduate in Malaysia. In:
IEEE EDUCON Education Engineering
2010: The Future Of Global Learning
Engineering Education, pp. 921--927 (2010)
5. Rees, C., Forbes, P., Kubler, B.: Student
Employability Profiles: A Guide for Higher
Education Practitioners (2006)
6. Wook, M., Yahaya, Y.H., Wahab, N., Isa,
M.R.M.: Predicting NDUM Student’s
Academic Performance using Data Mining
Techniques. In: Second International
Conference on Computer and Electrical
Engineering, pp. 357--361 (2009)
7. Ogor, E.N.: Student Academic Performance
Monitoring and Evaluation Using Data
Mining Techniques. In: Fourth Congress of
Electronics, Robotics and Automotive
Mechanics, pp. 354--359 (2007)
8. Minaei-Bidgoli, B., Kashy, D.A.,
Kortemeyer, G., Punch, W.F.: Predicting
Student Performance: An Application of Data
Mining Methods with an Educational Web-
based System. In: 33rd Frontiers in Education
Conference, pp. 13--18 (2003)
9. Guruler, H., Istanbullu, A., Karahasan, M.: A
New Student Performance Analysing System
using Knowledge Discovery in Higher
Educational Databases. Computers &
Education. 55(1), pp 247--254 (2010)
10. Kumar, V., Chadha, A.: An Empirical Study
of the Applications of Data Mining
Techniques in Higher Education,
International Journal of Advanced Computer
Science and Applications, Vol. 2, No.3,
March 2011, pp 80-84 (2011)
1098
16. L. Jiang, H. Zhang: Weightily Averaged One-
Dependence Estimators. In: Proceedings of
the 9th Biennial Pacific Rim International
Conference on Artificial Intelligence,
PRICAI 2006, pp 970-974 (2006)
15. Mitchell, T.: Machine Learning. McGraw
Hill, New York (1997)
14. Jaynes, E.T.: Probability Theory: The Logic
of Science. Cambridge University Press
(2003)
13. Ian H. Witten, Eibe Frank:Data Mining :
Practical Machine Learning Tools and
Techniques, Morgan Kaufmann (2005)
12. Hall, M., Frank, E., Holmes, G., Pfahringer,
B., Reutemann, P., Witten, I.H.: The WEKA
Data Mining Software: An Update; SIGKDD
Explorations, Volume 11, Issue 1 (2009)
11. Affendey, L.S., Paris, I.H.M., Mustapha, N.,
Sulaiman, M.N., Muda, Z.: Ranking of
Influencing Factors in Predicting Student
Academic Performance. Information
Technology Journal. 9(4):832--837 (2010)
International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(4): 1086-1098
The Society of Digital Information and Wireless Communications, 2011 (ISSN: 2220-9085)