1. Analysis of Results of Final Semester of B.E. (Civil Engineering),
2014 of Purbanchal University using Weka Tool: A mini Research
Raj Kumar Thakur
Associate Professor( Computer)
Purbanchal University School of Engineering and Technology
Biratnagar, Nepal
Abstract - Data mining has been used as a very important tool in many areas of research, industry
and business. This paper focuses on the application of data mining tool in educational domain to
analyze the results of final semester examination to find out causes in the form of precise rules that
technically controls final examination results. Once these rules have been found out, management
control measures can be developed and implemented to improve examination results.
Keywords- Data Mining, Business Intelligence, WEKA, Data Visualization, Classification
1. Introduction
With the opening of more universities in Nepal and neighboring countries India and China,
admissions in any public universities and educational institutions are likely to face imminent
admission in crisis in near future. Nevertheless, number of admissions in B. E. degrees, especially
Civil Engineering has been rising consistently over the past few years. A University would
always like to see to it that not only quality education is being provided by all colleges run under
it but also that passing percentage is also as high as possible specially in case of final semester
students. As a great number of students is being admitted in B. E. Civil Engineering Program run
in different colleges of Purbanchal University all over Nepal, it is extremely essential that
students, especially of the final semester study hard and most hopefully all of them pass their
final semester examination. However, final examination results of B.E. programs are not
encouraging.
Data mining is a very powerful tool used for the extraction of hidden predictive information
from large databases and has a great potential to help educational institutes focus on the most
important information in the data they have generated. Data mining techniques need to be applied
to determine precise rules controlling the final examination results.
With the help of data mining techniques, such as classification it is possible to discover the key
decision rules from the final examination results of students and possibly use those rules to figure
out the key course(s) that controls whether the student passes or fails. This paper presents
classification based on J48 algorithm as a simple and efficient tool to analyze the final
Examination results of B. E. (Civil Engineering) of Purbanchal University, Nepal.
2. METHODOLOGY
The study followed the steps suggested by Fayyad, Piatetsky-Shapiro, and Smyth (1996)
for the knowledge discovery process: data selection, data pre-processing and cleanup, data
transformation, data mining, data interpretation, and the evaluation of results. Among the
available data mining techniques, decision tree, J8 algorithm which is an extension of ID3
algorithm creates a small tree representing rules that provides extremely valuable insight as
regards the classification and prediction of data.
Algorithm used by ID3 is as follows
2. Algorithm
function ID3
Input: (R: a set of non-target attributes,
C: the target attribute,
S: a training set) returns a decision tree;
begin
If S is empty, return a single node with value failure;
If S consists of records all with the same
value for the target attribute,
return a single leaf node with that value;
If R is empty, then return a single node with
the value of the most frequent of the values of the
target attribute that are found in records of S; [in that case
there may be errors, examples that will be improperly classified];
Let A be the attribute with largest Gain(A,S) among attributes in R;
Let {Aj | j=1,2, .., m} be the values of attribute A;
Let {Sj | j=1,2, .., m} be the subsets of S consisting
respectively of records with value aj for A;
Return a tree with root labeled A and arcs
labeled a1, a2, .., am going respectively
to the trees (ID3(R-{A}, C, S1), ID3(R-{A}, C, S2),
.....,ID3(R-{A}, C, Sm);
Recursively apply ID3 to subsets {Sj | j=1,2, .., m}
until they are empty
end.
2.1Building the model
A.Data Collection
MS Excel sheets containing Final Examination results of the Eighth semester of B.E.
( Civil Engineering) of the year2014 of seven colleges under Purbanchal University
were obtained from the Examination Office of Purbanchal University. Total number of
students contained in the Excel examination result sheet were 474. All subjects taught,
internal assessment marks in theory, practical, and final examination and grade obtained
in each course, total marks obtained, SGPA and the final result of every student were
taken into account.
B. Tools Used
To apply the classification algorithm, we used WEKA toolkit , a widely used software
for data mining that was developed at the University of Waikato in New Zealand. This
3. toolkit provides a wide range of different data mining algorithms implemented in JAVA.
It has been widely used in educational data mining researches and for teaching purposes.
C. Data Preparation and Pre-Processing
During this phase, some pre-processing for the collected data was applied to prepare
it for the mining techniques. At first, some irrelevant attributes, e.g. S. No., Exam Roll
No. were eliminated. Excel sheets were converted into arff files so as to subject them to
analysis by WEKA tool.
E. Classification
In this research, the aim was to find out the causes and the decision rules that
controlled the result whether the student passed or failed. Classification technique was
used because the objective of classification techniques in educational data mining is to
identify what are the important factors that contribute to categorizing students’ results.
Decision trees are the most popular classification technique in data mining. They
represent the group of classification rules in a tree form, and they have several
advantages over other techniques as stated in [1]:
The simplicity of its presentation makes them easy to understand
They can work for different types of attributes, nominal or numerical
They can classify new examples fast.
One of the earliest decision tree algorithms is the C4.5 tree developed by Ross Quinlan
[2]. The basic idea of this tree is to build trees from a group of training data using the
concept of information entropy [3]. J48 is an open source Java implementation of the
C4.5 algorithm in the WEKA. We chose this algorithm after proving its capabilities to
handle educational dataset and provide a high accuracy results as mentioned in[4], [5]-
[6].
F. Results and Discussion
.
Classification technique using J48 Tree algorithm was applied onto data of the
examination result of one of the colleges, here named as as first dataset in MS Excel sheet
in Fig, 1 and its ARFF file in Fig, 2, and the results of analysis obtained are shown in
Figure 3. Here, the examination results of 76 students corresponding to the college were
initially used as the training data .
4. Fig. 1 : First Dataset corresponding to a college
7. The result of classification shows that out of a total of 76 data instances, each data instance being
the details of marks obtained in all subjects by a particular student along with grades obtained and
the final result, J48 algorithm of classification predicted results correctly for 75 data instances.
That is, there is only one incorrectly classified data instance. The visualized tree obtained by the
WEKA Classifier Tree Visualizer is shown in Fig.4. A total of 12 students who got the grade B
in the course Hydropower Engineering passed their examination and a total of 63 students getting
grade F in the same subject failed in the examination.
So, for the college 76 students results were considered for the analysis, the course that controlled
their results is Hydropower Engineering. Decision rules that controlled the attribute RESULT are
given below.
=== Classifier model (full training set) ===
J48 pruned tree
------------------
HYDROPOWER-ENGINEERING-GRADE = A: Failed (0.0)
HYDROPOWER-ENGINEERING-GRADE = B: Passed (13.0/1.0)
HYDROPOWER-ENGINEERING-GRADE = C: Failed (0.0)
HYDROPOWER-ENGINEERING-GRADE = D: Failed (0.0)
HYDROPOWER-ENGINEERING-GRADE = F: Failed (63.0)
HYDROPOWER-ENGINEERING-GRADE = I: Failed (0.0)
8. Fig. 4: J48 Tree visualized for the first dataset
Upon using the sample data which was kept same as the training data except that the
result data were kept blanks and represented in the arff file as ?, the same algorithm could
give exactly the same examination results.
The predicted result is shown in Fig. 5.
9. Fig. 5: Predicted result of first dataset in ARFF
Conducting classification using the same J48 algorithm with dataset for other five colleges also
showed that the Course Hydropower Engineering controlled the results of students.
However, for one of the colleges, classification using the same J48 Tree algorithm showed the
results given in Fig. 7.
10. Fig. 7: Analysis output of the dataset corresponding to the seventh college.
In the classification for the final examination of the Eighth semester of the seventh college,
results of all 75 students were correctly classified, that is, the accuracy of classification was
found to be 100%. And the visualized tree for the seventh college is shown in Fig. 8.
11. Fig. 8: J48 Tree visualized for the dataset corresponding to the seventh college
The rules that controlled classification for the seventh college are:
=== Classifier model (full training set) ===
J48 pruned tree
------------------
HYDROPOWER-ENGINEERING-FINAL-TH <= 27: Failed (16.0)
HYDROPOWER-ENGINEERING-FINAL-TH > 27
| CONSTRUCTION-MANAGEMENT-TOTAL <= 42: Failed (3.0)
| CONSTRUCTION-MANAGEMENT-TOTAL > 42: Passed (56.0)
This decision tree tells that students who scored less than or equal to 27 marks in the final
(theoretical) examination of the course Hydropower Engineering failed, and there were 16 of
them. Those students who scored more than 27 marks in the final examination (theoretical) of the
course Hydropower Engineering but scored less than or equal to 42 marks in total for the course
Construction Management failed ; and there were three of them who failed in this class. And
12. those students who scored more than 27 marks in the final examination (theoretical) of the course
Hydropower Engineering and scored greater than 42 marks in total for the course Construction
Management passed the Eighth semester examination passed the examination ; and there were 56
of them in this class.
4. Conclusions
In this paper, a mini research study was conducted using data Mining technique in order
to enquire what are causing failure of the majority of students in their Eighth semester
examination. The analysis showed that it was mainly only one course, Hydropower
Engineering, which controlled the result of students. It shows the potential of data
mining in higher education. It was especially used to know courses which were causing
students to fail. Once this fact is known, the college management can take appropriate
measures to guide and make their students improve themselves in the course,
Hydropower Engineering.
5. References
[1] W. H¨am¨al¨ainen and M. Vinni, "Classifiers for educational data mining," Handbook
of Educational Data Mining, 2010.
[2] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan
[3] D. Kabakchieva, "Predicting student performance by using data mining methods for
classification," Cybernetics and Information Technologies, vol. 13, 2013.
[4] Q. A. Al-Radaideh, A. A. Ananbeh, and E. M. Al-Shawakfa, "A classification model
for predicting the suitable study track for school students " International Journal of
Research and Reviews in Applied Sciences, vol. 8, 2001.
[5] A. Nandeshwar and S. Chaudhari. (2009). Enrollment Prediction
Models Using Data Mining. [Online]. Available: http://nandeshwar.info/wp-
content/uploads/2008/11/DMWVU_Project.pdf
[6] D. Garc´ıa-Saiz and M. Zorrilla, "Comparing classification methods for predicting
distance students’ performance," The Journal of Machine Learning Research, 2011.