Data Mining Project for student academic specialization and performance

Impact of university
students social status on
their selection of academic
specialization and their
performance

Top 11 Factors extracted
 College
 Specialization
 Gender
 Grade of secondary
 Parent family availability: Student how have one
parent family or two parent family.
 If the student has job or not
 Financial aids, if he got a loan for his semester.
 Educational level of parents.
 Geographical location.
 Positive social life
 Academic overload

Experiments
 We applied number of algorithms on the
data we got it from AlQadi university, but
we also generated part of the data
randomly since there are some missing
data in some column such as grade
mark in the secondary school, so the
result of our applying of these algorithm
may be not trusted and correct. I used
the result as 1 for good prediction for
student result, and 0 for bad predicted
student result

Data Processing
The data used for each attribute are Classified as below:
Every table from the following describe the expected
labels for the identified attributes, for example the
first attribute “college” may has one of the 2 value
listed in “Attribute 1: College” table below
Attribute 1: College
12 Pharmacy
3 Science and Technology
Gender
1 Male
2 Female
Educational level of parents
2 High
1 Low
*High - over Secondary
*Low - Secondary or Less

Data Processing
has job
1 Has Job
0 No
Specialization
0305 Math
1201 Pharmacy
0302 Physics
City
1 Ramallah
2 Hebron
3 Jeneen
Parent family availability
2 Both
1 one of them
0 None

Data Processing
Positive social life
0 Positive
1 Negative
*Positive - Good and Stable Social Life
*Negative - Good and Stable Social Life
Academic overload
0 High
1 Low
*High - 15 Hour Or Greater
*Low - Less Than 15 Hour
Financial aids
1 Has aid
0 No
*Has aid - Student got financial Aid
*No aid - Student didn't got financial Aid

First Experiment by Random Tree
1. Choose m input variables to be used to
determine the decision at a node of the tree.
2. Take a bootstrap sample(training set)
3. For each node of the tree, choose m variables
on which to base the decision at that node.
Calculate the best split based on these m
variables in the training set. The value of m
remains constant during forest growing.
4. Each tree is grown to the largest extent
possible and not pruned as done in
constructing a normal tree classifier.

Result
 "Grade of secondary" = Grade of
 secondary : Result (1/0)
 "Grade of secondary" = 72
 | City = City : Result (0/0)
 | City = 3 : 1 (2/0)
 | City = 2
 | | ID = Id : Result (0/0)
 | | ID = 20920135 : Result (0/0)
 | | ID = 20920171 : Result (0/0)
○ :
 | | | | ID = 21011651 : Result (0/0)
 | | ID = 21011733 : Result (0/0)
 | City = 1 : 0 (1/0)
 "Grade of secondary" = 76.8 : 0 (1/0)
 "Grade of secondary" = 94.2
 | Colleges = Colleges : Result (0/0)
 | Colleges = 3 : 0 (1/0)
 | Colleges = 12 : 1 (1/0)
 ""Grade of secondary" = 86.9 : 1 (1/0)
 "Grade of secondary" = 90 : 0 (2/0)
 :

 Size of the tree : 150

Second Experiment by W-
LADTree
 LADTree is multi-class alternating decision tree
technique that combines decision trees with
the predictive accuracy of LogitBoosting into a
set of interpretable classification rules. The
original formulation of the tree induction
algorithm restricted attention to binary
classification problems

Result : 0,0,0
 | (1)Gender = 1: -1,1.211,-0.211
 | (1)Gender != 1: -0.946,0.339,0.607
 | | (3)Parent family availability = 2: -0.521,0.288,0.234
 | | | (8)City = 1: -0.467,0.96,-0.493
 | | | | (10)Id = 21010890: -0.448,-2.442,2.891
 | | | | (10)Id != 21010890: -0.455,0.542,-0.087
 | | | (8)City != 1: -0.472,-0.143,0.615
 | | (3)Parent family availability != 2: 1.277,-2.193,0.916
 | | (5)Academic overload = 1: -0.509,-0.269,0.778
 | | (5)Academic overload != 1: -0.027,0.362,-0.335
 | (2)Colleges = 3: -0.594,-0.02,0.614
 | (2)Colleges != 3: -0.375,0.721,-0.346
 | | (4)Positive social life = 1: -0.525,-1.934,2.459
 | | (4)Positive social life != 1: -0.103,0.238,-0.136
 | | | (6)Id = 21010654: -0.452,-2.441,2.893
 | | | (6)Id != 21010654: 0.087,0.139,-0.227
 | | | | (7)Id = 21010742: -0.452,-2.441,2.893
 | | | | (7)Id != 21010742: 0.022,0.131,-0.153
 | | | | | (9)Id = 21010850: -0.446,-2.443,2.89
 | | | | | (9)Id != 21010850: 0.11,0.283,-0.394
 Legend: Result, 1, 0
 #Tree size (total): 31
 #Tree size (number of predictor nodes): 21
 #Leaves (number of predictor nodes): 13
 #Expanded nodes: 100
 #Processed examples: 3146
 #Ratio e/n: 31.46

Third Experiment by W-J48
 J48 Is a standard algorithm in machine learning
based on decision tree induction, this algorithm
employs two pruning methods. In the first method
“sub-tree replacement”, the nodes in a decision
tree may be replaced with a leaf to reducing the
number of tests along a specific path. The steps of
algorithm are:
 Algorithm starts from the leaves of the fully formed
tree and works backwards the root to reduce
number of test along the path. While in the second
type “sub-tree raising”, where a node may be
moved upwards the root of the tree, replacing
other nodes along the path. Sub-tree raising
usually has a effect on decision tree models.

Data Mining Project for student academic specialization and performance

More Related Content

Viewers also liked

Similar to Data Mining Project for student academic specialization and performance

More from Mohammed Kharma

Recently uploaded

Data Mining Project for student academic specialization and performance