This is a data science Nigeria bootcamp activity1, in which we need to predict the “Score” of each student “S/N” in a specific semester.
Based on Kaggle, this competition consists of two datasets “Training” and “Testing” and considered a medium size dataset.
Training dataset has (325, 23) rows and attributes respectively and the Testing dataset has (323, 21) rows and attributes, respectively.
4. Dataset and Attributes
Numerical Attribute Meaning Value Categorical Attribute Meaning Value
S/N Student Number Gender Student’s Gender M: Male
F: Female
Age Student’s Age 10 – 17 Location Student’s Home Address
Type
U: Urban
R: Rural
Traveltime Home to school
travel time
1: <15 min
2: 15 to 30 min
3: 30 min to 1 hour
4: >1 hour
Famsize Family Size LE3: Less or equal to 3
GT3: Greater than 3
Studytime Weekly study time 1: <2 hours
2: 2-5 hours
3: 5-10 hours
4: >10 hours
Pstatus Parent’s Status T: Living Together
A: Apart
failures Number of past class
failures
N if 1<=n<3, else 4 Medu
Fedu
Mother Education
Father Education
0: none
1: Lower Primary
2: Upper Primary to JSS3
3: SSCE level
4: Higher Education
Schoolsup Extra Educational
Support
Yes
No
5. Dataset and Attributes
Numerical Attribute Meaning Value Categorical Attribute Meaning Value
Famrel Quality of Family
Relationships
1: very bad
5: Excellent
Famsup Family Educational
Support
Yes
No
Freetime Freetime after
School
1: very low
5: very high
Paid Extra Paid Classes within
the Course Subject
Yes
No
Health Current Health 1: very bad
5: very good
Activities Extra Curricular Activities Yes
No
Absences Number of School
Absences
0 to 93 Nursery Attended Nursery School Yes
No
Scores Score in a subject 0-60 Higher Wants to take higher
Education
Yes
No
Internet Internet Access at Home Yes
No
6. Steps Followed for Prediction
Removing
Outliers
Removing
Irrelevant
Attributes
Filling Missing
Values
Handling
Categorical and
Numerical Values
Level 2: Data
Preparation
Level 3: Creating
Regression Models
Feature
Engineering
Feature
Importance
Level 1: Concatenate
Datasets
Regression
Models and
Hyper Tuning
Level 4: Ensemble
Method, Final Results
Voting Bagging Stacking
Training Set Testing Set
Full Dataset
7. Steps Followed for Prediction
Categorical Values
Numerical Values
Missing Values
Columns
Numerical Values
Dropped Gender
Column
Testing Set
(325, 23)
(323, 21)
Training Set
Full Dataset
8. Steps Followed for Prediction
Plotting and Removing
Outliers:
• Counter plot
• Scatter plot