SlideShare a Scribd company logo
Data Science Tools and
Application
Final Project
Heba Zaina – Fatma Al-Naimi
Overview and Goal of Kaggle Competition
Dataset and Attribute
(325, 23) (323, 21) (323,)
Training Set Testing Set Student’s Score
Dataset and Attributes
Numerical Attribute Meaning Value Categorical Attribute Meaning Value
S/N Student Number Gender Student’s Gender M: Male
F: Female
Age Student’s Age 10 – 17 Location Student’s Home Address
Type
U: Urban
R: Rural
Traveltime Home to school
travel time
1: <15 min
2: 15 to 30 min
3: 30 min to 1 hour
4: >1 hour
Famsize Family Size LE3: Less or equal to 3
GT3: Greater than 3
Studytime Weekly study time 1: <2 hours
2: 2-5 hours
3: 5-10 hours
4: >10 hours
Pstatus Parent’s Status T: Living Together
A: Apart
failures Number of past class
failures
N if 1<=n<3, else 4 Medu
Fedu
Mother Education
Father Education
0: none
1: Lower Primary
2: Upper Primary to JSS3
3: SSCE level
4: Higher Education
Schoolsup Extra Educational
Support
Yes
No
Dataset and Attributes
Numerical Attribute Meaning Value Categorical Attribute Meaning Value
Famrel Quality of Family
Relationships
1: very bad
5: Excellent
Famsup Family Educational
Support
Yes
No
Freetime Freetime after
School
1: very low
5: very high
Paid Extra Paid Classes within
the Course Subject
Yes
No
Health Current Health 1: very bad
5: very good
Activities Extra Curricular Activities Yes
No
Absences Number of School
Absences
0 to 93 Nursery Attended Nursery School Yes
No
Scores Score in a subject 0-60 Higher Wants to take higher
Education
Yes
No
Internet Internet Access at Home Yes
No
Steps Followed for Prediction
Removing
Outliers
Removing
Irrelevant
Attributes
Filling Missing
Values
Handling
Categorical and
Numerical Values
Level 2: Data
Preparation
Level 3: Creating
Regression Models
Feature
Engineering
Feature
Importance
Level 1: Concatenate
Datasets
Regression
Models and
Hyper Tuning
Level 4: Ensemble
Method, Final Results
Voting Bagging Stacking
Training Set Testing Set
Full Dataset
Steps Followed for Prediction
Categorical Values
Numerical Values
Missing Values
Columns
Numerical Values
Dropped Gender
Column
Testing Set
(325, 23)
(323, 21)
Training Set
Full Dataset
Steps Followed for Prediction
Plotting and Removing
Outliers:
• Counter plot
• Scatter plot
Steps Followed for Prediction
Steps Followed for Prediction
Steps Followed for Prediction
Steps Followed for Prediction
Feature Engineering:
• Next Step
Steps Followed for Prediction
Feature Importance:
• Linear Models (model.coef)
• Decision Tree Ensembles
• (model.feature_importance)
• One-Liner (SelectFromModel)
• Pearson Correlation Coefficient
Steps Followed for Prediction
Steps Followed for Prediction
Steps Followed for Prediction
Lasso
ElasticNet
Ridge Regressor
KNN Regressor
SVR
Random Forest
AdaBoost
Gradient Boosting
Prediction Models
10.95029
9.85833
9.99448
10.77772
10.85440
9.80351
9.63525
9.94054
Steps Followed for Prediction
Ensemble Methods
Voting/Averaging Bagging Stacking
To be done Next
Steps Followed for Prediction
Ensemble Methods
(Voting/Averaging)
ElasticNet Random Forest AdaBoost
9.85833 9.80351 9.63525
9.62969
Steps Followed for Prediction
Ensemble Methods
(Stacking)
ElasticNet Random Forest
Gradient
Boosting
9.95711
SVR AdaBoost
Submission on Kaggle and Ranking
in Kaggle Public Leaderboard
Next Step:
Feature Engineering Ensemble Methods
Adding new
features
Boxcox Skewness Bagging
Submit the Results to Kaggle and Report
the best Score
Thank you

More Related Content

Similar to Wazobia Students Score Prediction.pptx

Ri sme 6054 s em 1 sesi 2014 2015
Ri sme 6054 s em 1 sesi 2014 2015Ri sme 6054 s em 1 sesi 2014 2015
Ri sme 6054 s em 1 sesi 2014 2015
Hasnidi Fara
 
IEOR 115 Final Presentation (2)
IEOR 115 Final Presentation (2)IEOR 115 Final Presentation (2)
IEOR 115 Final Presentation (2)
Catherine Darmawan
 
GATE Exam
GATE ExamGATE Exam
GATE Exam
Sajeev P
 
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
shibbirtanvin
 
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon webinar series    Standard Setting for the 21st Century, Using Informa...Caveon webinar series    Standard Setting for the 21st Century, Using Informa...
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon Test Security
 
CS Education in Texas ISDs: Partnerships for Success
CS Education in Texas ISDs: Partnerships for SuccessCS Education in Texas ISDs: Partnerships for Success
CS Education in Texas ISDs: Partnerships for Success
WeTeach_CS
 
Edstaffingtrends
EdstaffingtrendsEdstaffingtrends
Edstaffingtrends
tstephens
 
Standards-Based Grading
Standards-Based GradingStandards-Based Grading
Standards-Based Grading
Stewart Hudnall
 
Automatic Assessment of Student Answers for Geometric Construction Questions
Automatic Assessment of Student Answers for Geometric Construction QuestionsAutomatic Assessment of Student Answers for Geometric Construction Questions
Automatic Assessment of Student Answers for Geometric Construction Questions
Buddhima Wijeweera
 
Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...
Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...
Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...
William Kritsonis
 
Learning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesLearning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profiles
aciijournal
 
Gate
GateGate
Gate
sraj_4u
 
Learning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' ProfilesLearning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' Profiles
aciijournal
 
Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...
Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...
Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...
William Kritsonis
 
Data Clustering in Education for Students
Data Clustering in Education for StudentsData Clustering in Education for Students
Data Clustering in Education for Students
IRJET Journal
 
Learning Analytics bij de Rijksuniversiteit Groningen - deel 2
Learning Analytics bij de Rijksuniversiteit Groningen - deel 2Learning Analytics bij de Rijksuniversiteit Groningen - deel 2
Learning Analytics bij de Rijksuniversiteit Groningen - deel 2
SURF Events
 
Assessment
AssessmentAssessment
Assessment
lnowak
 
Assessment Lite
Assessment LiteAssessment Lite
Assessment Lite
lnowak
 
Class 02 business ethics inclass
Class 02 business ethics inclassClass 02 business ethics inclass
Class 02 business ethics inclass
Ryan Wold
 
Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...
Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...
Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...
William Kritsonis
 

Similar to Wazobia Students Score Prediction.pptx (20)

Ri sme 6054 s em 1 sesi 2014 2015
Ri sme 6054 s em 1 sesi 2014 2015Ri sme 6054 s em 1 sesi 2014 2015
Ri sme 6054 s em 1 sesi 2014 2015
 
IEOR 115 Final Presentation (2)
IEOR 115 Final Presentation (2)IEOR 115 Final Presentation (2)
IEOR 115 Final Presentation (2)
 
GATE Exam
GATE ExamGATE Exam
GATE Exam
 
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
 
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon webinar series    Standard Setting for the 21st Century, Using Informa...Caveon webinar series    Standard Setting for the 21st Century, Using Informa...
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
 
CS Education in Texas ISDs: Partnerships for Success
CS Education in Texas ISDs: Partnerships for SuccessCS Education in Texas ISDs: Partnerships for Success
CS Education in Texas ISDs: Partnerships for Success
 
Edstaffingtrends
EdstaffingtrendsEdstaffingtrends
Edstaffingtrends
 
Standards-Based Grading
Standards-Based GradingStandards-Based Grading
Standards-Based Grading
 
Automatic Assessment of Student Answers for Geometric Construction Questions
Automatic Assessment of Student Answers for Geometric Construction QuestionsAutomatic Assessment of Student Answers for Geometric Construction Questions
Automatic Assessment of Student Answers for Geometric Construction Questions
 
Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...
Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...
Clarence Johnson, Dissertation PPT, Dr. William Allan Kritsonis, Dissertation...
 
Learning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesLearning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profiles
 
Gate
GateGate
Gate
 
Learning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' ProfilesLearning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' Profiles
 
Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...
Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...
Dr. William Allan Kritsonis, Dissertation Chair for Clarence Johnson, Dissert...
 
Data Clustering in Education for Students
Data Clustering in Education for StudentsData Clustering in Education for Students
Data Clustering in Education for Students
 
Learning Analytics bij de Rijksuniversiteit Groningen - deel 2
Learning Analytics bij de Rijksuniversiteit Groningen - deel 2Learning Analytics bij de Rijksuniversiteit Groningen - deel 2
Learning Analytics bij de Rijksuniversiteit Groningen - deel 2
 
Assessment
AssessmentAssessment
Assessment
 
Assessment Lite
Assessment LiteAssessment Lite
Assessment Lite
 
Class 02 business ethics inclass
Class 02 business ethics inclassClass 02 business ethics inclass
Class 02 business ethics inclass
 
Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...
Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...
Clarence Johnson, PhD, Dissertation Defense PPT, Dr. William Allan Kritsonis,...
 

Recently uploaded

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 

Recently uploaded (20)

A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 

Wazobia Students Score Prediction.pptx

  • 1. Data Science Tools and Application Final Project Heba Zaina – Fatma Al-Naimi
  • 2. Overview and Goal of Kaggle Competition
  • 3. Dataset and Attribute (325, 23) (323, 21) (323,) Training Set Testing Set Student’s Score
  • 4. Dataset and Attributes Numerical Attribute Meaning Value Categorical Attribute Meaning Value S/N Student Number Gender Student’s Gender M: Male F: Female Age Student’s Age 10 – 17 Location Student’s Home Address Type U: Urban R: Rural Traveltime Home to school travel time 1: <15 min 2: 15 to 30 min 3: 30 min to 1 hour 4: >1 hour Famsize Family Size LE3: Less or equal to 3 GT3: Greater than 3 Studytime Weekly study time 1: <2 hours 2: 2-5 hours 3: 5-10 hours 4: >10 hours Pstatus Parent’s Status T: Living Together A: Apart failures Number of past class failures N if 1<=n<3, else 4 Medu Fedu Mother Education Father Education 0: none 1: Lower Primary 2: Upper Primary to JSS3 3: SSCE level 4: Higher Education Schoolsup Extra Educational Support Yes No
  • 5. Dataset and Attributes Numerical Attribute Meaning Value Categorical Attribute Meaning Value Famrel Quality of Family Relationships 1: very bad 5: Excellent Famsup Family Educational Support Yes No Freetime Freetime after School 1: very low 5: very high Paid Extra Paid Classes within the Course Subject Yes No Health Current Health 1: very bad 5: very good Activities Extra Curricular Activities Yes No Absences Number of School Absences 0 to 93 Nursery Attended Nursery School Yes No Scores Score in a subject 0-60 Higher Wants to take higher Education Yes No Internet Internet Access at Home Yes No
  • 6. Steps Followed for Prediction Removing Outliers Removing Irrelevant Attributes Filling Missing Values Handling Categorical and Numerical Values Level 2: Data Preparation Level 3: Creating Regression Models Feature Engineering Feature Importance Level 1: Concatenate Datasets Regression Models and Hyper Tuning Level 4: Ensemble Method, Final Results Voting Bagging Stacking Training Set Testing Set Full Dataset
  • 7. Steps Followed for Prediction Categorical Values Numerical Values Missing Values Columns Numerical Values Dropped Gender Column Testing Set (325, 23) (323, 21) Training Set Full Dataset
  • 8. Steps Followed for Prediction Plotting and Removing Outliers: • Counter plot • Scatter plot
  • 9. Steps Followed for Prediction
  • 10. Steps Followed for Prediction
  • 11. Steps Followed for Prediction
  • 12. Steps Followed for Prediction Feature Engineering: • Next Step
  • 13. Steps Followed for Prediction Feature Importance: • Linear Models (model.coef) • Decision Tree Ensembles • (model.feature_importance) • One-Liner (SelectFromModel) • Pearson Correlation Coefficient
  • 14. Steps Followed for Prediction
  • 15. Steps Followed for Prediction
  • 16. Steps Followed for Prediction Lasso ElasticNet Ridge Regressor KNN Regressor SVR Random Forest AdaBoost Gradient Boosting Prediction Models 10.95029 9.85833 9.99448 10.77772 10.85440 9.80351 9.63525 9.94054
  • 17. Steps Followed for Prediction Ensemble Methods Voting/Averaging Bagging Stacking To be done Next
  • 18. Steps Followed for Prediction Ensemble Methods (Voting/Averaging) ElasticNet Random Forest AdaBoost 9.85833 9.80351 9.63525 9.62969
  • 19. Steps Followed for Prediction Ensemble Methods (Stacking) ElasticNet Random Forest Gradient Boosting 9.95711 SVR AdaBoost
  • 20. Submission on Kaggle and Ranking in Kaggle Public Leaderboard
  • 21. Next Step: Feature Engineering Ensemble Methods Adding new features Boxcox Skewness Bagging Submit the Results to Kaggle and Report the best Score