SlideShare a Scribd company logo
1 of 10
Page | 1
PROJECT REPORT
STUDENT PERFORMANCE
(DATAMINING)
BS (SE)2017
GROUP MEMBER(S):
NAME: HAFSAHABIB 2017/COMP/BS(SE)-21597
NAME: MUNIBAJAVIAD 2017/COMP/BS(SE)-21621
SUPERVISOR:
MISS SADIA JAVED
29TH APRIL, 2019
DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
JINNAH UNIVERSITY FOR WOMEN
5-C NAZIMABAD, KARACHI 74600
Page | 2
Table of Contents
1. Introduction.................................................................................................................................3
2. Description of the problem and problem domain....................................................................3
3. Description of implemented data mining techniques/methods...............................................3
3.1. Naïve Bayes Classifier..........................................................................................................3
4. Data Set........................................................................................................................................3
4.1. Exploring the Data Set.........................................................................................................4
4.1.1. General Distribution of Exam Scores ..........................................................................4
4.1.2. Exam scores based on the gender.................................................................................5
4.1.3. Exam scores based on the Parent Level of Education................................................6
4.1.4. Exam scores based on the Lunch Type........................................................................7
4.1.5. Exam scores based on theTest Prepration Course .....................................................7
5. Implementation ...........................................................................................................................8
5.1. Operators ..............................................................................................................................8
6. Results and evaluation/discussion of the results ......................................................................9
7. Future directions/ideas how to extend and enhance the technique......................................10
8. Conclusion .................................................................................................................................10
9. References..................................................................................................................................10
Page | 3
1. Introduction
Using the Students Performance in Exams Dataset we will try to understand what affects the
exam scores. The data is limited, but it will present a good visualization to spot the relations. First
of all, we explore our data and after that we apply Naive Bayes Classification technique for
evaluation purpose.
2. Description of the problem and problem domain
To understand the influence of the parent’s background, test preparation etc. on students’
performance.
Objectives
 Check the dataset and tidying the data if needed.
 Visualize the data to understand the effects of different factors on a student performance.
 Check the effectiveness of test preparation course.
 Check what are the major factors influencing the test scores.
3. Description of implemented data mining techniques/methods
3.1. Naïve Bayes Classifier
Bayesian classifiers are statistical classifiers that predict class membership by probabilities, such
as the probability that a given sample belongs to a particular class. Naive Bayes algorithms
assume that the effect that an attribute plays on a given class is independent of the values of other
attributes. However, in practice, dependencies often exist among attributes; hence Bayesian
networks are graphical models, which can describe joint conditional probability distributions.
Bayesian classifiers are popular classification algorithms due to their simplicity, computational
efficiency and very good performance for real-world problems. Another important advantage is
also that the Bayesian models are fast to train and to evaluate, and have a high accuracy in many
domains.
4. Data Set
 Gender: Gender of the student (i.e. Male, Female)
 Ethnicity: Ethnicity to which the student belongs (i.e. group A, B, C, D, E)
 Parent level of Education: Education level of the parents/guardian of the student (i.e.
high school, bachelor’s degree, master’s degree, some college, associate’s degree)
 Lunch: Standard of the lunch provided to the student in school (i.e. standard,
free/reduced)
 Test preparation course: Whether the student took the preparation course (i.e. none,
completed)
 Math score: Mathematics score of the student (from 0 to 100)
 Reading score: Reading score of the student (from 0 to 100)
 Writing score: Writing score of the student (from 0 to 100)
 Student Performance: Overall performance of the student (i.e. Good, Average, Bad,
Worst)
Page | 4
4.1. Exploring the Data Set
Firstly, We Import the dataset repository and display first few rows of the dataset.
4.1.1. General Distribution of Exam Scores
There are 5 features which might affect the scores of each exam. First thing to analyses would be
to see how the scores are distributed within each exam (Math’s, Reading, and Writing). We will
plot histograms to see if there any differences in the scores' distribution.
Page | 5
The scores are distributed in the Gaussian manner. It is hard to draw any conclusion from the
graphs above: they all look very similar and we don't have enough data for the plots to look more
smoothly.
4.1.2. Exam scores based on the gender
Graphical representation of the exam scores’ based on the gender (i.e. Male, Female).
Page | 6
4.1.3. Exam scores based on the Parent Level of Education
Displaying the mean values as a table or a heat map.
Indeed, it seems that a lower parental level of education has a negative impact on the exam scores.
A child of parents who’s the highest education level was college or high school has noticeably
lower exam scores than their peers. Similarly, parents with master's or bachelor's degree have
children who scores much better in the exams.
Page | 7
4.1.4. Exam scores based on the Lunch Type
It might be amusing to think that type of lunch students have is correlated to their exam scores.
On the other hand, we can see from the dataset that there are two types of
lunch: standard and free/reduced. So it depends on the parents' financial situation rather than on
the type of the dish. There might be some correlation be here, so let's try to visualize the problem.
According to above visualization, there is a huge disproportion between students who have
a free/reduced lunch when compared to those having standard lunch.
4.1.5. Exam scores based on theTest Prepration Course
The last thing we explore in this dataset is to determine how the completion of the test preparation
course affects the exam scores by using heat map. There are only two categorical
variables: none and completed.
Page | 8
5. Implementation
This dataset is clean and free of unwanted data. We don’t have to go through the processes of
cleaning the data. In our data set Student Performance, we apply Naïve Bayes classification
technique. Naïve Bayes classifier is a famous approach for supervised learning. It mainly
classifies a test data provided with the fact that training data is used to train up the model. There
exist 8 features and 1 label named as Student performance.
Student Performance is the class label which needs to be predicted. As the testing data is not
separately provided thereby, we will split this dataset for training and testing respectively. We are
using the ratio of 70:30 for training and testing respectively.
We then train Naïve Bayes model using 70% of the dataset and then classify the rest 30% of the
data. After that we Measure performance parameters i.e. accuracy, precision and recall to show
how much accurate the model has been for the dataset.
5.1. Operators
The details of the operators that are used for the creation of the process are as follows:
 Retrieve
This Operator can access stored information in the Repository and load them into the Process.
 Set Role
This Operator is used to change the role of one or more Attributes.
 Split data
This operator produces the desired number of subsets of the given Data Set.
 Naïve Bayes
This Operator generates a Naive Bayes classification model.
 Apply Model
This Operator applies a model on the given Data Set.
 Performance
This operator is used for performance evaluation. It delivers a list of performance criteria
values. These performance criteria are automatically determined in order to fit the learning
task type.
Page | 9
6. Results and evaluation/discussion of the results
Confusion Matrix
Here, the result of the process of data set “Student Performance” is shown below in the form of
confusion matrix. This table shows the accuracy, class precision and class recall.
The following criteria are added for binominal classification tasks:
 Accuracy
 Precision
 Recall
Accuracy is calculated by taking the percentage of correct predictions over the total number of
examples. Correct prediction means examples where the value of the prediction attribute is equal
to the value of the label attribute.
Here, the Accuracy of the Student Performance data set is 92.64%
Page | 10
7. Future directions/ideas how to extend and enhance the technique
By using the process or model we can predict more about the student performances and theirs
factors involves with them.
In future, this can be implemented in any university by using this process we can calculate the
GPA of the student in advance by just knowing their previous GPA.
In schools, we can calculate the performance of the worst student so that by knowing the name of
those students, teacher may focus more on such type of students.
8. Conclusion
We have already seen the insights of the Data, the summary is written below:
 135 students failed in mathematics, 90 students failed in reading examination, 114
students failed in writing examination and overall 103 students failed the examination.
 Reading score and Writing score are positively linearly correlated with correlation
coefficient 0.95(approx.).
 Students who belongs to group D in ethnicity performed very well.
 Test Preparation Course is very effective. We saw that the students who had completed
their test preparation course failed less in number.
 Students who take standard lunch performed very well than others.
 In case of parental education level, the parents with master's or bachelor's degree have
children who scores much better in the exams.
 The Accuracy of the Student Performance data set is 92.64% calculated by the naïve
Bayes classifier process.
9. References
https://www.kaggle.com/spscientist/students-performance-in-exams#StudentsPerformance.csv

More Related Content

What's hot

Student management system
Student management systemStudent management system
Student management systemGaurav Subham
 
Student Grade Prediction
Student Grade PredictionStudent Grade Prediction
Student Grade PredictionGaurav Sawant
 
College Management System project srs 2015
College Management System project srs 2015College Management System project srs 2015
College Management System project srs 2015Surendra Mahala
 
Software requirement specification for online examination system
Software requirement specification for online examination systemSoftware requirement specification for online examination system
Software requirement specification for online examination systemkarthik venkatesh
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and modelssabah N
 
System analysis and design chapter 2
System analysis and design chapter 2System analysis and design chapter 2
System analysis and design chapter 2Einrez Pugao
 
Online Exam Management System(OEMS)
Online Exam Management System(OEMS)Online Exam Management System(OEMS)
Online Exam Management System(OEMS)PUST
 
Analysis modeling & scenario based modeling
Analysis modeling &  scenario based modeling Analysis modeling &  scenario based modeling
Analysis modeling & scenario based modeling Benazir Fathima
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningLovely Professional University
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Advanced topics in software engineering
Advanced topics in software engineeringAdvanced topics in software engineering
Advanced topics in software engineeringRupesh Vaishnav
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
SRS for online examination system
SRS for online examination systemSRS for online examination system
SRS for online examination systemlunarrain
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).pptSanjayAcharaya
 
Synopsis of online Attendance System
Synopsis of online Attendance SystemSynopsis of online Attendance System
Synopsis of online Attendance SystemShyam Sundar Pandey
 
Unified process model
Unified process modelUnified process model
Unified process modelRyndaMaala
 

What's hot (20)

Student management system
Student management systemStudent management system
Student management system
 
Student Grade Prediction
Student Grade PredictionStudent Grade Prediction
Student Grade Prediction
 
College Management System project srs 2015
College Management System project srs 2015College Management System project srs 2015
College Management System project srs 2015
 
Software requirement specification for online examination system
Software requirement specification for online examination systemSoftware requirement specification for online examination system
Software requirement specification for online examination system
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
Kdd process
Kdd processKdd process
Kdd process
 
System analysis and design chapter 2
System analysis and design chapter 2System analysis and design chapter 2
System analysis and design chapter 2
 
Online Exam Management System(OEMS)
Online Exam Management System(OEMS)Online Exam Management System(OEMS)
Online Exam Management System(OEMS)
 
Analysis modeling & scenario based modeling
Analysis modeling &  scenario based modeling Analysis modeling &  scenario based modeling
Analysis modeling & scenario based modeling
 
Predicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data miningPredicting students performance using classification techniques in data mining
Predicting students performance using classification techniques in data mining
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
System Analysis and Design
System Analysis and DesignSystem Analysis and Design
System Analysis and Design
 
School Management System
School Management SystemSchool Management System
School Management System
 
Advanced topics in software engineering
Advanced topics in software engineeringAdvanced topics in software engineering
Advanced topics in software engineering
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
SRS for online examination system
SRS for online examination systemSRS for online examination system
SRS for online examination system
 
Data Science-1 (1).ppt
Data Science-1 (1).pptData Science-1 (1).ppt
Data Science-1 (1).ppt
 
Synopsis of online Attendance System
Synopsis of online Attendance SystemSynopsis of online Attendance System
Synopsis of online Attendance System
 
Unified process model
Unified process modelUnified process model
Unified process model
 
13 software metrics
13 software metrics13 software metrics
13 software metrics
 

Similar to Student Performance Data Mining Project Report

IRJET- Using Data Mining to Predict Students Performance
IRJET-  	  Using Data Mining to Predict Students PerformanceIRJET-  	  Using Data Mining to Predict Students Performance
IRJET- Using Data Mining to Predict Students PerformanceIRJET Journal
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...IRJET Journal
 
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...ijtsrd
 
ADMINISTRATION SCORING AND REPORTING.pdf
ADMINISTRATION  SCORING AND REPORTING.pdfADMINISTRATION  SCORING AND REPORTING.pdf
ADMINISTRATION SCORING AND REPORTING.pdfOM VERMA
 
educ331 Linear Regression for Baseball
educ331 Linear Regression for Baseballeduc331 Linear Regression for Baseball
educ331 Linear Regression for Baseballboernerj
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IJITE
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...IJITE
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...IJITE
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IJITE
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET Journal
 
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam DesignExams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam DesignG. Alex Ambrose
 
IRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan SystemIRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan SystemIRJET Journal
 
Automating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdfAutomating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdfCharlie Congdon
 
WSU new standards mean new baseline
WSU new standards mean new baselineWSU new standards mean new baseline
WSU new standards mean new baselineGlenn E. Malone, EdD
 

Similar to Student Performance Data Mining Project Report (20)

IRJET- Using Data Mining to Predict Students Performance
IRJET-  	  Using Data Mining to Predict Students PerformanceIRJET-  	  Using Data Mining to Predict Students Performance
IRJET- Using Data Mining to Predict Students Performance
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
 
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
 
ADMINISTRATION SCORING AND REPORTING.pdf
ADMINISTRATION  SCORING AND REPORTING.pdfADMINISTRATION  SCORING AND REPORTING.pdf
ADMINISTRATION SCORING AND REPORTING.pdf
 
educ331 Linear Regression for Baseball
educ331 Linear Regression for Baseballeduc331 Linear Regression for Baseball
educ331 Linear Regression for Baseball
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
 
B05110409
B05110409B05110409
B05110409
 
I-ready Research
I-ready ResearchI-ready Research
I-ready Research
 
C0364010013
C0364010013C0364010013
C0364010013
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career Prediction
 
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam DesignExams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
 
IRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan SystemIRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan System
 
Primer_NATG12.pptx
Primer_NATG12.pptxPrimer_NATG12.pptx
Primer_NATG12.pptx
 
Automating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdfAutomating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdf
 
Chap 15
Chap 15Chap 15
Chap 15
 
WSU new standards mean new baseline
WSU new standards mean new baselineWSU new standards mean new baseline
WSU new standards mean new baseline
 
Csrde discriminant analysis final
Csrde discriminant analysis finalCsrde discriminant analysis final
Csrde discriminant analysis final
 

Recently uploaded

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

Student Performance Data Mining Project Report

  • 1. Page | 1 PROJECT REPORT STUDENT PERFORMANCE (DATAMINING) BS (SE)2017 GROUP MEMBER(S): NAME: HAFSAHABIB 2017/COMP/BS(SE)-21597 NAME: MUNIBAJAVIAD 2017/COMP/BS(SE)-21621 SUPERVISOR: MISS SADIA JAVED 29TH APRIL, 2019 DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY JINNAH UNIVERSITY FOR WOMEN 5-C NAZIMABAD, KARACHI 74600
  • 2. Page | 2 Table of Contents 1. Introduction.................................................................................................................................3 2. Description of the problem and problem domain....................................................................3 3. Description of implemented data mining techniques/methods...............................................3 3.1. Naïve Bayes Classifier..........................................................................................................3 4. Data Set........................................................................................................................................3 4.1. Exploring the Data Set.........................................................................................................4 4.1.1. General Distribution of Exam Scores ..........................................................................4 4.1.2. Exam scores based on the gender.................................................................................5 4.1.3. Exam scores based on the Parent Level of Education................................................6 4.1.4. Exam scores based on the Lunch Type........................................................................7 4.1.5. Exam scores based on theTest Prepration Course .....................................................7 5. Implementation ...........................................................................................................................8 5.1. Operators ..............................................................................................................................8 6. Results and evaluation/discussion of the results ......................................................................9 7. Future directions/ideas how to extend and enhance the technique......................................10 8. Conclusion .................................................................................................................................10 9. References..................................................................................................................................10
  • 3. Page | 3 1. Introduction Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose. 2. Description of the problem and problem domain To understand the influence of the parent’s background, test preparation etc. on students’ performance. Objectives  Check the dataset and tidying the data if needed.  Visualize the data to understand the effects of different factors on a student performance.  Check the effectiveness of test preparation course.  Check what are the major factors influencing the test scores. 3. Description of implemented data mining techniques/methods 3.1. Naïve Bayes Classifier Bayesian classifiers are statistical classifiers that predict class membership by probabilities, such as the probability that a given sample belongs to a particular class. Naive Bayes algorithms assume that the effect that an attribute plays on a given class is independent of the values of other attributes. However, in practice, dependencies often exist among attributes; hence Bayesian networks are graphical models, which can describe joint conditional probability distributions. Bayesian classifiers are popular classification algorithms due to their simplicity, computational efficiency and very good performance for real-world problems. Another important advantage is also that the Bayesian models are fast to train and to evaluate, and have a high accuracy in many domains. 4. Data Set  Gender: Gender of the student (i.e. Male, Female)  Ethnicity: Ethnicity to which the student belongs (i.e. group A, B, C, D, E)  Parent level of Education: Education level of the parents/guardian of the student (i.e. high school, bachelor’s degree, master’s degree, some college, associate’s degree)  Lunch: Standard of the lunch provided to the student in school (i.e. standard, free/reduced)  Test preparation course: Whether the student took the preparation course (i.e. none, completed)  Math score: Mathematics score of the student (from 0 to 100)  Reading score: Reading score of the student (from 0 to 100)  Writing score: Writing score of the student (from 0 to 100)  Student Performance: Overall performance of the student (i.e. Good, Average, Bad, Worst)
  • 4. Page | 4 4.1. Exploring the Data Set Firstly, We Import the dataset repository and display first few rows of the dataset. 4.1.1. General Distribution of Exam Scores There are 5 features which might affect the scores of each exam. First thing to analyses would be to see how the scores are distributed within each exam (Math’s, Reading, and Writing). We will plot histograms to see if there any differences in the scores' distribution.
  • 5. Page | 5 The scores are distributed in the Gaussian manner. It is hard to draw any conclusion from the graphs above: they all look very similar and we don't have enough data for the plots to look more smoothly. 4.1.2. Exam scores based on the gender Graphical representation of the exam scores’ based on the gender (i.e. Male, Female).
  • 6. Page | 6 4.1.3. Exam scores based on the Parent Level of Education Displaying the mean values as a table or a heat map. Indeed, it seems that a lower parental level of education has a negative impact on the exam scores. A child of parents who’s the highest education level was college or high school has noticeably lower exam scores than their peers. Similarly, parents with master's or bachelor's degree have children who scores much better in the exams.
  • 7. Page | 7 4.1.4. Exam scores based on the Lunch Type It might be amusing to think that type of lunch students have is correlated to their exam scores. On the other hand, we can see from the dataset that there are two types of lunch: standard and free/reduced. So it depends on the parents' financial situation rather than on the type of the dish. There might be some correlation be here, so let's try to visualize the problem. According to above visualization, there is a huge disproportion between students who have a free/reduced lunch when compared to those having standard lunch. 4.1.5. Exam scores based on theTest Prepration Course The last thing we explore in this dataset is to determine how the completion of the test preparation course affects the exam scores by using heat map. There are only two categorical variables: none and completed.
  • 8. Page | 8 5. Implementation This dataset is clean and free of unwanted data. We don’t have to go through the processes of cleaning the data. In our data set Student Performance, we apply Naïve Bayes classification technique. Naïve Bayes classifier is a famous approach for supervised learning. It mainly classifies a test data provided with the fact that training data is used to train up the model. There exist 8 features and 1 label named as Student performance. Student Performance is the class label which needs to be predicted. As the testing data is not separately provided thereby, we will split this dataset for training and testing respectively. We are using the ratio of 70:30 for training and testing respectively. We then train Naïve Bayes model using 70% of the dataset and then classify the rest 30% of the data. After that we Measure performance parameters i.e. accuracy, precision and recall to show how much accurate the model has been for the dataset. 5.1. Operators The details of the operators that are used for the creation of the process are as follows:  Retrieve This Operator can access stored information in the Repository and load them into the Process.  Set Role This Operator is used to change the role of one or more Attributes.  Split data This operator produces the desired number of subsets of the given Data Set.  Naïve Bayes This Operator generates a Naive Bayes classification model.  Apply Model This Operator applies a model on the given Data Set.  Performance This operator is used for performance evaluation. It delivers a list of performance criteria values. These performance criteria are automatically determined in order to fit the learning task type.
  • 9. Page | 9 6. Results and evaluation/discussion of the results Confusion Matrix Here, the result of the process of data set “Student Performance” is shown below in the form of confusion matrix. This table shows the accuracy, class precision and class recall. The following criteria are added for binominal classification tasks:  Accuracy  Precision  Recall Accuracy is calculated by taking the percentage of correct predictions over the total number of examples. Correct prediction means examples where the value of the prediction attribute is equal to the value of the label attribute. Here, the Accuracy of the Student Performance data set is 92.64%
  • 10. Page | 10 7. Future directions/ideas how to extend and enhance the technique By using the process or model we can predict more about the student performances and theirs factors involves with them. In future, this can be implemented in any university by using this process we can calculate the GPA of the student in advance by just knowing their previous GPA. In schools, we can calculate the performance of the worst student so that by knowing the name of those students, teacher may focus more on such type of students. 8. Conclusion We have already seen the insights of the Data, the summary is written below:  135 students failed in mathematics, 90 students failed in reading examination, 114 students failed in writing examination and overall 103 students failed the examination.  Reading score and Writing score are positively linearly correlated with correlation coefficient 0.95(approx.).  Students who belongs to group D in ethnicity performed very well.  Test Preparation Course is very effective. We saw that the students who had completed their test preparation course failed less in number.  Students who take standard lunch performed very well than others.  In case of parental education level, the parents with master's or bachelor's degree have children who scores much better in the exams.  The Accuracy of the Student Performance data set is 92.64% calculated by the naïve Bayes classifier process. 9. References https://www.kaggle.com/spscientist/students-performance-in-exams#StudentsPerformance.csv