SlideShare a Scribd company logo
Page | 1
PROJECT REPORT
STUDENT PERFORMANCE
(DATAMINING)
BS (SE)2017
GROUP MEMBER(S):
NAME: HAFSAHABIB 2017/COMP/BS(SE)-21597
NAME: MUNIBAJAVIAD 2017/COMP/BS(SE)-21621
SUPERVISOR:
MISS SADIA JAVED
29TH APRIL, 2019
DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
JINNAH UNIVERSITY FOR WOMEN
5-C NAZIMABAD, KARACHI 74600
Page | 2
Table of Contents
1. Introduction.................................................................................................................................3
2. Description of the problem and problem domain....................................................................3
3. Description of implemented data mining techniques/methods...............................................3
3.1. Naïve Bayes Classifier..........................................................................................................3
4. Data Set........................................................................................................................................3
4.1. Exploring the Data Set.........................................................................................................4
4.1.1. General Distribution of Exam Scores ..........................................................................4
4.1.2. Exam scores based on the gender.................................................................................5
4.1.3. Exam scores based on the Parent Level of Education................................................6
4.1.4. Exam scores based on the Lunch Type........................................................................7
4.1.5. Exam scores based on theTest Prepration Course .....................................................7
5. Implementation ...........................................................................................................................8
5.1. Operators ..............................................................................................................................8
6. Results and evaluation/discussion of the results ......................................................................9
7. Future directions/ideas how to extend and enhance the technique......................................10
8. Conclusion .................................................................................................................................10
9. References..................................................................................................................................10
Page | 3
1. Introduction
Using the Students Performance in Exams Dataset we will try to understand what affects the
exam scores. The data is limited, but it will present a good visualization to spot the relations. First
of all, we explore our data and after that we apply Naive Bayes Classification technique for
evaluation purpose.
2. Description of the problem and problem domain
To understand the influence of the parent’s background, test preparation etc. on students’
performance.
Objectives
 Check the dataset and tidying the data if needed.
 Visualize the data to understand the effects of different factors on a student performance.
 Check the effectiveness of test preparation course.
 Check what are the major factors influencing the test scores.
3. Description of implemented data mining techniques/methods
3.1. Naïve Bayes Classifier
Bayesian classifiers are statistical classifiers that predict class membership by probabilities, such
as the probability that a given sample belongs to a particular class. Naive Bayes algorithms
assume that the effect that an attribute plays on a given class is independent of the values of other
attributes. However, in practice, dependencies often exist among attributes; hence Bayesian
networks are graphical models, which can describe joint conditional probability distributions.
Bayesian classifiers are popular classification algorithms due to their simplicity, computational
efficiency and very good performance for real-world problems. Another important advantage is
also that the Bayesian models are fast to train and to evaluate, and have a high accuracy in many
domains.
4. Data Set
 Gender: Gender of the student (i.e. Male, Female)
 Ethnicity: Ethnicity to which the student belongs (i.e. group A, B, C, D, E)
 Parent level of Education: Education level of the parents/guardian of the student (i.e.
high school, bachelor’s degree, master’s degree, some college, associate’s degree)
 Lunch: Standard of the lunch provided to the student in school (i.e. standard,
free/reduced)
 Test preparation course: Whether the student took the preparation course (i.e. none,
completed)
 Math score: Mathematics score of the student (from 0 to 100)
 Reading score: Reading score of the student (from 0 to 100)
 Writing score: Writing score of the student (from 0 to 100)
 Student Performance: Overall performance of the student (i.e. Good, Average, Bad,
Worst)
Page | 4
4.1. Exploring the Data Set
Firstly, We Import the dataset repository and display first few rows of the dataset.
4.1.1. General Distribution of Exam Scores
There are 5 features which might affect the scores of each exam. First thing to analyses would be
to see how the scores are distributed within each exam (Math’s, Reading, and Writing). We will
plot histograms to see if there any differences in the scores' distribution.
Page | 5
The scores are distributed in the Gaussian manner. It is hard to draw any conclusion from the
graphs above: they all look very similar and we don't have enough data for the plots to look more
smoothly.
4.1.2. Exam scores based on the gender
Graphical representation of the exam scores’ based on the gender (i.e. Male, Female).
Page | 6
4.1.3. Exam scores based on the Parent Level of Education
Displaying the mean values as a table or a heat map.
Indeed, it seems that a lower parental level of education has a negative impact on the exam scores.
A child of parents who’s the highest education level was college or high school has noticeably
lower exam scores than their peers. Similarly, parents with master's or bachelor's degree have
children who scores much better in the exams.
Page | 7
4.1.4. Exam scores based on the Lunch Type
It might be amusing to think that type of lunch students have is correlated to their exam scores.
On the other hand, we can see from the dataset that there are two types of
lunch: standard and free/reduced. So it depends on the parents' financial situation rather than on
the type of the dish. There might be some correlation be here, so let's try to visualize the problem.
According to above visualization, there is a huge disproportion between students who have
a free/reduced lunch when compared to those having standard lunch.
4.1.5. Exam scores based on theTest Prepration Course
The last thing we explore in this dataset is to determine how the completion of the test preparation
course affects the exam scores by using heat map. There are only two categorical
variables: none and completed.
Page | 8
5. Implementation
This dataset is clean and free of unwanted data. We don’t have to go through the processes of
cleaning the data. In our data set Student Performance, we apply Naïve Bayes classification
technique. Naïve Bayes classifier is a famous approach for supervised learning. It mainly
classifies a test data provided with the fact that training data is used to train up the model. There
exist 8 features and 1 label named as Student performance.
Student Performance is the class label which needs to be predicted. As the testing data is not
separately provided thereby, we will split this dataset for training and testing respectively. We are
using the ratio of 70:30 for training and testing respectively.
We then train Naïve Bayes model using 70% of the dataset and then classify the rest 30% of the
data. After that we Measure performance parameters i.e. accuracy, precision and recall to show
how much accurate the model has been for the dataset.
5.1. Operators
The details of the operators that are used for the creation of the process are as follows:
 Retrieve
This Operator can access stored information in the Repository and load them into the Process.
 Set Role
This Operator is used to change the role of one or more Attributes.
 Split data
This operator produces the desired number of subsets of the given Data Set.
 Naïve Bayes
This Operator generates a Naive Bayes classification model.
 Apply Model
This Operator applies a model on the given Data Set.
 Performance
This operator is used for performance evaluation. It delivers a list of performance criteria
values. These performance criteria are automatically determined in order to fit the learning
task type.
Page | 9
6. Results and evaluation/discussion of the results
Confusion Matrix
Here, the result of the process of data set “Student Performance” is shown below in the form of
confusion matrix. This table shows the accuracy, class precision and class recall.
The following criteria are added for binominal classification tasks:
 Accuracy
 Precision
 Recall
Accuracy is calculated by taking the percentage of correct predictions over the total number of
examples. Correct prediction means examples where the value of the prediction attribute is equal
to the value of the label attribute.
Here, the Accuracy of the Student Performance data set is 92.64%
Page | 10
7. Future directions/ideas how to extend and enhance the technique
By using the process or model we can predict more about the student performances and theirs
factors involves with them.
In future, this can be implemented in any university by using this process we can calculate the
GPA of the student in advance by just knowing their previous GPA.
In schools, we can calculate the performance of the worst student so that by knowing the name of
those students, teacher may focus more on such type of students.
8. Conclusion
We have already seen the insights of the Data, the summary is written below:
 135 students failed in mathematics, 90 students failed in reading examination, 114
students failed in writing examination and overall 103 students failed the examination.
 Reading score and Writing score are positively linearly correlated with correlation
coefficient 0.95(approx.).
 Students who belongs to group D in ethnicity performed very well.
 Test Preparation Course is very effective. We saw that the students who had completed
their test preparation course failed less in number.
 Students who take standard lunch performed very well than others.
 In case of parental education level, the parents with master's or bachelor's degree have
children who scores much better in the exams.
 The Accuracy of the Student Performance data set is 92.64% calculated by the naïve
Bayes classifier process.
9. References
https://www.kaggle.com/spscientist/students-performance-in-exams#StudentsPerformance.csv

More Related Content

What's hot

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
ASMS Project Plan
ASMS Project PlanASMS Project Plan
ASMS Project Plan
Varuna Harshana
 
Scholarship Information System documentation
Scholarship Information System documentationScholarship Information System documentation
Scholarship Information System documentation
Kasi Annapurna
 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)Amani Mrisho
 
Daily Expense Tracker BSc.CSIT Project Nepal
Daily Expense Tracker BSc.CSIT Project NepalDaily Expense Tracker BSc.CSIT Project Nepal
Daily Expense Tracker BSc.CSIT Project Nepal
Rashna Maharjan
 
College admission system
College admission system College admission system
College admission system
Sourabh Upadhyay
 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outlineAmit Panwar
 
Student result mamagement
Student result mamagementStudent result mamagement
Student result mamagementMickey
 
Student management system
Student management systemStudent management system
Student management system
Annu Venkata Nagarjuna
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
Kush Kulshrestha
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
Learnbay Datascience
 
inventory management system
 inventory management system inventory management system
inventory management system
Barbara Onwutalobi
 
SRS for online examination system
SRS for online examination systemSRS for online examination system
SRS for online examination system
lunarrain
 
Student management system
Student management systemStudent management system
Student management system
Student
 
Final Project Report of College Management System
Final Project Report of College Management SystemFinal Project Report of College Management System
Final Project Report of College Management System
MuhammadHusnainRaza
 
student database management system
student database management systemstudent database management system
student database management system
Md. Riadul Islam
 
Project proposal of school managment software
Project proposal of school managment softwareProject proposal of school managment software
Project proposal of school managment software
Proshanta Halder
 
online exninition system ppt
online exninition system pptonline exninition system ppt
online exninition system ppt
prahlad chandra
 
Online examination system of open and distance education
Online examination system of open and distance educationOnline examination system of open and distance education
Online examination system of open and distance education
Pallavi Singh
 

What's hot (20)

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
ASMS Project Plan
ASMS Project PlanASMS Project Plan
ASMS Project Plan
 
Scholarship Information System documentation
Scholarship Information System documentationScholarship Information System documentation
Scholarship Information System documentation
 
9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)9321885 online-university-admission-system (1)
9321885 online-university-admission-system (1)
 
Daily Expense Tracker BSc.CSIT Project Nepal
Daily Expense Tracker BSc.CSIT Project NepalDaily Expense Tracker BSc.CSIT Project Nepal
Daily Expense Tracker BSc.CSIT Project Nepal
 
College admission system
College admission system College admission system
College admission system
 
Student information-system-project-outline
Student information-system-project-outlineStudent information-system-project-outline
Student information-system-project-outline
 
Exam system
Exam systemExam system
Exam system
 
Student result mamagement
Student result mamagementStudent result mamagement
Student result mamagement
 
Student management system
Student management systemStudent management system
Student management system
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
inventory management system
 inventory management system inventory management system
inventory management system
 
SRS for online examination system
SRS for online examination systemSRS for online examination system
SRS for online examination system
 
Student management system
Student management systemStudent management system
Student management system
 
Final Project Report of College Management System
Final Project Report of College Management SystemFinal Project Report of College Management System
Final Project Report of College Management System
 
student database management system
student database management systemstudent database management system
student database management system
 
Project proposal of school managment software
Project proposal of school managment softwareProject proposal of school managment software
Project proposal of school managment software
 
online exninition system ppt
online exninition system pptonline exninition system ppt
online exninition system ppt
 
Online examination system of open and distance education
Online examination system of open and distance educationOnline examination system of open and distance education
Online examination system of open and distance education
 

Similar to Student Performance Data Mining Project Report

IRJET- Using Data Mining to Predict Students Performance
IRJET-  	  Using Data Mining to Predict Students PerformanceIRJET-  	  Using Data Mining to Predict Students Performance
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
ijtsrd
 
ADMINISTRATION SCORING AND REPORTING.pdf
ADMINISTRATION  SCORING AND REPORTING.pdfADMINISTRATION  SCORING AND REPORTING.pdf
ADMINISTRATION SCORING AND REPORTING.pdf
OM VERMA
 
educ331 Linear Regression for Baseball
educ331 Linear Regression for Baseballeduc331 Linear Regression for Baseball
educ331 Linear Regression for Baseball
boernerj
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IJITE
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IJITE
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
IJITE
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
IJITE
 
B05110409
B05110409B05110409
B05110409
IOSR-JEN
 
C0364010013
C0364010013C0364010013
C0364010013
inventionjournals
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career Prediction
IRJET Journal
 
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam DesignExams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
G. Alex Ambrose
 
IRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan SystemIRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan System
IRJET Journal
 
Primer_NATG12.pptx
Primer_NATG12.pptxPrimer_NATG12.pptx
Primer_NATG12.pptx
ChristyJoyRetanal
 
Automating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdfAutomating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdf
Charlie Congdon
 
Chap 15
Chap 15Chap 15
Chap 15
Umme Rubab
 
WSU new standards mean new baseline
WSU new standards mean new baselineWSU new standards mean new baseline
WSU new standards mean new baseline
Glenn E. Malone, EdD
 
Csrde discriminant analysis final
Csrde discriminant analysis finalCsrde discriminant analysis final
Csrde discriminant analysis final
Arkansas Tech University
 

Similar to Student Performance Data Mining Project Report (20)

IRJET- Using Data Mining to Predict Students Performance
IRJET-  	  Using Data Mining to Predict Students PerformanceIRJET-  	  Using Data Mining to Predict Students Performance
IRJET- Using Data Mining to Predict Students Performance
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
 
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...Multi Criteria Decision Making Methodology on Selection of a Student for All ...
Multi Criteria Decision Making Methodology on Selection of a Student for All ...
 
ADMINISTRATION SCORING AND REPORTING.pdf
ADMINISTRATION  SCORING AND REPORTING.pdfADMINISTRATION  SCORING AND REPORTING.pdf
ADMINISTRATION SCORING AND REPORTING.pdf
 
educ331 Linear Regression for Baseball
educ331 Linear Regression for Baseballeduc331 Linear Regression for Baseball
educ331 Linear Regression for Baseball
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
 
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
IMPROVING FAIRNESS ON STUDENTS’ OVERALL MARKS VIA DYNAMIC RESELECTION OF ASSE...
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
 
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
Improving Fairness on Students' Overall Marks via Dynamic Reselection of Asse...
 
B05110409
B05110409B05110409
B05110409
 
I-ready Research
I-ready ResearchI-ready Research
I-ready Research
 
C0364010013
C0364010013C0364010013
C0364010013
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career Prediction
 
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam DesignExams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
Exams evaluate students. Who’s evaluating exams? Data-Informed Exam Design
 
IRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan SystemIRJET-Student Performance Prediction for Education Loan System
IRJET-Student Performance Prediction for Education Loan System
 
Primer_NATG12.pptx
Primer_NATG12.pptxPrimer_NATG12.pptx
Primer_NATG12.pptx
 
Automating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdfAutomating the Assessment of Learning Outcomes.pdf
Automating the Assessment of Learning Outcomes.pdf
 
Chap 15
Chap 15Chap 15
Chap 15
 
WSU new standards mean new baseline
WSU new standards mean new baselineWSU new standards mean new baseline
WSU new standards mean new baseline
 
Csrde discriminant analysis final
Csrde discriminant analysis finalCsrde discriminant analysis final
Csrde discriminant analysis final
 

Recently uploaded

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 

Recently uploaded (20)

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 

Student Performance Data Mining Project Report

  • 1. Page | 1 PROJECT REPORT STUDENT PERFORMANCE (DATAMINING) BS (SE)2017 GROUP MEMBER(S): NAME: HAFSAHABIB 2017/COMP/BS(SE)-21597 NAME: MUNIBAJAVIAD 2017/COMP/BS(SE)-21621 SUPERVISOR: MISS SADIA JAVED 29TH APRIL, 2019 DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY JINNAH UNIVERSITY FOR WOMEN 5-C NAZIMABAD, KARACHI 74600
  • 2. Page | 2 Table of Contents 1. Introduction.................................................................................................................................3 2. Description of the problem and problem domain....................................................................3 3. Description of implemented data mining techniques/methods...............................................3 3.1. Naïve Bayes Classifier..........................................................................................................3 4. Data Set........................................................................................................................................3 4.1. Exploring the Data Set.........................................................................................................4 4.1.1. General Distribution of Exam Scores ..........................................................................4 4.1.2. Exam scores based on the gender.................................................................................5 4.1.3. Exam scores based on the Parent Level of Education................................................6 4.1.4. Exam scores based on the Lunch Type........................................................................7 4.1.5. Exam scores based on theTest Prepration Course .....................................................7 5. Implementation ...........................................................................................................................8 5.1. Operators ..............................................................................................................................8 6. Results and evaluation/discussion of the results ......................................................................9 7. Future directions/ideas how to extend and enhance the technique......................................10 8. Conclusion .................................................................................................................................10 9. References..................................................................................................................................10
  • 3. Page | 3 1. Introduction Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose. 2. Description of the problem and problem domain To understand the influence of the parent’s background, test preparation etc. on students’ performance. Objectives  Check the dataset and tidying the data if needed.  Visualize the data to understand the effects of different factors on a student performance.  Check the effectiveness of test preparation course.  Check what are the major factors influencing the test scores. 3. Description of implemented data mining techniques/methods 3.1. Naïve Bayes Classifier Bayesian classifiers are statistical classifiers that predict class membership by probabilities, such as the probability that a given sample belongs to a particular class. Naive Bayes algorithms assume that the effect that an attribute plays on a given class is independent of the values of other attributes. However, in practice, dependencies often exist among attributes; hence Bayesian networks are graphical models, which can describe joint conditional probability distributions. Bayesian classifiers are popular classification algorithms due to their simplicity, computational efficiency and very good performance for real-world problems. Another important advantage is also that the Bayesian models are fast to train and to evaluate, and have a high accuracy in many domains. 4. Data Set  Gender: Gender of the student (i.e. Male, Female)  Ethnicity: Ethnicity to which the student belongs (i.e. group A, B, C, D, E)  Parent level of Education: Education level of the parents/guardian of the student (i.e. high school, bachelor’s degree, master’s degree, some college, associate’s degree)  Lunch: Standard of the lunch provided to the student in school (i.e. standard, free/reduced)  Test preparation course: Whether the student took the preparation course (i.e. none, completed)  Math score: Mathematics score of the student (from 0 to 100)  Reading score: Reading score of the student (from 0 to 100)  Writing score: Writing score of the student (from 0 to 100)  Student Performance: Overall performance of the student (i.e. Good, Average, Bad, Worst)
  • 4. Page | 4 4.1. Exploring the Data Set Firstly, We Import the dataset repository and display first few rows of the dataset. 4.1.1. General Distribution of Exam Scores There are 5 features which might affect the scores of each exam. First thing to analyses would be to see how the scores are distributed within each exam (Math’s, Reading, and Writing). We will plot histograms to see if there any differences in the scores' distribution.
  • 5. Page | 5 The scores are distributed in the Gaussian manner. It is hard to draw any conclusion from the graphs above: they all look very similar and we don't have enough data for the plots to look more smoothly. 4.1.2. Exam scores based on the gender Graphical representation of the exam scores’ based on the gender (i.e. Male, Female).
  • 6. Page | 6 4.1.3. Exam scores based on the Parent Level of Education Displaying the mean values as a table or a heat map. Indeed, it seems that a lower parental level of education has a negative impact on the exam scores. A child of parents who’s the highest education level was college or high school has noticeably lower exam scores than their peers. Similarly, parents with master's or bachelor's degree have children who scores much better in the exams.
  • 7. Page | 7 4.1.4. Exam scores based on the Lunch Type It might be amusing to think that type of lunch students have is correlated to their exam scores. On the other hand, we can see from the dataset that there are two types of lunch: standard and free/reduced. So it depends on the parents' financial situation rather than on the type of the dish. There might be some correlation be here, so let's try to visualize the problem. According to above visualization, there is a huge disproportion between students who have a free/reduced lunch when compared to those having standard lunch. 4.1.5. Exam scores based on theTest Prepration Course The last thing we explore in this dataset is to determine how the completion of the test preparation course affects the exam scores by using heat map. There are only two categorical variables: none and completed.
  • 8. Page | 8 5. Implementation This dataset is clean and free of unwanted data. We don’t have to go through the processes of cleaning the data. In our data set Student Performance, we apply Naïve Bayes classification technique. Naïve Bayes classifier is a famous approach for supervised learning. It mainly classifies a test data provided with the fact that training data is used to train up the model. There exist 8 features and 1 label named as Student performance. Student Performance is the class label which needs to be predicted. As the testing data is not separately provided thereby, we will split this dataset for training and testing respectively. We are using the ratio of 70:30 for training and testing respectively. We then train Naïve Bayes model using 70% of the dataset and then classify the rest 30% of the data. After that we Measure performance parameters i.e. accuracy, precision and recall to show how much accurate the model has been for the dataset. 5.1. Operators The details of the operators that are used for the creation of the process are as follows:  Retrieve This Operator can access stored information in the Repository and load them into the Process.  Set Role This Operator is used to change the role of one or more Attributes.  Split data This operator produces the desired number of subsets of the given Data Set.  Naïve Bayes This Operator generates a Naive Bayes classification model.  Apply Model This Operator applies a model on the given Data Set.  Performance This operator is used for performance evaluation. It delivers a list of performance criteria values. These performance criteria are automatically determined in order to fit the learning task type.
  • 9. Page | 9 6. Results and evaluation/discussion of the results Confusion Matrix Here, the result of the process of data set “Student Performance” is shown below in the form of confusion matrix. This table shows the accuracy, class precision and class recall. The following criteria are added for binominal classification tasks:  Accuracy  Precision  Recall Accuracy is calculated by taking the percentage of correct predictions over the total number of examples. Correct prediction means examples where the value of the prediction attribute is equal to the value of the label attribute. Here, the Accuracy of the Student Performance data set is 92.64%
  • 10. Page | 10 7. Future directions/ideas how to extend and enhance the technique By using the process or model we can predict more about the student performances and theirs factors involves with them. In future, this can be implemented in any university by using this process we can calculate the GPA of the student in advance by just knowing their previous GPA. In schools, we can calculate the performance of the worst student so that by knowing the name of those students, teacher may focus more on such type of students. 8. Conclusion We have already seen the insights of the Data, the summary is written below:  135 students failed in mathematics, 90 students failed in reading examination, 114 students failed in writing examination and overall 103 students failed the examination.  Reading score and Writing score are positively linearly correlated with correlation coefficient 0.95(approx.).  Students who belongs to group D in ethnicity performed very well.  Test Preparation Course is very effective. We saw that the students who had completed their test preparation course failed less in number.  Students who take standard lunch performed very well than others.  In case of parental education level, the parents with master's or bachelor's degree have children who scores much better in the exams.  The Accuracy of the Student Performance data set is 92.64% calculated by the naïve Bayes classifier process. 9. References https://www.kaggle.com/spscientist/students-performance-in-exams#StudentsPerformance.csv