SlideShare a Scribd company logo
1 of 6
Download to read offline
1 | P a g e
15th November, 2017
BANA7041 HW: STATISTICAL
METHODS - MODULE II
SECTION 001
PART A WRITTEN/ PART B COMPUTATIONAL
ASSIGNED GROUP NUMBER: 2
GROUP MEMBER NAMES:
Datta, Sourapratim (M12399768)
Kalra, Ravish (M12382149)
Popuri, Venkata Sai Lakshmi Srikanth (M12388241)
2 | P a g e
Alumni Donation Case Study
INTRODUCTION
This study addresses the available constituent database of universities in an effort to identify the
criteria that are the strongest predictors of alumni giving rate at private and public universities
located in U.S.A. The analysis was done largely using R and SAS. After several tentative modifications
on our model, our best result achieves an R2
score of 0.719, which outperforms the baseline model.
DATASET OVERVIEW
The Alumni dataset describes the Alumni Giving Rate of universities in the US. It contains a list of 48
observations in total. Each observation contains the following variables:
• School: The name of the institutions.
• Student/Faculty Ratio: The ratio of the number of students who attend a school or university
divided by the number of faculty in the institution.
• % of Classes Under 20: The percentage of classes with fewer than 20 students
• Private: A categorical variable indicating whether the institution is privately owned or not.
• Alumni Giving Rate: The percentage of alumni giving back to the institutions.
Additional data was collected to improve the prediction rate of the model from the site:
vault.hanover.edu/~dodge/Statistics/DownloadData/Alumni%20Giving.xls
The additional predictor variable used for the model is Graduation Rate - The average graduation
rate of students for the institution in the given period.
Distribution of the variables:
The figures 1 and 2 show the distribution of the variables used in the prediction of the model.
Figure 1 -Distribution of variables
3 | P a g e
Figure 2- Distribution of variables
Relation between the variables
Figure 3 shows the scatter plot between the variables.
Figure 3 - Scatter plot of the variables
4 | P a g e
Graduation Rate % of Classes
Under 20
Student/Faculty
Ratio
Alumni Giving
Rate
Graduation Rate 1.0000000 0.5827884 -0.6049379 0.7559436
% of Classes
Under 20
0.5827884 1.0000000 -0.7855593 0.6456504
Student/Faculty
Ratio
-0.6049379 -0.7855593 1.0000000 -0.7423975
Alumni Giving
Rate
0.7559436 0.6456504 -0.7423975 1.0000000
Table 1 - Correlation between the variables
As can be observed from the scatter plot, there may be correlation between the Graduation rate and
the Alumni Giving Rate, which is confirmed from the correlation table which shows 0.75 correlation
between the two variables.
FITTING LINEAR REGRESSION MODEL
Base Model
A multiple linear regression using the Alumni Donation Data was conducted.
• The Alumni Giving Rate served as the response variable (Y)
Three predictor variables were used in the model:
• Percentage of classes with fewer than 20 students (X1)
• Student/faculty Ratio (X2)
• Private (X3)
The initial estimated model is given as
𝒀̂ = 36.784 + 0.077X1 – 1.398X2 + 6.285X3
Improved model
Performing a partial F-test with the additional variable Graduation Rate (X4) in our improved model,
we get an F-statistic value of 113.2 with p-value of 2.666e-05. Given a significance level of 0.05, and
under the null hypothesis: β5 (coefficient of X4) = 0, we reject the null hypothesis.
The improved model is thus given by:
𝒀̂ = -25.549 – 0.0809X1 – 0.815X2 + 7.555X3 + 0.7652X4
This model suggests that the following:
• With every unit increase in the percentage of classes with under 20 students, the Alumni
Giving Rate decreases by 0.0809.
5 | P a g e
• With every unit increase in the Student/ Faculty Ratio, the Alumni Giving Rate decreases by
0.815.
• Private institutions have 7.555 times more Alumni Donation Rate than public institutions.
• With a unit increase in Graduation Rate, the Alumni donation rate increases by 0.7652.
MODEL DIAGNOSTICS
Performing diagnostics on the residuals of the improved model, and based on the plotted graph in
figure 5, the following conclusions can be made about the model residuals:
• Leaving just point 21 and 33 in the residual v/s fitted graph (Figure 4 top left), the residuals
appear to follow constant variance. No specific quadratic pattern is visible, indicating that
linearity of the regression function holds.
• Normal Q-Q plot (Figure 4 top right) - the normality is violated towards both the ends of the
line. Point 21 is notably far on top of the curve but overall this model’s residuals appear quite
normally distributed. With reference to previous homework, the residuals from this model
appear more normally distributed.
• Checking for model outliers, suggest that point 21 with a value of 21.88 is an outlier. Refer
to Figure 5
• The points 9, 21, 26 and 43 were found to be influential points. Refer to Table-2
• A point is categorized as a high leverage point if the value of hii associated with an
observation is greater than 2p/n, where p is the number of parameters and n is the sample
size. For this model, 2p/n is equal to 0.208. Based on this threshold, there are 2 points in the
data that are high leverage points. These points are the observations 1 and 43.
Figure 4- Model Diagnostics
6 | P a g e
Figure 5- Residual Boxplot
Influence measures of
lm(formula = y ~ x1 + x2 + x3 + x4) :
dfb.x1 dfb.x2 dfb.x3 dfb.x4 dffit cov.r cook.d hat inf
9 0.01 -0.01 -0.01 -0.01 -0.01 -0.01 1.35 4.12e-05 0.17 *
21 -0.05 -0.50 -0.03 -0.35 0.39 1.03 0.36 1.71e-01 0.08 *
26 0.04 -0.03 -0.07 0.06 -0.01 -0.10 1.37 2.00e-03 0.19 *
43 -0.22 0.21 0.07 0.14 0.12 -0.26 1.41 1.42e-02 0.20 *
Table 2- Influence and Leverage Points
CONCLUSION
Evaluation Metrics
R2
value, which is a common statistical measurement of regression model that accounts for the
variations explained by the model to the given values, has been used to evaluate the model.
Result
Two models having different predictor variables have been compared based on the R2
value.
• Base Model: This model having 3 predictor variables gave us a Multiple R2
value of 0.5747
and an Adjusted R2
of 0.5457.
• Improved Model: This model having an additional predictor variable gave us a Multiple R2
value of 0.7191 and an Adjusted R2
of 0.693. Using Graduation Rate as an additional predictor
for our Linear Regression Model leads to a better Multiple and Adjusted R2
value.

More Related Content

What's hot

What's hot (20)

Chapter03
Chapter03Chapter03
Chapter03
 
Central tendency m,m,m 1.2
Central tendency m,m,m 1.2Central tendency m,m,m 1.2
Central tendency m,m,m 1.2
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and Deciles
 
lesson 4 measures of central tendency copy
lesson 4 measures of central tendency   copylesson 4 measures of central tendency   copy
lesson 4 measures of central tendency copy
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendency
 
Mann Whitney U Test
Mann Whitney U TestMann Whitney U Test
Mann Whitney U Test
 
Measure of central tendency
Measure of central tendency Measure of central tendency
Measure of central tendency
 
Graphs that Enlighten and Graphs that Deceive
Graphs that Enlighten and Graphs that DeceiveGraphs that Enlighten and Graphs that Deceive
Graphs that Enlighten and Graphs that Deceive
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
numerical method in statistics (MEAN AND MEDIAN)
numerical method in statistics (MEAN AND MEDIAN)numerical method in statistics (MEAN AND MEDIAN)
numerical method in statistics (MEAN AND MEDIAN)
 
2 way anova
2 way anova2 way anova
2 way anova
 
4. parameter and statistic
4. parameter and statistic4. parameter and statistic
4. parameter and statistic
 
Percentile For Grouped Data
Percentile For Grouped Data Percentile For Grouped Data
Percentile For Grouped Data
 
Stastistics
StastisticsStastistics
Stastistics
 
Mod 4 data presentation graphs bar charts tables
Mod 4 data presentation graphs bar charts tablesMod 4 data presentation graphs bar charts tables
Mod 4 data presentation graphs bar charts tables
 
Measures of Central Tendancy
Measures of Central TendancyMeasures of Central Tendancy
Measures of Central Tendancy
 
Measurement of central tendency
Measurement of central tendencyMeasurement of central tendency
Measurement of central tendency
 
Module 2 statistics
Module 2   statisticsModule 2   statistics
Module 2 statistics
 

Similar to Predicting Alumni Donation Rate

IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...IRJET Journal
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Alumni Donation - Complete exploration and analysis report
Alumni Donation - Complete exploration and analysis reportAlumni Donation - Complete exploration and analysis report
Alumni Donation - Complete exploration and analysis reportJatin Saini
 
Aligning benchmarks with high stakes assessments 2010
Aligning benchmarks with high stakes assessments 2010Aligning benchmarks with high stakes assessments 2010
Aligning benchmarks with high stakes assessments 2010dvodicka
 
Paper planes short ver linkedin
Paper planes  short ver   linkedinPaper planes  short ver   linkedin
Paper planes short ver linkedinHimanshu Agarwal
 
Statistical Techniques in Business and Economics 15th Edition Lind Test Bank
Statistical Techniques in Business and Economics 15th Edition Lind Test BankStatistical Techniques in Business and Economics 15th Edition Lind Test Bank
Statistical Techniques in Business and Economics 15th Edition Lind Test BankClaresaLan
 
Aligning Benchmarks With High Stakes Assessments 2009
Aligning Benchmarks With High Stakes Assessments 2009Aligning Benchmarks With High Stakes Assessments 2009
Aligning Benchmarks With High Stakes Assessments 2009dvodicka
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Splay Method of Model Acquisition Assessment
Splay Method of Model Acquisition AssessmentSplay Method of Model Acquisition Assessment
Splay Method of Model Acquisition Assessmentijtsrd
 
Data Clustering in Education for Students
Data Clustering in Education for StudentsData Clustering in Education for Students
Data Clustering in Education for StudentsIRJET Journal
 
TOPIC 9 VARIABILITY TESTS.pdf
TOPIC 9 VARIABILITY TESTS.pdfTOPIC 9 VARIABILITY TESTS.pdf
TOPIC 9 VARIABILITY TESTS.pdfEdwin Osiyel
 
Regression Analysis.ppt
Regression Analysis.pptRegression Analysis.ppt
Regression Analysis.pptAbebe334138
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptxiris765749
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptxBHUSHANKPATEL
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsShantanu Deshpande
 

Similar to Predicting Alumni Donation Rate (20)

Chapter3
Chapter3Chapter3
Chapter3
 
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
IRJET- Analysis of Chi-Square Independence Test for Naïve Bayes Feature Selec...
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
MidTerm memo
MidTerm memoMidTerm memo
MidTerm memo
 
Alumni Donation - Complete exploration and analysis report
Alumni Donation - Complete exploration and analysis reportAlumni Donation - Complete exploration and analysis report
Alumni Donation - Complete exploration and analysis report
 
Aligning benchmarks with high stakes assessments 2010
Aligning benchmarks with high stakes assessments 2010Aligning benchmarks with high stakes assessments 2010
Aligning benchmarks with high stakes assessments 2010
 
Paper planes short ver linkedin
Paper planes  short ver   linkedinPaper planes  short ver   linkedin
Paper planes short ver linkedin
 
Statistical Techniques in Business and Economics 15th Edition Lind Test Bank
Statistical Techniques in Business and Economics 15th Edition Lind Test BankStatistical Techniques in Business and Economics 15th Edition Lind Test Bank
Statistical Techniques in Business and Economics 15th Edition Lind Test Bank
 
Aligning Benchmarks With High Stakes Assessments 2009
Aligning Benchmarks With High Stakes Assessments 2009Aligning Benchmarks With High Stakes Assessments 2009
Aligning Benchmarks With High Stakes Assessments 2009
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
 
Splay Method of Model Acquisition Assessment
Splay Method of Model Acquisition AssessmentSplay Method of Model Acquisition Assessment
Splay Method of Model Acquisition Assessment
 
Chap003.ppt
Chap003.pptChap003.ppt
Chap003.ppt
 
Data Clustering in Education for Students
Data Clustering in Education for StudentsData Clustering in Education for Students
Data Clustering in Education for Students
 
TOPIC 9 VARIABILITY TESTS.pdf
TOPIC 9 VARIABILITY TESTS.pdfTOPIC 9 VARIABILITY TESTS.pdf
TOPIC 9 VARIABILITY TESTS.pdf
 
Regression Analysis.ppt
Regression Analysis.pptRegression Analysis.ppt
Regression Analysis.ppt
 
bayes_proj
bayes_projbayes_proj
bayes_proj
 
Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
 
Dm
DmDm
Dm
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptx
 
X18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalyticsX18125514 ca2-statisticsfor dataanalytics
X18125514 ca2-statisticsfor dataanalytics
 

Recently uploaded

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Predicting Alumni Donation Rate

  • 1. 1 | P a g e 15th November, 2017 BANA7041 HW: STATISTICAL METHODS - MODULE II SECTION 001 PART A WRITTEN/ PART B COMPUTATIONAL ASSIGNED GROUP NUMBER: 2 GROUP MEMBER NAMES: Datta, Sourapratim (M12399768) Kalra, Ravish (M12382149) Popuri, Venkata Sai Lakshmi Srikanth (M12388241)
  • 2. 2 | P a g e Alumni Donation Case Study INTRODUCTION This study addresses the available constituent database of universities in an effort to identify the criteria that are the strongest predictors of alumni giving rate at private and public universities located in U.S.A. The analysis was done largely using R and SAS. After several tentative modifications on our model, our best result achieves an R2 score of 0.719, which outperforms the baseline model. DATASET OVERVIEW The Alumni dataset describes the Alumni Giving Rate of universities in the US. It contains a list of 48 observations in total. Each observation contains the following variables: • School: The name of the institutions. • Student/Faculty Ratio: The ratio of the number of students who attend a school or university divided by the number of faculty in the institution. • % of Classes Under 20: The percentage of classes with fewer than 20 students • Private: A categorical variable indicating whether the institution is privately owned or not. • Alumni Giving Rate: The percentage of alumni giving back to the institutions. Additional data was collected to improve the prediction rate of the model from the site: vault.hanover.edu/~dodge/Statistics/DownloadData/Alumni%20Giving.xls The additional predictor variable used for the model is Graduation Rate - The average graduation rate of students for the institution in the given period. Distribution of the variables: The figures 1 and 2 show the distribution of the variables used in the prediction of the model. Figure 1 -Distribution of variables
  • 3. 3 | P a g e Figure 2- Distribution of variables Relation between the variables Figure 3 shows the scatter plot between the variables. Figure 3 - Scatter plot of the variables
  • 4. 4 | P a g e Graduation Rate % of Classes Under 20 Student/Faculty Ratio Alumni Giving Rate Graduation Rate 1.0000000 0.5827884 -0.6049379 0.7559436 % of Classes Under 20 0.5827884 1.0000000 -0.7855593 0.6456504 Student/Faculty Ratio -0.6049379 -0.7855593 1.0000000 -0.7423975 Alumni Giving Rate 0.7559436 0.6456504 -0.7423975 1.0000000 Table 1 - Correlation between the variables As can be observed from the scatter plot, there may be correlation between the Graduation rate and the Alumni Giving Rate, which is confirmed from the correlation table which shows 0.75 correlation between the two variables. FITTING LINEAR REGRESSION MODEL Base Model A multiple linear regression using the Alumni Donation Data was conducted. • The Alumni Giving Rate served as the response variable (Y) Three predictor variables were used in the model: • Percentage of classes with fewer than 20 students (X1) • Student/faculty Ratio (X2) • Private (X3) The initial estimated model is given as 𝒀̂ = 36.784 + 0.077X1 – 1.398X2 + 6.285X3 Improved model Performing a partial F-test with the additional variable Graduation Rate (X4) in our improved model, we get an F-statistic value of 113.2 with p-value of 2.666e-05. Given a significance level of 0.05, and under the null hypothesis: β5 (coefficient of X4) = 0, we reject the null hypothesis. The improved model is thus given by: 𝒀̂ = -25.549 – 0.0809X1 – 0.815X2 + 7.555X3 + 0.7652X4 This model suggests that the following: • With every unit increase in the percentage of classes with under 20 students, the Alumni Giving Rate decreases by 0.0809.
  • 5. 5 | P a g e • With every unit increase in the Student/ Faculty Ratio, the Alumni Giving Rate decreases by 0.815. • Private institutions have 7.555 times more Alumni Donation Rate than public institutions. • With a unit increase in Graduation Rate, the Alumni donation rate increases by 0.7652. MODEL DIAGNOSTICS Performing diagnostics on the residuals of the improved model, and based on the plotted graph in figure 5, the following conclusions can be made about the model residuals: • Leaving just point 21 and 33 in the residual v/s fitted graph (Figure 4 top left), the residuals appear to follow constant variance. No specific quadratic pattern is visible, indicating that linearity of the regression function holds. • Normal Q-Q plot (Figure 4 top right) - the normality is violated towards both the ends of the line. Point 21 is notably far on top of the curve but overall this model’s residuals appear quite normally distributed. With reference to previous homework, the residuals from this model appear more normally distributed. • Checking for model outliers, suggest that point 21 with a value of 21.88 is an outlier. Refer to Figure 5 • The points 9, 21, 26 and 43 were found to be influential points. Refer to Table-2 • A point is categorized as a high leverage point if the value of hii associated with an observation is greater than 2p/n, where p is the number of parameters and n is the sample size. For this model, 2p/n is equal to 0.208. Based on this threshold, there are 2 points in the data that are high leverage points. These points are the observations 1 and 43. Figure 4- Model Diagnostics
  • 6. 6 | P a g e Figure 5- Residual Boxplot Influence measures of lm(formula = y ~ x1 + x2 + x3 + x4) : dfb.x1 dfb.x2 dfb.x3 dfb.x4 dffit cov.r cook.d hat inf 9 0.01 -0.01 -0.01 -0.01 -0.01 -0.01 1.35 4.12e-05 0.17 * 21 -0.05 -0.50 -0.03 -0.35 0.39 1.03 0.36 1.71e-01 0.08 * 26 0.04 -0.03 -0.07 0.06 -0.01 -0.10 1.37 2.00e-03 0.19 * 43 -0.22 0.21 0.07 0.14 0.12 -0.26 1.41 1.42e-02 0.20 * Table 2- Influence and Leverage Points CONCLUSION Evaluation Metrics R2 value, which is a common statistical measurement of regression model that accounts for the variations explained by the model to the given values, has been used to evaluate the model. Result Two models having different predictor variables have been compared based on the R2 value. • Base Model: This model having 3 predictor variables gave us a Multiple R2 value of 0.5747 and an Adjusted R2 of 0.5457. • Improved Model: This model having an additional predictor variable gave us a Multiple R2 value of 0.7191 and an Adjusted R2 of 0.693. Using Graduation Rate as an additional predictor for our Linear Regression Model leads to a better Multiple and Adjusted R2 value.