SlideShare a Scribd company logo
Group Project
Analyses of Athlete’s BMI and hemoglobin level
Introduction
The purpose of this report is to give an overall understanding of the available data, the
type and description of the data and present analyses on the data. The dataset was collected in
a study on how the characteristics of the athletes’ body varied with sport and sex. The data is
available within the DAAG package in R as `ais`.
Data Description
Initially the data contained 202 observations and 13 variables, few columns which were
not necessary for the analyses were removed reducing the dataset to 202 observations and 8
variables. The table below lists the variables that were selected for the analyses after removing
the unused variables.
Variable Name Data Type Description
Rcc Numeric Red blood cell count
Hg Numeric Hemoglobin concentration, in g per decaliter
BMI Numeric Body Mass index kg
Lbm Numeric Lena body mass, kg
Ht numeric Height, cm
wt Numeric Weight, kg
Sex Factor A factor with levels f for female, m for male
Sport Factor A factor with 9 sports
Data Cleaning
The dataset does not contain any missing values, 5 columns were removed since they
were not used in the analyses.
DataSummary
Summary of the variables in the dataset is shown in figure 1.
Figure 1: Summary of variables in the dataset
 There are 6 numeric variables, 2 factor variables (sex and sport)
Data Analysis
1.0 Initial exploration of data
Initially to get a feel for the data, the variables of interest are BMI (Body Mass Index) and HG
(blood Hemoglobin concentration).
Figure 3: Histogram of BMI
The plots are shown above in figure 2 and 3. Later in the project we do hypothesis testing to
prove statistically that the distributions are normal.
2.0 Empirical CDF
The empirical CDF is a nonparametric way of estimating the underlying CDF of a random
variable. It is a visual display of how quickly the CDF increases to 1.
Figure 5: ECDF of the normal distribution of BMI
The figure 4 above shows the ECDF plot of the normal BMI for comparison, figure 5
shows the ECDF of the sample BMI. The ECDF is close to normal although the ECDF of the
sample BMI is increasing to 1 quickly.
Figure 2: Histogram of hemoglobin level
Figure 4: ECDF of the sample BMI
The ECDF for the blood hemoglobin level vs the normal is plotted.
Figure 7: ECDF of the normal with mean and sd of hg
The ECDF of HG is close to normal as shown in figure 6 and 7.
3.0 BMI Categorization and Confidence interval estimation of BMI mean
Categorical proportions for the BMI variable are calculate using the ecdf function, based
on the following range of BMI values:
 Underweight – BMI less than 18.5
 Normal – BMI between 18.5 and 24.9
 Overweight – BMI between 25 and 30
 Obese – BMI more than 30
The above proportions give a general idea about the proportion of all the athletes, who
belong to different sports taken together.
We are now interested in calculating the confidence interval of BMI of all the athletes.
We can achieve this by applying both non-parametric and parametric model approaches. These
results might be useful in comprehending the better approach.
Figure 6: ECDF of sample hemoglobin concentration
Non-Parametric Model approach:
 Mean - 22.95
 SE – 0.203
 CI – (22.557, 23.354)
Parametric Model approach:
 Mean - 22.956
 SE – 0.201
 CI – (22.562, 23.351)
Both the approaches produce similar results. There does not seemto be a clear winner
here, the confidence band for parametric approach is a little shorter.
HG (haemoglobin count) is another variable of interest. So, we will be estimating the
confidence interval for mean of this variable. Parametric method is the chosen
approach because the histogram of this variable as shown in figure 2 depicts normality.
 Mean - 14.56
 SE – 0.0945
 CI – (14.377, 14.748)
4.0 Hypothesis testing
This section includes all the hypothesis testing that was performed on this dataset.
4.1 Testing to check if the distribution follow normality
Both BMI and HG variables are tested to check if they follow a normal distribution.
Permutation test was employed for this purpose. 1000 random normal numbers from normal
distributions with mean and std. dev. of BMI and HG were generated and tested against the
sample data.
The null hypothesis of the permutation test is that both samples are from the same
distribution, we reject if p-value is less than 0.05.
The results of the test are given below.
BMI:
 P-Value: 0.948
 Conclusion: We do not have evidence against the null. Hence, we cannot reject the null
hypothesis. The p-value is very high and suggests that the null hypothesis is strong.
HG:
 P-Value: 0.71
 Conclusion: We do not have evidence against the null. Hence, we cannot reject the null
hypothesis. The p-value is high and suggests that the null hypothesis is strong.
4.2 Hypothesis to test if BMI varies according to sex
The BMI for males and females were segregated and Wald test was applied. The aimof
this test is to test if sex influences BMI. The null hypothesis is that the difference in the means is
equal to zero.
P-Value obtained from the test is 1.27e-06. Hence, we can reject the null hypothesis;
there is significant difference between the BMI of a male and female.
4.3 Hypotheses testing on HG based on category of sport
Sports in our dataset are segregated in two different categories:
 Endurance Sport: Row, Swim, T_400. Tennis, WaterPolo
 Power Sport: Netball, B_Ball, Field, Gym, TSprint
4.3.1 Endurance sport vs Endurance sport
We are interested in finding out if there is a difference between hg concentration within
the same category of sport. We apply Wald test with null hypothesis that there is no difference
between intra-category sport, the p value is 0.36, which is greater than 0.05 therefore we
cannot reject the null hypothesis.
A permutation test was also employed to test if the samples come from the same
distribution, the p value is 0.638, which is greater than 0.05 therefore the null hypothesis
cannot be rejected. There is insufficient evidence to disprove the null hypothesis.
We observe that the p-value for permutation test is more than that obtained through
Wald test. It implies that the null hypothesis is stronger in case of Permutation test.
4.3.2 Endurance sport vs Power sport
We are now interested in studying if there is a difference between hg count of sports
coming from two different categories. The sports that were selected are Rowing (as endurance
sport) and Netball (as power sport).
The null hypothesis is that the hemoglobin does not vary with its category of sport. p-
value obtained from Wald test is 6.33e-6. The null hypothesis is rejected, there is significant
difference in the hemoglobin concentrations in athletes between the two sport.
Permutation test also gives similar result with a p-value of 0. So, the power of test for
both of these tests is close to 1. Please note that permutation test trumps over Wald test by a
very small margin.
5.0 Maximum Likelihood Estimator
Based on the results in section 4.1, both BMI and HG variables were found to follow
normal distribution. Now the aimis to find the point estimates for the parameter of the normal
distribution using maximum likelihood estimation. The results are given below.
MLE for BMI ~ N(22.95, 2.86)
MLE for HG ~ N(14.57, 1.36)
6.0 Bayesian Analysis
The approach that we have incorporated till now revolved around the Frequentist
philosophy, which involved measures to find out the confidence interval of the estimates
of a parameter. Let us now look at how things differ when we find out the confidence
interval of the parameter itself using the Bayesian approach. We can do this for both the
bmi and hg variables in our dataset
BMI:
Let us assume a prior for the mean bmi, which follows N (0,1)
The posterior for the mean will follow N (22.95, 0.04) (results are calculated based on
the formula derived in class)
Posterior Interval for mean bmi: (22.877, 22.898)
Hg:
Let us assume a prior for the mean hg that follows N(1,2)
The posterior for the mean will follow N(14.56, 0.07)
Posterior Interval for mean bmi: (14.429, 14.465)
In the second case, we have deliberately substituted higher variance for the prior
to understand the effect of higher prior variance on the posterior. We observe that in our
case since the posterior variance is low, the prior variance tends to inflate the former.
Conclusion
 If the distribution of the population is known, the parametric bootstrapping method
gives better results when compared to non-parametric bootstrap
 The power of permutation test is slightly more than that of the Wald test for the chosen
dataset
 In Bayesian approach, the closer the prior is to the posterior, the better is the posterior
prediction
 The Body Mass Index(BMI) depends on the gender
 Hemoglobin Count is dependent on the category of sport

More Related Content

Similar to Statistical Analyses of athlete's BMI and haemoglobin level

14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx
jeanettehully
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian Valles
Adrián Vallés
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
John Michael Croft
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density prediction
IJDKP
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
Statistics Homework Helper
 
Two sample t-test
Two sample t-testTwo sample t-test
Two sample t-test
Stephen Lange
 
Add slides
Add slidesAdd slides
Add slides
Rupa D
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
Shubham Joon
 
Week 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docxWeek 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docx
melbruce90096
 
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docxKines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
DIPESH30
 
Hypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingHypothesis Tests in R Programming
Hypothesis Tests in R Programming
Atacan Garip
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
Zeyang Zhang
 
SoCRA Poster Draft 081115
SoCRA Poster Draft 081115SoCRA Poster Draft 081115
SoCRA Poster Draft 081115
Noah Sheeley
 
BMI vs Devine Poster
BMI vs  Devine PosterBMI vs  Devine Poster
BMI vs Devine Poster
Lindsey Parker
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
Excel Homework Help
 
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
FGV Brazil
 
Chi-square tests are great to show if distributions differ or i.docx
 Chi-square tests are great to show if distributions differ or i.docx Chi-square tests are great to show if distributions differ or i.docx
Chi-square tests are great to show if distributions differ or i.docx
MARRY7
 
metrics_game_paper
metrics_game_papermetrics_game_paper
metrics_game_paper
Yuan Fei
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
Excel Homework Help
 
Lab report Ashtin Spies
Lab report Ashtin SpiesLab report Ashtin Spies
Lab report Ashtin Spies
Ashtin Spies
 

Similar to Statistical Analyses of athlete's BMI and haemoglobin level (20)

14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian Valles
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density prediction
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Two sample t-test
Two sample t-testTwo sample t-test
Two sample t-test
 
Add slides
Add slidesAdd slides
Add slides
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Week 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docxWeek 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docx
 
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docxKines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
 
Hypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingHypothesis Tests in R Programming
Hypothesis Tests in R Programming
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
SoCRA Poster Draft 081115
SoCRA Poster Draft 081115SoCRA Poster Draft 081115
SoCRA Poster Draft 081115
 
BMI vs Devine Poster
BMI vs  Devine PosterBMI vs  Devine Poster
BMI vs Devine Poster
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
 
Chi-square tests are great to show if distributions differ or i.docx
 Chi-square tests are great to show if distributions differ or i.docx Chi-square tests are great to show if distributions differ or i.docx
Chi-square tests are great to show if distributions differ or i.docx
 
metrics_game_paper
metrics_game_papermetrics_game_paper
metrics_game_paper
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Lab report Ashtin Spies
Lab report Ashtin SpiesLab report Ashtin Spies
Lab report Ashtin Spies
 

Recently uploaded

Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 

Recently uploaded (20)

Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 

Statistical Analyses of athlete's BMI and haemoglobin level

  • 1. Group Project Analyses of Athlete’s BMI and hemoglobin level Introduction The purpose of this report is to give an overall understanding of the available data, the type and description of the data and present analyses on the data. The dataset was collected in a study on how the characteristics of the athletes’ body varied with sport and sex. The data is available within the DAAG package in R as `ais`. Data Description Initially the data contained 202 observations and 13 variables, few columns which were not necessary for the analyses were removed reducing the dataset to 202 observations and 8 variables. The table below lists the variables that were selected for the analyses after removing the unused variables. Variable Name Data Type Description Rcc Numeric Red blood cell count Hg Numeric Hemoglobin concentration, in g per decaliter BMI Numeric Body Mass index kg Lbm Numeric Lena body mass, kg Ht numeric Height, cm wt Numeric Weight, kg Sex Factor A factor with levels f for female, m for male Sport Factor A factor with 9 sports
  • 2. Data Cleaning The dataset does not contain any missing values, 5 columns were removed since they were not used in the analyses. DataSummary Summary of the variables in the dataset is shown in figure 1. Figure 1: Summary of variables in the dataset  There are 6 numeric variables, 2 factor variables (sex and sport) Data Analysis 1.0 Initial exploration of data Initially to get a feel for the data, the variables of interest are BMI (Body Mass Index) and HG (blood Hemoglobin concentration).
  • 3. Figure 3: Histogram of BMI The plots are shown above in figure 2 and 3. Later in the project we do hypothesis testing to prove statistically that the distributions are normal. 2.0 Empirical CDF The empirical CDF is a nonparametric way of estimating the underlying CDF of a random variable. It is a visual display of how quickly the CDF increases to 1. Figure 5: ECDF of the normal distribution of BMI The figure 4 above shows the ECDF plot of the normal BMI for comparison, figure 5 shows the ECDF of the sample BMI. The ECDF is close to normal although the ECDF of the sample BMI is increasing to 1 quickly. Figure 2: Histogram of hemoglobin level Figure 4: ECDF of the sample BMI
  • 4. The ECDF for the blood hemoglobin level vs the normal is plotted. Figure 7: ECDF of the normal with mean and sd of hg The ECDF of HG is close to normal as shown in figure 6 and 7. 3.0 BMI Categorization and Confidence interval estimation of BMI mean Categorical proportions for the BMI variable are calculate using the ecdf function, based on the following range of BMI values:  Underweight – BMI less than 18.5  Normal – BMI between 18.5 and 24.9  Overweight – BMI between 25 and 30  Obese – BMI more than 30 The above proportions give a general idea about the proportion of all the athletes, who belong to different sports taken together. We are now interested in calculating the confidence interval of BMI of all the athletes. We can achieve this by applying both non-parametric and parametric model approaches. These results might be useful in comprehending the better approach. Figure 6: ECDF of sample hemoglobin concentration
  • 5. Non-Parametric Model approach:  Mean - 22.95  SE – 0.203  CI – (22.557, 23.354) Parametric Model approach:  Mean - 22.956  SE – 0.201  CI – (22.562, 23.351) Both the approaches produce similar results. There does not seemto be a clear winner here, the confidence band for parametric approach is a little shorter. HG (haemoglobin count) is another variable of interest. So, we will be estimating the confidence interval for mean of this variable. Parametric method is the chosen approach because the histogram of this variable as shown in figure 2 depicts normality.  Mean - 14.56  SE – 0.0945  CI – (14.377, 14.748) 4.0 Hypothesis testing This section includes all the hypothesis testing that was performed on this dataset. 4.1 Testing to check if the distribution follow normality Both BMI and HG variables are tested to check if they follow a normal distribution. Permutation test was employed for this purpose. 1000 random normal numbers from normal distributions with mean and std. dev. of BMI and HG were generated and tested against the sample data. The null hypothesis of the permutation test is that both samples are from the same distribution, we reject if p-value is less than 0.05. The results of the test are given below. BMI:  P-Value: 0.948  Conclusion: We do not have evidence against the null. Hence, we cannot reject the null hypothesis. The p-value is very high and suggests that the null hypothesis is strong.
  • 6. HG:  P-Value: 0.71  Conclusion: We do not have evidence against the null. Hence, we cannot reject the null hypothesis. The p-value is high and suggests that the null hypothesis is strong. 4.2 Hypothesis to test if BMI varies according to sex The BMI for males and females were segregated and Wald test was applied. The aimof this test is to test if sex influences BMI. The null hypothesis is that the difference in the means is equal to zero. P-Value obtained from the test is 1.27e-06. Hence, we can reject the null hypothesis; there is significant difference between the BMI of a male and female. 4.3 Hypotheses testing on HG based on category of sport Sports in our dataset are segregated in two different categories:  Endurance Sport: Row, Swim, T_400. Tennis, WaterPolo  Power Sport: Netball, B_Ball, Field, Gym, TSprint 4.3.1 Endurance sport vs Endurance sport We are interested in finding out if there is a difference between hg concentration within the same category of sport. We apply Wald test with null hypothesis that there is no difference between intra-category sport, the p value is 0.36, which is greater than 0.05 therefore we cannot reject the null hypothesis. A permutation test was also employed to test if the samples come from the same distribution, the p value is 0.638, which is greater than 0.05 therefore the null hypothesis cannot be rejected. There is insufficient evidence to disprove the null hypothesis. We observe that the p-value for permutation test is more than that obtained through Wald test. It implies that the null hypothesis is stronger in case of Permutation test.
  • 7. 4.3.2 Endurance sport vs Power sport We are now interested in studying if there is a difference between hg count of sports coming from two different categories. The sports that were selected are Rowing (as endurance sport) and Netball (as power sport). The null hypothesis is that the hemoglobin does not vary with its category of sport. p- value obtained from Wald test is 6.33e-6. The null hypothesis is rejected, there is significant difference in the hemoglobin concentrations in athletes between the two sport. Permutation test also gives similar result with a p-value of 0. So, the power of test for both of these tests is close to 1. Please note that permutation test trumps over Wald test by a very small margin. 5.0 Maximum Likelihood Estimator Based on the results in section 4.1, both BMI and HG variables were found to follow normal distribution. Now the aimis to find the point estimates for the parameter of the normal distribution using maximum likelihood estimation. The results are given below. MLE for BMI ~ N(22.95, 2.86) MLE for HG ~ N(14.57, 1.36) 6.0 Bayesian Analysis The approach that we have incorporated till now revolved around the Frequentist philosophy, which involved measures to find out the confidence interval of the estimates of a parameter. Let us now look at how things differ when we find out the confidence interval of the parameter itself using the Bayesian approach. We can do this for both the bmi and hg variables in our dataset BMI: Let us assume a prior for the mean bmi, which follows N (0,1) The posterior for the mean will follow N (22.95, 0.04) (results are calculated based on the formula derived in class) Posterior Interval for mean bmi: (22.877, 22.898)
  • 8. Hg: Let us assume a prior for the mean hg that follows N(1,2) The posterior for the mean will follow N(14.56, 0.07) Posterior Interval for mean bmi: (14.429, 14.465) In the second case, we have deliberately substituted higher variance for the prior to understand the effect of higher prior variance on the posterior. We observe that in our case since the posterior variance is low, the prior variance tends to inflate the former. Conclusion  If the distribution of the population is known, the parametric bootstrapping method gives better results when compared to non-parametric bootstrap  The power of permutation test is slightly more than that of the Wald test for the chosen dataset  In Bayesian approach, the closer the prior is to the posterior, the better is the posterior prediction  The Body Mass Index(BMI) depends on the gender  Hemoglobin Count is dependent on the category of sport