SlideShare a Scribd company logo
1 of 8
Group Project
Analyses of Athlete’s BMI and hemoglobin level
Introduction
The purpose of this report is to give an overall understanding of the available data, the
type and description of the data and present analyses on the data. The dataset was collected in
a study on how the characteristics of the athletes’ body varied with sport and sex. The data is
available within the DAAG package in R as `ais`.
Data Description
Initially the data contained 202 observations and 13 variables, few columns which were
not necessary for the analyses were removed reducing the dataset to 202 observations and 8
variables. The table below lists the variables that were selected for the analyses after removing
the unused variables.
Variable Name Data Type Description
Rcc Numeric Red blood cell count
Hg Numeric Hemoglobin concentration, in g per decaliter
BMI Numeric Body Mass index kg
Lbm Numeric Lena body mass, kg
Ht numeric Height, cm
wt Numeric Weight, kg
Sex Factor A factor with levels f for female, m for male
Sport Factor A factor with 9 sports
Data Cleaning
The dataset does not contain any missing values, 5 columns were removed since they
were not used in the analyses.
DataSummary
Summary of the variables in the dataset is shown in figure 1.
Figure 1: Summary of variables in the dataset
 There are 6 numeric variables, 2 factor variables (sex and sport)
Data Analysis
1.0 Initial exploration of data
Initially to get a feel for the data, the variables of interest are BMI (Body Mass Index) and HG
(blood Hemoglobin concentration).
Figure 3: Histogram of BMI
The plots are shown above in figure 2 and 3. Later in the project we do hypothesis testing to
prove statistically that the distributions are normal.
2.0 Empirical CDF
The empirical CDF is a nonparametric way of estimating the underlying CDF of a random
variable. It is a visual display of how quickly the CDF increases to 1.
Figure 5: ECDF of the normal distribution of BMI
The figure 4 above shows the ECDF plot of the normal BMI for comparison, figure 5
shows the ECDF of the sample BMI. The ECDF is close to normal although the ECDF of the
sample BMI is increasing to 1 quickly.
Figure 2: Histogram of hemoglobin level
Figure 4: ECDF of the sample BMI
The ECDF for the blood hemoglobin level vs the normal is plotted.
Figure 7: ECDF of the normal with mean and sd of hg
The ECDF of HG is close to normal as shown in figure 6 and 7.
3.0 BMI Categorization and Confidence interval estimation of BMI mean
Categorical proportions for the BMI variable are calculate using the ecdf function, based
on the following range of BMI values:
 Underweight – BMI less than 18.5
 Normal – BMI between 18.5 and 24.9
 Overweight – BMI between 25 and 30
 Obese – BMI more than 30
The above proportions give a general idea about the proportion of all the athletes, who
belong to different sports taken together.
We are now interested in calculating the confidence interval of BMI of all the athletes.
We can achieve this by applying both non-parametric and parametric model approaches. These
results might be useful in comprehending the better approach.
Figure 6: ECDF of sample hemoglobin concentration
Non-Parametric Model approach:
 Mean - 22.95
 SE – 0.203
 CI – (22.557, 23.354)
Parametric Model approach:
 Mean - 22.956
 SE – 0.201
 CI – (22.562, 23.351)
Both the approaches produce similar results. There does not seemto be a clear winner
here, the confidence band for parametric approach is a little shorter.
HG (haemoglobin count) is another variable of interest. So, we will be estimating the
confidence interval for mean of this variable. Parametric method is the chosen
approach because the histogram of this variable as shown in figure 2 depicts normality.
 Mean - 14.56
 SE – 0.0945
 CI – (14.377, 14.748)
4.0 Hypothesis testing
This section includes all the hypothesis testing that was performed on this dataset.
4.1 Testing to check if the distribution follow normality
Both BMI and HG variables are tested to check if they follow a normal distribution.
Permutation test was employed for this purpose. 1000 random normal numbers from normal
distributions with mean and std. dev. of BMI and HG were generated and tested against the
sample data.
The null hypothesis of the permutation test is that both samples are from the same
distribution, we reject if p-value is less than 0.05.
The results of the test are given below.
BMI:
 P-Value: 0.948
 Conclusion: We do not have evidence against the null. Hence, we cannot reject the null
hypothesis. The p-value is very high and suggests that the null hypothesis is strong.
HG:
 P-Value: 0.71
 Conclusion: We do not have evidence against the null. Hence, we cannot reject the null
hypothesis. The p-value is high and suggests that the null hypothesis is strong.
4.2 Hypothesis to test if BMI varies according to sex
The BMI for males and females were segregated and Wald test was applied. The aimof
this test is to test if sex influences BMI. The null hypothesis is that the difference in the means is
equal to zero.
P-Value obtained from the test is 1.27e-06. Hence, we can reject the null hypothesis;
there is significant difference between the BMI of a male and female.
4.3 Hypotheses testing on HG based on category of sport
Sports in our dataset are segregated in two different categories:
 Endurance Sport: Row, Swim, T_400. Tennis, WaterPolo
 Power Sport: Netball, B_Ball, Field, Gym, TSprint
4.3.1 Endurance sport vs Endurance sport
We are interested in finding out if there is a difference between hg concentration within
the same category of sport. We apply Wald test with null hypothesis that there is no difference
between intra-category sport, the p value is 0.36, which is greater than 0.05 therefore we
cannot reject the null hypothesis.
A permutation test was also employed to test if the samples come from the same
distribution, the p value is 0.638, which is greater than 0.05 therefore the null hypothesis
cannot be rejected. There is insufficient evidence to disprove the null hypothesis.
We observe that the p-value for permutation test is more than that obtained through
Wald test. It implies that the null hypothesis is stronger in case of Permutation test.
4.3.2 Endurance sport vs Power sport
We are now interested in studying if there is a difference between hg count of sports
coming from two different categories. The sports that were selected are Rowing (as endurance
sport) and Netball (as power sport).
The null hypothesis is that the hemoglobin does not vary with its category of sport. p-
value obtained from Wald test is 6.33e-6. The null hypothesis is rejected, there is significant
difference in the hemoglobin concentrations in athletes between the two sport.
Permutation test also gives similar result with a p-value of 0. So, the power of test for
both of these tests is close to 1. Please note that permutation test trumps over Wald test by a
very small margin.
5.0 Maximum Likelihood Estimator
Based on the results in section 4.1, both BMI and HG variables were found to follow
normal distribution. Now the aimis to find the point estimates for the parameter of the normal
distribution using maximum likelihood estimation. The results are given below.
MLE for BMI ~ N(22.95, 2.86)
MLE for HG ~ N(14.57, 1.36)
6.0 Bayesian Analysis
The approach that we have incorporated till now revolved around the Frequentist
philosophy, which involved measures to find out the confidence interval of the estimates
of a parameter. Let us now look at how things differ when we find out the confidence
interval of the parameter itself using the Bayesian approach. We can do this for both the
bmi and hg variables in our dataset
BMI:
Let us assume a prior for the mean bmi, which follows N (0,1)
The posterior for the mean will follow N (22.95, 0.04) (results are calculated based on
the formula derived in class)
Posterior Interval for mean bmi: (22.877, 22.898)
Hg:
Let us assume a prior for the mean hg that follows N(1,2)
The posterior for the mean will follow N(14.56, 0.07)
Posterior Interval for mean bmi: (14.429, 14.465)
In the second case, we have deliberately substituted higher variance for the prior
to understand the effect of higher prior variance on the posterior. We observe that in our
case since the posterior variance is low, the prior variance tends to inflate the former.
Conclusion
 If the distribution of the population is known, the parametric bootstrapping method
gives better results when compared to non-parametric bootstrap
 The power of permutation test is slightly more than that of the Wald test for the chosen
dataset
 In Bayesian approach, the closer the prior is to the posterior, the better is the posterior
prediction
 The Body Mass Index(BMI) depends on the gender
 Hemoglobin Count is dependent on the category of sport

More Related Content

Similar to Statistical Analyses of athlete's BMI and haemoglobin level

14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docxjeanettehully
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesAdrián Vallés
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density predictionIJDKP
 
Add slides
Add slidesAdd slides
Add slidesRupa D
 
Econometrics project
Econometrics projectEconometrics project
Econometrics projectShubham Joon
 
Week 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docxWeek 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docxmelbruce90096
 
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docxKines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docxDIPESH30
 
Hypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingHypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingAtacan Garip
 
Heart Stats(4)-1
Heart Stats(4)-1Heart Stats(4)-1
Heart Stats(4)-1Lei Barr
 
Project Presentation
Project PresentationProject Presentation
Project PresentationZeyang Zhang
 
SoCRA Poster Draft 081115
SoCRA Poster Draft 081115SoCRA Poster Draft 081115
SoCRA Poster Draft 081115Noah Sheeley
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpExcel Homework Help
 
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...FGV Brazil
 
Chi-square tests are great to show if distributions differ or i.docx
 Chi-square tests are great to show if distributions differ or i.docx Chi-square tests are great to show if distributions differ or i.docx
Chi-square tests are great to show if distributions differ or i.docxMARRY7
 
metrics_game_paper
metrics_game_papermetrics_game_paper
metrics_game_paperYuan Fei
 

Similar to Statistical Analyses of athlete's BMI and haemoglobin level (20)

14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx14 + 8 Answers and calculations as basic statistics student would ex.docx
14 + 8 Answers and calculations as basic statistics student would ex.docx
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian Valles
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density prediction
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Two sample t-test
Two sample t-testTwo sample t-test
Two sample t-test
 
Add slides
Add slidesAdd slides
Add slides
 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
 
Week 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docxWeek 2 Assignment1. A. What is the probability of rolling a four.docx
Week 2 Assignment1. A. What is the probability of rolling a four.docx
 
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docxKines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
Kines 260 Take Home FinalNameDue Friday December 12th at 11 A.docx
 
Hypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingHypothesis Tests in R Programming
Hypothesis Tests in R Programming
 
Heart Stats(4)-1
Heart Stats(4)-1Heart Stats(4)-1
Heart Stats(4)-1
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
SoCRA Poster Draft 081115
SoCRA Poster Draft 081115SoCRA Poster Draft 081115
SoCRA Poster Draft 081115
 
BMI vs Devine Poster
BMI vs  Devine PosterBMI vs  Devine Poster
BMI vs Devine Poster
 
Multiple Linear Regression Homework Help
Multiple Linear Regression Homework HelpMultiple Linear Regression Homework Help
Multiple Linear Regression Homework Help
 
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
Applying the Bootstrap Techniques in Detecting Turning Points: a Study of Con...
 
Chi-square tests are great to show if distributions differ or i.docx
 Chi-square tests are great to show if distributions differ or i.docx Chi-square tests are great to show if distributions differ or i.docx
Chi-square tests are great to show if distributions differ or i.docx
 
metrics_game_paper
metrics_game_papermetrics_game_paper
metrics_game_paper
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Statistical Analyses of athlete's BMI and haemoglobin level

  • 1. Group Project Analyses of Athlete’s BMI and hemoglobin level Introduction The purpose of this report is to give an overall understanding of the available data, the type and description of the data and present analyses on the data. The dataset was collected in a study on how the characteristics of the athletes’ body varied with sport and sex. The data is available within the DAAG package in R as `ais`. Data Description Initially the data contained 202 observations and 13 variables, few columns which were not necessary for the analyses were removed reducing the dataset to 202 observations and 8 variables. The table below lists the variables that were selected for the analyses after removing the unused variables. Variable Name Data Type Description Rcc Numeric Red blood cell count Hg Numeric Hemoglobin concentration, in g per decaliter BMI Numeric Body Mass index kg Lbm Numeric Lena body mass, kg Ht numeric Height, cm wt Numeric Weight, kg Sex Factor A factor with levels f for female, m for male Sport Factor A factor with 9 sports
  • 2. Data Cleaning The dataset does not contain any missing values, 5 columns were removed since they were not used in the analyses. DataSummary Summary of the variables in the dataset is shown in figure 1. Figure 1: Summary of variables in the dataset  There are 6 numeric variables, 2 factor variables (sex and sport) Data Analysis 1.0 Initial exploration of data Initially to get a feel for the data, the variables of interest are BMI (Body Mass Index) and HG (blood Hemoglobin concentration).
  • 3. Figure 3: Histogram of BMI The plots are shown above in figure 2 and 3. Later in the project we do hypothesis testing to prove statistically that the distributions are normal. 2.0 Empirical CDF The empirical CDF is a nonparametric way of estimating the underlying CDF of a random variable. It is a visual display of how quickly the CDF increases to 1. Figure 5: ECDF of the normal distribution of BMI The figure 4 above shows the ECDF plot of the normal BMI for comparison, figure 5 shows the ECDF of the sample BMI. The ECDF is close to normal although the ECDF of the sample BMI is increasing to 1 quickly. Figure 2: Histogram of hemoglobin level Figure 4: ECDF of the sample BMI
  • 4. The ECDF for the blood hemoglobin level vs the normal is plotted. Figure 7: ECDF of the normal with mean and sd of hg The ECDF of HG is close to normal as shown in figure 6 and 7. 3.0 BMI Categorization and Confidence interval estimation of BMI mean Categorical proportions for the BMI variable are calculate using the ecdf function, based on the following range of BMI values:  Underweight – BMI less than 18.5  Normal – BMI between 18.5 and 24.9  Overweight – BMI between 25 and 30  Obese – BMI more than 30 The above proportions give a general idea about the proportion of all the athletes, who belong to different sports taken together. We are now interested in calculating the confidence interval of BMI of all the athletes. We can achieve this by applying both non-parametric and parametric model approaches. These results might be useful in comprehending the better approach. Figure 6: ECDF of sample hemoglobin concentration
  • 5. Non-Parametric Model approach:  Mean - 22.95  SE – 0.203  CI – (22.557, 23.354) Parametric Model approach:  Mean - 22.956  SE – 0.201  CI – (22.562, 23.351) Both the approaches produce similar results. There does not seemto be a clear winner here, the confidence band for parametric approach is a little shorter. HG (haemoglobin count) is another variable of interest. So, we will be estimating the confidence interval for mean of this variable. Parametric method is the chosen approach because the histogram of this variable as shown in figure 2 depicts normality.  Mean - 14.56  SE – 0.0945  CI – (14.377, 14.748) 4.0 Hypothesis testing This section includes all the hypothesis testing that was performed on this dataset. 4.1 Testing to check if the distribution follow normality Both BMI and HG variables are tested to check if they follow a normal distribution. Permutation test was employed for this purpose. 1000 random normal numbers from normal distributions with mean and std. dev. of BMI and HG were generated and tested against the sample data. The null hypothesis of the permutation test is that both samples are from the same distribution, we reject if p-value is less than 0.05. The results of the test are given below. BMI:  P-Value: 0.948  Conclusion: We do not have evidence against the null. Hence, we cannot reject the null hypothesis. The p-value is very high and suggests that the null hypothesis is strong.
  • 6. HG:  P-Value: 0.71  Conclusion: We do not have evidence against the null. Hence, we cannot reject the null hypothesis. The p-value is high and suggests that the null hypothesis is strong. 4.2 Hypothesis to test if BMI varies according to sex The BMI for males and females were segregated and Wald test was applied. The aimof this test is to test if sex influences BMI. The null hypothesis is that the difference in the means is equal to zero. P-Value obtained from the test is 1.27e-06. Hence, we can reject the null hypothesis; there is significant difference between the BMI of a male and female. 4.3 Hypotheses testing on HG based on category of sport Sports in our dataset are segregated in two different categories:  Endurance Sport: Row, Swim, T_400. Tennis, WaterPolo  Power Sport: Netball, B_Ball, Field, Gym, TSprint 4.3.1 Endurance sport vs Endurance sport We are interested in finding out if there is a difference between hg concentration within the same category of sport. We apply Wald test with null hypothesis that there is no difference between intra-category sport, the p value is 0.36, which is greater than 0.05 therefore we cannot reject the null hypothesis. A permutation test was also employed to test if the samples come from the same distribution, the p value is 0.638, which is greater than 0.05 therefore the null hypothesis cannot be rejected. There is insufficient evidence to disprove the null hypothesis. We observe that the p-value for permutation test is more than that obtained through Wald test. It implies that the null hypothesis is stronger in case of Permutation test.
  • 7. 4.3.2 Endurance sport vs Power sport We are now interested in studying if there is a difference between hg count of sports coming from two different categories. The sports that were selected are Rowing (as endurance sport) and Netball (as power sport). The null hypothesis is that the hemoglobin does not vary with its category of sport. p- value obtained from Wald test is 6.33e-6. The null hypothesis is rejected, there is significant difference in the hemoglobin concentrations in athletes between the two sport. Permutation test also gives similar result with a p-value of 0. So, the power of test for both of these tests is close to 1. Please note that permutation test trumps over Wald test by a very small margin. 5.0 Maximum Likelihood Estimator Based on the results in section 4.1, both BMI and HG variables were found to follow normal distribution. Now the aimis to find the point estimates for the parameter of the normal distribution using maximum likelihood estimation. The results are given below. MLE for BMI ~ N(22.95, 2.86) MLE for HG ~ N(14.57, 1.36) 6.0 Bayesian Analysis The approach that we have incorporated till now revolved around the Frequentist philosophy, which involved measures to find out the confidence interval of the estimates of a parameter. Let us now look at how things differ when we find out the confidence interval of the parameter itself using the Bayesian approach. We can do this for both the bmi and hg variables in our dataset BMI: Let us assume a prior for the mean bmi, which follows N (0,1) The posterior for the mean will follow N (22.95, 0.04) (results are calculated based on the formula derived in class) Posterior Interval for mean bmi: (22.877, 22.898)
  • 8. Hg: Let us assume a prior for the mean hg that follows N(1,2) The posterior for the mean will follow N(14.56, 0.07) Posterior Interval for mean bmi: (14.429, 14.465) In the second case, we have deliberately substituted higher variance for the prior to understand the effect of higher prior variance on the posterior. We observe that in our case since the posterior variance is low, the prior variance tends to inflate the former. Conclusion  If the distribution of the population is known, the parametric bootstrapping method gives better results when compared to non-parametric bootstrap  The power of permutation test is slightly more than that of the Wald test for the chosen dataset  In Bayesian approach, the closer the prior is to the posterior, the better is the posterior prediction  The Body Mass Index(BMI) depends on the gender  Hemoglobin Count is dependent on the category of sport