SlideShare a Scribd company logo
1 of 10
Download to read offline
1 INTRODUCTION
Matters of the Heart
Lei Ge, Elizabeth Lydon, Michael Saxton
University of Central Florida
1 Introduction
The heart is an amazing muscle that pumps roughly 7,000 liters of blood through our body each day
[1]. Arguably, the heart is the most important organ that each one of us possesses and one which we
should take good care of. Every year in the United States of America approximately 610,000 people
die from coronory heart disease. That is approximatley 1 in every 4 deaths in the United States, an
astonishing statistic. That makes heart disease the number one cause of death in the United states
for men and women alike while more than half of the deaths are attributed to men [2]. Not only is
heart disease the number one killer in the United States, it is the leading cause of death worldwide
[3].
Figure 1: Heart Disease Death Rates by State
source:www.cdc.gov
1
1 INTRODUCTION
The heart is comprised of blood vessels (veins and arteries) and two pumps; one which pumps
blood in and one that pumps blood out. The blood entering the heart from the other vital organs is
pumped into the right valve and the heart then pumps it back out into the lungs. The lungs filter the
blood by removing any waste carbon dioxide and revitalize the blood with fresh oxygen. After the
blood is revitalized with oxygen it is then returned to the heart via the left pump and subsequently
pumped out to the other vital organs [4]. Coronary heart disease is a disease in which there is
a narrowing of these blood vessels caused by a buildup of plaque on the walls of the coronary
arteries. As a result, the blood flow to and from the heart can slow down or completely stop. This
can lead to heart failure, heart attack, or even a stroke [5].
There are many risk factors that are thought to contribute to heart disease. Some of these factors
include, but are not limited to:
• Low-Density Lipoprotein levels (LDL) This is also known as bad cholesterol. Leads to
buildup in major blood vessels. [6]
• Age As you get older, it is likely that you have more plaque buildup in your vessels. [7]
• Systolic Blood Pressure (SBP) This is the pressure in the arteries when the heart contracts.
The higher the pressure the higher the risk factor for CHD. [8]
• Body Mass Index (BMI) The higher the BMI the more likely you are to have a high SBP
and high amounts of LDL. [9]
Our goal for this paper is to find the correlation strength between LDL and age, SBP, and BMI
index. To achieve this result we plan on running a multiple linear regression using the formula LDL
∼ Age + SBP + BMI in R. We are also interested in comparing the mean SBP and BMI from a
sample that is known to have CHD against a sample of people without CHD. We will use hypotheses
tests to examine whether or not the averages are the same or different for the two populations.
To address our questions, we used a data set collected from the University of Kentucky in 2011
for the Applied Statistical Modeling for Medicine and Public Health course [10]. Information on
the factors contributing to heart disease (outlined in the introduction) was collected from men ages
15-64.
2
2 RESULTS AND STATISTICAL ANALYSIS
2 Results and Statistical Analysis
Before beginning any hypotheses tests, we calculated the trimmed sample means and variances for
SBP and BMI. We got rid of the first and last quantiles of sample values and only examined the
second and third quantiles when computing the sample mean and variance in R.
Table 1: Sample mean and variance for second and third quantiles of SBP and BMI
Disease Status Size Mean (SBP) Var(SBP) Mean (BMI) Var (BMI)
CHD Present 80 139.8375 560.6225 26.38325 19.28372
CHD Absent 150 132.5461 323.4585 25.36599 16.73416
From [11], confidence intervals for the means are:
µ : ¯x − s√
n
tn−1,α/2 ≤ µ ≤ ¯x + s√
n
tn−1,α/2
where sample size n is as listed in Table 1.
Thus the confidence intervals for the means of SBP and BMI are as follows:
Table 2: Confidence intervals for mean of SBP
Disease Status 95% Confidence Interval 99% Confidence Interval
CHD Present [97.32966,190.14] [82.76802,204.707]
CHD Absent [100.2098,170.7108] [89.14901,181.7715]
Table 3: Confidence intervals for mean BMI
Disease Status 95% Confidence Interval 99% Confidence Interval
CHD Present [18.01594,35.22993] [15.31528,37.93059]
CHD Absent [17.7196,33.7553] [15.20379,36.27111]
Although we anticipated the mean values for BMI and SBP to be larger for those with CHD than
those without CHD; from tables 1-3 the confidence intervals for the data indicate that the means are
actually similar. We want to confirm this evidence by performing a hypothesis test ( assuming all
samples are normally distributed i.i.d) with the following assumptions:
1. H0 : µsx = µsy versus H1 : µsx = µsy where µsx represents the average SBP of those with
CHD and µsy represents average SBP of those without CHD
3
2 RESULTS AND STATISTICAL ANALYSIS
2. H0 : µbx = µby versus H1 : µbx = µby where µbx represents the average BMI of those with
CHD and µby represents average BMI of those without CHD
For the first hypothesis test, the sample SBP variances are dissimilar. Thus, we can perform a
modification of the two-sample t test using the test statistic
T =
¯X − ¯Y
Sx
2
n +
Sy
2
m
(1)
where Sx
2
= 1
n−1
n
1 (Xi − ¯X) and Sy
2
= 1
m−1
m
1 (Yi − ¯Y ) [11], where n = 161 and m = 301
and we have assumed that σx
2 = σy
2. Once we run R and collect the value of T we look at
|T| > tv = t1 (2)
where v =
Sx
2
n
+
Sy
2
m
2
Sx
4
n2(n − 1)
+
Sy
4
m2(m − 1)
≈ 1 and also the p-value to confirm or reject our null hypoth-
esis H0.
After running R for the first hypothesis test we found that T = 8.87964 while t1 = 0.834625.
Based on that information, we should reject H0. We also found that the p-value was small, so our
decision to reject H0 is convincing.
For the second hypothesis test, the sample BMI variances are close enough that we can perform
a hypothesis test under the assumption that σx
2 = σy
2, meaning we can use test statistic
T =
¯X − ¯Y
( 1
n + 1
m )Sp
2
(3)
where Sp
2
= 1
n+m−2( n
1 (Xi − ¯X) + m
1 (Yi − ¯Y )) [11].
Once we run R and collect the value of T we look at
|T| > tn+m−2 = t460, Since n = 161 and m = 301. (4)
and also the p-value to confirm or reject our null hypothesis H0.
4
2 RESULTS AND STATISTICAL ANALYSIS
After running R we found that T = 16.28965 while t460 = 0.834625. We also found that the
p-value is large, so our decision to reject H0 is not convincing.
Your LDL cholesterol is considered high when it is at 4.5 mmol/L and considered ”borderline”
high when it is in between 3.5-4.5 mmol/L [12]. In the left panel we have a LDL vs. age scatter
plot. We can see that the patients that are younger than 20 years of age have healthy cholesterol
levels. Other than that, the correlation between LDL and age do not form any distinct pattern.
According to [13] a male is considered underweight if their BMI is less than 18.5, healthy if
it is betweem 18.5-24.9, overweight if their BMI is between 25-29.9, and obese if their BMI is
over 30. In the middle panel we have a LDL vs. BMI scatter plot. It seems that the majority of
those in the healthy BMI range have lower LDL cholesterol levels than those that are overweight or
obese. Although, there are outliers most of the men that are overeweight or obese have high LDL
choleserol. It is safe to say that having a healthy range BMI is condusive to having a healthy amount
of LDL cholesterol.
Accoring to [14] low SBP is 70-89, ideal SBP is 90-120, pre-hypertension SBP is 121-140, and
hyptertension SBP is 140 and up. The right most panel we have a LDL vs SBP scatter plot. It is
not immediately obvious as to whether high SBP by examination of this plot. We can see people
with LDL levels 11-13 having SBP levels around 120. Conversely, we can see people with LDL
levels that are below 4.5 with extremely high SBP levels, around 220. However, it does seem that
most of the paitents with LDL around 4.5 have SBP levels that are considered pre-hypertension and
hypertension.
5
2 RESULTS AND STATISTICAL ANALYSIS
Figure 2: Plots of the multilinear regression error diagnosis
The following tables are data from the multi-linear regression LDL∼age+SBP+BMI from the
sample of men with CHD.
Table 4: Residuals
Minimum 1Q Median 3Q Maximum
-3.5020 -1.4329 -0.4148 0.8804 8.3035
Table 5: Coefficients
Category Estimate Standard Error t-value Pr(> |t|)
CHD age 0.639573 1.445884 0.442 0.658856
CHD SBP 0.001277 0.007488 0.170 0.864840
CHD BMI 0.152727 0.038973 3.7919 0.000133
Table 6: ANOVA table
6
3 CONCLUSIONS AND FURTHER WORK
Category Diff Sum Squared Mean Squared F-Value Pr(>F)
CHD age 8.31 8.315 1.8318 0.1778755
CHD SBP 1.05 1.046 0.2304 0.6319355
CHD BMI 69.71 69.710 15.3569 0.0001329
From the table summaries we can conclude that R2 = 0.1004. This is relatively low because the
ideal R2 = 1. The F-Statistics is 5.806 on 3, 156 degrees of freedom and the P-value is 0.000868.
We ran one last simulation to compare those who have CHD to those without. We got the
following plots:
We can easily see the trend that those with CHD have higher levels of LDL than those without
CHD.
3 Conclusions and Further Work
Based on our first hypothesis test we rejected H0. Thus our initial assumption that those who have
CHD, on average, have a higher blood pressure than those without CHD was correct. Conversely,
based on our second hypothesis test we did not have confidence in rejecting H0. Thus our assump-
tion that the average BMI of those who have CHD and those who do not are roughly the same.
7
4 R-CODE
We could not draw any strong conclusions from our linear regressions. In future studies, we
should run the regressions with the trimmed sample data. That is, we should run a regression
only looking at the second and third quantiles. Also, our sample size is small and we did not
consider how other factors contribute to LDL levels such as: family history of CHD, smokin, alcohol
consumption, or exercise. This applies to the multiple regression as well. In addition, we could try
a log transformation on the multiple linear regression to get the residual error to fit better.
4 R-Code
# ######################### #
# matters of the heart project #
# ######################### #
chd<-read.csv(file.choose(), header=T)
chdnull<-read.csv(file.choose(),header=T)
#----- see data structure
# chd
quantile(chd$sbp, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chd$sbp, trim=0.25) #139.8375
var(chd$sbp) #560.6225
quantile(chd$BMI, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chd$BMI, trim=0.25) #26.38325
var(chd$BMI) #19.28372
# chd null
quantile(chdnull$sbp, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chdnull$sbp, trim=0.25) #132.5461
var(chdnull$sbp)#323.4585
quantile(chdnull$BMI, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chdnull$BMI, trim=0.25)
var(chdnull$BMI)
#----- multiple linear regression
mod<-lm(chd$ldl˜chd$age+chd$sbp+chd$BMI)
par(mfrow=c(1,1))
8
REFERENCES REFERENCES
plot(mod)
summary(mod)
anova(mod)
#------ hypothesis test
#sbp
S1=vector(mode="numeric", length=160)
for(i in 1:160){
S1=1/159*(sum(chd$sbp[i]-mean(chd$sbp[1:160])ˆ2))
}
S2=vector(mode="numeric", length=302)
for(i in 1:302)
{S2=(1/301*sum(chdnull$sbp[i]-mean(chdnull$sbp[1:302])ˆ2))
}
T1=(mean(chd$sbp)-mean(chdnull$sbp))/sqrt(S1+S2)
#8.874964
# BMI
S3=vector(mode="numeric", length=160)
for(i in 1:160){
S3=(1/(159+301))*(sum(chd$BMI[i]-mean(chd$BMI[1:160])ˆ2))
}
S4=vector(mode="numeric", length=302)
for(i in 1:302)
{S4=((1/(159+301))*sum(chdnull$BMI[i]-mean(chdnull$BMI[1:302])ˆ2))
}
T2=(mean(chd$BMI)-mean(chdnull$BMI))/sqrt((1/160+1/302)*(S3+S4))
#16.28965
#---test results
t1<-pt(0.975, 200-2,lower.tail=T) #0.834625
T1-t1 #16.89937
T2-t1 #3.83544
References
[1] http://blood.ygoy.com
[2] http://www.cdc.gov
[3] http://www.who.int
9
REFERENCES REFERENCES
[4] http://www.heartfailurematters.org
[5] http://www.nhlbi.nih.gov
[6] http://www.nhlbi.nih.govl
[7] http://www.heart.org
[8] http://www.heart.org
[9] https://www.nhlbi.nih.gov
[10] http://web.as.uky.edul
[11] George Casella and Roger L. Berger, Statistical Inference, 2nd edition, Wadsworth Group,
2002.
[12] http://www.cvtoolbox.coml
[13] http://www.webmd.com
[14] http://www.bloodpressureuk.org
10

More Related Content

Viewers also liked

Viewers also liked (8)

The Internet of Everywhere — How The Weather Company Scales
The Internet of Everywhere — How The Weather Company ScalesThe Internet of Everywhere — How The Weather Company Scales
The Internet of Everywhere — How The Weather Company Scales
 
Prototype
PrototypePrototype
Prototype
 
Restconn 2015 products
Restconn 2015 productsRestconn 2015 products
Restconn 2015 products
 
Ip project report
Ip project reportIp project report
Ip project report
 
Stratégie Dassault Aviation
Stratégie Dassault AviationStratégie Dassault Aviation
Stratégie Dassault Aviation
 
Location Based Services at Heterogeneous Networks
Location Based Services at Heterogeneous NetworksLocation Based Services at Heterogeneous Networks
Location Based Services at Heterogeneous Networks
 
RestComm Architecture Design
RestComm Architecture DesignRestComm Architecture Design
RestComm Architecture Design
 
20150918 unigis-oereb
20150918 unigis-oereb20150918 unigis-oereb
20150918 unigis-oereb
 

Similar to Heart Stats(4)-1

HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine Learning
Nimai Chand Das Adhikari
 
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftcbiostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
MrMedicine
 
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
A N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docxA N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docx
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
ransayo
 
Coronary Artery Disease - The Benefits of Exercise
Coronary Artery Disease - The Benefits of ExerciseCoronary Artery Disease - The Benefits of Exercise
Coronary Artery Disease - The Benefits of Exercise
Bond University HSM Faculty
 

Similar to Heart Stats(4)-1 (20)

Project ppt
Project pptProject ppt
Project ppt
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine Learning
 
Heart attack possibility.pptx
Heart attack possibility.pptxHeart attack possibility.pptx
Heart attack possibility.pptx
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftcbiostat 2.pptx h h jbjbivigyfyfyfyfyftftc
biostat 2.pptx h h jbjbivigyfyfyfyfyftftc
 
Themelis-BIP-Project.pdf
Themelis-BIP-Project.pdfThemelis-BIP-Project.pdf
Themelis-BIP-Project.pdf
 
Index Percentile Distribution Poster
Index Percentile Distribution PosterIndex Percentile Distribution Poster
Index Percentile Distribution Poster
 
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
A N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docxA N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docx
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
 
Impact of diastolic and systolic blood pressure on mortality - implications ...
Impact of diastolic and systolic blood pressure on mortality  - implications ...Impact of diastolic and systolic blood pressure on mortality  - implications ...
Impact of diastolic and systolic blood pressure on mortality - implications ...
 
Z scores
Z scoresZ scores
Z scores
 
Lipid Profile of Kashmiri Type 2 Diabetic Patients
Lipid Profile of Kashmiri Type 2 Diabetic PatientsLipid Profile of Kashmiri Type 2 Diabetic Patients
Lipid Profile of Kashmiri Type 2 Diabetic Patients
 
Oscillometric Blood Pressure Limits
Oscillometric Blood Pressure LimitsOscillometric Blood Pressure Limits
Oscillometric Blood Pressure Limits
 
Niraj_Pandey_Summary_InvDiab
Niraj_Pandey_Summary_InvDiabNiraj_Pandey_Summary_InvDiab
Niraj_Pandey_Summary_InvDiab
 
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCEPREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
 
Coronary Artery Disease - The Benefits of Exercise
Coronary Artery Disease - The Benefits of ExerciseCoronary Artery Disease - The Benefits of Exercise
Coronary Artery Disease - The Benefits of Exercise
 
Life course trajectories of systolic blood pressure using longitudinal data f...
Life course trajectories of systolic blood pressure using longitudinal data f...Life course trajectories of systolic blood pressure using longitudinal data f...
Life course trajectories of systolic blood pressure using longitudinal data f...
 
Heart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning AnalysisHeart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning Analysis
 
Heart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning AnalysisHeart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning Analysis
 
IRJET - Mobile Application for Blood Pressure Monitoring and Control
IRJET - Mobile Application for Blood Pressure Monitoring and ControlIRJET - Mobile Application for Blood Pressure Monitoring and Control
IRJET - Mobile Application for Blood Pressure Monitoring and Control
 
Perspective of Cardiac Troponin and Membrane Potential in People Living with ...
Perspective of Cardiac Troponin and Membrane Potential in People Living with ...Perspective of Cardiac Troponin and Membrane Potential in People Living with ...
Perspective of Cardiac Troponin and Membrane Potential in People Living with ...
 

Heart Stats(4)-1

  • 1. 1 INTRODUCTION Matters of the Heart Lei Ge, Elizabeth Lydon, Michael Saxton University of Central Florida 1 Introduction The heart is an amazing muscle that pumps roughly 7,000 liters of blood through our body each day [1]. Arguably, the heart is the most important organ that each one of us possesses and one which we should take good care of. Every year in the United States of America approximately 610,000 people die from coronory heart disease. That is approximatley 1 in every 4 deaths in the United States, an astonishing statistic. That makes heart disease the number one cause of death in the United states for men and women alike while more than half of the deaths are attributed to men [2]. Not only is heart disease the number one killer in the United States, it is the leading cause of death worldwide [3]. Figure 1: Heart Disease Death Rates by State source:www.cdc.gov 1
  • 2. 1 INTRODUCTION The heart is comprised of blood vessels (veins and arteries) and two pumps; one which pumps blood in and one that pumps blood out. The blood entering the heart from the other vital organs is pumped into the right valve and the heart then pumps it back out into the lungs. The lungs filter the blood by removing any waste carbon dioxide and revitalize the blood with fresh oxygen. After the blood is revitalized with oxygen it is then returned to the heart via the left pump and subsequently pumped out to the other vital organs [4]. Coronary heart disease is a disease in which there is a narrowing of these blood vessels caused by a buildup of plaque on the walls of the coronary arteries. As a result, the blood flow to and from the heart can slow down or completely stop. This can lead to heart failure, heart attack, or even a stroke [5]. There are many risk factors that are thought to contribute to heart disease. Some of these factors include, but are not limited to: • Low-Density Lipoprotein levels (LDL) This is also known as bad cholesterol. Leads to buildup in major blood vessels. [6] • Age As you get older, it is likely that you have more plaque buildup in your vessels. [7] • Systolic Blood Pressure (SBP) This is the pressure in the arteries when the heart contracts. The higher the pressure the higher the risk factor for CHD. [8] • Body Mass Index (BMI) The higher the BMI the more likely you are to have a high SBP and high amounts of LDL. [9] Our goal for this paper is to find the correlation strength between LDL and age, SBP, and BMI index. To achieve this result we plan on running a multiple linear regression using the formula LDL ∼ Age + SBP + BMI in R. We are also interested in comparing the mean SBP and BMI from a sample that is known to have CHD against a sample of people without CHD. We will use hypotheses tests to examine whether or not the averages are the same or different for the two populations. To address our questions, we used a data set collected from the University of Kentucky in 2011 for the Applied Statistical Modeling for Medicine and Public Health course [10]. Information on the factors contributing to heart disease (outlined in the introduction) was collected from men ages 15-64. 2
  • 3. 2 RESULTS AND STATISTICAL ANALYSIS 2 Results and Statistical Analysis Before beginning any hypotheses tests, we calculated the trimmed sample means and variances for SBP and BMI. We got rid of the first and last quantiles of sample values and only examined the second and third quantiles when computing the sample mean and variance in R. Table 1: Sample mean and variance for second and third quantiles of SBP and BMI Disease Status Size Mean (SBP) Var(SBP) Mean (BMI) Var (BMI) CHD Present 80 139.8375 560.6225 26.38325 19.28372 CHD Absent 150 132.5461 323.4585 25.36599 16.73416 From [11], confidence intervals for the means are: µ : ¯x − s√ n tn−1,α/2 ≤ µ ≤ ¯x + s√ n tn−1,α/2 where sample size n is as listed in Table 1. Thus the confidence intervals for the means of SBP and BMI are as follows: Table 2: Confidence intervals for mean of SBP Disease Status 95% Confidence Interval 99% Confidence Interval CHD Present [97.32966,190.14] [82.76802,204.707] CHD Absent [100.2098,170.7108] [89.14901,181.7715] Table 3: Confidence intervals for mean BMI Disease Status 95% Confidence Interval 99% Confidence Interval CHD Present [18.01594,35.22993] [15.31528,37.93059] CHD Absent [17.7196,33.7553] [15.20379,36.27111] Although we anticipated the mean values for BMI and SBP to be larger for those with CHD than those without CHD; from tables 1-3 the confidence intervals for the data indicate that the means are actually similar. We want to confirm this evidence by performing a hypothesis test ( assuming all samples are normally distributed i.i.d) with the following assumptions: 1. H0 : µsx = µsy versus H1 : µsx = µsy where µsx represents the average SBP of those with CHD and µsy represents average SBP of those without CHD 3
  • 4. 2 RESULTS AND STATISTICAL ANALYSIS 2. H0 : µbx = µby versus H1 : µbx = µby where µbx represents the average BMI of those with CHD and µby represents average BMI of those without CHD For the first hypothesis test, the sample SBP variances are dissimilar. Thus, we can perform a modification of the two-sample t test using the test statistic T = ¯X − ¯Y Sx 2 n + Sy 2 m (1) where Sx 2 = 1 n−1 n 1 (Xi − ¯X) and Sy 2 = 1 m−1 m 1 (Yi − ¯Y ) [11], where n = 161 and m = 301 and we have assumed that σx 2 = σy 2. Once we run R and collect the value of T we look at |T| > tv = t1 (2) where v = Sx 2 n + Sy 2 m 2 Sx 4 n2(n − 1) + Sy 4 m2(m − 1) ≈ 1 and also the p-value to confirm or reject our null hypoth- esis H0. After running R for the first hypothesis test we found that T = 8.87964 while t1 = 0.834625. Based on that information, we should reject H0. We also found that the p-value was small, so our decision to reject H0 is convincing. For the second hypothesis test, the sample BMI variances are close enough that we can perform a hypothesis test under the assumption that σx 2 = σy 2, meaning we can use test statistic T = ¯X − ¯Y ( 1 n + 1 m )Sp 2 (3) where Sp 2 = 1 n+m−2( n 1 (Xi − ¯X) + m 1 (Yi − ¯Y )) [11]. Once we run R and collect the value of T we look at |T| > tn+m−2 = t460, Since n = 161 and m = 301. (4) and also the p-value to confirm or reject our null hypothesis H0. 4
  • 5. 2 RESULTS AND STATISTICAL ANALYSIS After running R we found that T = 16.28965 while t460 = 0.834625. We also found that the p-value is large, so our decision to reject H0 is not convincing. Your LDL cholesterol is considered high when it is at 4.5 mmol/L and considered ”borderline” high when it is in between 3.5-4.5 mmol/L [12]. In the left panel we have a LDL vs. age scatter plot. We can see that the patients that are younger than 20 years of age have healthy cholesterol levels. Other than that, the correlation between LDL and age do not form any distinct pattern. According to [13] a male is considered underweight if their BMI is less than 18.5, healthy if it is betweem 18.5-24.9, overweight if their BMI is between 25-29.9, and obese if their BMI is over 30. In the middle panel we have a LDL vs. BMI scatter plot. It seems that the majority of those in the healthy BMI range have lower LDL cholesterol levels than those that are overweight or obese. Although, there are outliers most of the men that are overeweight or obese have high LDL choleserol. It is safe to say that having a healthy range BMI is condusive to having a healthy amount of LDL cholesterol. Accoring to [14] low SBP is 70-89, ideal SBP is 90-120, pre-hypertension SBP is 121-140, and hyptertension SBP is 140 and up. The right most panel we have a LDL vs SBP scatter plot. It is not immediately obvious as to whether high SBP by examination of this plot. We can see people with LDL levels 11-13 having SBP levels around 120. Conversely, we can see people with LDL levels that are below 4.5 with extremely high SBP levels, around 220. However, it does seem that most of the paitents with LDL around 4.5 have SBP levels that are considered pre-hypertension and hypertension. 5
  • 6. 2 RESULTS AND STATISTICAL ANALYSIS Figure 2: Plots of the multilinear regression error diagnosis The following tables are data from the multi-linear regression LDL∼age+SBP+BMI from the sample of men with CHD. Table 4: Residuals Minimum 1Q Median 3Q Maximum -3.5020 -1.4329 -0.4148 0.8804 8.3035 Table 5: Coefficients Category Estimate Standard Error t-value Pr(> |t|) CHD age 0.639573 1.445884 0.442 0.658856 CHD SBP 0.001277 0.007488 0.170 0.864840 CHD BMI 0.152727 0.038973 3.7919 0.000133 Table 6: ANOVA table 6
  • 7. 3 CONCLUSIONS AND FURTHER WORK Category Diff Sum Squared Mean Squared F-Value Pr(>F) CHD age 8.31 8.315 1.8318 0.1778755 CHD SBP 1.05 1.046 0.2304 0.6319355 CHD BMI 69.71 69.710 15.3569 0.0001329 From the table summaries we can conclude that R2 = 0.1004. This is relatively low because the ideal R2 = 1. The F-Statistics is 5.806 on 3, 156 degrees of freedom and the P-value is 0.000868. We ran one last simulation to compare those who have CHD to those without. We got the following plots: We can easily see the trend that those with CHD have higher levels of LDL than those without CHD. 3 Conclusions and Further Work Based on our first hypothesis test we rejected H0. Thus our initial assumption that those who have CHD, on average, have a higher blood pressure than those without CHD was correct. Conversely, based on our second hypothesis test we did not have confidence in rejecting H0. Thus our assump- tion that the average BMI of those who have CHD and those who do not are roughly the same. 7
  • 8. 4 R-CODE We could not draw any strong conclusions from our linear regressions. In future studies, we should run the regressions with the trimmed sample data. That is, we should run a regression only looking at the second and third quantiles. Also, our sample size is small and we did not consider how other factors contribute to LDL levels such as: family history of CHD, smokin, alcohol consumption, or exercise. This applies to the multiple regression as well. In addition, we could try a log transformation on the multiple linear regression to get the residual error to fit better. 4 R-Code # ######################### # # matters of the heart project # # ######################### # chd<-read.csv(file.choose(), header=T) chdnull<-read.csv(file.choose(),header=T) #----- see data structure # chd quantile(chd$sbp, probs=c(0, 0.25, 0.50, 0.75, 1)) mean(chd$sbp, trim=0.25) #139.8375 var(chd$sbp) #560.6225 quantile(chd$BMI, probs=c(0, 0.25, 0.50, 0.75, 1)) mean(chd$BMI, trim=0.25) #26.38325 var(chd$BMI) #19.28372 # chd null quantile(chdnull$sbp, probs=c(0, 0.25, 0.50, 0.75, 1)) mean(chdnull$sbp, trim=0.25) #132.5461 var(chdnull$sbp)#323.4585 quantile(chdnull$BMI, probs=c(0, 0.25, 0.50, 0.75, 1)) mean(chdnull$BMI, trim=0.25) var(chdnull$BMI) #----- multiple linear regression mod<-lm(chd$ldl˜chd$age+chd$sbp+chd$BMI) par(mfrow=c(1,1)) 8
  • 9. REFERENCES REFERENCES plot(mod) summary(mod) anova(mod) #------ hypothesis test #sbp S1=vector(mode="numeric", length=160) for(i in 1:160){ S1=1/159*(sum(chd$sbp[i]-mean(chd$sbp[1:160])ˆ2)) } S2=vector(mode="numeric", length=302) for(i in 1:302) {S2=(1/301*sum(chdnull$sbp[i]-mean(chdnull$sbp[1:302])ˆ2)) } T1=(mean(chd$sbp)-mean(chdnull$sbp))/sqrt(S1+S2) #8.874964 # BMI S3=vector(mode="numeric", length=160) for(i in 1:160){ S3=(1/(159+301))*(sum(chd$BMI[i]-mean(chd$BMI[1:160])ˆ2)) } S4=vector(mode="numeric", length=302) for(i in 1:302) {S4=((1/(159+301))*sum(chdnull$BMI[i]-mean(chdnull$BMI[1:302])ˆ2)) } T2=(mean(chd$BMI)-mean(chdnull$BMI))/sqrt((1/160+1/302)*(S3+S4)) #16.28965 #---test results t1<-pt(0.975, 200-2,lower.tail=T) #0.834625 T1-t1 #16.89937 T2-t1 #3.83544 References [1] http://blood.ygoy.com [2] http://www.cdc.gov [3] http://www.who.int 9
  • 10. REFERENCES REFERENCES [4] http://www.heartfailurematters.org [5] http://www.nhlbi.nih.gov [6] http://www.nhlbi.nih.govl [7] http://www.heart.org [8] http://www.heart.org [9] https://www.nhlbi.nih.gov [10] http://web.as.uky.edul [11] George Casella and Roger L. Berger, Statistical Inference, 2nd edition, Wadsworth Group, 2002. [12] http://www.cvtoolbox.coml [13] http://www.webmd.com [14] http://www.bloodpressureuk.org 10