Heart Stats(4)-1

1 INTRODUCTION
Matters of the Heart
Lei Ge, Elizabeth Lydon, Michael Saxton
University of Central Florida
1 Introduction
The heart is an amazing muscle that pumps roughly 7,000 liters of blood through our body each day
[1]. Arguably, the heart is the most important organ that each one of us possesses and one which we
should take good care of. Every year in the United States of America approximately 610,000 people
die from coronory heart disease. That is approximatley 1 in every 4 deaths in the United States, an
astonishing statistic. That makes heart disease the number one cause of death in the United states
for men and women alike while more than half of the deaths are attributed to men [2]. Not only is
heart disease the number one killer in the United States, it is the leading cause of death worldwide
[3].
Figure 1: Heart Disease Death Rates by State
source:www.cdc.gov
1

1 INTRODUCTION
The heart is comprised of blood vessels (veins and arteries) and two pumps; one which pumps
blood in and one that pumps blood out. The blood entering the heart from the other vital organs is
pumped into the right valve and the heart then pumps it back out into the lungs. The lungs filter the
blood by removing any waste carbon dioxide and revitalize the blood with fresh oxygen. After the
blood is revitalized with oxygen it is then returned to the heart via the left pump and subsequently
pumped out to the other vital organs [4]. Coronary heart disease is a disease in which there is
a narrowing of these blood vessels caused by a buildup of plaque on the walls of the coronary
arteries. As a result, the blood flow to and from the heart can slow down or completely stop. This
can lead to heart failure, heart attack, or even a stroke [5].
There are many risk factors that are thought to contribute to heart disease. Some of these factors
include, but are not limited to:
• Low-Density Lipoprotein levels (LDL) This is also known as bad cholesterol. Leads to
buildup in major blood vessels. [6]
• Age As you get older, it is likely that you have more plaque buildup in your vessels. [7]
• Systolic Blood Pressure (SBP) This is the pressure in the arteries when the heart contracts.
The higher the pressure the higher the risk factor for CHD. [8]
• Body Mass Index (BMI) The higher the BMI the more likely you are to have a high SBP
and high amounts of LDL. [9]
Our goal for this paper is to find the correlation strength between LDL and age, SBP, and BMI
index. To achieve this result we plan on running a multiple linear regression using the formula LDL
∼ Age + SBP + BMI in R. We are also interested in comparing the mean SBP and BMI from a
sample that is known to have CHD against a sample of people without CHD. We will use hypotheses
tests to examine whether or not the averages are the same or different for the two populations.
To address our questions, we used a data set collected from the University of Kentucky in 2011
for the Applied Statistical Modeling for Medicine and Public Health course [10]. Information on
the factors contributing to heart disease (outlined in the introduction) was collected from men ages
15-64.
2

2 RESULTS AND STATISTICAL ANALYSIS
2 Results and Statistical Analysis
Before beginning any hypotheses tests, we calculated the trimmed sample means and variances for
SBP and BMI. We got rid of the first and last quantiles of sample values and only examined the
second and third quantiles when computing the sample mean and variance in R.
Table 1: Sample mean and variance for second and third quantiles of SBP and BMI
Disease Status Size Mean (SBP) Var(SBP) Mean (BMI) Var (BMI)
CHD Present 80 139.8375 560.6225 26.38325 19.28372
CHD Absent 150 132.5461 323.4585 25.36599 16.73416
From [11], confidence intervals for the means are:
µ : ¯x − s√
n
tn−1,α/2 ≤ µ ≤ ¯x + s√
n
tn−1,α/2
where sample size n is as listed in Table 1.
Thus the confidence intervals for the means of SBP and BMI are as follows:
Table 2: Confidence intervals for mean of SBP
Disease Status 95% Confidence Interval 99% Confidence Interval
CHD Present [97.32966,190.14] [82.76802,204.707]
CHD Absent [100.2098,170.7108] [89.14901,181.7715]
Table 3: Confidence intervals for mean BMI
Disease Status 95% Confidence Interval 99% Confidence Interval
CHD Present [18.01594,35.22993] [15.31528,37.93059]
CHD Absent [17.7196,33.7553] [15.20379,36.27111]
Although we anticipated the mean values for BMI and SBP to be larger for those with CHD than
those without CHD; from tables 1-3 the confidence intervals for the data indicate that the means are
actually similar. We want to confirm this evidence by performing a hypothesis test ( assuming all
samples are normally distributed i.i.d) with the following assumptions:
1. H0 : µsx = µsy versus H1 : µsx = µsy where µsx represents the average SBP of those with
CHD and µsy represents average SBP of those without CHD
3

2. H0 : µbx = µby versus H1 : µbx = µby where µbx represents the average BMI of those with
CHD and µby represents average BMI of those without CHD
For the first hypothesis test, the sample SBP variances are dissimilar. Thus, we can perform a
modification of the two-sample t test using the test statistic
T =
¯X − ¯Y
Sx
2
n +
Sy
2
m
(1)
where Sx
2
= 1
n−1
n
1 (Xi − ¯X) and Sy
2
= 1
m−1
m
1 (Yi − ¯Y ) [11], where n = 161 and m = 301
and we have assumed that σx
2 = σy
2. Once we run R and collect the value of T we look at
|T| > tv = t1 (2)
where v =
Sx
2
n
+
Sy
2
m
2
Sx
4
n2(n − 1)
+
Sy
4
m2(m − 1)
≈ 1 and also the p-value to confirm or reject our null hypoth-
esis H0.
After running R for the first hypothesis test we found that T = 8.87964 while t1 = 0.834625.
Based on that information, we should reject H0. We also found that the p-value was small, so our
decision to reject H0 is convincing.
For the second hypothesis test, the sample BMI variances are close enough that we can perform
a hypothesis test under the assumption that σx
2 = σy
2, meaning we can use test statistic
T =
¯X − ¯Y
( 1
n + 1
m )Sp
2
(3)
where Sp
2
= 1
n+m−2( n
1 (Xi − ¯X) + m
1 (Yi − ¯Y )) [11].
Once we run R and collect the value of T we look at
|T| > tn+m−2 = t460, Since n = 161 and m = 301. (4)
and also the p-value to confirm or reject our null hypothesis H0.
4

After running R we found that T = 16.28965 while t460 = 0.834625. We also found that the
p-value is large, so our decision to reject H0 is not convincing.
Your LDL cholesterol is considered high when it is at 4.5 mmol/L and considered ”borderline”
high when it is in between 3.5-4.5 mmol/L [12]. In the left panel we have a LDL vs. age scatter
plot. We can see that the patients that are younger than 20 years of age have healthy cholesterol
levels. Other than that, the correlation between LDL and age do not form any distinct pattern.
According to [13] a male is considered underweight if their BMI is less than 18.5, healthy if
it is betweem 18.5-24.9, overweight if their BMI is between 25-29.9, and obese if their BMI is
over 30. In the middle panel we have a LDL vs. BMI scatter plot. It seems that the majority of
those in the healthy BMI range have lower LDL cholesterol levels than those that are overweight or
obese. Although, there are outliers most of the men that are overeweight or obese have high LDL
choleserol. It is safe to say that having a healthy range BMI is condusive to having a healthy amount
of LDL cholesterol.
Accoring to [14] low SBP is 70-89, ideal SBP is 90-120, pre-hypertension SBP is 121-140, and
hyptertension SBP is 140 and up. The right most panel we have a LDL vs SBP scatter plot. It is
not immediately obvious as to whether high SBP by examination of this plot. We can see people
with LDL levels 11-13 having SBP levels around 120. Conversely, we can see people with LDL
levels that are below 4.5 with extremely high SBP levels, around 220. However, it does seem that
most of the paitents with LDL around 4.5 have SBP levels that are considered pre-hypertension and
hypertension.
5

Figure 2: Plots of the multilinear regression error diagnosis
The following tables are data from the multi-linear regression LDL∼age+SBP+BMI from the
sample of men with CHD.
Table 4: Residuals
Minimum 1Q Median 3Q Maximum
-3.5020 -1.4329 -0.4148 0.8804 8.3035
Table 5: Coefﬁcients
Category Estimate Standard Error t-value Pr(> |t|)
CHD age 0.639573 1.445884 0.442 0.658856
CHD SBP 0.001277 0.007488 0.170 0.864840
CHD BMI 0.152727 0.038973 3.7919 0.000133
Table 6: ANOVA table
6

3 CONCLUSIONS AND FURTHER WORK
Category Diff Sum Squared Mean Squared F-Value Pr(>F)
CHD age 8.31 8.315 1.8318 0.1778755
CHD SBP 1.05 1.046 0.2304 0.6319355
CHD BMI 69.71 69.710 15.3569 0.0001329
From the table summaries we can conclude that R2 = 0.1004. This is relatively low because the
ideal R2 = 1. The F-Statistics is 5.806 on 3, 156 degrees of freedom and the P-value is 0.000868.
We ran one last simulation to compare those who have CHD to those without. We got the
following plots:
We can easily see the trend that those with CHD have higher levels of LDL than those without
CHD.
3 Conclusions and Further Work
Based on our ﬁrst hypothesis test we rejected H0. Thus our initial assumption that those who have
CHD, on average, have a higher blood pressure than those without CHD was correct. Conversely,
based on our second hypothesis test we did not have conﬁdence in rejecting H0. Thus our assump-
tion that the average BMI of those who have CHD and those who do not are roughly the same.
7

4 R-CODE
We could not draw any strong conclusions from our linear regressions. In future studies, we
should run the regressions with the trimmed sample data. That is, we should run a regression
only looking at the second and third quantiles. Also, our sample size is small and we did not
consider how other factors contribute to LDL levels such as: family history of CHD, smokin, alcohol
consumption, or exercise. This applies to the multiple regression as well. In addition, we could try
a log transformation on the multiple linear regression to get the residual error to ﬁt better.
4 R-Code
# ######################### #
# matters of the heart project #
# ######################### #
chd<-read.csv(file.choose(), header=T)
chdnull<-read.csv(file.choose(),header=T)
#----- see data structure
# chd
quantile(chd$sbp, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chd$sbp, trim=0.25) #139.8375
var(chd$sbp) #560.6225
quantile(chd$BMI, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chd$BMI, trim=0.25) #26.38325
var(chd$BMI) #19.28372
# chd null
quantile(chdnull$sbp, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chdnull$sbp, trim=0.25) #132.5461
var(chdnull$sbp)#323.4585
quantile(chdnull$BMI, probs=c(0, 0.25, 0.50, 0.75, 1))
mean(chdnull$BMI, trim=0.25)
var(chdnull$BMI)
#----- multiple linear regression
mod<-lm(chd$ldl˜chd$age+chd$sbp+chd$BMI)
par(mfrow=c(1,1))
8

REFERENCES REFERENCES
plot(mod)
summary(mod)
anova(mod)
#------ hypothesis test
#sbp
S1=vector(mode="numeric", length=160)
for(i in 1:160){
S1=1/159*(sum(chd$sbp[i]-mean(chd$sbp[1:160])ˆ2))
}
for(i in 1:302)
{S2=(1/301*sum(chdnull$sbp[i]-mean(chdnull$sbp[1:302])ˆ2))
}
T1=(mean(chd$sbp)-mean(chdnull$sbp))/sqrt(S1+S2)
#8.874964
# BMI
for(i in 1:160){
S3=(1/(159+301))*(sum(chd$BMI[i]-mean(chd$BMI[1:160])ˆ2))
}
for(i in 1:302)
{S4=((1/(159+301))*sum(chdnull$BMI[i]-mean(chdnull$BMI[1:302])ˆ2))
}
T2=(mean(chd$BMI)-mean(chdnull$BMI))/sqrt((1/160+1/302)*(S3+S4))
#16.28965
#---test results
t1<-pt(0.975, 200-2,lower.tail=T) #0.834625
T1-t1 #16.89937
T2-t1 #3.83544
References
[1] http://blood.ygoy.com
[2] http://www.cdc.gov
[3] http://www.who.int
9

REFERENCES REFERENCES
[4] http://www.heartfailurematters.org
[5] http://www.nhlbi.nih.gov
[6] http://www.nhlbi.nih.govl
[7] http://www.heart.org
[8] http://www.heart.org
[9] https://www.nhlbi.nih.gov
[10] http://web.as.uky.edul
[11] George Casella and Roger L. Berger, Statistical Inference, 2nd edition, Wadsworth Group,
2002.
[12] http://www.cvtoolbox.coml
[13] http://www.webmd.com
[14] http://www.bloodpressureuk.org
10

Heart Stats(4)-1

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Heart Stats(4)-1

Similar to Heart Stats(4)-1 (20)

Heart Stats(4)-1