Determinants of Dying from Coronary Artery Disease.doc.doc ...
Determinants of Dying fromCoronary Artery Disease Submitted to: Dr. Jackie Khorassani Instructor of Econ 421 December 6, 2010
Introduction: In 2001, 700,000 people died from coronary artery disease. Coronary artery disease isthe hardening of the arteries near the heart. This hardening can lead to reduced blood flow, heartattacks, and death (“What is Coronary...” 2003). Why are so many people dying from coronaryartery disease? With the use of OLS regression analysis of 49 observations in the year 2000, Iwill examine thirteen variables that may affect coronary artery disease. The thirteen variablesare education level, income level, lack of health insurance, state health expenditures, alcoholconsumption, inactivity, state stress levels, tobacco consumption, average age of the population,diabetes, high blood cholesterol, high blood pressure, and obesity. This study is organized in seven sections. Section one is the introduction. Section two isa description of each variable and the reasons for each variable’s inclusion in my model. Sectionthree discusses the meaning of the raw data. Section four and five test my regression analysis forcommon errors. Section six discusses the significance of each variable and the effects of eachsignificant variable. Section seven concludes with a brief wrap up of this study.Empirical Model: For the purpose of measuring the effects of thirteen factors on the death rate caused bycoronary artery disease among the US population in 2001, Equation 1 is estimated with a crosssectional data set that consists of 49 observations from 49 US states1. The method of estimationis OLS, and the estimation software is EViews:Equation 1: LCAD = F (EDU, INC, LHI, SHE, ALC, INY, SSL, TOB, AAP, DIB, HBC, HBP,OBY) + error term The dependent variable, LCAD, is the population per 100,000 that dies from coronary1 Florida was excluded for lack of state health expenditure data.
artery disease. Table 1 includes the definition of the independent variables, and their expectedeffect on the dependent variable. Table 1: Independent Variables Variables Definitions Expected Sign of the Coefficients EDU percentage of the 25 and older population that has a college degree negative INC per capita disposable personal income in current dollars negative LHI percent of the population without health insurance positive SHE per capita dollars spent by the state on health care negative ALC average gallons of beer consumed per person ambiguous INY percentage of adults with no leisure-time physical activity positive SSL percent of the population of each state living in metropolitan areas ambiguous TOB percentage of the population that has reported having smoked 100 positive or more cigarettes during their lifetime and who currently smoke every day or some days AAP average age of the population positive DIB percent of the population that has been diagnosed with diabetes positive HBC percent of the population with high cholesterol positive HBP percentage of adults with high blood pressure positive OBY percentage of adults who were obese positiveCoronary artery disease is the hardening of the arteries near the heart that leads to reduced bloodflow, heart attacks, and can lead to death (“What is Coronary...” 2003). The nature of my thirteen independent variables allows me to summarize them in threecategories: economic independent variables, lifestyle independent variables, and medical/geneticindependent variables. The economic independent variables are EDU (education), INC(income), LHI (lack of health insurance), and SHE (state health expenditures). The lifestyleindependent variables are ALC (alcohol), INY (inactivity), SSL (state stress levels), and TOB
(tobacco). The medical/genetic independent variables are AAP (average age of the population),DIB (diabetes), HBC (high blood cholesterol), HBP (high blood pressure), and OBY (obesity).Economic Independent Variables: The first economic variable is EDU, or education. EDU measures the percentage of the25 and older population that has a college degree in each state in the year 2000. The effect ofeducation on the number of cases of lethal coronary artery disease overlaps with the effect ofmany of the other variables in my study. The Tromso Heart Study (1988) found that moreeducated people are less likely to be overweight, seem to smoke less, are more physically active,and have better diets (“Risk factors for...” 1988). That is why I expect the sign of the coefficientof EDU to be negative. The second economic variable is INC, or income. More specifically, INC measures theper capita disposable personal income in current dollars in each state in the year 2000. A healthstudy (2001) out of Canada shows that in general the higher the income level, the more likely aperson is to have an active lifestyle, a healthy weight, not smoke, and not drink dangerousamounts of alcohol. (“Health and Wealth...” 2001). Due to this study, I expect the sign of thecoefficient for INC to be negative. The third economic variable is LHI, or lack of health insurance. LHI measures thepercentage of the population who did not have health insurance in each state in 2002. Accordingto a 2003 report by the Robert Wood Johnson Foundation, the uninsured are more likely thanthose who have health coverage to receive second-rate care and to die from health-relatedproblems (Anil Kumar. 2004). Due to this study’s findings, I expect the sign of the coefficientof LHI to be positive. The fourth economic variable is SHE, or state healthcare expenditures. SHE measures
the per capita dollars spent by the state on health care in each state in the year 2000. The effectof this variable on the percentage of the population that dies from coronary artery disease issimilar to the effects of income and health insurance. State healthcare expenditures can takedifferent forms such as funding clinics and hospitals, state-funded insurance, and funds givendirectly to citizens to spend on healthcare. The money spent on clinics and hospitals improve thequality of the care provided, which will decrease lethal coronary artery disease. State-fundedhealth insurance and funds given to citizens increase the quantity of health care a person canafford, meaning fewer cases of lethal coronary artery disease. Due to these effects, I expect thesign of the coefficient for SHE to be negative.Lifestyle Independent Variables: The first lifestyle variable is ALC, or alcohol consumption. More specifically, ALCmeasures the average gallons of beer consumed per person in each state in the year 2000. Theeffects of alcohol on the heart are a little questionable. One study by the Cleveland Clinic HeartCenter (2004) has found that, “moderate alcohol consumption (wine or beer) does offer someprotection against heart disease for some people (“Heart Disease: Alcohol...” 2004).” Alcohol’spoisonous effects, however, may be dangerous to the heart. The same article warns that thosewho already have heart disease should avoid alcohol, and it also warns to not start drinkingbecause the same benefits made by alcohol can be produced through healthy eating and exercise(“Heart Disease: Alcohol...” 2004). Due to the uncertainty of the effects of alcohol, I expect thesign of the coefficient of ALC to be ambiguous. The second lifestyle variable is INY, or inactivity. This variable is measured as thepercentage of adults who reported no leisure-time physical activity in each state in 2000.Inactivity prevents the heart from benefiting from exercise. There are numerous benefits to
exercise for the heart including strengthening the heart and cardiovascular system, improvingcirculation and helping the body use oxygen better, improving heart failure symptoms, loweringblood pressure and helping reduce stress, tension, anxiety and depression (“Heart Disease:Exercise...” 2004). Being active obviously has good effects for the heart, so I predict that thesign of the coefficient of INY will be positive. The third lifestyle variable is SSL, or state stress levels. SSL is measured as thepercentage of the population of each state living in metropolitan areas in 2000. This is not aperfect measure of stress. It doesn’t include other sources of stress outside of living in a city,such as the number of children per couple, the nature of their jobs, or how well people respond tostressful situations. Data for these and any other sources of stress are not available; therefore Iwas unable to include them in my measure of stress. I also have to consider that people inmetropolitan areas most likely have better access to healthcare, which may affect the results forthis variable. The connection between stress and heart health has not been proven. That isbecause, according to an article by the Texas Heart Institute (2004) on heart disease, peopledefine and respond to stress in different ways (“Causes of Heart Disease.” 2004). It is hard todetermine why stress may be damaging to the heart. In general, however, the article points tothree effects of stress that would have a damaging effect on the heart. Those three are: 1)stressful situations increase heart rate and blood pressure, which makes the heart demandadditional oxygen, 2) during stress, extra hormones are released which causes blood pressure toincrease, 3) and stress also increases the amount of clotting agents that are flowing in the blood.The need for additional oxygen can cause angina (pain of and around the heart) in persons withpreexisting heart disease. Angina can damage the heart and blood vessels further, leading tohardening of the arteries. The increase in blood pressure can damage artery walls. When this
damage heals, the arteries may become hard and more prone to collect plaque. Additionalclotting agents in the blood make it more likely to form a clot in arteries that are already partiallyblocked by plaque (“Causes of Heart Disease.” 2004). Considering these effects of stress on theheart, but keeping in mind the issues of measuring stress in this manner, I am expecting the signof the coefficient of SSL to be ambiguous. The fourth lifestyle variable is TOB, or tobacco smoke consumption. TOB is measuredas the percentage of the population that has reported having smoked 100 or more cigarettesduring their lifetime and who currently smoke every day or some days in each state as of the year2000. There is a strong link between smoking and developing lethal coronary artery disease.According to The Cleveland Clinic, smoking increases risk of coronary artery disease in fourways: 1) decreased oxygen to the heart, 2) increased blood pressure and heart rate, 3) increasedblood clotting, and 4) damage to cells that line coronary arteries and other blood vessels (“HeartDisease: Smoking...” 2004). I already established that these four effects are damaging and willlead to more cases of lethal coronary artery disease. That is why I expect that the sign of thecoefficient of TOB is positive.Medical/Genetic Independent Variables: The first medical/genetic variable is AAP, or average age of the population. AAP ismeasured exactly how it sounds, the average age of the population in every state in 2000. It iscommon sense that the older you are, the more health problems you are likely to have. Statisticsalso show that about 80% of the deaths from coronary artery disease are people age 65 and older(“Coronary.” 2005). Therefore, I predict that the sign of the coefficient of AAP is positive. The second medical/genetic variable is DIB, or diabetes. This is measured as thepercentage of the population that has been diagnosed as having diabetes in each state in the year
2001. The reasons why diabetes increases cases of coronary artery disease are not completelyunderstood. However, according to The Cleveland Clinic, the high glucose levels in the bloodfrom diabetes may damage the small blood vessels of the heart and predispose a person toatherosclerosis (hardening) of the large arteries (“Diabetes...” 2004). Since this is the definitionof coronary artery disease, I expect the sign of the coefficient of DIB to be positive. The third medical/genetic variable is HBC, or high blood cholesterol. HBC is measuredas the percentage of the adult population in each state that reported having high cholesterol in2001. According to the National Heart Lung and Blood Institute (2003), too much cholesterol inyour blood can build up in the walls of your arteries. This buildup of cholesterol is called plaque.Over time, plaque can cause hardening of the arteries (“What is Coronary...” 2003). Given thatcoronary artery disease is defined as hardening of the arteries high cholesterol obviously is acause of coronary artery disease (“What is Coronary...” 2003). Therefore, I expect the sign ofthe coefficient of HBC to be positive. The fourth medical/genetic variable is HBP, or high blood pressure. This is measured asthe percentage of adults who have ever been told by a health-care provider that they have highblood pressure. High blood pressure may be caused by smoking, excessive alcohol consumption,inactivity, and obesity, all of which are a part of my thirteen independent variables (“HeartDisease: Risk ...” 2004). However, to a certain extent, getting high blood pressure seems to begenetic, and may not be a good indicator of the overall health of the circulatory system.Considering that the long list of possible causes of high blood pressure makes it the mostcommon coronary artery disease risk factor, it still should have a direct impact. High blood pressure increases cases of lethal coronary artery disease for two reasons: 1)it makes the heart work harder to supply the body with blood and 2) it contributes to the
hardening of the arteries. Why does it make the heart work harder? First, for clarification, high blood pressurecauses the heart to work harder but the heart working harder does not necessarily cause highblood pressure. Blood pressure is determined by two forces: 1) the pumping of the heart, and 2)the force of the arteries resisting the blood flow (“Blood Pressure” 2004). In most cases, it is theincrease of resistance to the blood flow from the arteries that causes high blood pressure. As theresistance to blood flow is increased, the heart must work harder to accomplish its job. A harderworking heart has a shorter life. If that isn’t bad enough, the increase in resistance to blood flowfrom the arteries happens when arteries are damaged and harden. High blood pressure,therefore, can be considered a sign that there may be some coronary artery disease present.Therefore, I predict the sign of the coefficient of HBP is positive. The fifth medical/genetic variable is OBY, or obesity. OBY is measured as thepercentage of adults in the US who were obese in each state in 2001. Since inactivity can causeobesity, they have the same links to coronary artery disease, but there are additional effects fromobesity. A study by the American Heart Association (1997) finds that obesity is connected toheart disease both indirectly (through other factors) and directly (“Obesity and Heart Disease”1997). Considering this evidence, I expect the sign of the coefficient of OBY to be positive.Data Analysis: Table 2 shows the lowest values and corresponding states, the highest values and theircorresponding states, and the mean values for each variable. Table 2: Data Analysis: Maximum, Minimum, and Mean Variable Minimum Maximum Mean LCAD 171.0 - Minnesota 329.0 - Mississippi 238.12 EDU 15.3% - West Virginia 34.6% - Colorado 25.0%
INC $19,258 - West Virginia $32,556 - Connecticut $24,076 LHI 7.9% - Minnesota 21.1% - New Mexico 13.8% SHE $504.09 – Nevada $2,001.49 - Alaska $970.55 ALC 12.59 gallons – Utah 33.09 gallons - Nevada 22.87 gallons INY 15.5% - Utah 41.1% - Kentucky 26.8% SSL 27.8% - Vermont 100.0% - New Jersey 67.2% TOB 12.9% - Utah 29.1% - Nevada 22.9% AAP 27.1 years – Utah 38.9 years - West Virginia 35.5 years DIB 2.71% - Alaska 6.08% - West Virginia 4.37% HBC 24.8% - New Mexico 37.7% - West Virginia 30.5% HBP 14.0% - Arizona 31.6% - Alabama 24.6% OBY 13.8% - Colorado 24.3% - Mississippi 19.5% Note: LCAD = lethal coronary artery disease; EDU = education; INC = income; LHI = lack of health insurance; SHE = state health expenditures; ALC = alcohol consumption; INY = inactivity; SSL = state stress levels; TOB = tobacco consumption; AAP = average age of the population; DIB = diabetes; HBC = high blood cholesterol; HBP = high blood pressure; OBY = obesity According to my data, the worst state to live in when worried about lethal coronary arterydisease is Mississippi, and the best is Minnesota. The difference between these two extremes is158 per 100,000 people. The biggest observation involves West Virginia. This state appears five times in tabletwo, and each time, it is not a good thing pertaining to lethal coronary artery disease. They havethe minimum in EDU (education), a variable expected to have a negative affect on LCAD (lethalcoronary artery disease). They also have the minimum in INC (income), which is closely relatedto EDU (education). This too is expected to have a negative affect on LCAD (lethal coronaryartery disease). West Virginia next appears as the maximum for AAP (average age of thepopulation), a variable expected to affect LCAD (lethal coronary artery disease) positively. Thestate also appears as the maximum for DIB (diabetes) and HBC (high blood cholesterol), whichalso is expected to affect LCAD (lethal coronary artery disease) positively. Each appearance asmaximum or minimum shows that West Virginia is expected to be more likely to develop lethalcases of coronary artery disease. It would seem that with this much working against it, West
Virginia would most likely be the maximum for LCAD (lethal coronary artery disease), but theyare not. However, they do come in second from the maximum at 296 per 100,000 people, only33 per 100,000 people below Mississippi. Another state that sticks out in Table 2 is Utah. Utah holds the minimum for fourvariables. All four variables, ALC (alcohol), INY (inactivity), TOB (tobacco), and AAP(average age of the population), are expected to have a negative effect on LCAD. With Utahholding this many minimums for variables expected to have a negative coefficient, it is likelythat Utah is very low on the percentage of lethal coronary artery disease deaths. In fact, they arethird from the minimum at 185.2 per 100,000 people, just 14.2 per 100,000 people aboveMinnesota. Alaska is also interesting. It holds the maximum in state healthcare expenditures and theminimum in diabetes. New Mexico also appears twice in Table 2, first as the maximum for lackof health insurance, and second, the minimum for high blood cholesterol. Nevada appears twiceas the minimum for state healthcare expenditures and the maximum for tobacco use. The nextstate to appear twice is Colorado, who holds the minimum for obesity and the maximum foreducation.Multicollinearity Test: Any equation must be tested for problems that may affect the results of the estimation.One such problem is multicollinearity. A. H. Studenmund (2001) states that multicollinearity iseither perfect or imperfect (A. H. Studenmund. 2001). Perfect multicollinearity is a violation ofthe classical assumption that no independent variable is a perfect linear function of any other independent variable. With perfect multicollinearity the variable’s coefficient cannot be
determined, and the standard error for the coefficients is infinite. Imperfect multicollinearity is when the linear function between two or more independent variables is strong enough to affect the estimation results. Imperfect multicollinearity results in increased variance and standard errors of the coefficients and decreased t-statistics. Multicollinearity, however, does not bias the coefficients of the equation and the overall accuracy of the equation is not affected. The test for multicollinearity involves examining the correlation coefficients. The correlation coefficient is not considered a problem unless the absolute value of any correlation coefficient is higher than 0.7 and is higher than the correlation between the dependent variable and the corresponding independent variables. Table 3: Correlation Coefficients LCAD EDU INC LHI SHE ALC INY SSL TOB AAP DIB HBC HBP OBY 0.631LCAD 1 -0.513 -0.260 0.150 0.092 -0.079 0.706 0.070 0.587 0.195 0.831 0.478 0.530EDU 1 0.761 -0.271 0.127 -0.229 -0.373 0.444 -0.556 -0.018 -0.475 -0.270 -0.461 -0.597INC 1 -0.260 0.219 -0.172 -0.281 0.682 -0.241 0.141 -0.256 0.042 -0.225 -0.512LHI 1 -0.064 0.116 0.059 -0.002 0.050 -0.429 0.044 0.010 0.064 0.189SHE 1 -0.153 0.069 0.181 0.122 0.193 0.069 -0.134 0.010 0.026ALC 1 0.061 -0.318 0.348 0.193 -0.077 -0.002 0.072 -0.052INY 1 -0.135 0.463 0.230 0.604 0.188 0.233 0.497SSL 1 -0.144 -0.153 0.164 0.138 0.112 -0.164TOB 1 0.386 0.442 0.368 0.583 0.482 -0.075AAP 1 0.318 0.254 0.101DIB 1 0.426 0.546 0.596HBC 1 0.448 0.273
HBP 1 0.482 OBY 1Note: Any correlation coefficients with an absolute value more than 0.7 is underlined by a thick line and italicized. Any correlation coefficient with anabsolute value that is almost 0.7 is underlined by a dotted line and italicized. Any correlation coefficient with an absolute value that is larger than thecorrelation coefficients between the dependent variables and the independent variable are underlined by a thin line and italicized. As you can see in Table 3, there is only one clear multicollinearity problem between my thirteen independent variables. The high correlation coefficient of 0.761 is between INC (income) and EDU (education). That is clearly higher than 0.7. This correlation coefficient is also larger than the absolute value of the correlation coefficient between LCAD (lethal coronary artery disease) and EDU (education) and larger than the absolute value of the correlation coefficient between LCAD (lethal coronary artery disease) and INC (income). These results show that there is a severe multicollinearity problem between INC (income) and EDU (education). The correlation coefficient between SSL (state stress levels) and INC (income) is too near 0.7 to ignore. The issue is amplified since the correlation coefficient between LCAD (lethal coronary artery disease) and SSL (state stress levels), and the correlation coefficient between LCAD (lethal coronary artery disease) and INC (income) are considerably smaller than the correlation coefficient between SSL (state stress levels) and INC (income). These results show there may be a severe multicollinearity problem between SSL (state stress levels) and INC (income). The correlation coefficients between ALC (alcohol) and SHE (state healthcare expenditures), SSL (state stress levels) and SHE (state healthcare expenditures), SSL (state stress levels) and ALC (alcohol), and AAP (average age of the population) and LHI (lack of health insurance) all may have a multicollinearity problem. Each one is more than the correlation coefficient between each of these independent variables and the dependent variable. These results show there may be a multicollinearity problem between each of these pairs. Considering
the very small size of these correlation coefficients, the equations should not be affectedsignificantly; therefore I am not doing anything to fix these problems. In order to limit the effects of multicollinearity, Equation 1 will be split into twovariations: Equation 1-A and Equation 1-B. Equation 1-A will exclude EDU (education) andSSL (state stress levels) and equation 1-B will exclude INC (income).Heteroskedasticity Test: Heteroskedasticity is another issue to deal with when estimating an equation. As definedby A. H. Studenmund (2001), heteroskedasticity is a violation of the classical assumption thatthe observations of the error terms are drawn from a distribution that has a constant variance (A.H. Studenmund. 2001). There are two types of heteroskedasticity: pure and impure. Pureheteroskedasticity occurs when the assumption is violated even though the equation is correctlyspecified. Correctly specified means there are no irrelevant or omitted variables, the functionalform is correct (linear), and there are no sample errors. In the case of pure heteroskedasticity thecoefficients of the variables are not biased, but the t statistics are bigger than they should be,which results in a bigger chance that a variable will be considered relevant. Impureheteroskedasticity occurs when the equation is not correctly specified (i.e. irrelevant or omittedvariables, wrong functional form, sample errors). The results of impure heteroskedasticity arebiased variable coefficients and incorrect standard errors. In order to test for heteroskedasticity, I am using the white test, named after its creatorHalbert White. The white test has three steps. The first step is to obtain the residuals ofEquation 1-A. The second step is to use these residuals squared as the dependent variable in asecond equation. The independent variables of the second equation are the independent variablesof Equation 1-A, the squares of the independent variables of Equation 1-A, and the products of
each two independent variables of Equation 1-A. However, the white test, although consideredthe best for cross sectional equations, does have one flaw. It cannot be used if, in the secondequation, there are more variables than observations. I have 49 observations, but the secondequation for Equation 1-A has more than 49 variables. The only way for the white test to workhere is to use its other form, which drops the products of each two independent variables, andonly uses the independent variables and their squares. Here, I would have the same 49observations, but only 22 variables. This test is also sufficient to determine if there is aheteroskedasticity problem. The third step is to multiply the number of observations by theunadjusted R2 (n*R2). The decision rule is that if n*R2 is greater than critical chi squared, thenthere is a heteroskedasticity problem. For Equation 1-A, n*R2 = 24.26 and chi squared withdegrees of freedom 22 = 33.92. Repeat the three steps for Equation 1-B, which also will use thesimple version of the white test. For Equation 1-B, n*R2 = 27.24 and chi squared with degrees offreedom 24 = 36.41. By the decision rule, I find that there is no serious problem withheteroskedasticity in Equation 1-A or Equation 1-B.Empirical Estimation Results: Table 4 reports the results of the estimation of Equation 1-A, and Equation 1-B. Table 4: Estimation Results for Equation 1-A and Equation 1-B Independent Variables Variations of Equation 1 Expected Sign of Coefficients Equation 1-A Equation 1-B Intercepts 40.25261 (0.499160) 56.58221(0.657541) EDU 40.05244 (0.379967) negative INC 0.000110 (0.113182) negative LHI 56.77456 (0.712506) 55.23382 (0.686585) positive SHE 0.004282 (0.430282) 0.005972 (0.596954) negative ALC -1.223578 (-1.595043) -1.307321 (-1.675342) ambiguous INY 199.6210 (2.969297) 186.9634 (2.654275) positive
SSL -13.09042 (-0.653808) ambiguous TOB 347.0400 (2.599307) 374.4734 (2.549625) positive AAP -2.839518 (-1.438962) -3.610169 (-1.558149) positive DIB 2676.302 (4.645088) 2953.804 (4.038012) positive HBC 246.6707 (1.824513) 269.3550 (2.008087) positive HBP -41.07500 (-0.334783) -38.39052 (-0.310557) positive OBY -14.01551 (-0.085363) -63.05713 (-0.372228) positive Adjusted R2 0.797766 0.794527 Note: t-statistics are in parenthesis ( ) thick underline = significant at 99.5% level of certainty, thin underline = significant at 99% level of certainty, double underline = significant at 95% level of certainty, and dotted underline = significant at 90% level of certainty. As observed from Table 4, the adjusted R2 for Equation 1-A is 0.797, and the adjusted R2for Equation 1-B is 0.794. According to A. H. Studenmund (2001), the closer the adjusted R2 isto 1, the closer the estimated equation fits the data (A. H. Studenmund. 2001). Therefore, theadjusted R2 is quite strong for both equations and slightly stronger for Equation 1-A. In order to test the hypothesis that the coefficients have a significant impact on LCAD(lethal coronary artery disease), the t-test will be used. For the coefficients whose signs areexpected to be positive or negative, I will use a one-sided test. The decision rule for a one-sidedtest is if the absolute value of the t-statistic (in Table 4) is greater than the absolute value of thecritical-t then that coefficient is significant. For coefficients with signs that are expected to beambiguous, I will use a two-sided test. The decision rule for a two-sided test is if the positive t-statistic is greater than the positive critical-t, or the negative t-statistic is less than the negativecritical-t then that coefficient is significant. Out of the thirteen variables, eight are not significant. These eight are LHI (lack of healthinsurance), HBP (high blood pressure), OBY (obesity), EDU (education), INC (income), SHE(state health expenditures), ALC (alcohol), and SSL (state stress levels). All eight are tested atthe 90% level of certainty, and are found to be insignificant.
There are five significant variables, all of which are expected to effect lethal coronaryartery disease positively. For 99.5% level of certainty, the critical-t is 2.704. Out of the fivesignificant independent variables only DIB (diabetes) for both equations is significant at thislevel. This means with 99.5% certainty, every additional one percentage point of the populationthat has been diagnosed with diabetes causes a 2,676.302 to 2,953.804 rise in the population per100,000 dying from lethal coronary artery disease. For 99% level of certainty, the critical-t is 2.423. At this level, INY (inactivity) and TOB(tobacco) for both equations are significant. Specifically for INY, this means that with 99%certainty, every additional one percentage point of the population with no leisure-time activitycauses a 186.9634 to 199.621 increase in the population per 100,000 dying from lethal coronaryartery disease. For TOB, this means that with 99% certainty, every additional one percentagepoint of the population that has ever smoked 100 or more cigarettes and currently smoke causes a347.04 to 374.4743 increase in the population per 100,000 dying from lethal coronary arterydisease. For 95% level of certainty, the critical-t is 1.684. This level has HBC (high bloodcholesterol) for both equations as being significant. This means that with 95% certainty, forevery additional one percentage point of the population with high cholesterol there is a 246.6707to 269.355 increase in population per 100,000 dying from lethal coronary artery disease. For 90% level of certainty, the critical-t is 1.303. For this level, AAP (average age of thepopulation) for both equations is significant. This poses a problem. The sign for the coefficientfor AAP is expected to be positive, but the estimated coefficient turns out to be negative. Oneexplanation is an omitted variable. An omitted variable can bias the estimated coefficient of avariable if one of the following is true: the correlation coefficient (r) between the omitted
variable and LCAD (lethal coronary artery disease) is negative, and the estimated coefficient (β)for the omitted variable is positive, or r is positive, and the β is negative. One possible omittedvariable is a measure of the quality of health care which would have a positive r and a negativeβ. Out of my thirteen variables, only SHE (state healthcare expenditures) measures any of thequality of healthcare, and that is only a small part of that variable. That is why I believe theincorrect sign is from the omitted variable of quality of health care.Conclusion: In brief, this paper investigates the effects thirteen variables have on lethal coronaryartery disease through OLS regression analysis using 49 observations in the year 2000. Equation1 was tested for multicollinearity and heteroskedasticity. There was no problem withheteroskedasticity, but there was a problem with multicollinearity. In order to fix it, Equation 1was split into two variations: Equation 1-A and Equation 1-B. Each new equation was estimatedresulting in a very high accuracy (adjusted R2) with Equation 1-A being slightly more accurate. Out of the thirteen variables, eight were insignificant. These variables are LHI (lack ofhealth insurance), HBP (high blood pressure), OBY (obesity), EDU (education), INC (income),SHE (state health expenditures), ALC (alcohol), and SSL (state stress levels). That leaves fivevariables that significantly affect lethal coronary artery disease. These five are DIB (diabetes),INY (inactivity), TOB (tobacco), HBC (high blood cholesterol), and AAP (average age of thepopulation). AAP is the only variable that is significant in the opposite way that was expected. Idetermined that this was due to an omitted variable, namely, the quality of health care. Myresults show that in order to avoid coronary artery disease, the best strategy is to prevent gettingdiabetes, be more active, smoke less, and control cholesterol levels. If this study is to be done again, I have two recommendations. First, I recommend
including the omitted variable, quality of healthcare. This would avoid the biased coefficient foraverage age and make the equation more accurate. Second, spend more time finding data fromthe same year. Even though a couple of years off should not make a huge difference, theequation would be more accurate if all of the data is from the same year.
Data SourcesThe data for lethal coronary artery disease was obtained from the National Center for Chronic Disease Prevention and Health Promotion at: http://www.cdc.gov/nccdphp/burdenbook2004/Section02/heart.htmThe data for education was obtained from the National Census Bureau at: http://www.census.gov/prod/2003pubs/02statab/educ.pdfThe data for income was obtained from the National Census Bureau at: http://www.census.gov/prod/2002pubs/01statab/income.pdfThe data for lack of health insurance was obtained from the United Health Foundation at: http://www.unitedhealthfoundation.org/shr2003/components/lackinsurance.htmlThe data for state health expenditures was obtained from the Milbank Memorial Fund at: http://www.milbank.org/reports/2000shcer/nasbotable14.htmlThe data for alcohol consumption was obtained from the Brewers Association at: http://22.214.171.124/beerinfo/bystate.shtmlThe data for inactivity was obtained from the National Center for Chronic Disease Prevention and Health Promotion at: http://www.cdc.gov/nccdphp/burdenbook2002/03_leisureadult.htmThe data for state stress levels was obtained from the National Census Bureau at: http://www.census.gov/prod/2003pubs/02statab/pop.pdfThe data for tobacco consumption was obtained from the National Census Bureau at: http://www.census.gov/prod/2003pubs/02statab/health.pdfThe data for average age of the population was obtained from the National Census Bureau at:
http://tinyurl.com/6lez4The data for diabetes was obtained from the National Center for Chronic Disease Prevention and Health Promotion at: http://tinyurl.com/6jkc6The data for high blood cholesterol was obtained from the National Center for Chronic Disease Prevention and Health Promotion at: http://www.cdc.gov/nccdphp/burdenbook2004/Section03/cholesterol.htmThe data for high blood pressure was obtained from the National Center for Chronic Disease Prevention and Health Promotion at: http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5121a2.htmThe data for obesity was obtained from the American Obesity Association at: http://www.obesity.org/subs/fastfacts/obesity_US.shtmlThe data for state populations was obtained from the National Census Bureau at: http://www.census.gov/population/cen2000/phc-t2/tab01.pdf
Works CitedA. H. Studenmund. “Using Econometrics: A Practical Guide.” 4th edition. Addison Wesley Longman. 2001. April 18, 2005.Anil Kumar. “Who Doesn’t Have Health Insurance and Why?” Federal Reserve Bank of Dallas. 2004. April 18, 2005. http://www.dallasfed.org/research/swe/2004/swe0406a.html“Blood Pressure.” American Heart Association. 2004. April 18, 2005. http://www.americanheart.org/presenter.jhtml?identifier=4473“Causes of Heart Disease.” Texas Heart Institute. December 8, 2004. April 18, 2005. http://chinese-school.netfirms.com/heart-disease-causes.html“Coronary.” Mama’s Health. April 18, 2005. April 18, 2005. http://www.mamashealth.com/Coronary.asp“Diabetes: Type 2 Diabetes.” WebMD. June, 2004. April 18, 2005. http://my.webmd.com/content/article/59/66844.htm“Health and Wealth – A Fundamental Look.” Region of Peel. 2001. April 18, 2005. http://www.region.peel.on.ca/health/health-status-report/pdfs/health_wealth.pdf“Heart Disease: Alcohol and Your Heart.” WebMD. June, 2004. April 18, 2005. http://my.webmd.com/content/pages/9/1675_57836.htm“Heart Disease: Exercise for a Healthy Heart.” WebMD. June, 2004. April 18, 2005. http://my.webmd.com/content/pages/9/1675_57839.htm“Heart Disease: Risk Factors For Heart Disease.” WebMD. June, 2004. April 18, 2005. http://my.webmd.com/content/pages/9/1675_57840.htm“Heart Disease: Smoking and Heart Disease.” WebMD. June, 2004. April 18, 2005.
http://my.webmd.com/content/pages/9/1675_57857.htm“Obesity and Heart Disease.” American Heart Association. 1997. April 18, 2005. http://circ.ahajournals.org/cgi/content/full/96/9/3248“Risk factors for coronary heart disease and level of education.” The Tromso Heart Study. 1988. April 18, 2005. http://aje.oupjournals.org/cgi/content/abstract/127/5/923“What is Coronary Artery Disease?” National Heart, Lung, and Blood Institute. August, 2003. April 18, 2005. http://www.nhlbi.nih.gov/health/dci/Diseases/Cad/CAD_WhatIs.html