Hm306 week 5

© 2016© 2016
A Practical Approach to Analyzing
Healthcare Data
Chapter 6 – Analyzing the
Relationship between Two
Variables

© 2016
Categorical Variables
• Descriptive Statistics
– Contingency tables
– Used to display and analyze the relationship between two
categorical variables
– Notice in table below:
• 20/32 = 62.5% of female patients were discharged home
• 10/24 = 41.7% of male patients were discharged home
• Inferential Statistics
– Is this just a random occurrence or is this evidence that there
is a significant relationship between gender and being
discharged to home?
– An hypothesis test may be used to answer that question

© 2016
Example: Chi-squared Test of
Independence
Step Response
1. Determine the null and
alternative hypotheses
Ho: Discharged to Home and
Gender are independent
H1: Discharged to Home and
Gender are not independent
2. Set the acceptable type I error or
alpha level
The analyst is willing to accept a
5% chance or probability of
rejecting the null hypothesis when it
is true. Alpha = 5% or 0.05
3. Select the appropriate test
statistic
Chi-squared

© 2016
Independence
• Test statistics typically compare the value observed in the
sample to the null hypothesis value.
• If gender and discharged home were independent, then we
would expect the distribution of subjects among the four cells
(Male/female x home/not home) to be uniform and not have a
pattern.
• In other words, the proportion of males sent home should be
similar to the proportion of the females sent home if the null
hypothesis were indeed true.
• The basis of the chi-squared test statistic is the observed and
expected frequencies in each of the table cells

© 2016
Independence

© 2016
Independence
Test
statistic:

© 2016
Independence
• Last two steps in hypothesis test:
4. Compare the test statistic to a critical value based on the alpha level and the distribution of the
test statistic
5. Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not
reject the null hypothesis.
• Chi-squared test statistic follows the Chi-squared distribution with (r-1)x(c-1) degrees of
freedom. r = rows in contingency table and c = columns
– Chi-squared distribution is always non-negative
– Degrees of freedom define the shape
• Since alpha was set to be 0.05 (5%), reject H0 if the test statistic is greater than 3.841
– X2 = 2.39 which is not greater than 3.841
– Do not reject H0
• Conclusion: The sample data does not provide sufficient evidence to reject H0 and
conclude that there is no significant relationship between gender and the likelihood being
discharged to the home setting

© 2016
Sensitivity and Specificity
• Measures the accuracy of predictions made by
categorical variables
• When using one categorical variable (smoking
status) to predict another categorical variable
(cancer status)
• Sensitivity – proportion of sample with the
indicator present and a positive test divided by the
number of those with an indicator present.
• Specificity – the proportion of the sample without
the indicator and a negative test divided by the
number of those without an indicator

© 2016
Sensitivity/Specificity Example
A health plan wishes to use accessing their patient portal as a predictor of
whether or not a patient will seek care at an emergency room during the year.
That is, they believe that patients that do not access the patient portal are more
likely to experience an ER visit. They collected the following data based on
enrollees during the previous plan year. Calculate the sensitivity and specificity
of patient portal use as a predictor of ER use.
Note that the contingency table is set up so that ‘no’ for patient portal access
and ‘yes’ for ER visit are in cell ‘A’ (upper left hand corner). This is because the
health plan believes that patients that do not use the patient portal are MORE
likely to experience an ER visit.
ER Visit During Previous Year?
Patient Portal Access? Yes No
No 30 23
Yes 15 86

© 2016
Sensitivity/Specificity Example
ER Visit During Previous Year?
Patient Portal Access? Yes No
No A: 30 B: 23
Yes C: 15 D: 86
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝐴
𝐴 + 𝐶
=
30
30 + 15
=
30
45
= 0.667 = 66.7%
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝐷
𝐷 + 𝐵
=
86
86 + 23
=
86
109
= 0.789 = 78.9%

© 2016
Descriptive Statistics - Correlation
• Pearson’s correlation coefficient (r)
– Measures the linear association between two continuous
variables
• Spearman’s Rho (r)
– Measures the linear association between two ordinal variables or
one ordinal and one continuous variable
• Correlation between two variables does not imply causation –
only that the two have a relationship or are ‘associated’
• Be aware that correlation measures the linear association of
two variables
– They may be related in a non-linear way that may result in
misleading values for the correlation coefficients

© 2016
Descriptive Statistics –
Pearson’s Correlation Coefficient
• Used for measuring the linear association between
two continuous variables
• Values from -1 to +1
• Positive value means that both variables
increase/decrease together
– Example: Charges and length of stay
• Negative value means that one variable increases
as the other decreases
– Example: Experience and time to code a medical
record

© 2016
• Example of negative correlation
– More experienced coders require less time to
code records – in general

© 2016
• Example of positive correlation
– Longer lengths of stay result in longer
charges – in general

© 2016
Pearson’s Correlation
Coefficient Example
• 𝑟 =
65,754
14.80× 336,460,939
= 0.93

© 2016
Spearman’s Rho Correlation Coefficient
• Used for measuring the linear association between two ordinal variables
or an ordinal and continuous variable
• Operates on the ranks for the paired values and not the actual variable
values
– Typically rank ties are broken with average ranks
• Values from -1 to +1
• Positive value means that both variables increase/decrease together
– Example: patient severity level and charges
• Negative value means that one variable increases as the other decreases
– Example: Grade in elementary school and time to run 100 yards
• Same formula a Pearson’s r, but use ranks instead of actual values
• If there are no ties in the ranks, may use (Where Di is the difference
between the ranks of the ith pair of variables and n is the sample size):

© 2016
Inferential Statistics –
T-test for correlations
• Used to test the null hypothesis that the correlation
coefficient is zero
• Same formula for both Pearson’s and Spearman’s
correlation coefficients
• Note that the sample size in is the numerator of the
test statistic
• For very large samples, the test may reject the
hypothesis of 0 correlation when the value of the
sample correlation is not practically significant

© 2016
T-test for correlations - Example
• Test the hypothesis that the correlation between
length of stay and charges in the previous example if
different from zero.
• Step 1: State the null and alternative hypotheses
– Ho: r ≤ 0
– Ha: r > 0
– Note: In practice, a one sided test of significance is
used for r. If the sample value is > 0, then the
alternative hypothesis is ‘>0’. If the sample value is
negative, then the alternative hypothesis is ‘<0’.
• Step 2: Set the acceptable alpha level = 0.05

© 2016
T-test for correlations - Example
• Step 3: Determine the test statistic and
calculate the value
– T-test for correlations
– 𝑡 = 𝑟 ×
𝑛−2
1−𝑟2
= 0.93×
5−2
1−0.932
= 4.71
• Step 4: Compare the test statistic to the
critical value
– Use t-distribution with d.f. = n-2 = 3 and
alpha = 0.05 is 2.353
– t= 4.71 > 2.353,
• Step 5: Reject the null hypothesis since
4.71 > 2.353 and conclude that the
correlation between LOS and charge is not
zero

© 2016
Inferential Statistics
Simple Linear Regression
• Used to formulate a functional relationship between two
continuous variables
• A linear function of the independent variable (X) is estimated
to predict values of the dependent variable (Y)
• Slope-intercept form of a line:
– Y = a + bX
– a is the y-intercept
– b is the slope of the line
• If variables are positively correlated, the slope of the line is
positive
• If variables are negatively correlated, the slope of the line is
negative

© 2016
Simple Linear Regression - Example
• Least squares regression
– Minimizes the vertical distance from each point to line
– Vertical distance called the ‘error’ or ‘residual’
• Least square line provides a line that comes as close as
possible to all points, but may not actually intersect with
any of them

© 2016
• Slope of line is 4,443
– Interpretation: The expected charge increase for each additional day is
$4,443
• Intercept of line is $7,801
– Interpretation: The expected charge with a zero day stay is $7,801
– Zero stay is not realistic, but intercept gives an estimate of the fixed cost
of admitting a patient while the slope represents the variable cost.

© 2016
Multiple R = Pearson’s r
R Square = Pearson’s r squared
R Square estimates the amount of
variance in the dependent variable
explained by the independent variable
T stat and p-value for
testing that intercept and
slope are not equal to
zero
Note: If p-value is less
than alpha, then reject
null hypothesis

© 2016
Coefficient of Determination
• In simple linear regression (one independent variable)
– Multiple R is the Pearson’s Correlation Coefficient value for
the correlation
– R Square is also called the coefficient of determination
– The coefficient of determination measures the amount of
variance in the dependent variable that is explained by the
independent variable
– In our example, 87% of the variance in charge is explained
by length of stay

© 2016
Regression Hypothesis Tests
• Two hypothesis tests are presented in this table
– Ho: Intercept = 0 vs H1: Intercept ≠ 0
• P-value = 0.121 > do not reject
• Even though the intercept is not statistically different from
zero (do not reject the null hypothesis that it is equal to
zero), the intercept is typically kept in the model
– Ho: Slope = 0 vs H1: Slope ≠ 0
• P-value = 0.021 > reject Ho and conclude that the slope is
not equal to zero
• The interpretation here is that LOS gives us useful
information about the charge since the slope of the
regression line is non-zero

© 2016
Regression Assumptions
• Residuals
– Difference between the actual value of the dependent
variable and the value predicted using the regression
equation
– The vertical (y-axis) distance from an individual point
to the regression line
• Must test the following assumptions regarding the
residuals:
– Independence
– Normally distributed
– Mean of zero

Hm306 week 5

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Hm306 week 5

Similar to Hm306 week 5 (20)

More from BealCollegeOnline

More from BealCollegeOnline (20)

Recently uploaded

Recently uploaded (20)

Hm306 week 5