Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

2,968 views

Published on

No Downloads

Total views

2,968

On SlideShare

0

From Embeds

0

Number of Embeds

103

Shares

0

Downloads

22

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Chapter 4 More about relationships between 2 variables
- 2. 4.1 TRANSFORMING TO ACHIEVE LINEARITY
- 3. What if the scatterplot is not linear? • Of course not all data is linear! • Our method in statistics will involving mathematically operating on one or both of the explanatory and response variables • An inverse transformation will be used to create a non-linear regression model • This will be a little “mathy”
- 4. Transformations • Before we begin transformations, remember that some well known phenomenon act in predictable ways – I.e. when working with time and gravity, you should know that there is a square relationship between distance and time!
- 5. The Basics • The data from measurements (raw data) must be operated on. • Apply the same mathematical transformation on the raw data – Ex. “Square every response” • Use methods from the previous chapter to find the LSRL for the transformed data • Analyze your regression to ensure the LSRL is appropriate • Apply an inverse transformation on the LSRL to find the regression for the raw data.
- 6. Example Please refer to p 265 exercise 4.2 Length (cm) Period (s) 16.5 0.777 17.5 0.839 19.5 0.912 22.5 0.878 28.5 1.004 31.5 1.087 34.5 1.129 37.5 1.111 43.5 1.290 46.5 1.371 106.5 2.115
- 7. Example • Data inputted into L1 and L2 • Scatterplot • Looks pretty good, right?
- 8. Exercise • LSRL • Y=.6+.015X r = 0.991 • Residual Plot • Perhaps we can do better!
- 9. Example • L3 = L2^.5 (square root) • LinReg L1, L3 • Note that the value of r2 has increased • Note that the value of the residual of the last point has decreased
- 10. Exponential Models • Many natural phenomenon are explained by an exponential model. • Exponential models are marked by sharp increases in growth and decay. • Basic model: y = A·Bx • For this transformation, you need to take the logarithm of the response data. • You may use “log10” or “ln” your choice. – I prefer “ln” (of course)
- 11. Exponential Models After the transformation, we have the following linear model: ln(y) = a + b·x 1. ln(y) = a + b·x 2. eln(y) = e(a + b·x) exponentiate 3. y = ea · ebx property of logarithms 4. Let ‘A’ = ea redefine variables ‘B’ = eb 5. y = A·Bx this is our model
- 12. Exponential Models • Since this is an ‘applied math’ course, you need not remember how to apply the inverse transformation • Whew • BUT you do need to memorize: when ln(y) = a + bx y = A·Bx where ‘A’ = ea and ‘B’ = eb
- 13. Exponential Models Let’s try this data
- 14. Exponential Models Take the ln of L2- the response list and store in L3
- 15. Exponential Models These are our “transformed responses”
- 16. Exponential Models From our homescreen, we perform an LSRL using the transformed data
- 17. Exponential Models We don’t have to store this regression for transformed data
- 18. Exponential Models Take note of the values of ‘a’ and ‘b’
- 19. Exponential Models A quick look at the residuals
- 20. Exponential Models The values of the residuals are small .. . no defined pattern
- 21. Exponential Models • Our regression model is exponential y = A·Bx Where A = ea and B = eb • y = e0.701 · (e0.184)x
- 22. Exponential Models • Our regression model is exponential y = A·Bx Where A = ea and B = eb • y = e0.701 x (e0.184)x
- 23. Exponential Models • Our regression model is exponential y = A·Bx Where A = ea and B = eb • y = e0.701 x (e0.184)x • Or y = 2.06 · (1.20)x
- 24. Exponential Models Put our regression in Y1
- 25. Exponential Models Change Plot1 from a resid. to a scatter plot
- 26. Exponential Models Looks pretty good, eh?
- 27. Power Models • These models are used when the rate of increase is less severe than an exponential model, or if you suspect a ‘root’ model • For this model, you will find the logarithms of both the expl var and the resp var
- 28. Power models LSRL on transformed data yields: ln(y) = a + b·ln(x) 1. ln(y) = a + b·ln(x) 2. e ln(y) = e(a + b·ln(x)) 3. y = ea·eln(x^b) 4. y = ea ·xb 5. Let ‘A’ = ea 6. y = A · xb
- 29. Power models Let’s use this data to find a power model
- 30. Power models This time we need to transform both lists
- 31. Power models This time we need to transform both lists
- 32. Power models Transformed exp = L3 Transformed resp = L4
- 33. Power models LSRL on transformed data no need to store in Y1
- 34. Power models Take note of the values of ‘a’ and ‘b’
- 35. Power models A quick look at the residuals
- 36. Power models Note that we use the transformed exp var
- 37. Power models No defined pattern
- 38. Power models Residuals are all small in size
- 39. Power models • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea Our model is y = (e1.31)· x1.27
- 40. Power models • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea Our model is y = (e1.31) · x1.27
- 41. Power models • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea Our model is y = (e1.31) · x1.27 Or y = 3.71 · x1.27
- 42. Power models Regression in Y1
- 43. Power models Change from resid to scatter plot
- 44. Power models (notice L1 and L2)
- 45. Power models Looks pretty good!
- 46. Power models • Much like the exponential model, you only need to know how the transformed model becomes the model for the raw data. • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea
- 47. Transformation thoughts • Although this is not a major topic for the course, you still need to be able to apply these two transformations (exp and power) • Be sure to check the residuals for the LSRL on transformed data! You may have picked the wrong model :/ • If one model doesn’t work, try the other. I would start with the exponential model. • Don’t transform into a cockroach. Ask Kafka!
- 48. Assn 4.1 • pg 276 #5, 8, 9, 11, 12
- 49. 4.2 RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES
- 50. Marginal Distributions • Tables that relate two categorical variables are called “Two-Way Tables” – Ex 4.11 pg 292 • Marginal Distribution – Very fancy term for “row totals and column totals” – Named because the totals appear in the margins of the table. Wow. • Often, the percentage of the row or column table is very informative
- 51. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639
- 52. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Column Totals
- 53. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Row Totals
- 54. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Grand Total
- 55. Marginal Distributions “Age Group”
- 56. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639
- 57. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Row total / grand total 150/16639=0.009
- 58. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 0.9% 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Row total / grand total 150/16639=0.009
- 59. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 0.9% 18-24 5668 4697 10365 62.3% 25-34 1904 1589 3494 21.0% 35 or older 1660 970 2630 15.8% Totals 9321 7317 16639 100% Adds to 100%
- 60. Marginal Distributions “Gender” Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 &up 1660 970 2630 Totals 9321 7317 16639 Margin dist. 56% 44% 100% Similarly for columns
- 61. Describing Relationships • Some relationships are easier to see when we look at the proportions within each group • These distributions are called “Conditional Distributions” • To find a conditional distribution, find each percentage of the row or column total. • Let’s look at the same table, and find the conditional distribution of gender, given each age group
- 62. Conditional Distributions Age Group Female Male Total 15-17 89 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%)
- 63. Conditional Distributions Age Group Female Male Total 15-17 89 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) We will look at the conditional distribution for this row
- 64. Conditional Distributions Age Group Female Male Total 15-17 89 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 89/150 (cell total /row total) =53.9%
- 65. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 89/150 (cell total /row total) =59.3%
- 66. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 61/150 (cell total /row total) =40.7%
- 67. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 61/150 (cell total /row total) =40.7%
- 68. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%)
- 69. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) The table with complete conditional distributions for each row
- 70. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) For an analysis of the effect of age groups, compare a row’s conditional distribution…
- 71. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) With the marginal distribution for the columns…
- 72. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) They should be close …
- 73. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) … unless there is an effect caused by the age group (?)
- 74. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) … and these are not close to the marginal distribution!
- 75. Conditional Distributions • Based on the previous table, the distribution of “gender given age group” are not that different. • We can see that the “35 and older” group seems to differ slightly from the overall trend.
- 76. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%)
- 77. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%) Here is the same chart with the conditional distributions by gender…
- 78. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%) Is there a gender effect noticeable from this table?
- 79. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%)
- 80. Conditional Distribution Conclusions from the previous chart • Females are more likely to be in the “35 and older group” and less likely to be in the “18 to 24” group • Males are more likely to be in the “18 to 24” group and less likely to be in the “35 and older” group • These differences appear slight. Are actually “significant” with respect to the overall distribution?
- 81. Conditional Distribution • No single graph portrays the form of the relationship between categorical variables. • No single numerical measure (such as correlation) summarizes the strength of the association.
- 82. Simpson’s Paradox • Associations that hold true for all of several groups can reverse direction when teh data is combined to form a single group. • EX 4.15 pg 299 • This phenomenon is often the result of an “unaccounted” variable.
- 83. Assignment 4.2 • Pg 298 #23-25, 29, 31-35
- 84. 4.3 ESTABLISHING CAUSATION
- 85. Different Relationships • Suppose two variables (X and Y) have some correlation – i.e. when X increases in value, Y increases as well – One of the following relationships may hold.
- 86. Different Relationships Causation • In this relationship, the explanatory variable is somehow affecting the response variable. • In most instances, we are looking to find evidence of a causation relationship
- 87. Different Relationships Causation
- 88. Different Relationships Common Response • In this relationship, both X and Y are correlated to a third (unknown) variable (Z). • EX, When Z increases X increases and Y increases. • Unless we known about Z, it appears as though X and Y have a causation relationship.
- 89. Different Relationships Common Response
- 90. Different Relationships Confounding • X and Y have correlation, • An (often unknown) third variable ‘Z” also has correlation with Y • Is X the explanatory variable, or is Z the explanatory variable, or are the both explanatory variables?
- 91. Different Relationships Confounding
- 92. Causation • The best way to establish causation is with a carefully designed experiment – Possible ‘lurking variables’ are controlled • Experiments cannot always be conducted – Many times, they are costly or even unethical • Some guidelines need to be established in cases where an observational study is the only method to measure variables.
- 93. Causation- some criteria • Association is strong • Association is consistent (among different studies) • Large values of the response variable are associated with stronger responses (typo?) • The alleged cause precedes the effect in time • The alleged cause is probable
- 94. Assignment 4.3 Pg312 #41, 45, 50, 51
- 95. Chapter 4 Review • #37, 53, 54, 57

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment