Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Evaluation and research_methods_in_... by Malik Khalid Mehmood 423 views
- Health sciences university of mongolia by naranbatn 2942 views
- хичээлийн хуваарь 2011-2012 1-р улирал by naranbatn 2882 views
- хэлэлцүүлэх материалын танилцуулга by naranbatn 1243 views
- Магистрын ганцаарчилсан сургалтын т... by naranbatn 3002 views
- эрүүл мэндийн шинжлэх ухаан by naranbatn 1852 views

4,199 views

Published on

No Downloads

Total views

4,199

On SlideShare

0

From Embeds

0

Number of Embeds

2,123

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. 1 Research Methods in Health Chapter 9. Estimation Young Moon Chae, Ph.D. Graduate School of Public Health Yonsei University, Korea ymchae@yuhs.ac
- 2. 2 Correlation
- 3. 3 Questions • Why does the maximum value of r equal 1.0? • What does it mean when a correlation is positive? Negative? • What is the purpose of the Fisher r to z transformation? • What is range restriction? Range enhancement? What do they do to r? • Give an example in which data properly analyzed by ANOVA cannot be used to infer causality. • Why do we care about the sampling distribution of the correlation coefficient? • What is the effect of reliability on r?
- 4. 4 Basic Ideas • Nominal vs. continuous IV • Degree (direction) & closeness (magnitude) of linear relations -Sign (+ or -) for direction -Absolute value for magnitude • Pearson product-moment correlation coefficient N zz r YXå=
- 5. 5 Illustrations 757269666360 Height 210 180 150 120 90 Weight Plot of Weight by Height 4003002001000 Study Time 30 20 10 0 Errors Plot of Errors by Study Time 1.91.81.71.61.5 Toe Size 700 600 500 400 SAT-V Plot of SAT-V by Toe Size Positive, negative, zero
- 6. 6 Graphic Representation 757269666360 Height 210 180 150 120 90 Weight Plot of Weight by Height 757269666360 Height Plot of Weight by HeightPlot of Weight by Height Mean = 66.8 Inches Mean = 150.7 lbs. 210-1-2 Z-height 2 1 0 -1 -2 Z-weight Plot of Weight by Height in Z-scores 2 1 0 -1 -2 Z-weight Plot of Weight by Height in Z-scoresPlot of Weight by Height in Z-scores + - - + 1. Conversion from raw to z. 2. Points & quadrants. Positive & negative products. 3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall. 4. Product at maximum (average =1) when points on line where zX=zY.
- 7. 7 Correlation Analysis • It measures the closeness of the relationship between two or more variables • The degree of association or covariation between variables, no causality • Measures of Association by Measurement • Interpretation of Correlation -T-test
- 8. 8 Regression
- 9. 9 Questions • What are predictors and criteria? • Write an equation for the linear regression. Describe each term. • How do changes in the slope and intercept affect (move) the regression line? • What does it mean to test the significance of the regression sum of squares? R-square? • What is R-square? • What does it mean to choose a regression line to satisfy the loss function of least squares? • How do we find the slope and intercept for the regression line with a single independent variable? (Either formula for the slope is acceptable.) • Why does testing for the regression sum of squares turn out to have the same result as testing for R- square?
- 10. 10 Basic Ideas • Jargon -IV = X = Predictor (pl. predictors) -DV = Y = Criterion (pl. criteria) -Regression of Y on X e.g., GPA on SAT • Linear Model = relations between IV and DV represented by straight line. • A score on Y has 2 parts – (1) linear function of X and (2) error. Y Xi i i= + +a b e (population values)
- 11. 11 Regression Analysis • It refers to the techniques used to derive an equation that relates the criterion variable to one or more predictor variables • Method of least squares • Standardized coefficients • Goodness of fit -F test, t test, Coefficient of Determination • multicollinearity
- 12. 12 Multiple linear regression
- 13. 13 ANOVA as linear regression
- 14. 14 Results
- 15. 15 Raw & Standardized Regression Weights • Each X has a raw score slope, b. • Slope tells expected change in Y if X changes 1 unit*. • Large b weights should indicate important variables, but b depends on variance of X. • A b for height in inches would be 12 times larger than b for height in feet. • If we standardize X and Y, all units of X are the same. • Relative size of b now meaningful. *strictly speaking, holding other X variables constant.
- 16. 16 Tests of R2 vs Tests of b • Slopes (b) tell about the relation between Y and the unique part of X. R2 tells about proportion of variance in Y accounted for by set of predictors all together. • Correlations among X variables increase the standard errors of b weights but not R2 . • Possible to get significant R2 , but no or few significant b weights • Possible but unlikely to have significant b but not significant R2 . Look to R2 first. If it is n.s., avoid interpreting b weights.
- 17. 17 Testing Incremental R2 You can start regression with a set of one or more variables and then add predictors 1 or more at a time. When you add predictors, R 2 will never go down. It usually goes up, and you can test whether the increment in R 2 is significant or else if likely due to chance. )1/()1( )/()( 2 22 --- -- = LL SLSL kNR kkRR F 2 LR 2 SR Sk Lk =R-square for the larger model =R-square for the smaller model = number of predictors in the larger model =number of predictors in the smaller model
- 18. 18 (cont.) • In regression problems, the most commonly used indices of importance are the correlation, r, and the increment to R-square when the variable of interest is considered last. The second is sometimes called a last-in R-square change. The last-in increment corresponds to the Type III sums of squares and is closely related to the b weight. • The correlation tells about the importance of the variable ignoring all other predictors. • The last-in increment tells about the importance of the variable as a unique contributor to the prediction of Y, above and beyond all other predictors in the model. •“Importance” is not well defined statistically when IVs are correlated. Doesn’t include mediated models (path analysis).
- 19. 19 Collinearity Defined • The problem of large correlations among the independent variables • Within the set of IVs, one or more IVs are (nearly) totally predicted by the other IVs. • In such a case, the b or beta weights are poorly estimated. • Problem of the “Bouncing Betas.”
- 20. 20 Dealing with Collinearity • Lump it. Admit ambiguity; SE of b weights. Refer also to correlations. • Select or combine variables. • Factor analyze set of IVs. • Use another type of analysis (e.g., path analysis). • Use another type of regression (ridge regression). • Unit weights (no longer regression).
- 21. 21 Diagnostics Checking Assumptions and Bad Data
- 22. 22 Good-Looking Graph 6420-2 X 9 6 3 0 -3 Y No apparent departures from line.
- 23. 23 Problem with Linearity 50 100 150 200 250 Horsepower 10 20 30 40 50 MilesperGallon R Sq Linear = 0.595
- 24. 24 Outliers 65320-2 X 10 8 6 3 1 -1 Y Outlier Outlier = pathological point
- 25. 25 Non-parametric or Distribution-free Tests
- 26. 26 Non-parametric or Distribution-free Tests • Two kinds of assertions in statistical tests: 1. Assertion directly related to the purpose of investigation, i.e., hypothesis to be tested 2. Assertion to make a probability statement. Set of all assertions is called the model • Testing a hypothesis without a model is non-parametric test. That is, tests which do not make basic assumptions about and without having the knowledge of the distribution of the population parameters
- 27. 27 Characteristics 1.Do not depend on any assumptions about properties / parameters of the parent population, I.e., do not suppose any particular distribution & consequential assumptions (Parametric tests like ‘t’& ‘F’ tests make assumption about homogeneity of the variances) & No such assumptions or less restricting assumptions 2.When measurements are not so accurate, non-parametric tests come very handy 3.Most non-parametric tests assume only nominal or ordinal data I.e., more suitable (than parametric tests) for nominal & ordinal (or rated data) 4.Involves few arithmetic computations
- 28. 28 (cont.) 5.Usually less efficient & powerful than parametric tests as they are based on no assumption 6.Greater risk of accepting a false hypothesis and committing type II error; Non-parametric tests require more observations than parametric tests to achieve the same size of type I and type II errors 7.Null hypothesis is somewhat loosely defined & hence rejection of null hypothesis may lead to less precise conclusion than parametric tests 8.It is a trade off between loss in sharpness of estimating intervals and gain in the ability of using less information & to calculate faster
- 29. 29 Some important applications are (I)concerning single value for the given data (II)difference among 2 or more sets of data (III)relations between variables (IV)variation in the given data (V)randomness of a sample (VI)association or dependency of categorical data (VII)comparing theoretical population with actual data in categories
- 30. 30 Typical situation 1.Data not likely to be normally distributed 2.Nominal data from responses to questionnaire 3.Partially filled questions, i.e., to handle incomplete / missing data. to make necessary adjustments to extract maximum information from average data 4.Reasonably good results from even very small sample but need more observations than parametric tests to achieve the same size of type I and type II errors
- 31. 31 31
- 32. 32 Mc Nemer Test •Useful for testing nominal data of two related samples and before –after measurements of the same subjects with a view to judge the significance for any observed change after treatment
- 33. 33 Chi-Square Test • An important non-parametric test for significance of association as well as for testing hypothesis regarding (i) goodness of fit and (ii) homogeneity or significance of population variance • When responses are classified into two mutually exclusive classes like favor -not favor, like -dislike, etc. • To find whether differences exist between observed and expected data • χ2is not a measure of degree of relationship • 2. Assumes random observations • 3. Items in the sample are independent
- 34. 34 (cont.) • Constraints are linear, no cell contains less than five as frequency value and over all no. of items must be reasonably large (Yate’s correction can be applied to a 2x2 table if cells frequencies are smaller than five); Use Kolmogorov-Smirnov Test • PHI Coefficient, φ= √χ2/ N , as a non-parametric measure of coefficient of correlation helps to estimate the magnitude of association; • Cramer’s V-measure, V = φ2/ √min. (r-1), (c-1) • Coefficient of Contingency, C = √χ2/ χ2+ N , also known as coefficient of mean square contingency, is a non-parametric measure of relationship useful where contingency tables are higher order than 2x2 and combining classes is not possible for Yule’s coefficient of association
- 35. 35 Wilcoxon-Mann-Whitney U-Test • Most powerful non-parametric test to determine whether two independent samples have been drawn from the same population. Used as alternative to t-test both for qualitative and quantitative data • Both the samples are pooled together and elements arranged in ascending order to find U
- 36. 36 Wilcoxon Matched Pair or Signed Rank Test • Used in the context of two-related samples where we can determine both direction and magnitude of difference. Examples: wife & husband, subjects studied before & after experiment, comparing output of two machines, etc. • As it attaches greater weight to pair which shows a larger difference it is more powerful test than sign test • Null hypothesis (Ho) is that there is no difference in the two groups with respect to characteristics under study
- 37. 37 K Sample (i.e., more than two sample) Tests The Kruskal-Wallis Test or H Test: • Similar to U test; • H0, ‘K’ individual random samples come from identical universes; does not require approximation of normal distribution as H follows Chi-square distribution; use Chi-square table.
- 38. 38 A few points on K-W • Calculation of P-values: (avoiding type I errors) – F statistic: F distribution (requires normality) – K-W statistic: 2 distribution (requires large samples) – Either statistic: Permutation tests • Power: (avoiding type II errors) – K-W statistic more resistant to outliers – F statistic more powerful in the case of normality • K-W statistic: don’t need to worry about transformations
- 39. 39 Reference • Cohen, Louis and Manion, Lawrence. Research methods in education. London: Routledge, 1980. • Goode, William J and Hatt, Paul K. Methods on social research. London; Mc GrawHill, 1981. • Gopal, M.H. An introduction to research procedures in social sciences. Bombay: Asia Publishing House, 1970. • Koosis, Donald J. Business statistics. New York: John Wiley,1972.
- 40. 40 Multivariate Analysis • Discriminant Analysis -It joins a nominally scaled criterion or dependent variable with one or more independent variables that are interval or ratio scaled. • Multivariate ANOVA -Assesses the relationship between two or more dependent variables and classificatory variables or factors • LISREL (Linear Structural Relationships) -Measurement and Structural equation model -Causality testing
- 41. 41 Interdependency Techniques •Factor analysis -A factor is a linear combination of variables -Construct with a new set of variables based on the relationships in the correlation matrix -Factor loading -Orthogonal or oblique rotaion •Cluster Analysis -A set of technique for grouping similar objects or people

No public clipboards found for this slide

Be the first to comment