Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

1,493 views

1,324 views

1,324 views

Published on

Published in:
Technology

No Downloads

Total views

1,493

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

100

Comments

0

Likes

2

No embeds

No notes for slide

What exactly do I want to find out?

What is a researchable problem?

What are the obstacles in terms of knowledge, data availability, time, or resources?

Do the benefits outweigh the costs?

THEORY, ASSUMPTIONS, BACKGROUND LITERATURE

What does the relevant literature in the field indicate about this problem?

Which theory or conceptual framework does the work fit within?

What are the criticisms of this approach, or how does it constrain the research process?

What do I know for certain about this area?

What is the background to the problem that needs to be made available in reporting the work?

VARIABLES AND HYPOTHESES

What will I take as given in the environment ie what is the starting point?

Which are the independent and which are the dependent variables?

Are there control variables?

Is the hypothesis specific enough to be researchable yet still meaningful?

How certain am I of the relationship(s) between variables?

OPERATIONAL DEFINITIONS AND MEASUREMENT

Does the problem need scoping/simplifying to make it achievable?

What and how will the variables be measured?

What degree of error in the findings is tolerable?

Is the approach defendable?

RESEARCH DESIGN AND METHODOLOGY

What is my overall strategy for doing this research?

Will this design permit me to answer the research question?

What constraints will the approach place on the work?

INSTRUMENTATION/SAMPLING

How will I get the data I need to test my hypothesis?

What tools or devices will I use to make or record observations?

Are valid and reliable instruments available, or must I construct my own?

How will I choose the sample?

Am I interested in representativeness?

If so, of whom or what, and with what degree of accuracy or level of confidence?

DATA ANALYSIS

What combinations of analytical and statistical process will be applied to the data?

Which of these will allow me to accept or reject my hypotheses?

Do the findings show numerical differences, and are those differences important?

CONCLUSIONS, INTERPRETATIONS, RECOMMENDATIONS

Was my initial hypothesis supported?

What if my findings are negative?

What are the implications of my findings for the theory base, for the background assumptions, or relevant literature?

What recommendations result from the work?

What suggestions can I make for further research on this topic?

Administrators can tell us

We notice anecdotally or through qualitative research that a particular subgroup of students is experiencing higher risk

We decide to do everyone and go from there

3 factors that influence sample representativeness

Sampling procedure

Sample size

Participation (response)

When might you sample the entire population?

When your population is very small

When you have extensive resources

When you don’t expect a very high response

Because some members of the population have no chance of being sampled, the extent to which a convenience sample – regardless of its size – actually represents the entire population cannot be known

Inferential Statistics investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population thinks. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what&apos;s going on in our data.

- 1. Data Analysis and Surveying 101:Data Analysis and Surveying 101: Basic research methods and biostatistics as they apply to the Theresa Jackson Hughes, MPH American College Health Association December 2006
- 2. What we will cover today Research Methods • Sampling Frame and Sampling • Generalizability • Bias • Reliability and Validity • Levels of measurement Biostatistics • Statistical significance • Other key terms • Appropriate statistical tests • Fun examples from the Spring 2005 dataset! Get excited! It’s data time!!!
- 3. Research MethodsResearch Methods
- 4. “To do successful research, you don't need to know everything, you just need to know of one thing that isn't known.” • Arthur Schawlow “That's the nature of research - you don't know what in hell you're doing.” • Harold "Doc" Edgerton “If we knew what it was we were doing, it would not be called research, would it?” • Albert Einstein
- 5. What exactly is research? “Scientific research is systematic, controlled, empirical, and critical investigation of natural phenomena guided by theory and hypotheses about the presumed relations among such phenomena.” • Kerlinger, 1986 Research is an organized and systematic way of finding answers to questions
- 6. Important Components of Empirical Research Problem statement, research questions, purposes, benefits Theory, assumptions, background literature Variables and hypotheses Operational definitions and measurement Research design and methodology Instrumentation, sampling Data analysis Conclusions, interpretations, recommendations
- 7. Sampling What is your population of interest? • To whom do you want to generalize your results? All students (18 and over) Undergraduates only Greeks Athletes Other Can you sample the entire population?
- 8. Sampling A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005) Why sample? • Resources (time, money) and workload • Gives results with known accuracy that can be calculated mathematically The sampling frame is the list from which the potential respondents are drawn • Registrar’s office • Class rosters • Must assess sampling frame errors
- 9. Types of Samples Probability (Random) Samples • Simple random sample • Systematic random sample • Stratified random sample Proportionate Disproportionate • Cluster sample Non-Probability Samples • Convenience sample • Purposive sample • Quota
- 10. Sample Size Size of Campus Final Desired N <600 All students 600-2,999 600 3,000-9,999 700 10,000-19,999 800 20,000-29,000 900 ≥30,000 1,000 Depends on expected response rate • Average 85% for paper FINAL SAMPLE DESIRED / .85 = SAMPLE • Average 25% for web FINAL SAMPLE DESIRED / .25 = SAMPLE
- 11. Bias and Error
- 12. Bias and Error Systematic Error or Bias: unknown or unacknowledged error created during the design, measurement, sampling, procedure, or choice of problem studied • Error tends to go in one direction Examples: Selection, Recall, Social desirability Random • Unrelated to true measures Example: Momentary fatigue
- 13. Reliability and Validity Reliability • The extent to which a test is repeatable and yields consistent scores • Affected by random error/bias Validity • The extent to which a test measures what it is supposed to measure • A subjective judgment made on the basis of experience and empirical indicators • Asks "Is the test measuring what you think it’s measuring?“ • Affected by systematic error/bias
- 14. Reliability vs. Validity In order to be valid, a test must be reliable; but reliability does not guarantee validity.
- 15. Levels of Measurement
- 16. Levels of Measurement Nominal • Gender Male, Female • Vaccinations Yes, No, Unsure Ordinal • Personal health status Excellent, Very good, Good, Fair, Poor • Last 30 days Never used, Not in last 30 days, 1-2 days, 3-5 days, 6-9 days, 10-19 days, 20-29 days, All 30 days Interval • Body Mass Index (BMI) Ratio • Number of drinks • Number of sexual partners • Perception percentages • Blood alcohol concentration (BAC)
- 17. BiostatisticsBiostatistics
- 18. “It is commonly believed that anyone who tabulates numbers is a statistician. This is like believing that anyone who owns a scalpel is a surgeon.” • R. Hooke “Torture numbers, and they'll confess to anything.” • Gregg Easterbrook “98% of all statistics are made up.” • Author Unknown
- 19. Types of Statistics Descriptive statistics • Describe the basic features of data in a study • Provide summaries about the sample and measures Inferential statistics • Investigate questions, models, and hypotheses • Infer population characteristics based on sample • Make judgments about what we observe
- 20. Descriptive Statistics Mode Median Mean Central Tendency Variation Range Variance Standard Deviation Frequency
- 21. Descriptive Statistics Examples Categorical Variables (Nominal/Ordinal) Q1 Gen health 9145 16.9 17.0 17.0 23767 43.9 44.2 61.2 16442 30.4 30.6 91.8 3737 6.9 6.9 98.7 565 1.0 1.1 99.8 132 .2 .2 100.0 53788 99.4 100.0 323 .6 54111 100.0 1 excellent 2 very good 3 good 4 fair 5 poor 6 don't know Total Valid SystemMissing Total Frequency Percent Valid Percent Cumulative Percent
- 22. Descriptive Statistics Examples Categorical Variables (Nominal/Ordinal) Q49 Year in school * Q46 Sex Crosstabulation 7366 4154 11520 14.5% 8.2% 22.7% 6755 3678 10433 13.3% 7.2% 20.6% 6195 3333 9528 12.2% 6.6% 18.8% 5192 2676 7868 10.2% 5.3% 15.5% 1380 985 2365 2.7% 1.9% 4.7% 5088 3246 8334 10.0% 6.4% 16.4% 203 105 308 .4% .2% .6% 266 145 411 .5% .3% .8% 32445 18322 50767 63.9% 36.1% 100.0% Count % of Total Count % of Total Count % of Total Count % of Total Count % of Total Count % of Total Count % of Total Count % of Total Count % of Total 1 1st year undergrad 2 2nd year under 3 3rd year under 4 4th year under 5 5th year or more under 6 graduate 7 adult special 8 other Q49 Year in school Total 1 female 2 male Q46 Sex Total
- 23. Descriptive Statistics Examples Descriptive Statistics 51935 534 52 586 153.16 35.791 1281.031 52017 56.00 48.00 104.00 67.2035 4.01241 16.099 53374 88 0 88 4.42 4.401 19.370 53326 65 0 65 2.99 2.726 7.430 50604 2.47 .00 2.47 .0731 .08357 .007 50218 Q48 Weight in pounds HT_INCH Height in Inches Q13 How many drinks Q12 Hours alcohol BAC Blood Alcohol Content Valid N (listwise) N Range Minimum Maximum Mean Std. Deviation Variance Continuous Variables (Interval/Ratio)
- 24. Hypotheses Null hypotheses • Presumed true until statistical evidence in the form of a hypothesis test indicates otherwise There is no effect/relationship There is no difference in means Alternative hypotheses • Tested using inferential statistics There is an effect/relationship There is a difference in means
- 25. Alpha, Beta, Power, Effect Size Alpha – probability of making a Type I error • Reject null when null is true • Level of significance, p value Beta – probability of making a Type II error • Fail to reject null when null is false Power – probability of correctly rejecting null • 1 – Beta Effect Size • Measure of the strength of the relationship between two variables Null is true Null is false Reject null Alpha Type I error 1 – Beta Power CORRECT REJECTION Fail to Reject null 1 – Alpha CORRECT NON- REJECTION Beta Type II error
- 26. Let’s test some hypotheses!!!
- 27. Test of the mean of one continuous variable One-Sample Statistics 53374 4.42 4.401 .019How many drinks N Mean Std. Deviation Std. Error Mean One-Sample Test -30.352 53373 .000 -.578 -.62 -.54How many drinks t df Sig. (2-tailed) Mean Difference Lower Upper 95% Confidence Interval of the Difference Test Value = 5 College students report drinking an average of 5 drinks the last time they “partied”/socialized • Hypotheses Ho: µ = 5 HA: µ ≠ 5 • Test: Two-tailed t-test • Result: Reject null
- 28. Test of a single proportion of one categorical variable 20% of college students report their health is excellent • Hypotheses Ho: p = 20 HA: p ≠ 20 (one-tailed) • Test: Z-test for a single proportion • Result: Reject null Binomial Test <= 1 9145 .170 .2 .000a,b > 1 44643 .830 53788 1.000 Group 1 Group 2 Total Gen health Category N Observed Prop. Test Prop. Asymp. Sig. (1-tailed) Alternative hypothesis states that the proportion of cases in the first group < .2.a. Based on Z Approximation.b.
- 29. Correlations 1 .238** .000 53374 52576 .238** 1 .000 52576 52896 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N How many drinks Partners you had How many drinks Partners you had Correlation is significant at the 0.01 level (2-tailed).**. Test of a relationship between two continuous variables There is a relationship between the number of drinks students report drinking the last time they drank and the number of sex partners they have had within the last school year • Hypotheses Ho: ρ = 0 HA: ρ ≠ 0 • Test: Pearson Product Moment Correlation • Result: Reject null
- 30. Test of the difference between two means Group Statistics 32687 1.34 2.017 .011 18474 1.82 3.627 .027 Sex female male Partners you had N Mean Std. Deviation Std. Error Mean Independent Samples Test 867.978 .000 -19.360 51159 .000 -.483 .025 -.532 -.434 -16.704 25065.988 .000 -.483 .029 -.540 -.426 Equal variances assumed Equal variances not assumed Partners you had F Sig. Levene's Test for Equality of Variances t df Sig. (2-tailed) Mean Difference Std. Error Difference Lower Upper 95% Confidence Interval of the Difference t-test for Equality of Means Men and women report significantly different numbers of sexual partners over the past 12 months • Hypotheses µ1 = µ2 µ1 ≠ µ2 • Test: Independent Samples t-test OR One-way ANOVA • Result: Reject null
- 31. Descriptives Blood Alcohol Content 21285 .0741 .08215 .00056 .0730 .0752 .00 1.27 781 .1127 .09278 .00332 .1062 .1193 .00 .75 3620 .0622 .07357 .00122 .0598 .0646 .00 1.41 18151 .0773 .08539 .00063 .0760 .0785 .00 2.47 4279 .0606 .08490 .00130 .0581 .0631 .00 1.17 2266 .0579 .08296 .00174 .0545 .0613 .00 1.26 50382 .0731 .08357 .00037 .0724 .0738 .00 2.47 residence hall frat/sorority house other university housing off campus with parents other Total N Mean Std. Deviation Std. Error Lower Bound Upper Bound 95% Confidence Interval for Mean Minimum Maximum Test of the difference between two or more means Mean BAC reported differs across student residences • Hypotheses µ1 = µ2 = µ3 =µ4 = µ5 = µ6 µi ≠ µj for at least one pair i, j • Test: One-way ANOVA • Result: Reject null ANOVA Blood Alcohol Content 3.188 5 .638 92.123 .000 348.695 50376 .007 351.884 50381 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig.
- 32. Test of the difference between two or more meansMultiple Comparisons Dependent Variable: Blood Alcohol Content Games-Howell -.03865* .00337 .000 -.0483 -.0290 .01190* .00135 .000 .0081 .0157 -.00316* .00085 .003 -.0056 -.0007 .01350* .00141 .000 .0095 .0175 .01623* .00183 .000 .0110 .0215 .03865* .00337 .000 .0290 .0483 .05055* .00354 .000 .0404 .0606 .03548* .00338 .000 .0258 .0451 .05215* .00356 .000 .0420 .0623 .05488* .00375 .000 .0442 .0656 -.01190* .00135 .000 -.0157 -.0081 -.05055* .00354 .000 -.0606 -.0404 -.01506* .00138 .000 -.0190 -.0111 .00160 .00178 .947 -.0035 .0067 .00433 .00213 .323 -.0017 .0104 .00316* .00085 .003 .0007 .0056 -.03548* .00338 .000 -.0451 -.0258 .01506* .00138 .000 .0111 .0190 .01667* .00144 .000 .0125 .0208 .01940* .00185 .000 .0141 .0247 -.01350* .00141 .000 -.0175 -.0095 -.05215* .00356 .000 -.0623 -.0420 -.00160 .00178 .947 -.0067 .0035 -.01667* .00144 .000 -.0208 -.0125 .00273 .00217 .809 -.0035 .0089 -.01623* .00183 .000 -.0215 -.0110 -.05488* .00375 .000 -.0656 -.0442 -.00433 .00213 .323 -.0104 .0017 -.01940* .00185 .000 -.0247 -.0141 -.00273 .00217 .809 -.0089 .0035 (J) Currently live frat/sorority house other university housing off campus with parents other residence hall other university housing off campus with parents other residence hall frat/sorority house off campus with parents other residence hall frat/sorority house other university housing with parents other residence hall frat/sorority house other university housing off campus other residence hall frat/sorority house other university housing off campus with parents (I) Currently live residence hall frat/sorority house other university housing off campus with parents other Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound 95% Confidence Interval The mean difference is significant at the .05 level.*.
- 33. Test for a relationship between two categorical variables Is there an association between being a member of a fraternity/sorority and ever being diagnosed with depression? • Hypotheses Ho: There is no association between being a member of a fraternity/sorority and ever being diagnosed with depression. HA: There is an association between being a member of a fraternity/sorority and ever being diagnosed with depression. • Test: Chi-square test for independence • Result: Fail to reject null
- 34. Test for relationship between two categorical variables Ever - Depression * Frat or sorority? Crosstabulation 681 7692 8373 715.6 7657.4 8373.0 3744 39657 43401 3709.4 39691.6 43401.0 4425 47349 51774 4425.0 47349.0 51774.0 Count Expected Count Count Expected Count Count Expected Count yes no Ever - Depression Total yes no Frat or sorority? Total Chi-Square Tests 2.185b 1 .139 2.122 1 .145 2.211 1 .137 .141 .073 2.185 1 .139 51774 Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Computed only for a 2x2 tablea. 0 cells (.0%) have expected count less than 5. The minimum expected count is 715. 62. b.
- 35. Important Points to Remember An significant association does not indicate causation Statistical significance is not always the same as practical significance Multiple factors contribute to whether your results are significant It gets easier and easier as you practice!
- 36. Questions???

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment