Chapter 1: Where do we get the data

### \\Dc hss2\staff\kretsch k\prob and stat\class materials\chapter 1\1-4 where do we get the data

1. 1. Basic Definitions: Population <ul><li>Population </li></ul><ul><ul><li>The entire group to be studied </li></ul></ul><ul><li>Census </li></ul><ul><ul><li>A collection of data and information from the population </li></ul></ul><ul><li>Parameter </li></ul><ul><ul><li>A numeric measurement or calculation of census data </li></ul></ul><ul><ul><li>Quantifies an attribute of the population </li></ul></ul>
2. 2. Basic Definitions: Sample <ul><li>Fundamental concept </li></ul><ul><ul><li>A census may not be possible or practical </li></ul></ul><ul><li>Sample </li></ul><ul><ul><li>(Noun) A subset of a population that (we hope) represents the population </li></ul></ul><ul><ul><li>(Verb) The process of collecting data from the subset. </li></ul></ul><ul><li>Statistic </li></ul><ul><ul><li>A numeric measurement or calculation of sample data </li></ul></ul><ul><ul><li>Estimates the parameter of a population </li></ul></ul>
3. 3. Another Fundamental Concept <ul><li>We are never 100% sure that our sample exactly represents the population </li></ul><ul><li>So a statistic is just an estimate </li></ul><ul><li>We will learn many techniques to deal with this uncertainty </li></ul>
4. 4. Where do we get the data? <ul><li>Census vs sample </li></ul><ul><li>Observations </li></ul><ul><ul><li>“ Watching” real activity and collecting data </li></ul></ul><ul><ul><li>Opinion polls </li></ul></ul><ul><li>Experiments </li></ul><ul><ul><li>Running the activity and measuring the results </li></ul></ul><ul><ul><li>Relatively easy to control </li></ul></ul>
5. 5. For Example <ul><li>TV watching and test scores </li></ul><ul><li>Observation </li></ul><ul><ul><li>Use a survey that asks your sampled students their TV watching habits and their test scores. </li></ul></ul><ul><li>Experiment </li></ul><ul><ul><li>Design varied TV-watching schedules for your samples </li></ul></ul><ul><ul><li>Design and/or administer an test to measure learning </li></ul></ul><ul><li>Car crashworthiness and make </li></ul><ul><li>Observation </li></ul><ul><ul><li>Collect accident data and auto repair data </li></ul></ul><ul><li>Experiment </li></ul><ul><ul><li>Deliberately crash cars and measure the results </li></ul></ul>
6. 6. Live Example <ul><li>Movie popularity </li></ul><ul><li>Observation </li></ul><ul><li>Experiment </li></ul><ul><li>Cell Phone Reception </li></ul><ul><li>Observation </li></ul><ul><li>Experiment </li></ul>
7. 7. Homework <ul><li>Describe an experiment to gather data tests the following claims. </li></ul><ul><ul><li>Reading books improves school performance </li></ul></ul><ul><ul><li>Blondes have more fun </li></ul></ul>
8. 8. Variables <ul><li>Variable refers to any characteristic that could effect an outcome being tested. </li></ul><ul><ul><li>Variables have to be measureable </li></ul></ul><ul><li>What characteristics affect SAT scores? </li></ul><ul><li>What characteristics affect car crashworthiness? </li></ul>
9. 9. Varying and Controlling <ul><li>In a statistics study, we test if one variable really has an affect on the outcome. </li></ul><ul><li>We will vary the test variable </li></ul><ul><ul><li>Change the value to see if the outcome also changes </li></ul></ul><ul><li>To prevent confounding , we will control the other variables </li></ul><ul><ul><li>Confounding: The effects of two or more variables can not be distinguished </li></ul></ul><ul><ul><li>Control: Samples with similar values for the other variables may be grouped </li></ul></ul>
10. 10. C and A: How do you raise a smart kid? <ul><li>Economics professor has correlated test scores with family characteristics </li></ul><ul><ul><li>Educated parents </li></ul></ul><ul><ul><li>High socio-economic status </li></ul></ul><ul><ul><li>30 year old mom </li></ul></ul><ul><ul><li>Books in home </li></ul></ul><ul><ul><li>English in the house </li></ul></ul><ul><ul><li>PTA participation </li></ul></ul><ul><ul><li>Birth weight </li></ul></ul><ul><ul><li>Adopted </li></ul></ul>
11. 11. Your Turn, Home Work. <ul><li>Lets assume we are designing a study of car crashworthiness. Your assignment is to to the following. </li></ul><ul><li>List 6 variables of a car or driver that you feel affect </li></ul><ul><li>Of these variables, pick one that you would like to test. </li></ul><ul><li>Using the control variables, describe three groups of cars and/or drivers you would create to test your variable. </li></ul>
12. 12. Treatment <ul><li>When running a experiment that tests a variable: </li></ul><ul><ul><li>The sample will be split into groups </li></ul></ul><ul><ul><li>Each group will be administered one level of the variable </li></ul></ul><ul><ul><li>Who or what is assigned to each group is randomly determined. </li></ul></ul><ul><li>In some experiments the test variable is all or none. </li></ul><ul><ul><li>E.g., a drug </li></ul></ul><ul><ul><li>One group, the treatment group, receives all (called the treatment) </li></ul></ul><ul><ul><li>The other group, the control group, receives nothing or a pretend treatment called a placebo </li></ul></ul>
13. 13. Placebo Effect <ul><li>The subject, but especially the control group, might think they are being given the treatment and start to act accordingly. </li></ul><ul><li>If the experiment is blinded the subjects are not told if they are receiving the real treatment or placebo. </li></ul><ul><ul><li>The subjects should also not be told the outcome </li></ul></ul><ul><li>If the experiment is double blinded the people administering the experiment are also not told </li></ul>
14. 14. Your turn/homework <ul><li>You are charged with testing a new SAT prep course </li></ul><ul><li>Describe how the placebo effect might come into play in your experiment </li></ul><ul><li>Describe how you would counteract that effect </li></ul>
15. 15. Sampling <ul><li>Sampling: picking a subset of a population </li></ul><ul><li>Sample’s characteristics should reflect the population’s in the same proportion </li></ul><ul><li>E.g., our school’s demographic break-down is </li></ul>Frosh Sophomore Junior Senior Male 13% 12% 12% 13% Female 13% 13% 11% 13%
16. 16. Sample Scheme Characteristics <ul><li>Random sample </li></ul><ul><ul><li>Each member of the population has an equal chance to be selected </li></ul></ul><ul><li>Simple random sample </li></ul><ul><ul><li>Each subset a population has an equal change of being selected. </li></ul></ul>
17. 17. Sampling Strategies <ul><li>Self-selected </li></ul><ul><ul><li>Population members volunteer </li></ul></ul><ul><ul><li>E.g., Call-in phone lines </li></ul></ul><ul><ul><li>Easy to implement </li></ul></ul><ul><ul><li>Difficult to get a proportional sample </li></ul></ul><ul><ul><li>Susceptible to bias </li></ul></ul><ul><li>Convenience sampling </li></ul><ul><ul><li>Whoever happens by </li></ul></ul><ul><ul><li>E.g., Mall surveys </li></ul></ul><ul><ul><li>Also susceptible to bias </li></ul></ul>
18. 18. Sampling Strategies <ul><li>Random sample </li></ul><ul><ul><li>Each member of the population is selected at random </li></ul></ul><ul><ul><li>E.g., Generate random student id’s </li></ul></ul><ul><li>Systematic sampling </li></ul><ul><ul><li>Population is put into some order </li></ul></ul><ul><ul><li>Select some starting point, then select every n th individual in a population </li></ul></ul><ul><ul><li>The starting point and maybe the interval ( n ) are picked at random </li></ul></ul>
19. 19. More Sampling Selection and Collection <ul><li>Stratified sampling </li></ul><ul><ul><li>Divide the population into groups. </li></ul></ul><ul><ul><ul><li>Groups are determined by control variables </li></ul></ul></ul><ul><ul><li>Randomly sample within each group </li></ul></ul><ul><li>Cluster sample </li></ul><ul><ul><li>Divide the population into clusters, randomly pick a cluster, then sample all (or most) members of the cluster </li></ul></ul>
20. 20. Example: Student Opinion Poll <ul><li>Self-selecting </li></ul><ul><li>Random sampling </li></ul><ul><li>Systematic sampling </li></ul><ul><li>Convenience sampling </li></ul><ul><li>Stratified sampling </li></ul><ul><li>Cluster sample </li></ul>
21. 21. Example: Crashworthiness <ul><li>Self-selecting </li></ul><ul><li>Random sample </li></ul><ul><li>Systematic sampling </li></ul><ul><li>Convenience sampling </li></ul><ul><li>Stratified sampling </li></ul><ul><li>Cluster sample </li></ul>
22. 22. Bias <ul><li>Sampling members of a population… </li></ul><ul><ul><li>With a specific characteristic </li></ul></ul><ul><ul><li>That will give a specific outcome </li></ul></ul><ul><ul><li>“ Rigging the game” </li></ul></ul><ul><li>Selection and undercoverage bias </li></ul><ul><ul><li>E.g., FOX news and health care </li></ul></ul><ul><li>Non-response bias </li></ul><ul><ul><li>Counting non-response as one answer </li></ul></ul><ul><li>Voluntary response bias </li></ul><ul><li>Only people who feel strongly might respond to a survey. </li></ul>
23. 23. More on Bias <ul><li>If I want my test to support the claim that watching too much TV hurts SAT scores, how do I rig the sample? </li></ul><ul><li>If I want my test to support the claim that European cars are safer that Japanese cars, how do I rig the sample? </li></ul>