SlideShare a Scribd company logo
1 of 12
+ 
Data Collection Principles: Overview 
Slides edited by Valerio Di Fonzo for www.globalpolis.org 
Based on the work of Mine Çetinkaya-Rundel of OpenIntro 
The slides may be copied, edited, and/or shared via the CC BY-SA license 
Some images may be included under fair use guidelines (educational purposes)
Populations and Samples 
http://well.blogs.nytimes.com/2012/08/29/finding-your- 
ideal-running-form 
Research Question 
Can people become better, more efficient 
runners on their own, merely by running? 
Population of Interest 
All people 
Sample 
Group of adult women who recently joined a 
running group 
Population to which results can 
be generalized 
Adult women, if the data are randomly 
sampled
Anecdotal evidence and early smoking research 
● Anti-smoking research started in the 1930s and 1940s when 
cigarette smoking became increasingly popular. While some 
smokers seemed to be sensitive to cigarette smoke, others were 
completely unaffected. 
● Anti-smoking research was faced with resistance based on 
anecdotal evidence such as "My uncle smokes three packs a day 
and he's in perfectly good health", evidence based on a limited 
sample size that might not be representative of the population. 
● It was concluded that "smoking is a complex human behavior, by its 
nature difficult to study, confounded by human variability." 
● In time researchers were able to examine larger samples of cases 
(smokers), and trends showing that smoking has negative health 
impacts became much clearer. 
Brandt, The Cigarette Century (2009), Basic Books.
Census 
● Wouldn't it be better to just include everyone and "sample" the 
entire population? 
o This is called a census. 
● There are problems with taking a census: 
o It can be difficult to complete a census: there always seem to 
be some individuals who are hard to locate or hard to measure. 
And these difficult-to-find people may have certain 
characteristics that distinguish them from the rest of the 
population. 
o Populations rarely stand still. Even if you could take a census, 
the population changes constantly, so it's never possible to get 
a perfect measure. 
o Taking a census may be more complex than sampling.
http://www.npr.org/templates/story/story.php?storyId=125380052
Exploratory analysis to inference 
● Sampling is natural. 
● Think about sampling something you are cooking - you taste 
(examine) a small part of what you're cooking to get an idea about the 
dish as a whole. 
● When you taste a spoonful of soup and decide the spoonful you tasted 
isn't salty enough, that's exploratory analysis. 
● If you generalize and conclude that your entire soup needs salt, that's 
an inference. 
● For your inference to be valid, the spoonful you tasted (the sample) 
needs to be representative of the entire pot (the population). 
o If your spoonful comes only from the surface and the salt is 
collected at the bottom of the pot, what you tasted is probably not 
representative of the whole pot. 
o If you first stir the soup thoroughly before you taste, your spoonful 
will more likely be representative of the whole pot.
Sampling bias 
Non-response: If only a small fraction of the randomly sampled people 
choose to respond to a survey, the sample may no longer be 
representative of the population. 
Voluntary response: Occurs when the sample consists of people who 
volunteer to respond because they have strong opinions on the issue. 
Such a sample will also not be representative of the population. 
Convenience sample: Individuals who are easily accessible are more likely 
to be included in the sample.
Sampling bias example: Landon vs. FDR 
A historical example of a biased sample yielding misleading results 
In 1936, Landon sought the 
Republican presidential 
nomination opposing the re-election 
of FDR. 
● The Literary Digest polled about 10 million 
Americans, and got responses from about 2.4 
million. 
● The poll showed that Landon would likely be the 
overwhelming winner and FDR would get only 
43% of the votes. 
● Election result: FDR won, with 62% of the votes. 
● The magazine was completely discredited 
because of the poll, and was soon discontinued.
The Literary Digest Poll - what went 
wrong? 
● The magazine had surveyed 
o its own readers, 
o registered automobile owners, and 
o registered telephone users. 
● These groups had incomes well above the national average 
of the day (remember, this is Great Depression era) which 
resulted in lists of voters far more likely to support 
Republicans than a truly typical voter of the time, i.e. the 
sample was not representative of the American population at 
the time.
Large samples are preferable, but... 
● The Literary Digest election poll was based on a sample 
size of 2.4 million, which is huge, but since the sample was 
biased, the sample did not yield an accurate prediction. 
● Back to the soup analogy: If the soup is not well stirred, it 
doesn't matter how large a spoon you have, it will still not 
taste right. If the soup is well stirred, a small spoon will 
suffice to test the soup.
Explanatory and Response Variables 
● To identify the explanatory variable in a pair of variables, 
identify which of the two is suspected of affecting the other: 
● Labeling variables as explanatory and response does not 
guarantee the relationship between the two is actually 
causal, even if there is an association identified between the 
two variables. We use these labels only to keep track of 
which variable we suspect affects the other.
Explanatory and Response Variables 
Observational study: Researchers collect data in a way that does not directly 
interfere with how the data arise, i.e. they merely "observe", and can only 
establish an association between the explanatory and response variables. 
Experiment: Researchers randomly assign subjects to various treatments in 
order to establish causal connections between the explanatory and 
response variables. 
If you're going to walk away with one thing from this class, let it be "correlation 
does not imply causation". 
http://xkcd.com/552/

More Related Content

Similar to Data Collection and Sampling

1.4 How not to do Statistics
1.4 How not to do Statistics1.4 How not to do Statistics
1.4 How not to do StatisticsMaryWall14
 
Survey (Primer on Questions, Sampling + Case Study)
Survey (Primer on Questions, Sampling + Case Study)Survey (Primer on Questions, Sampling + Case Study)
Survey (Primer on Questions, Sampling + Case Study)Dada Veloso-Beltran
 
Research and questionnaire analysis pro forma
Research and questionnaire analysis pro formaResearch and questionnaire analysis pro forma
Research and questionnaire analysis pro formaHannahMizen
 
Research and Questionnaire Analysis
Research and Questionnaire Analysis Research and Questionnaire Analysis
Research and Questionnaire Analysis cloestead
 
Health powerpoint
Health powerpointHealth powerpoint
Health powerpoint415554
 
Designing Samples
Designing SamplesDesigning Samples
Designing Samplesbiancaj5
 
Longitudinal Study.pptx
Longitudinal Study.pptxLongitudinal Study.pptx
Longitudinal Study.pptxJulielynMAmano
 
Lesson 2 conformity
Lesson 2   conformityLesson 2   conformity
Lesson 2 conformitygbaptie
 
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review? Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review? Ivan Oransky
 
Being Primer on Sampling
Being Primer on Sampling Being Primer on Sampling
Being Primer on Sampling Peanut Labs
 
201.04 Sociological Research Methods(1).ppt
201.04 Sociological Research Methods(1).ppt201.04 Sociological Research Methods(1).ppt
201.04 Sociological Research Methods(1).pptwelduweldegebriel1
 

Similar to Data Collection and Sampling (20)

1.4 How not to do Statistics
1.4 How not to do Statistics1.4 How not to do Statistics
1.4 How not to do Statistics
 
Survey (Primer on Questions, Sampling + Case Study)
Survey (Primer on Questions, Sampling + Case Study)Survey (Primer on Questions, Sampling + Case Study)
Survey (Primer on Questions, Sampling + Case Study)
 
Research and questionnaire analysis pro forma
Research and questionnaire analysis pro formaResearch and questionnaire analysis pro forma
Research and questionnaire analysis pro forma
 
O Behave! Issue 26
O Behave! Issue 26O Behave! Issue 26
O Behave! Issue 26
 
Research and Questionnaire Analysis
Research and Questionnaire Analysis Research and Questionnaire Analysis
Research and Questionnaire Analysis
 
SociologyExchange.co.uk Shared Resource
SociologyExchange.co.uk Shared ResourceSociologyExchange.co.uk Shared Resource
SociologyExchange.co.uk Shared Resource
 
Online survey
Online surveyOnline survey
Online survey
 
O Behave! Issue 18
O Behave! Issue 18O Behave! Issue 18
O Behave! Issue 18
 
Health powerpoint
Health powerpointHealth powerpoint
Health powerpoint
 
Designing Samples
Designing SamplesDesigning Samples
Designing Samples
 
Henry jonah
Henry jonahHenry jonah
Henry jonah
 
O Behave! Issue 23
O Behave! Issue 23O Behave! Issue 23
O Behave! Issue 23
 
Longitudinal Study.pptx
Longitudinal Study.pptxLongitudinal Study.pptx
Longitudinal Study.pptx
 
Henry jonah
Henry jonahHenry jonah
Henry jonah
 
Research
ResearchResearch
Research
 
Lesson 2 conformity
Lesson 2   conformityLesson 2   conformity
Lesson 2 conformity
 
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review? Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
Fraud and Why Studies are Flawed: Should Journalists Trust Peer Review?
 
Final Draft
Final DraftFinal Draft
Final Draft
 
Being Primer on Sampling
Being Primer on Sampling Being Primer on Sampling
Being Primer on Sampling
 
201.04 Sociological Research Methods(1).ppt
201.04 Sociological Research Methods(1).ppt201.04 Sociological Research Methods(1).ppt
201.04 Sociological Research Methods(1).ppt
 

More from Global Polis

Normal distribution
Normal distributionNormal distribution
Normal distributionGlobal Polis
 
Bayeasian inference
Bayeasian inferenceBayeasian inference
Bayeasian inferenceGlobal Polis
 
Conditional probability, and probability trees
Conditional probability, and probability treesConditional probability, and probability trees
Conditional probability, and probability treesGlobal Polis
 
Introduction to probability
Introduction to probabilityIntroduction to probability
Introduction to probabilityGlobal Polis
 
Experimental Design and Blocking
Experimental Design and BlockingExperimental Design and Blocking
Experimental Design and BlockingGlobal Polis
 
Gender Discrimination: A case study
Gender Discrimination: A case studyGender Discrimination: A case study
Gender Discrimination: A case studyGlobal Polis
 

More from Global Polis (7)

Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Bayeasian inference
Bayeasian inferenceBayeasian inference
Bayeasian inference
 
Conditional probability, and probability trees
Conditional probability, and probability treesConditional probability, and probability trees
Conditional probability, and probability trees
 
Introduction to probability
Introduction to probabilityIntroduction to probability
Introduction to probability
 
Experimental Design and Blocking
Experimental Design and BlockingExperimental Design and Blocking
Experimental Design and Blocking
 
Categorical Data
Categorical DataCategorical Data
Categorical Data
 
Gender Discrimination: A case study
Gender Discrimination: A case studyGender Discrimination: A case study
Gender Discrimination: A case study
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Data Collection and Sampling

  • 1. + Data Collection Principles: Overview Slides edited by Valerio Di Fonzo for www.globalpolis.org Based on the work of Mine Çetinkaya-Rundel of OpenIntro The slides may be copied, edited, and/or shared via the CC BY-SA license Some images may be included under fair use guidelines (educational purposes)
  • 2. Populations and Samples http://well.blogs.nytimes.com/2012/08/29/finding-your- ideal-running-form Research Question Can people become better, more efficient runners on their own, merely by running? Population of Interest All people Sample Group of adult women who recently joined a running group Population to which results can be generalized Adult women, if the data are randomly sampled
  • 3. Anecdotal evidence and early smoking research ● Anti-smoking research started in the 1930s and 1940s when cigarette smoking became increasingly popular. While some smokers seemed to be sensitive to cigarette smoke, others were completely unaffected. ● Anti-smoking research was faced with resistance based on anecdotal evidence such as "My uncle smokes three packs a day and he's in perfectly good health", evidence based on a limited sample size that might not be representative of the population. ● It was concluded that "smoking is a complex human behavior, by its nature difficult to study, confounded by human variability." ● In time researchers were able to examine larger samples of cases (smokers), and trends showing that smoking has negative health impacts became much clearer. Brandt, The Cigarette Century (2009), Basic Books.
  • 4. Census ● Wouldn't it be better to just include everyone and "sample" the entire population? o This is called a census. ● There are problems with taking a census: o It can be difficult to complete a census: there always seem to be some individuals who are hard to locate or hard to measure. And these difficult-to-find people may have certain characteristics that distinguish them from the rest of the population. o Populations rarely stand still. Even if you could take a census, the population changes constantly, so it's never possible to get a perfect measure. o Taking a census may be more complex than sampling.
  • 6. Exploratory analysis to inference ● Sampling is natural. ● Think about sampling something you are cooking - you taste (examine) a small part of what you're cooking to get an idea about the dish as a whole. ● When you taste a spoonful of soup and decide the spoonful you tasted isn't salty enough, that's exploratory analysis. ● If you generalize and conclude that your entire soup needs salt, that's an inference. ● For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population). o If your spoonful comes only from the surface and the salt is collected at the bottom of the pot, what you tasted is probably not representative of the whole pot. o If you first stir the soup thoroughly before you taste, your spoonful will more likely be representative of the whole pot.
  • 7. Sampling bias Non-response: If only a small fraction of the randomly sampled people choose to respond to a survey, the sample may no longer be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue. Such a sample will also not be representative of the population. Convenience sample: Individuals who are easily accessible are more likely to be included in the sample.
  • 8. Sampling bias example: Landon vs. FDR A historical example of a biased sample yielding misleading results In 1936, Landon sought the Republican presidential nomination opposing the re-election of FDR. ● The Literary Digest polled about 10 million Americans, and got responses from about 2.4 million. ● The poll showed that Landon would likely be the overwhelming winner and FDR would get only 43% of the votes. ● Election result: FDR won, with 62% of the votes. ● The magazine was completely discredited because of the poll, and was soon discontinued.
  • 9. The Literary Digest Poll - what went wrong? ● The magazine had surveyed o its own readers, o registered automobile owners, and o registered telephone users. ● These groups had incomes well above the national average of the day (remember, this is Great Depression era) which resulted in lists of voters far more likely to support Republicans than a truly typical voter of the time, i.e. the sample was not representative of the American population at the time.
  • 10. Large samples are preferable, but... ● The Literary Digest election poll was based on a sample size of 2.4 million, which is huge, but since the sample was biased, the sample did not yield an accurate prediction. ● Back to the soup analogy: If the soup is not well stirred, it doesn't matter how large a spoon you have, it will still not taste right. If the soup is well stirred, a small spoon will suffice to test the soup.
  • 11. Explanatory and Response Variables ● To identify the explanatory variable in a pair of variables, identify which of the two is suspected of affecting the other: ● Labeling variables as explanatory and response does not guarantee the relationship between the two is actually causal, even if there is an association identified between the two variables. We use these labels only to keep track of which variable we suspect affects the other.
  • 12. Explanatory and Response Variables Observational study: Researchers collect data in a way that does not directly interfere with how the data arise, i.e. they merely "observe", and can only establish an association between the explanatory and response variables. Experiment: Researchers randomly assign subjects to various treatments in order to establish causal connections between the explanatory and response variables. If you're going to walk away with one thing from this class, let it be "correlation does not imply causation". http://xkcd.com/552/