Influencing policy (training slides from Fast Track Impact)
1.4 How not to do Statistics
1. Section 1.4: Examples, p. 1
• Who funded the study?
Researchers may have an incentive to produce favorable results
– In the 1960’s tobacco companies funded studies which claimed the connection
between smoking and lung cancer was inconclusive.
– When soft drink companies fund studies on the effects of sugar, the results may
be unreliable.
– Surveys from well-respected organizations are more reliable
• E.g.: Pew Research, Gallup, J.D. Powers, and universities
• Were the questions poorly worded?
– The wording of the questions can cause hidden bias – where the way a
question is asked influences a person’s response.
– Definitely biased:: ”Do you oppose street repair taxes by our wasteful city
government?”
– Somewhat biased: ”Do you oppose street repair taxes ?”
– Better: “Do you favor or oppose taxes for street repair?”
1
2. Section 1.4: Examples, p. 2
• How was the sample obtained?
– Good: Random, stratified, systematic , cluster, and matched pairs samples
– Bad: Voluntary response – Social media and other online polls. People with
strong opinions are more likely to response.
• Examples: sports, entertainment, politics
– Bad: Convenience –Polls of family/friends/co-workers. They may have much in
common.
• How large is the study?
– National surveys of good quality usually have at least 1000 respondants.
– Local surveys should have at least 100
• Non-response
– Those who did not respond may be significantly different than those who did.
2
3. Section 1.4: Examples, p. 3
• Is there likely or possible bias in the study?
– Statistical bias is different than civil rights or what we talk about in politics or
sociology. It is generally not intentional, but may result from not being careful
enough.
– Examples: The sample has different proportions of ethnic or economic groups or
genders
– Randomization is one way to minimize bias.
• Response Bias: Occurs when the responses given are not accurate
– Misunderstanding the question or ignorance about the issue
• Example: was the recent tax cut the right thing to do.
– May not know the details nor the impacts on business and the National debt.
– False or misrepresented answers
– Person may not want to give the truth
– Are you in a gang?
• Their self-assessment (ego) is inaccurate
– Are you a much better than average driver?
3
4. Section 1.4: Examples, p. 4
• Could there be possible cause and effect confusion?
– Correlation/association between 2 variables does not prove that one
caused the other
– Is the popularity of opera in a particular country related to whether or
not the country had a dictator? (Factoid: some great operas were written
during the times of the Tsars in Russia.) Does one affect the other?
– Suppose a survey finds that people with dogs are happier than others, on
average. Doe that mean that dogs tend to make people happy? Or does
that mean that people who are already happy tend to get dogs? Without
further information, we don’t have enough information to answer these
questions.
• A 3rd factor may causes both A and B
– Both sunscreen use and heat exhaustion increase in the summer
• But sunscreen use does not cause heat exhaustion, and heat exhaustion does
not cause sunscreen use. The 3rd factor is the temperature in the summer.
• Cause and effect can be determined by a well-designed experiment.
4
5. Section 1.4: Examples, p. 5
• “Lurking” or “Confounding” variables
– When you cannot rule out the possibility that the observed effect is due
to some other variable rather than the factor being studied.
– Historical trends in car prices and food prices.
• Overgeneralization
– Where you do a study on one group and then try to say that it will
happen on all groups.
• A study of women may not apply to men or children or babies.
• A study of one ethnic group or of one nation may not apply to others.
• A study of healthy people may not apply to sick people.
5
6. Other issues
• Sampling error – This is the difference between the sample results and the
true population results.
• It is unavoidable, but other kinds of errors can be avoided.
• A new sample with different individuals would be different.
• This can be minimized by having a large sample size.
• The P-value includes the effect of sampling error.
6
7. Example of a biased sample:
1936 US Presidential Election Literary Digest Poll
• Predicted: Alfred Landon would win 57% of the vote
• Actual result: Alfred Landon won 37% to 61% for Roosevelt
– Wrong by 20%
• The polling techniques were not good.
– Sent out 10 million ballots; 2.4 million returned
– Surveyed: Its readers, car owners, random telephone numbers
– All these groups were high income
– Also, the non-response rate was high. Those who didn’t respond
were different form those who did respond, on average
A. Landon
F. D. Roosevelthttps://en.wikipedia.org/wiki/1936_United_States_presidential_election
https://en.wikipedia.org/wiki/The_Literary_Digest
End of Section