Successfully reported this slideshow.
Your SlideShare is downloading. ×

Frontiers of Computational Journalism week 7 - Randomness and Statistical Significance


Check these out next

1 of 92 Ad

More Related Content

Similar to Frontiers of Computational Journalism week 7 - Randomness and Statistical Significance (20)


More from Jonathan Stray (10)

Recently uploaded (20)


Frontiers of Computational Journalism week 7 - Randomness and Statistical Significance

  1. 1. Frontiers of Computational Journalism Columbia Journalism School Week 7: Randomness and Spooky Significance October 31, 2018
  2. 2. This class • Randomness • Significance testing in Journalism • $%#$! P-Values • Bayesian inference • The Garden of Forking Paths • Analysis of Competing Hypotheses
  3. 3. Randomness
  4. 4. Margin of Error
  5. 5. Which one is random?
  6. 6. One star per box – “less” random
  7. 7. Two principles of randomness 1. Random data has “patterns” in it way more often than you think. 2. This problem gets much more extreme when you have less data.
  8. 8. Is this die loaded?
  9. 9. How about this one?
  10. 10. Is this one loaded?
  11. 11. Two dice: non-uniform distribution
  12. 12. Is something causing cancer? Cancer rate per county. Darker = greater incidence of cancer. From Graphical Inference for Infovis, Wickham et. Al.
  13. 13. Which of these is real data?
  14. 14. Global temperature record How likely is it that the temperature won't increase over next decade?
  15. 15. From The Signal and the Noise, Nate Silver
  16. 16. It is conceivable that the 14 elderly people who are reported to have died soon after receiving the vaccination died of other causes. Government officials in charge of the program claim that it is all a coincidence, and point out that old people drop dead every day. The American people have even become familiar with a new statistic: Among every 100,000 people 65 to 75 years old, there will be nine or ten deaths in every 24-hour period under most normal circumstances. Even using the official statistic, it is disconcerting that three elderly people in one clinic in Pittsburgh, all vaccinated within the same hour, should die within a few hours thereafter. This tragedy could occur by chance, but the fact remains that it is extremely improbable that such a group of deaths should take place in such a peculiar cluster by pure coincidence. - New York Times editorial, 14 October 1976
  17. 17. Assuming that about 40 percent of elderly Americans were vaccinated within the first 11 days of the program, then about 9 million people aged 65 and older would have received the vaccine in early October 1976. Assuming that there were 5,000 clinics nationwide, this would have been 164 vaccinations per clinic per day. A person aged 65 or older has about a 1-in-7,000 chance of dying on any particular day; the odds of at least three such people dying on the same day from among a group of 164 patients are indeed very long, about 480,000 to one against. However, under our assumptions, there were 55,000 opportunities for this “extremely improbable” event to occur—5,000 clinics, multiplied by 11 days. The odds of this coincidence occurring somewhere in America, therefore, were much shorter—only about 8 to 1 - Nate Silver, The Signal and the Noise, Ch. 7 footnote 20
  18. 18. The Howland Will Trial
  19. 19. Significance Testing in Journalism
  20. 20. Randomization to detect insider trading
  21. 21. Looking at executives' trading in the week before their companies made news, the Journal found that one of every 33 who dipped in and out posted average returns of more than 20% (or avoided 20% downturns) in the following week. By contrast, only one in 117 executives who traded in an annual pattern did that well. Executives’ Good Luck in Trading Own Stock, Wall Street Journal, 2012
  22. 22. Randomization to detect tennis fixing Why look at betting data? Well, the main point of fixing a match is to make money off the betting. In a normal match, some people bet that one player will win and some people bet on the other, based on the odds that bookmakers have set. But if huge bets start pouring in on one side, that looks very much like a sign that some gamblers think they know more than the bookmaker about how that match is going to go. Perhaps they know one player is going to tank. … To estimate how often they should have been expected to lose, I ran 1 million computer simulations per player. How BuzzFeed News Used Betting Data To Investigate Match-Fixing In Tennis, John Templon, Buzzfeed, 2016
  23. 23. Problems with statistical tests alone “It’s very, very dangerous to make blasé assumptions about a match being dubious because of prematch movements,” Dan Weston, a tennis analyst and trader who writes for the website of the sports book Pinnacle, said in a telephone interview. (Using only data on betting and results to demonstrate fixing has proven problematic in other sports.) “By itself, the analysis of betting data does not prove match-fixing,” Schoofs said in his statement. “That’s why we did not name the players and are declining to comment, and also why our investigation went much wider than the algorithm and was based on a cache of leaked documents, interviews across three continents, and much more.” Why Betting Data Alone Can’t Identify Match Fixers In Tennis, FiveThirtyEight
  24. 24. Detecting campaign finance violations? In late October 2016, Donald Trump’s personal attorney Michael Cohen paid adult star Stormy Daniels $130,000 in order to purchase her silence about an alleged affair a decade earlier. … Sharp-eyed observers have noted that, in late October 2016, the Trump campaign made a series of five large payments to Trump-affiliated entities, totaling $129,999.72. Ultimately, our model suggests that the probability of a set of payments coincidentally coming so close to $130,000 is approximately 0.1%, or one out of one thousand. In other words, about 99.9% of the time, random chance would not produce a set of payments this close to $130,000. Therefore, the probability that the Trump campaign payments were related to the Daniels payoff is very high. Statistical Model Strongly Suggests the Stormy Daniels Payoff Came from the Trump Campaign, Will Stancil
  25. 25. Statistical Model Strongly Suggests the Stormy Daniels Payoff Came from the Trump Campaign, Will Stancil “The simulation confirmed that it is extremely unlikely that, by random chance alone, a set of payments near a specific date would almost equal $130,000.”
  26. 26. $%@*! P-Values
  27. 27. P-value p(observed data > your data | null hypothesis) What’s it good for? What’s it bad for? From A dirty dozen: twelve p-value misconceptions, S.Goodman
  28. 28. T-test for two groups with different variance. Expected to have T- distribution under under null hypothesis of equal scores Is one classroom better than another?
  29. 29. Things that depend on which classroom a student is in Things that don’t depend on which classroom they’re in Reasons for possible differences
  30. 30. Things that depend on which classroom a student is in Things that don’t depend on which classroom they’re in Reasons for possible differences
  31. 31. Break the relationship
  32. 32. observed difference between classes
  33. 33. observed difference between classes 14% of all resamples have a class difference > observed, so p = 0.14
  34. 34. New samples from the data
  35. 35. Boostrapping: resample with repetition. This gives an excellent approximation of the sampling distribution, even if non-normal. Computing the sampling distribution
  36. 36. A dirty dozen: twelve p-value misconceptions, S. Goodman
  37. 37. A dirty dozen: twelve p-value misconceptions, S. Goodman
  38. 38. Bayesian Inference
  39. 39. Conditional Probability Pr(B|A) = Pr(AB)/Pr(A)
  40. 40. Accident No Accident Blue Yellow
  41. 41. Accident No Accident Blue Yellow P(Accident|Blue) = 0.1
  42. 42. Relative risk as conditional probability N = a+b+c+d N(disease) = a+c N(no disease) = b+d Pr(disease) = a+c / a+b+c+d Pr(disease|smoker) = a / (a+b) Pr(disease|non-smoker) = c / (c+d) RR = Pr(disease|smoker)/Pr(disease|non-smoker) = (a/a+b) / (c/c+d)
  43. 43. Base Rates - Taxi Accidents Imagine you live in a city where 15% of all rides end in an accident, and last year there were - 75 accidents involving yellow cabs - 25 accidents involving blue cabs Which taxi company is more dangerous?
  44. 44. Base rate We know P(accident) = 0.15 P(accident|blue) = 0.25 P(accident|yellow) = 0.75 We do not know the “base rate”: P(yellow) or equivalently N(yellow)
  45. 45. Evidence and Conditional Probability Hypothesis H = Alice has a cold Evidence E = we just saw her cough
  46. 46. Alice is coughing. Does she have a cold? Most people with colds cough P(coughing|cold) = 0.9
  47. 47. P(A|B) ≠ P(B|A) Most people with colds cough P(coughing|cold) = 0.9 but we want P(cold | coughing)
  48. 48. Bayes’ Theorem Tells us how to go from Pr(A|B) to Pr(B|A) Pr(B|A) = Pr(A|B)Pr(B) / Pr(A)
  49. 49. Alice is coughing. Does she have a cold? Prior P(H) = 0.05 (5% of our friends have a cold) Likelihood P(E|H) = 0.9 (most people with colds cough) Base rate P(E) = 0.1 (10% of everyone coughs today) P(H|E) = P(E|H)P(H)/P(E) = 0.9 * 0.05 / 0.1 = 0.45 If you believe your initial probability estimates, you should now believe there's a 45% chance she has a cold.
  50. 50. Bayes’ Theorem - Diagnostic tests Suppose I tell you: • 14 of 1000 women under 50 have breast cancer • If a woman has cancer, a mammogram is positive 75% of the time • If a woman does not have cancer, a mammogram is positive 10% of the time If a woman has a positive mammogram, how likely is she to have cancer?
  51. 51. The Signal and the Noise, Nate Silver
  52. 52. cancer no cancer positive negative
  53. 53. cancer no cancer positive negative Pr(positive|cancer) = 0.75 = N(positive & cancer) / N(cancer) N(cancer) = 4 N(positive & cancer) = 3
  54. 54. cancer no cancer positive negative Pr(positive|no cancer) = 0.1 = N(positive & no cancer) / N(positive) N(no cancer) = 1000 N(positive & no cancer) = 100
  55. 55. cancer no cancer positive negative Pr(cancer) = 0.0014 = N(cancer) / N
  56. 56. Conditional probabilities Pr(positive|cancer) = 75% Pr(positive|no cancer) = 10% What is Pr(cancer|positive)?
  57. 57. cancer no cancer positive negative Pr(cancer|positive) = 9.6%
  58. 58. Bayesian diagnostics Pr(cancer|positive) = Pr(positive|cancer) Pr(cancer) / Pr(positive) Pr(positive|cancer) = 0.75 Pr(cancer) = 0.014 Pr(positive) = Pr(positive|no cancer)Pr(no cancer) + Pr(positive|cancer)Pr(cancer) = 0.10*0.986 + 0.75*0.014 = 0.1091
  59. 59. Bayesian diagnostics Pr(cancer|positive) = Pr(positive|cancer) Pr(cancer) / Pr(positive) = (0.75 * 0.014) / (0.1091) = 0.0962 = 9.6% chance she has cancer if mammogram is positive
  60. 60. Evidence Information that justifies a belief. Presented with evidence E for X, we should believe X "more." In terms of probability, P(X|E) > P(X)
  61. 61. Bayes “learns” from evidence Pr(H|E) = Pr(E|H) Pr(H) / Pr(E) or P(H|E) = Pr(E|H)/Pr(E) * Pr(H) Posterior How likely is H given evidence E? Prior How likely was H to begin with? Likelihood Probability of seeing E if H is true Base Rate How commonly do we see E at all?
  62. 62. A more complete theory Compare probability of multiple alternatives.
  63. 63. Did the stoplight reduce accidents?
  64. 64. 1 02468 2 02468 3 02468 4 02468 5 02468 6 02468 7 02468 8 02468 9 02468 Simulated without stoplight
  65. 65. 1 02468 2 02468 3 02468 4 02468 5 02468 6 02468 7 02468 8 02468 9 02468 Simulated with a 50% effective stoplight
  66. 66. Probability distribution over hypotheses Is the NYPD targeting mosques for stop-and-frisk? 1 0 H0 H1 H2 Never RoutinelyOnce or twice *Tricky: you have to imagine a hypothesis before you can assign it a probability.
  67. 67. Parameter Estimation Computing probability for a continuum of hypotheses P(𝛳|E) = Pr(E|𝛳)/Pr(E) * Pr(𝛳)
  68. 68. Relative strength of evidence Can we find a p-value equivalent? There is “Bayes factor” Pr(H1|E)/Pr(H2|E) = [Pr(E|H1)Pr(H1)/Pr(E)] / [Pr(E|H2)Pr(H2)/Pr(E)] = Pr(E|H1)/Pr(E|H2) * Pr(H1)/Pr(H2) Bayes Factor
  69. 69. Ok, but what’s a “significant” Bayes Factor? From Bayes Factors, Kass and Raftery There’s this, but the whole idea of “significance” is probably flawed.
  70. 70. The Garden of Forking Paths
  71. 71. I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here's How. John Bohannon
  72. 72. Science Isn’t Broken, FiveThirtyEight
  73. 73. “Statistical significance” is usually asking the wrong question.
  74. 74. Does the model reproduce the data? Testing for Racial Discrimination in Police Searches of Motor Vehicles, Simoiu et al.
  75. 75. Analysis of Competing Hypotheses
  76. 76. Cognitive biases Availability heuristic: we use examples that come to mind, instead of statistics. Preference for earlier information: what we learn first has a much greater effect on our judgment. Memory formation: whatever seems important at the time is what gets remembered. Confirmation bias: we seek out and give greater importance to information that confirms our expectations.
  77. 77. Confirmation bias Comes in many forms. ...unconsciously filtering information that doesn't fit expectations. ...not looking for contrary information. ...not imagining the alternatives.
  78. 78. Method of competing hypotheses Start with multiple hypotheses H0, H1, ... HN (Remember, if you can't imagine it, you can't conclude it!) Go looking for information that gives you the best ability to discriminate between hypotheses. Evidence which supports Hi is much less useful than evidence which supports Hi much more than Hj, if the goal is to choose a hypothesis.
  79. 79. In practice: Triangulation A good conclusion is one which is supported by multiple lines of evidence from multiple methods. “Philosophy ought to imitate the successful sciences in its methods, so far as to proceed only from tangible premises which can be subjected to careful scrutiny, and to trust rather to the multitude and variety of its arguments than to the conclusiveness of any one. Its reasoning should not form a chain which is no stronger than its weakest link, but a cable whose fibers may be ever so slender, provided they are sufficiently numerous and intimately connected.” - Charles Sanders Peirce
  80. 80. A difficult example NYPD performs ~600,000 street stop and frisks per year. What sorts of conclusions could we draw from this data? How?
  81. 81. Stop and Frisk Causation Suppose you take the address of every mosque in NYC, and discover that there are 15% more stop-and-frisks within 100m of mosques than the overall average. Can we conclude that the police are targeting Muslims?