Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Descriptive and Inferential Statistical Methods: Analysis of Voting and Elections

328 views

Published on

A presentation highlighting the relevance of statistical methods for the analysis and forecast of elections:
* Voter Turnout by Income, Age, and Gender, with detailed graphs and explanations
* Polls and Election Forecasting, with explanation of the 95% confidence interval
* Representativeness and random sampling
* Aggregated election forecast models

Published in: News & Politics
  • Be the first to comment

  • Be the first to like this

Descriptive and Inferential Statistical Methods: Analysis of Voting and Elections

  1. 1. Descriptive and Inferential Statistical Methods: Analysis of Voting and Elections Toni Menninger https://www.slideshare.net/amenning/presentations
  2. 2. Descriptive Statistics: Voter Turnout by Income, Age, and Gender
  3. 3. Voting and Elections in Statistics Source: http://www.demos.org/publication/why-voting-gap-matters
  4. 4. Voting and Elections in Statistics The graph displays voter turnout 2008-2012 by income group. Both longitudinal and cross-sectional data are presented. This graph is not a frequency distribution. The variable shown is the conditional probability that a member of an income group cast a vote in a general election. This probability is highly and positively correlated with income. The gap between presidential and midterm election turnout is also greatest in the lowest income group. Source: http://www.demos.org/publication/why-voting-gap-matters
  5. 5. 0 1 2 3 4 5 6 7 8 Numberofhouseholdsinmillion Distribution of Household Income in the USA Source: US Census Bureau 2012 Voting and Elections in Statistics
  6. 6. Source: http://www.demos.org Voting and Elections in Statistics
  7. 7. Voting and Elections in Statistics The time series (longitudinal) chart shows voter registration rates for three income groups over time, since 1972. The population has been divided into five income groups (quintiles) (it is not stated whether income refers to household or individual income). Of the lowest quintile, only about 50% are registered whereas more than 80% of the highest quintile are registered. As a result, the low income population is far underrepresented in the population of registered voters.
  8. 8. Source: http://www.demos.org/publication/why-voting-gap-matters Voting and Elections in Statistics
  9. 9. Voting and Elections in Statistics This chart again combines longitudinal (four consecutive presidential elections), with cross-sectional (age) data. The data represent conditional probabilities that a member of a certain age group is registered to vote (blue), that a registered member actually voted (orange), and that a member voted (the product of the other two, green). The data indicate a huge gap in voter turnout between the young and especially those aged 55-75 who are almost twice as likely to cast their vote. The gap has shrunk recently. Note that age and income are correlated, so the income and age gaps are not independent phenomena. Remember that the 18-30 years age group is quite large (next chart) but underrepresented in elections.
  10. 10. Voting and Elections in Statistics 0 5 10 Under 5 years 5 to 9 years 10 to 14 years 15 to 19 years 20 to 24 years 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years 85 to 89 years 90 years and over Millions Population Profile of USA according to 2010 Census Female Male
  11. 11. Source: Census Voting and Elections in Statistics
  12. 12. Voting and Elections in Statistics Again here is a longitudinal view of the age gap. Over the last 50 years, all age groups but especially the young and middle age groups have become less involved in elections.
  13. 13. Source: Census Voting and Elections in Statistics
  14. 14. Source: Census Voting and Elections in Statistics
  15. 15. Voting and Elections in Statistics The table compares the relative frequency distribution of the whole population by age group with the distribution of the population of voters by age group. The age group 18-29 represents 21% of the population but only 15% of voters. Its voting power is diluted by the age gap in voter turnout. The Age groups 45 and older are overrepresented in the population of voters relative to their share of the population. The difference is displayed graphically on the next chart.
  16. 16. Source: Census Voting and Elections in Statistics
  17. 17. Source: Census Voting and Elections in Statistics There is also a gender gap in voting behavior: women are more likely to vote, except among those 65 years and older.
  18. 18. Source: Census Voting and Elections in Statistics Further reading File, Thom. 2013. “Young-Adult Voting: An Analysis of Presidential Elections, 1964–2012.” Current Population Survey Reports, P20- 572. U.S. Census Bureau, Washington, DC. http://www.census.gov/prod/2014pubs/p20-573.pdf File, Thom. 2013. “ The Diversifying Electorate—Voting Rates by Race and Hispanic Origin in 2012 (and Other Recent Elections).” Current Population Survey Reports, P20-569. U.S. Census Bureau, Washington, DC http://www.census.gov/prod/2013pubs/p20-568.pdf Thom File and Sarah Crissey, “Voting and Registration in the Election of November 2008,” U.S. Census Bureau, Washington, DC, May 2010. http://www.census.gov/prod/2010pubs/p20-562.pdf
  19. 19. Voting and Elections in Statistics Inferential Statistics: Polls and Election Forecasting
  20. 20. Voting and Elections in Statistics Polls and Election Forecasting An election poll can be considered as a binomial experiment if the survey question is binary (“do you support candidate X, yes or no?”). The “true” ratio of supporters of the candidate in the population (which however can change over time) is p, n is the sample size. The number of supporters in the sample (the sample statistic) is a binomial random variable with mean = np and variance = npq ≈ n/4, therefore standard deviation SD ≈1/2 √n. For large enough sample size, the binomial distribution approaches a normal distribution. The chance is 95% that the sample statistic will be within 2 (1.96 to be precise) standard deviations of the mean, or within ± √n. The tolerance interval for the sample proportion lies within ± 1/ √n of p (this result is obtained from dividing everything by the sample size n). For n=100, the tolerance interval is p ± 10%; for n=400, p ± 5%; for n = 1,000, roughly p ± 3%.
  21. 21. Voting and Elections in Statistics Polls and Election Forecasting Example: p=0.45, n=100 Mean=45, SD ≈5. The 95% tolerance interval for the sample statistic is [35; 55], or 45% ± 10%. For n=400 we have mean = 180, SD ≈ 10, tolerance interval [160-200], or 45% ± 5%. That is, even if the candidate’s level of support is only 45%, it is quite likely to observe a level of 49% or more in the sample, from which it would be impossible to reliably infer the actual outcome of the election. These examples show that most polls have a fairly large uncertainty. It takes a large sample size for the sample statistic to give a good estimate of the population statistic (the actual level of support for the candidate), especially in close races. The reliability of a statistical estimate is expressed in terms of its confidence interval (which is not the same as the tolerance interval). When polls are reported, it is usually stated how reliable they are. E. g. “plus or minus 3 percentage points, 19 out of twenty times” means that the 95% confidence interval is ± 3%.
  22. 22. Voting and Elections in Statistics Polls and Election Forecasting A poll is only valid of it’s based on a true random sample, which is actually difficult to achieve. Refer to book chapters 7.4 and 7.5 for details. It is known that certain population groups are more difficult to reach and tend to be underrepresented in polls. A difficulty in recent years has been the increasing number of people with only cell phone connections. Two recent articles in the New York Times, one of them related to polling, the other to the estimate of unemployment, highlight these difficulties with obtaining random samples: Why Polls Tend to Undercount Democrats, NYT 10/30/2014 A New Reason to Question the Official Unemployment Rate, NYT 8/26/2014
  23. 23. Voting and Elections in Statistics Polls and Election Forecasting More sophisticated election forecasts use data aggregated from many polls to calculate the probability of each election outcome. Examples are the New York Times election model (methodology is explained here) and Nate Silver’s ‘FiveThirtyEight’ election forecast model (see next page). These forecasters have developed models to predict the overall outcome of a Senate or presidential election (which are held state by state). They estimate the probability of each state-wide outcome and then simulate the national election, using the estimated probabilities, many times. The simulation outcomes are used to estimate the probability distribution for the overall outcome in terms of Senate seats held by each party. On the NYT election web site, you can see this in action: when you click a button, a fortune wheel starts turning representing each Senate race displaying the estimated probabilities.
  24. 24. Voting and Elections in Statistics: Forecasting Nate Silver’s famous “FiveThirtyEight” model accurately predicted the 2008 and 2012 presidential election. For details, see http://fivethirtyeight.com/interactives/senat e-forecast/

×