Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Making Sense of Statistics in HCI: part 2 - doing it

571 views

Published on

Doing it – if not p then what

http://alandix.com/statistics/course/

In this part we will look at the major kinds of statistical analysis methods:

* Hypothesis testing (the dreaded p!) – robust but confusing
* Confidence intervals – powerful but underused
* Bayesian stats – mathematically clean but fragile

None of these is a magic bullet; all need care and a level of statistical understanding to apply.

We will discuss how these are related including the relationship between ‘likelihood’ in hypothesis testing and conditional probability as used in Bayesian analysis. There are common issues including the need to clearly report numbers and tests/distributions used. avoiding cherry picking, dealing with outliers, non-independent effects and correlated features. However, there are also specific issues for each method.

Classic statistical methods used in hypothesis testing and confidence intervals depend on ideas of ‘worse’ for measures, which are sometimes obvious, sometimes need thought (one vs. two tailed test), and sometimes outright confusing. In addition, care is needed in hypothesis testing to avoid classic fails such as treating non-significant as no-effect and inflated effect sizes.

In Bayesian statistics different problems arise: the need to be able to decide in a robust and defensible manner what are the expected likelihood of different hypothesis before an experiment; and the dangers of common causes leading to inflated probability estimates due to a single initial fluke event or optimistic prior.

Crucially, while all methods have problems that need to be avoided, we will see how not using statistics at all can be far worse.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Making Sense of Statistics in HCI: part 2 - doing it

  1. 1. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Making Sense of Statistics in HCI Part 2 – Doing It if not p then what Alan Dix http://alandix.com/statistics/
  2. 2. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix probing the unknown quantified uncertainty accepting unknowing and common sense.
  3. 3. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix recall … the job of statistics
  4. 4. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix conditionals and likelihood probability road will be busy vs probability busy given 4am on Sunday probability die is a 6 vs prob a 6 given it is 4 or bigger conditional probability is probability given some knowledge
  5. 5. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix likelihood likelihood is probability of measurement given unknown things in the world (conditional probability) e.g. prob. of six heads given coin is fair = 1/64 … but is it fair?
  6. 6. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix counterfactual statistics turns this round from likelihood of measurement probabilistic knowledge given the unknown world to uncertain knowledge of the unknown world based on measurement … but in various different ways
  7. 7. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix statistical reasoning
  8. 8. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix types of statistics • hypothesis testing (the dreaded p!) – robust but confusing • confidence intervals – powerful but underused • Bayesian stats – mathematically clean but fragile – issues of independence
  9. 9. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  10. 10. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix hypothesis testing the ubiquitous p
  11. 11. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix significance test hypotheses: H1 – what want to show H0 – null hypothesis (to disprove) idea (when experiment/study successful) if H0 were true then observed effect is very unlikely therefore H1 is (likely to be) true
  12. 12. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix 5% significance level? it says: if H0 were true then probability observed effect happening by chance is less than 1 in 20 (5%) so H0 is unlikely to be true and H1 is likely to be true
  13. 13. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix does not say 8 probability of H0 is < 1 in 20 8 probability of H1 is > 0.95 8 effect is large or important i.e. significant in the real sense
  14. 14. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix all it says 4 if H0 were true ... ... effect is unlikely ( prob. < 1 in 20)
  15. 15. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix non-significant result can NEVER reason: non-significant result  H1 false things are the same all you can say is: H1 is not statistically proven
  16. 16. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  17. 17. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix confidence intervals bounds of uncertainty
  18. 18. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix proving equality non-significant – not proved different real difference may always be smaller than experimental error  can never prove equality but can put bounds on inequality
  19. 19. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix confidence interval bound on true value – same theory as p values e.g. mean of data is 0.3 95% confidence interval is [–0.7,1.3] says if the real value not in the range [–0.7,1.3] probability of seeing observed data is less than 5%
  20. 20. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix counterfactuals 95% confidence interval is [–0.7,1.3] does not say: there is 95% probability that the real mean is in the range [–0.7,1.3] it either is or it isn’t! all it says: probability of seeing the observed data if real value outside the range is less than 5%
  21. 21. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix proven … what? H0: no difference (real mean is zero) experimental result: mean is 0.3 significance test: n.s. at 5% – so what? 95% confidence interval: [–0.7,1.3] ? is 1.3 is an important difference
  22. 22. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix … and don’t forget … you still need to say what test/distribution – e.g. Student’s T how many – degrees of freedom it is still uncertain the real value could be outside the interval
  23. 23. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  24. 24. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Bayesian statistics putting a number on it
  25. 25. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix traditional statistics probability of having antennae: if Martian = 1 if human = 0.001 hypotheses: H0: no Martians in the High Street H1: the Martians have landed ! you meet someone with antennae reject null hypothesis p ≤ 0.1% call the Men in Black!
  26. 26. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Bayesian thinking prior probability of meeting: a Martian = 0.000 001 a human = 0.999 999 probability of having antennae: Martian = 1; human = 0.001 ! you meet someone with antennae posterior probability of being: a Martian ≈ 0.001 a human ≈ 0.999 what you think before evidence
  27. 27. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Bayesian inference combine prior with likelihood prior distribution posterior
  28. 28. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix Bayesian issues how do you get the prior? sometimes doesn't matter too much! traditional stats rather like uniform prior handling multiple evidence can re-apply iteratively problems with interactions internecine warfare traditionalists and Bayesians often fight ;)
  29. 29. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  30. 30. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix philosophical differences probing the unknown
  31. 31. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix philosophical stance ? what do you know about the world traditional statistics – assume nothing! – reason from unknowledge Bayesian statistics – 'prior' probabilities – reason from guess-timates
  32. 32. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix uncertain and unknown assumptions: traditional – NO knowledge of real value Bayesian – precise probability distribution in reality … sort of half know traditional – choice of acceptable p Bayesian – ‘spread’ distributions (uniform, Cauchy) ideally should do sensitivity analysis … but no one does 
  33. 33. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  34. 34. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix issues from “what is worse” to really significant
  35. 35. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix issues likelihood is conditional probability what to do with 'non-sig'. priors for experiments? finding or verifying? significance vs effect size vs power idea of 'worse' for measures
  36. 36. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix example: playing cards run of cards badly shuffled?
  37. 37. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix example: playing cards (2) probability three cards make a decreasing run: – first card needs to be 3 or bigger: 11/13 – second card exactly the next one down: 1/51 – third card exactly one down again: 1/50 = 11/33150 approx 1 in 3000 – p <0.001 any card except ace or 2 next card down next again
  38. 38. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix example: playing cards (3) but equally surprised if … – either direction – 15 positions for the first card = 15x2x11/33150 approx 1 in 100 and then … how often do I play?
  39. 39. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix dangers if it can go wrong then with 95% confidence …
  40. 40. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix dangers avoiding cherry picking multiple tests, multiple stats outliers, post-hoc hypotheses non-independent effects e.g. fat and sugar correlated features e.g. weight, diet and exercise including trying both traditional and Bayes! the same UI but ‘not’ consistent
  41. 41. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  42. 42. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix so which is it? to p or not to p Bayes forward?
  43. 43. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix the statistical crisis? mostly well known problems with known solutions maybe more people doing stats with less training? some of the papers about it use flawed stats!
  44. 44. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix on balance (my advice!) traditional more conservative – if done properly Bayes many of the same problems as trad. plus a few more … needs expertise for most things stick with trad. … but always confidence intervals as well as p for special things go for Bayes if you know prior, decision making with costs, internal algorithms
  45. 45. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix
  46. 46. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix a quick experiment … but are they fair
  47. 47. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix calculations – six coins given coin is fair: probability six heads = 1/26 = 1/64 probability six tails = 1/26 = 1/64 probability either = 2/64 ~ 3% H0 – coin is fair H1 – coin is not-fair likelihood ( HHHHHH or TTTTTT | H0 ) < 5%
  48. 48. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix your experiment toss 6 coins record how many heads or tails if HHHHHH or TTTTTTT you can reject H0 with p< 5% repeat exp. 2 more times with rest of coins
  49. 49. Making Sense of Statistics in HCI: From P to Bayes and Beyond – Alan Dix

×