Graphical inference

1,772 views
1,666 views

Published on

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,772
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Graphical inference

  1. 1. “The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620 Wednesday, 10 November 2010
  2. 2. “The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620 Is what we see reallythere? Wednesday, 10 November 2010
  3. 3. October 2010 Hadley Wickham, Dianne Cook, Heike Hofmann, Andreas Buja Graphical inference for infovis Wednesday, 10 November 2010
  4. 4. Which one of these plots is not like the others? Which of these plots just doesn’t belong? Wednesday, 10 November 2010
  5. 5. 7 of those plots were plots of random (null) data. 1 plot was the real data. If you correctly picked the true plot from the null plots then we have evidence that it really is different. In fact, we have rigorous statistical evidence that there is a difference, just using Sesame Street skills! Wednesday, 10 November 2010
  6. 6. 1. The statistical justice system 2. Line up protocol 3. Rorschach protocol 4. Future work Wednesday, 10 November 2010
  7. 7. http://www.flickr.com/photos/joegratz/117048243 Hypothesis testing? Wednesday, 10 November 2010
  8. 8. http://www.flickr.com/photos/joegratz/117048243 The statistical justice system Hypothesis testing? Wednesday, 10 November 2010
  9. 9. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Wednesday, 10 November 2010
  10. 10. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Null distribution Innocents Wednesday, 10 November 2010
  11. 11. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Reject the null Fail to reject the null Guilty Not guilty Null distribution Innocents Wednesday, 10 November 2010
  12. 12. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Reject the null Fail to reject the null Guilty Not guilty Null distribution Innocents p-value Probability that a truly innocent dataset would look as guilty as the suspect Wednesday, 10 November 2010
  13. 13. Line up Wednesday, 10 November 2010
  14. 14. believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it? Wednesday, 10 November 2010
  15. 15. believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it? Wednesday, 10 November 2010
  16. 16. Protocol Generate n-1 decoys (null datasets) Plot the decoys + the real data (randomly positioned) Show to an impartial observer. Can they spot the real data? If so, you have evidence for true difference (p-value = 1/n) Wednesday, 10 November 2010
  17. 17. E. L. Scott, C. D. Shane, and M. D. Swanson. Comparison of the synthetic and actual distribution of galaxies on a photographic plate. Astrophysical Journal, 119:91–112, Jan. 1954. Wednesday, 10 November 2010
  18. 18. A. M. Noll. Human or machine: A subjective comparison of Piet Mondrian’s “composition with lines” (1917) and a computer- generated picture. The Psychological Record, 16:1–10, 1966. Wednesday, 10 November 2010
  19. 19. vs. classical tests Of course, if we know what we’re looking for, we can always develop an algorithm or numerical test. The advantage of visual inference is that works for very general tasks, including when you don’t know exactly what you’re looking for. Wednesday, 10 November 2010
  20. 20. ower of the test ! Power 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 sigma = 12 !15 !10 !5 0 5 10 15 sigma = 5 !15 !10 !5 0 5 10 15 samplesize=100samplesize=300 power_curve Theoretical test Visual test lower_CL upper_CL Recent work shows that power only a little worse than classical test Wednesday, 10 November 2010
  21. 21. Plot Task Choropleth map Is there a spatial trend? Treemap Is the distribution in higher level categories the same? Scatterplot Are the two variables independent? Time series Is there a trend in mean or variability? Wednesday, 10 November 2010
  22. 22. Wednesday, 10 November 2010
  23. 23. Wednesday, 10 November 2010
  24. 24. Wednesday, 10 November 2010
  25. 25. Once we’ve seen the plot, we’re no longer impartial Wednesday, 10 November 2010
  26. 26. Code # Support package written in R # http://github.com/ggobi/nullabor # Provides reference implementation of ideas library(nullabor) library(ggplot2) qplot(angle * 180 / pi, r, data = threept) %+% lineup(null_model(r ~ poly(angle, 2)), n = 10) + facet_wrap(~ .sample, ncol = 5) Wednesday, 10 November 2010
  27. 27. Rorschach Wednesday, 10 November 2010
  28. 28. Rorschach We’re surprisingly bad at appreciating the amount of variation in random data. Showing only null plots is a good way to calibrate our intuition. We also plan on using these plots as an empirical tool to understand what features people pick up on. Anecdotally, undergrads focus too much on outliers Wednesday, 10 November 2010
  29. 29. result count 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 1 4 7 0.0 0.2 0.4 0.6 0.8 1.0 2 5 8 0.0 0.2 0.4 0.6 0.8 1.0 3 6 9 0.0 0.2 0.4 0.6 0.8 1.0 Wednesday, 10 November 2010
  30. 30. Future work Wednesday, 10 November 2010
  31. 31. Future work How can visual inference be integrated into visualisation software at a fundamental level? How does training impact results? How do novices vs. experts differ? What patterns do people pick up on? What are the alternatives that people respond to? Wednesday, 10 November 2010
  32. 32. Questions? Wednesday, 10 November 2010
  33. 33. Wednesday, 10 November 2010
  34. 34. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Wednesday, 10 November 2010

×