# 10 things statistics taught us about big data

### 10 things statistics taught us about big data

10 things statistics taught us about big data
2. 2. Research Blogging Teaching
3. 3. Research Blogging Teaching jtleek.com
4. 4. Research Blogging Teaching simplystatistics.org
5. 5. Research Blogging Teaching jhudatascience.org
6. 6. from: jtleek@gmail.com Roger let me know you gave him a ballpark figure for the number of students registered for his course "Computing for Data Analysis”. Could you give me an idea of how many have registered for my course "Data Analysis?”
7. 7. from: pangwei@coursera.org Hi Jeff, 7,000 students! It's pretty awesome. (You'll be able to check this out yourself next week, once the class sites are up.)
8. 8. from: rdpeng@gmail.com You are f**ed. -roger
9. 9. Enrollment Time
10. 10. Enrollment Time
11. 11. Enrollment Time
12. 12. 9 classes 1 month long Every month
13. 13. Enrollment Time
14. 14. 1,000,000+ Enrolled
15. 15. http://goo.gl/vQK0RH
16. 16. http://goo.gl/xWAlPi
17. 17. 10 statistics things 1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work http://goo.gl/wTAuvR
18. 18. Problem first Not solution backward
19. 19. http://goo.gl/3vA1OB
20. 20. http://hyperboleandahalf.blogspot.com/
21. 21. http://cran.r-project.org//
22. 22. http://bioconductor.org/
23. 23. Define a metric for success Before you start
24. 24. http://www.agendia.com/managed-care/breast-cancer/mammaprint/
25. 25. 89% sensitivity 42% specificity 65% accuracy
26. 26. http://www.biomedcentral.com/1471-2164/14/336/figure/F3
27. 27. Analyze Interactively
31. 31. Plot your data First and always
33. 33. h\$p://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/
34. 34. Know your real sample size
35. 35. Watch out for confounders
36. 36. http://xkcd.com/552/
37. 37. shoe size & literacy
38. 38. Correct for multiple testing
39. 39. http://xkcd.com/882/
40. 40. http://xkcd.com/882/
41. 41. http://xkcd.com/882/
42. 42. Average many predictors
43. 43. 5 independent, 70% accurate classifiers 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5)= 83.7% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
44. 44. 101 independent, 70% accurate classifiers 99.9% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
46. 46. Smooth (average) over time and space
47. 47. http://simplystatistics.org/2014/02/13/loess-explained-in-a-gif/
48. 48. http://fivethirtyeight.com/
49. 49. Have others check your work
10 statistics things 1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work http://goo.gl/wTAuvR
51. 51. jtleek.com/talks

