10 things statistics taught us about big data

1. 10 things statistics taught us about big data

2. Research Blogging Teaching

3. Research Blogging Teaching jtleek.com

4. Research Blogging Teaching simplystatistics.org

5. Research Blogging Teaching jhudatascience.org

7. from: jtleek@gmail.com Roger let me know you gave him a ballpark figure for the number of students registered for his course "Computing for Data Analysis”. Could you give me an idea of how many have registered for my course "Data Analysis?”

8. from: pangwei@coursera.org Hi Jeff, 7,000 students! It's pretty awesome. (You'll be able to check this out yourself next week, once the class sites are up.)

9. from: rdpeng@gmail.com You are f**ed. -roger

10. Enrollment Time

11. Enrollment Time

12. Enrollment Time

13. 9 classes 1 month long Every month

14. Enrollment Time

15. 1,000,000+ Enrolled

16. http://goo.gl/vQK0RH

17. http://goo.gl/xWAlPi

18. 10 statistics things 1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work http://goo.gl/wTAuvR

19. Problem first Not solution backward

21. http://goo.gl/3vA1OB

22. http://hyperboleandahalf.blogspot.com/

23. http://cran.r-project.org//

24. http://bioconductor.org/

25. Define a metric for success Before you start

26. http://www.agendia.com/managed-care/breast-cancer/mammaprint/

27. 89% sensitivity 42% specificity 65% accuracy

28. http://www.biomedcentral.com/1471-2164/14/336/figure/F3

29. Analyze Interactively

30. http://had.co.nz/

31. https://twitter.com/EllieMcDonagh/status/469184554549248000

32. http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html

33. Plot your data First and always

34. http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html

35. h$p://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

36. Know your real sample size

39. Watch out for confounders

40. http://xkcd.com/552/

42. shoe size & literacy

45. Correct for multiple testing

50. Average many predictors

51. 5 independent, 70% accurate classifiers 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5)= 83.7% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway

52. 101 independent, 70% accurate classifiers 99.9% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway

53. http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway

54. Smooth (average) over time and space

55. http://simplystatistics.org/2014/02/13/loess-explained-in-a-gif/

56. http://fivethirtyeight.com/

57. Have others check your work

63. 10 statistics things 1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work http://goo.gl/wTAuvR

64. jtleek.com/talks

10 things statistics taught us about big data

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to 10 things statistics taught us about big data

Similar to 10 things statistics taught us about big data (20)

Recently uploaded

Recently uploaded (20)

10 things statistics taught us about big data