Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
10 things 
statistics 
taught us about 
big data
Research Blogging Teaching
Research Blogging Teaching 
jtleek.com
Research Blogging Teaching 
simplystatistics.org
Research Blogging Teaching 
jhudatascience.org
from: jtleek@gmail.com 
Roger let me know you gave him a 
ballpark figure for the number of 
students registered for his c...
from: pangwei@coursera.org 
Hi Jeff, 
7,000 students! It's pretty awesome. 
(You'll be able to check this out yourself 
ne...
from: rdpeng@gmail.com 
You are f**ed. 
-roger
Enrollment 
Time
Enrollment 
Time
Enrollment 
Time
9 classes 
1 month long 
Every month
Enrollment 
Time
1,000,000+ 
Enrolled
http://goo.gl/vQK0RH
http://goo.gl/xWAlPi
10 statistics things 
1. Problem first, not solution backward 
2. Define a metric for success first 
3. Analyze interactiv...
Problem first 
Not solution backward
http://goo.gl/3vA1OB
http://hyperboleandahalf.blogspot.com/
http://cran.r-project.org//
http://bioconductor.org/
Define a metric for success 
Before you start
http://www.agendia.com/managed-care/breast-cancer/mammaprint/
89% sensitivity 
42% specificity 
65% accuracy
http://www.biomedcentral.com/1471-2164/14/336/figure/F3
Analyze 
Interactively
http://had.co.nz/
https://twitter.com/EllieMcDonagh/status/469184554549248000
http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
Plot your data 
First and always
http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
h$p://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/
Know your real 
sample size
Watch out for 
confounders
http://xkcd.com/552/
shoe size & literacy
Correct for 
multiple testing
http://xkcd.com/882/
http://xkcd.com/882/
http://xkcd.com/882/
Average 
many predictors
5 independent, 
70% accurate classifiers 
10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5)= 
83.7% accuracy 
http://www.cbcb.umd.edu/~hc...
101 independent, 
70% accurate classifiers 
99.9% accuracy 
http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/Ens...
http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf 
Adapted from Todd Halloway
Smooth (average) 
over time and space
http://simplystatistics.org/2014/02/13/loess-explained-in-a-gif/
http://fivethirtyeight.com/
Have others 
check your work
10 statistics things 
1. Problem first, not solution backward 
2. Define a metric for success first 
3. Analyze interactiv...
jtleek.com/talks
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
10 things statistics taught us about big data
Upcoming SlideShare
Loading in …5
×

22

Share

Download to read offline

10 things statistics taught us about big data

Download to read offline

Talk at DC Business Intelligentsia.

10 things statistics taught us about big data

  1. 1. 10 things statistics taught us about big data
  2. 2. Research Blogging Teaching
  3. 3. Research Blogging Teaching jtleek.com
  4. 4. Research Blogging Teaching simplystatistics.org
  5. 5. Research Blogging Teaching jhudatascience.org
  6. 6. from: jtleek@gmail.com Roger let me know you gave him a ballpark figure for the number of students registered for his course "Computing for Data Analysis”. Could you give me an idea of how many have registered for my course "Data Analysis?”
  7. 7. from: pangwei@coursera.org Hi Jeff, 7,000 students! It's pretty awesome. (You'll be able to check this out yourself next week, once the class sites are up.)
  8. 8. from: rdpeng@gmail.com You are f**ed. -roger
  9. 9. Enrollment Time
  10. 10. Enrollment Time
  11. 11. Enrollment Time
  12. 12. 9 classes 1 month long Every month
  13. 13. Enrollment Time
  14. 14. 1,000,000+ Enrolled
  15. 15. http://goo.gl/vQK0RH
  16. 16. http://goo.gl/xWAlPi
  17. 17. 10 statistics things 1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work http://goo.gl/wTAuvR
  18. 18. Problem first Not solution backward
  19. 19. http://goo.gl/3vA1OB
  20. 20. http://hyperboleandahalf.blogspot.com/
  21. 21. http://cran.r-project.org//
  22. 22. http://bioconductor.org/
  23. 23. Define a metric for success Before you start
  24. 24. http://www.agendia.com/managed-care/breast-cancer/mammaprint/
  25. 25. 89% sensitivity 42% specificity 65% accuracy
  26. 26. http://www.biomedcentral.com/1471-2164/14/336/figure/F3
  27. 27. Analyze Interactively
  28. 28. http://had.co.nz/
  29. 29. https://twitter.com/EllieMcDonagh/status/469184554549248000
  30. 30. http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
  31. 31. Plot your data First and always
  32. 32. http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
  33. 33. h$p://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/
  34. 34. Know your real sample size
  35. 35. Watch out for confounders
  36. 36. http://xkcd.com/552/
  37. 37. shoe size & literacy
  38. 38. Correct for multiple testing
  39. 39. http://xkcd.com/882/
  40. 40. http://xkcd.com/882/
  41. 41. http://xkcd.com/882/
  42. 42. Average many predictors
  43. 43. 5 independent, 70% accurate classifiers 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5)= 83.7% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
  44. 44. 101 independent, 70% accurate classifiers 99.9% accuracy http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
  45. 45. http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/EnsembleMethods.pdf Adapted from Todd Halloway
  46. 46. Smooth (average) over time and space
  47. 47. http://simplystatistics.org/2014/02/13/loess-explained-in-a-gif/
  48. 48. http://fivethirtyeight.com/
  49. 49. Have others check your work
  50. 50. 10 statistics things 1. Problem first, not solution backward 2. Define a metric for success first 3. Analyze interactively 4. Plot your data first and always 5. Know your real sample size 6. Watch out for confounders 7. Correct for multiple testing 8. Average many predictors 9. Smooth over time and space 10. Have others check your work http://goo.gl/wTAuvR
  51. 51. jtleek.com/talks
  • AshrayAdappa

    Sep. 18, 2018
  • mdfaisalzaman

    Aug. 12, 2016
  • nitinkumar349

    Jul. 4, 2015
  • annejanefeng

    Apr. 5, 2015
  • astrogeek

    Mar. 28, 2015
  • yurifal

    Mar. 17, 2015
  • WanMohamadFarhan

    Jan. 16, 2015
  • dalevanderwoude

    Nov. 11, 2014
  • MollyCousins

    Oct. 20, 2014
  • torresoviedojorge

    Oct. 15, 2014
  • wyang72

    Oct. 15, 2014
  • eadeyeri

    Oct. 13, 2014
  • mediaczar

    Oct. 13, 2014
  • Galaxyweblinks

    Oct. 13, 2014
  • root

    Oct. 12, 2014
  • MaltiRaghavan

    Oct. 11, 2014
  • BobBae

    Oct. 10, 2014
  • mnmcook

    Oct. 10, 2014
  • marquinsmith

    Oct. 9, 2014
  • JashenthreeGovender

    Oct. 7, 2014

Talk at DC Business Intelligentsia.

Views

Total views

6,945

On Slideshare

0

From embeds

0

Number of embeds

316

Actions

Downloads

86

Shares

0

Comments

0

Likes

22

×