data science in academia and the real world

2,088 views

Published on

talk given 2014-04-29 to the New York Open Statistical Programming Meetup (http://www.meetup.com/nyhackr/) by Chris Wiggins (columbia/NYT/hackNY)

Published in: Engineering, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,088
On SlideShare
0
From Embeds
0
Number of Embeds
209
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

data science in academia and the real world

  1. 1. what is a computational biologist doing at the New York Times? ! (and what can academia do for a 163-year old company?) chris.wiggins@columbia.edu chris.wiggins@nytimes.com chris.wiggins@hackNY.org @chrishwiggins
  2. 2. context/background
  3. 3. context/background (before ‘the talk’)
  4. 4. biology: 1892 vs. 1995 biology changed for good.
  5. 5. genetics: 1837 vs. 2012 from “segments” to algorithms
  6. 6. genetics: 1837 vs. 2012 from intuition to prediction
  7. 7. data science: web scale
  8. 8. example: 163 yr old
  9. 9. bit.ly/nyt-interactive-2013
  10. 10. R+D: nytlabs.com
  11. 11. developer.nytimes.com: 2008
  12. 12. example: millions of views per hour
  13. 13. from “segments” to algorithms insert figure here
  14. 14. from intuition to prediction insert figure here
  15. 15. data science: the web
  16. 16. data science: the web is your “online presence”
  17. 17. data science: the web is a microscope
  18. 18. data science: the web is an experimental tool
  19. 19. data science: the web is an optimization tool
  20. 20. </header>
  21. 21. </header> i.e., <body>
  22. 22. common requirements in data science:
  23. 23. common requirements in data science: 1. practices 2. skills 3. culture
  24. 24. common requirements in data science: 1. practices 2. skills 3. culture
  25. 25. common requirements in data science: 1. practices 2. skills 3. culture
  26. 26. data science: practice
  27. 27. data science: practice - reframe domain questions as machine learning tasks
  28. 28. data science: practice - better wrong than "nice"
  29. 29. data science: practice - be relevant !
  30. 30. data science: practice - be relevant !
  31. 31. data science: practice - be relevant !
  32. 32. data science: practice - hypotheses are not data jeopardy !
  33. 33. data science: practice - befriend experimentalists !
  34. 34. data science: practice - befriend experimentalists !
  35. 35. data science: practice - befriend experimentalists !
  36. 36. data science: skills
  37. 37. data science: skills - find quantifiables !
  38. 38. data science: skills - find quantifiables (choose carefully) !
  39. 39. data science: skills - straw man first !
  40. 40. data science: skills - straw man first !
  41. 41. data science: skills - small wins before feature engineering !
  42. 42. data science: skills - data engineering before data science !
  43. 43. data science: culture
  44. 44. data science: culture - be communicative !
  45. 45. data science: culture - be communicative (promote rhetorical literacy)
  46. 46. data science: culture - be communicative (promote rhetorical literacy) - related: strive to build models which are both predictive and interpretable
  47. 47. data science: culture - be skeptical (promote critical literacy)
  48. 48. data science: culture - be empowering !
  49. 49. data science: culture - be transparent !
  50. 50. data science: culture - promote literacy: functional critical rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  51. 51. data science: culture - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  52. 52. data science: culture - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  53. 53. data science: culture - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  54. 54. data science: culture - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  55. 55. </body> i.e., <footer>
  56. 56. summary:
  57. 57. summary: pay attention to: 1. practices 2. skills 3. culture
  58. 58. practices: 1. reframe questions as ML 2. better wrong than "nice" 3. be relevant 4. aim for hypothesis vs data jeapordy 5. befriend experimentalists
  59. 59. skills: 1. find quantifiables 2. straw man first 3. small wins before feature engineering 4. data engineering before data science !
  60. 60. culture: 1. be communicative 2. be skeptical 3. be empowering 4. be transparent 5. promote literacies
  61. 61. find out more! 1. postdoc/student opportunities: chris.wiggins@columbia.edu ! 2. always hiring: chris.wiggins@nytimes.com ! 3. let’s talk: - @chrishwiggins - gist.github.com/chrishwiggins/
  62. 62. what is a computational biologist doing at the New York Times? ! (and what can academia do for a 163-year old company?) chris.wiggins@columbia.edu chris.wiggins@nytimes.com chris.wiggins@hackNY.org @chrishwiggins

×