Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

data science @ the new york times: IACS@harvard talk 2014-10-31

2,072 views

Published on

"Data Science at The New York Times"
talk given to the students of IACS, Harvard
2014-10-31

announcement:
http://www.seas.harvard.edu/calendar/event/77966

article:
http://www.thecrimson.com/article/2014/11/3/data-scientist-times-seminar/

Published in: Technology

data science @ the new york times: IACS@harvard talk 2014-10-31

  1. 1. data science @ The New York Times ! (and what can academia do for a 163-year old company?) chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins
  2. 2. 0. the path
  3. 3. biology: 1892 vs. 1995 biology changed for good.
  4. 4. genetics: 1837 vs. 2012 from “segments” to algorithms
  5. 5. genetics: 1837 vs. 2012 from intuition to prediction
  6. 6. genetics: 1837 vs. 2012 ML toolset; data science mindset
  7. 7. data science: web scale
  8. 8. example: 163 yr old
  9. 9. bit.ly/nyt-interactive-2013
  10. 10. R+D: nytlabs.com
  11. 11. example: millions of views per hour
  12. 12. data science: the web
  13. 13. learning the “genome” of loyal subscribers insert figure here
  14. 14. from “segments” to algorithms insert figure here
  15. 15. from intuition to prediction insert figure here
  16. 16. data science: the web
  17. 17. data science: the web is your “online presence”
  18. 18. data science: the web is a microscope
  19. 19. data science: the web is an experimental tool
  20. 20. data science: the web is an optimization tool
  21. 21. 0. the path
  22. 22. 1. common tools
  23. 23. 1. common tools - supervised learning - unsupervised learning - reinforcement learning
  24. 24. supervised learning, e.g.,
  25. 25. supervised learning, e.g., “the funnel”
  26. 26. interpretable supervised learning supercoolstuff
  27. 27. supervised learning, e.g., “logistics”
  28. 28. unsupervised learning, e.g, “segments”
  29. 29. unsupervised learning, e.g, “segments”
  30. 30. unsupervised learning, e.g, “segments” argmax_z p(z|x)=14
  31. 31. unsupervised learning, e.g, “segments” “soccer mom”
  32. 32. reinforcement learning
  33. 33. reinforcement learning aka “A/B testing”; RCT
  34. 34. reinforcement learning
  35. 35. reinforcement learning img: MSR SV (RIP) e.g., multi-armed bandits
  36. 36. 1. common tools - supervised learning - unsupervised learning - reinforcement learning
  37. 37. 2. differing goals
  38. 38. “data”:
  39. 39. “data”: “metrics” “business analytics” “Excel” “reporting”
  40. 40. Reporting
  41. 41. Reportingbusiness as usual
  42. 42. Reporting Learning business as usual
  43. 43. Reporting Learning(esp. supervised) business as usual
  44. 44. Reporting Learning Test business as usual (esp. supervised)
  45. 45. Reporting Learning Test aka “A/B testing”; RCT business as usual (esp. supervised)
  46. 46. Reporting Learning Test Optimizing aka “A/B testing”; RCT (esp. supervised) business as usual
  47. 47. Reporting Learning Test Optimizing aka “A/B testing”; RCT (i.e. reinforcement learning) (esp. supervised) business as usual
  48. 48. Reporting Learning Test Optimizing Explore aka “A/B testing”; RCT (i.e. reinforcement learning) (esp. supervised) business as usual
  49. 49. Reporting Learning Test Optimizing Explore aka “A/B testing”; RCT aka “segmenting” (i.e. reinforcement learning) (esp. supervised) business as usual
  50. 50. Reporting Learning Test Optimizing Explore
  51. 51. Reporting Learning Optimizing tech co
  52. 52. Reporting Optimizing big co
  53. 53. Reporting theoretical co
  54. 54. Reporting Learning Test Optimizing Explorestartups:
  55. 55. every publisher is now a startup
  56. 56. 3. creating a DS culture
  57. 57. DS team functions: what does DS deliver?
  58. 58. DS team functions: what does DS deliver? - build data products - build APIs - impact roadmaps
  59. 59. - build data products
  60. 60. - build data products
  61. 61. - build data products (including internal products)
  62. 62. - build APIs
  63. 63. - build APIs
  64. 64. - impact roadmaps flickr/McJex
  65. 65. DS&E team composition flickr/eggsalad78
  66. 66. DS&E team composition - data engineering - data science - data visualization - data product
  67. 67. 4. doing DS well
  68. 68. common requirements in data science:
  69. 69. common requirements in data science: 1. people 2. ideas 3. things cf. USAF
  70. 70. data science: things
  71. 71. data science: things - find quantifiables !
  72. 72. data science: things - find quantifiables (choose carefully) !
  73. 73. data science: things - straw man first !
  74. 74. data science: things - straw man first !
  75. 75. data science: things - small wins before feature engineering !
  76. 76. data science: things - data engineering before data science !
  77. 77. data science: ideas
  78. 78. data science: ideas - reframe domain questions as machine learning tasks
  79. 79. data science: ideas - better wrong than "nice"
  80. 80. data science: ideas - be relevant !
  81. 81. data science: ideas - be relevant !
  82. 82. data science: ideas - be relevant !
  83. 83. data science: ideas - befriend experimentalists !
  84. 84. data science: ideas - befriend experimentalists !
  85. 85. data science: ideas - befriend experimentalists !
  86. 86. data science: ideas - befriend experimentalists ! supercoolstuff
  87. 87. data science: people
  88. 88. data science: people - be communicative !
  89. 89. data science: people - be communicative (promote rhetorical literacy)
  90. 90. data science: people - be communicative (promote rhetorical literacy) - related: strive to build models which are both predictive and interpretable
  91. 91. data science: people - be skeptical (promote critical literacy)
  92. 92. data science: people - be empowering !
  93. 93. data science: people - be transparent !
  94. 94. data science: people - promote literacy: functional critical rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  95. 95. data science: people - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  96. 96. data science: people - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  97. 97. data science: people - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  98. 98. data science: people - promote literacies: 1. functional 2. critical 3. rhetorical ! (cf. Selber, Multiliteracies for a Digital Age. 2004)
  99. 99. summary: pay attention to: 1. people 2. ideas 3. things cf. USAF
  100. 100. people: 1. be communicative 2. be skeptical 3. be empowering 4. be transparent 5. promote literacies
  101. 101. ideas: 1. reframe questions as ML 2. better wrong than "nice" 3. be relevant 4. aim for hypothesis vs data jeapordy 5. befriend experimentalists
  102. 102. things: 1. find quantifiables 2. straw man first 3. small wins before feature engineering 4. data engineering before data science !
  103. 103. find out more! 1. postdoc/student opportunities: chris.wiggins@columbia.edu ! 2. we are hiring! chris.wiggins@nytimes.com ! 3. let’s talk: chris.wiggins@ @chrishwiggins
  104. 104. chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins

×