data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
references: bit.ly/brown-refs
data science @ The New York Times
data science @ The New York Times
“data science”
jobs, jobs, jobs
“data science”
jobs, jobs, jobs
data science: mindset & toolset
drew conway, 2010
modern history:
2009
modern history:
2009
“data science”
ancient history: 2001
“data science”
ancient history: 2001
data science
context
home schooled
B.A. & M.Sc. from Brown
PhD in topology
“By the end of late 1945, I was a
statistician rather than a topologist”
invented: “bit”
invented: “software”
invented: “FFT”
“the progenitor of data science.” - @mshron
“The Future of Data Analysis,” 1962
John W. Tukey
introduces:
“Exploratory data anlaysis”
Tukey 1965, via John Chambers
TUKEY BEGAT S WHICH BEGAT R
Tukey 1972
Tukey 1975
In 1975, while at Princeton, Tufte was asked to teach a
statistics course to a group of journalists who were visiting
the school to study economics. He developed a set of
readings and lectures on statistical graphics, which he
further developed in joint seminars he subsequently taught
with renowned statistician John Tukey (a pioneer in the field
of information design). These course materials became the
foundation for his first book on information design, The
Visual Display of Quantitative Information
TUKEY BEGAT VDQI
Tukey 1977
TUKEY BEGAT EDA
fast forward -> 2001
“The primary agents for change should be
university departments themselves.”
data science @ The New York Timeshistories
1. slow burn @Bell: as heretical
statistics (see also Breiman)
2. caught fire 2009-now: as job
description
historical rant: bit.ly/data-rant
biology: 1892 vs. 1995
biology: 1892 vs. 1995
biology changed for good.
biology: 1892 vs. 1995
new toolset, new mindset
genetics: 1837 vs. 2012
ML toolset; data science mindset
genetics: 1837 vs. 2012
genetics: 1837 vs. 2012
ML toolset; data science mindset
arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
data science: mindset & toolset
1851
news: 20th century
church state
church
church
church
news: 20th century
church state
news: 21st century
church state
engineering
1851 1996
newspapering: 1851 vs. 1996
example:
millions of views per hour2015
"...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’
"...social activities generate large quantities of potentially
valuable data...The data were not generated for the
purpose of learning; however, the potential for learning
is great’’ - J Chambers, Bell Labs,1993
data science: the web
data science: the web
is your “online presence”
data science: the web
is a microscope
data science: the web
is an experimental tool
1851 1996
newspapering: 1851 vs. 1996 vs. 2008
2008
“a startup is a temporary organization in search of a
repeatable and scalable business model” —Steve Blank
every publisher is now a startup
every publisher is now a startup
news: 21st century
church state
engineering
news: 21st century
church state
engineering
learnings
learnings
- predictive modeling
- descriptive modeling
- prescriptive modeling
(actually ML, shhhh…)
- (supervised learning)
- (unsupervised learning)
- (reinforcement learning)
learnings
- predictive modeling
- descriptive modeling
- prescriptive modeling
cf. modelingsocialdata.org
predictive modeling, e.g.,
cf. modelingsocialdata.org
predictive modeling, e.g.,
“the funnel”
cf. modelingsocialdata.org
interpretable predictive modeling
supercoolstuff
cf. modelingsocialdata.org
interpretable predictive modeling
supercoolstuff
cf. modelingsocialdata.org
arxiv.org/abs/q-bio/0701021
optimization & learning, e.g.,
“How The New York Times Works “popular mechanics, 2015
optimization & prediction, e.g.,
“How The New York Times Works “popular mechanics, 2015
(some models)
(somemoneys)
recommendation as predictive modeling
recommendation as predictive modeling
bit.ly/AlexCTM
descriptive modeling, e.g,
cf. daeilkim.com ; import bnpy
modeling your audience
bit.ly/Hughes-Kim-Sudderth-AISTATS15
modeling your audience
(optimization, ultimately)
also allows insight+targeting as inference
modeling your audience
prescriptive modeling
prescriptive modeling
cf. modelingsocialdata.org
prescriptive modeling
aka “A/B testing”;
RCT
cf. modelingsocialdata.org
prescriptive modeling, e.g,
prescriptive modeling, e.g,
prescriptive modeling, e.g,
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
Reporting
Learning
Test
Optimizing
Exploredescriptive:
predictive:
prescriptive:
common requirements in
data science:
common requirements in
data science:
1. people
2. ideas
3. things
cf. John Boyd, USAF
data science: ideas
data skills
data science and…
- data engineering
- data embeds
- data product
- data multiliteracies
cf. “data scientists at work”, ch 1
data science: ideas
- new mindset > new toolset
data science: people
thanks to the data science team!
data science @ The New York Times
chris.wiggins@columbia.edu
chris.wiggins@nytimes.com
@chrishwiggins
references: bit.ly/brown-refs

data science @NYT ; inaugural Data Science Initiative Lecture