Putting the science back in
Data Science
Daniel Whitenack, @dwhitena
Data Scientist and Advocate, @pachydermIO
Chicago Data Science Conference 2017
Outline
1. Why do we care about Reproducibility?
2. How can we achieve Reproducibility?
3. Demo with R and Pachyderm
4. Resources
@dwhitena, @datascienceassn, #DSAChicago2017
Why do we care about
Reproducibility?
@dwhitena, @datascienceassn, #DSAChicago2017
How can we achieve
Reproducibility?
(at scale)
@dwhitena, @datascienceassn, #DSAChicago2017
Demo
@dwhitena, @datascienceassn, #DSAChicago2017
iris.csv
1.3,1.4,...
@dwhitena, @datascienceassn, #DSAChicago2017
iris.csv
1.3,1.4,...
@dwhitena, @datascienceassn, #DSAChicago2017
iris.csv
1.3,1.4,...
train.R
model.rda
model.txt
@dwhitena, @datascienceassn, #DSAChicago2017
iris.csv
1.3,1.4,...
train.R
infer.R
model.rda
model.txt
@dwhitena, @datascienceassn, #DSAChicago2017
iris.csv
1.3,1.4,...
train.R
infer.R
1.csv
1.3,1.4,...
1
setosa
model.rda
model.txt
@dwhitena, @datascienceassn, #DSAChicago2017
iris.csv
1.3,1.4,...
train.R
infer.R
1.csv
1.3,1.4,...
1
setosamodel.rda
model.txt
@dwhitena, @datascienceassn, #DSAChicago2017
… enter Pachyderm
An open source, distributed processing and data versioning
framework built on containers.
@dwhitena, @datascienceassn, #DSAChicago2017
Pachyderm
training
model
model
attributes
1.csv
inference
1
Running train.Riris.csv
inference
Running infer.R
model.rda
model.txt
@dwhitena, @datascienceassn, #DSAChicago2017
Pachyderm
training
model
model
attributes
1.csv
inference
1
Running train.Riris.csv
Inference 1
model.rda
model.txt
Inference 2
Inference N
@dwhitena, @datascienceassn, #DSAChicago2017
Pachyderm
training
model
model
attributes inference
inference
@dwhitena, @datascienceassn, #DSAChicago2017
Pachyderm
training
model
model
attributes inference
inference
plots
plots
@dwhitena, @datascienceassn, #DSAChicago2017
Pachyderm
training
model
model
attributes inference
inference
plots
plots
raw_data
training
@dwhitena, @datascienceassn, #DSAChicago2017
training
model
model
attributes inference
inference
plots
plots
raw_data
training
raw_attr
attributes
#!/bin/bash
@dwhitena, @datascienceassn, #DSAChicago2017
Pachyderm
training
model
model
attributes inference
inference
plots
plots
raw_data
training
raw_attr
attributes
attributes
attributes
inference
training
@dwhitena, @datascienceassn, #DSAChicago2017
Conclusion/Resources
● Run the code/pipeline
● Join the Pachyderm Slack channel
● Check out the Pachyderm docs
● Slack/tweet me @dwhitena
● Read a related article
@dwhitena, @datascienceassn, #DSAChicago2017

Putting the Science Back in Data Science