SlideShare a Scribd company logo
OPINIONATED ANALYSIS
DEVELOPMENT
HILARY PARKER
@HSPTER
Not So Standard Deviations
Reproducible
Accurate
Collaborative
SOFTWARE DEVELOPER
SOFTWARE ENGINEER
NARRATIVE
• What models are you going to use?
• What argument are you going to make?
• How are you going to convince your audience
of something?
TECHNICAL ARTIFACT
• What tools are you going to use to make the
deliverable?
• What technical and coding choices will you
make for producing the deliverable?
mastering the statistical tooling :: mastering a musical instrument
mastering the statistical tooling :: mastering a musical instrument
statistical theory :: musical theory
mastering the statistical tooling :: mastering a musical instrument
statistical theory :: musical theory
full analysis :: piece of music
disclaimer: Karl is extremely nice, helpful & non-judgmental
BAD ANALYST? :(
ment
WHY?
don’t want to “limit creativity”
embarrassed by personal process
WHY DOES THAT
MATTER?
There are known, common problems when
developing an analysis
Making these problems easy to avoid frees up time
for the creativity of developing the narrative
You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
You change code but don’t re-execute it, so the results are out of date.
You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
A second analyst can’t understand your code.
Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Your data becomes corrupted, but you don’t notice.
You use a new dataset that renders your statistical methods invalid, but you don’t notice.
You make a mistake in your code.
You use inefficient code.
A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
You change code but don’t re-execute it, so the results are out of date.
You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
A second analyst can’t understand your code.
Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Your data becomes corrupted, but you don’t notice.
You use a new dataset that renders your statistical methods invalid, but you don’t notice.
You make a mistake in your code.
You use inefficient code.
A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
Reproducible
Accurate
Collaborative
OPERATIONS
Switch from “blaming” a person to blaming the
process that they used.
The person did not want to make an error, and
acted in a way that she thought wouldn’t create
an error.
The current system failed the person with good
intentions
AN EXAMPLE OF A
SUCCESSFUL(ISH) POST-MORTEM
”
“
— Hilary Parker, college essay
THE INCIDENT:
BROTHER LOSES HIS CD BINDER
ON A PLANE
”
“
— Hilary Parker, college essay
”
“
— Hilary Parker, college essay
TYPE A QUOTE HERE.
• Identify the incident, record what happened
• Change the system to make it hard for a person acting in
good faith to have the incident occur
• Use outside resources and best practices to inform
decisions
• Make trade-offs
BLAMEFUL PROCESS
”
“
— Mom or Dad
“ALEX, KEEP BETTER TRACK OF YOUR
CDS NEXT TIME!”
”
“
— Boss
“CHECK YOUR ANALYSIS MORE
CAREFULLY NEXT TIME!”
”
“
— John Allspaw
An engineer who thinks they’re going to be reprimanded is disincentivized to give the
details necessary to get an understanding of the mechanism, pathology, and
operation of the failure.
https://codeascraft.com/2012/05/22/blameless-postmortems/
You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
You change code but don’t re-execute it, so the results are out of date.
You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
A second analyst can’t understand your code.
Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Your data becomes corrupted, but you don’t notice.
You use a new dataset that renders your statistical methods invalid, but you don’t notice.
You make a mistake in your code.
You use inefficient code.
A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
Incident Happens
Blameless postmortem:
discuss system changes to prevent error
weigh cost of implementing these changes
Adopt new process
Executable analysis scripts You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
Defined dependencies An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date.
Version control (individual) You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
Code review A second analyst can’t understand your code.
Modular, tested code Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Assertive testing of data, Your data becomes corrupted, but you don’t notice.
assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice.
Code review You make a mistake in your code.
You use inefficient code.
Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
Issue tracking You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
Executable analysis scripts You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
Defined dependencies An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date.
Version control (individual) You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
Code review A second analyst can’t understand your code.
Modular, tested code Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Assertive testing of data, Your data becomes corrupted, but you don’t notice.
assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice.
Code review You make a mistake in your code.
You use inefficient code.
Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
Issue tracking You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
“Opinions”
EXECUTABLE ANALYSIS
SCRIPTS
You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
DEFINED
DEPENDENCIES
An external library you’re using is updated, and
you can’t recreate the results.
You change code but don’t re-run downstream
code, so the results aren’t consistently updated.
WATCHERS FOR CHANGED
DATA AND CODE
You change code but don’t re-execute it, so the
results are out of date.
MODULAR, TESTED
CODE
You can’t re-use logic in different parts of the
analysis.
You change the logic used in analysis, but only
in some places where it’s implemented.
Your code is not performing as expected, but
you don’t notice.
VERSION CONTROL
You update your analysis and can’t compare it
to previous results.
You can’t point to code changes that resulted in
a different analysis results.
Two analysts want to combine code but cannot.
ASSERTIVE TESTING OF DATA,
ASSUMPTIONS AND RESULTS
Your data becomes corrupted, but you don’t
notice.
You use a new dataset that renders your
statistical methods invalid, but you don’t notice.
CODE REVIEW
You make a mistake in your code.
You use inefficient code.
ISSUE TRACKING
You aren’t able to track and communicate
known next steps in your analysis.
Your collaborators can only make requests in
email or meetings, and they aren’t incorporated
into the project.
WHY DID YOU SPEND MOST OF
YOUR KEYNOTE TIME ON
PROCESS?
WITHOUT GOOD PROCESS,
OPINIONS CAN CREATE TOXIC
CULTURE
30+ YEARS OF BEING
OPINIONATED
”
“
— a friend at his birthday party
HILARY, I NEVER FIGHT WITH
ANYONE ELSE EXCEPT FOR YOU
”
“
–John Allspaw
A funny thing happens when engineers make mistakes and feel safe when giving
details about it: they are not only willing to be held accountable, they are also
enthusiastic in helping the rest of the company avoid the same error in the future.
They are, after all, the most expert in their own error. They ought to be heavily
involved in coming up with remediation items.
https://codeascraft.com/2012/05/22/blameless-postmortems/
”
“
–helpful (but misguided) coworker
“You should be using GitHub.”
”
“
“You should be using GitHub.”
“If this is a project where you want to compare past and future results, you might
want to use version control. GitHub is a good option, and RStudio integrates well with
it.”
–actually helpful coworker
”
“
“Excel is the worst.”
–helpful (but misguided) coworker
”
“
“Excel is the worst.”
“Excel does make it hard to create scripts. If you want to redo your analysis
frequently, writing a script will take time now but save you lots of time later.”
–actually helpful coworker
Shift from “I know better than you” to “lots of people
have run into this problem before, and here’s
software that makes a solution easy”
WHY
“OPINIONATED”?
”
“
— Stuart Eccles
Opinionated Software is a software product that believes a certain way of
approaching a business process is inherently better and provides software crafted
around that approach.
https://medium.com/@stueccles/the-rise-of-opinionated-software-ca1ba0140d5b#.by7aq2c3c
”
“
–David Heinemeier Hansson, creator of Ruby on Rails
It’s a strong disagreement with the conventional wisdom that everything should be
configurable, that the framework should be impartial and objective. In my mind,
that’s the same as saying that everything should be equally hard.
This is bowling. There are rules.
This is bowling. There are rules.
analysis
This is bowling. There are rules.
analysis development
DEFINE THE PROCESS:
ANALYSIS DEVELOPMENT
CREATE AND ENCOURAGE
A BLAMELESS CULTURE
FRAME DISCUSSIONS AROUND
INCIDENTS. DISCUSS OPINIONS,
NOT SOFTWARE CHOICES
ENCOURAGE SOFTWARE MAKERS
TO CREATE OPINIONATED
ANALYSIS SOFTWARE
THANKS!

More Related Content

Similar to Opinionated Analysis Development -- EARL SF Keynote

Practices and Tools for Better Software Testing
Practices and Tools for  Better Software TestingPractices and Tools for  Better Software Testing
Practices and Tools for Better Software Testing
Delft University of Technology
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
neelakandan2001kpm
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Sease
 
Helping Programmers Write Better Tests
Helping Programmers Write Better TestsHelping Programmers Write Better Tests
Helping Programmers Write Better Tests
Geoffrey Dunn
 
Mingle spot project
Mingle spot projectMingle spot project
Mingle spot project
saikrishnabachuwar
 
Mingle spot project
Mingle spot  project Mingle spot  project
Mingle spot project
saikrishnabachuwar
 
Code Smell and Refactoring
Code Smell and RefactoringCode Smell and Refactoring
Code Smell and Refactoring
kimsrung lov
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...
PVS-Studio
 
Code Review
Code ReviewCode Review
Code ReviewRavi Raj
 
STM-UNIT-1.pptx
STM-UNIT-1.pptxSTM-UNIT-1.pptx
STM-UNIT-1.pptx
nischal55
 
#Resource1
#Resource1#Resource1
#Resource1
Dragan Mihai
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
PVS-Studio
 
Static white box testing lecture 12
Static white box testing lecture 12Static white box testing lecture 12
Static white box testing lecture 12Abdul Basit
 
White box testing
White box testingWhite box testing
White box testing
Abdul Basit
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
PVS-Studio
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
Andrey Karpov
 
Stm unit1
Stm unit1Stm unit1
Stm unit1
Chaitanya Kn
 
Programming Fundamentals lecture 3
Programming Fundamentals lecture 3Programming Fundamentals lecture 3
Programming Fundamentals lecture 3
REHAN IJAZ
 
Problems of testing 64-bit applications
Problems of testing 64-bit applicationsProblems of testing 64-bit applications
Problems of testing 64-bit applications
PVS-Studio
 
Static analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal EngineStatic analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal Engine
PVS-Studio
 

Similar to Opinionated Analysis Development -- EARL SF Keynote (20)

Practices and Tools for Better Software Testing
Practices and Tools for  Better Software TestingPractices and Tools for  Better Software Testing
Practices and Tools for Better Software Testing
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
 
Helping Programmers Write Better Tests
Helping Programmers Write Better TestsHelping Programmers Write Better Tests
Helping Programmers Write Better Tests
 
Mingle spot project
Mingle spot projectMingle spot project
Mingle spot project
 
Mingle spot project
Mingle spot  project Mingle spot  project
Mingle spot project
 
Code Smell and Refactoring
Code Smell and RefactoringCode Smell and Refactoring
Code Smell and Refactoring
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...
 
Code Review
Code ReviewCode Review
Code Review
 
STM-UNIT-1.pptx
STM-UNIT-1.pptxSTM-UNIT-1.pptx
STM-UNIT-1.pptx
 
#Resource1
#Resource1#Resource1
#Resource1
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Static white box testing lecture 12
Static white box testing lecture 12Static white box testing lecture 12
Static white box testing lecture 12
 
White box testing
White box testingWhite box testing
White box testing
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Stm unit1
Stm unit1Stm unit1
Stm unit1
 
Programming Fundamentals lecture 3
Programming Fundamentals lecture 3Programming Fundamentals lecture 3
Programming Fundamentals lecture 3
 
Problems of testing 64-bit applications
Problems of testing 64-bit applicationsProblems of testing 64-bit applications
Problems of testing 64-bit applications
 
Static analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal EngineStatic analysis as part of the development process in Unreal Engine
Static analysis as part of the development process in Unreal Engine
 

More from Hilary Parker

WiDS Claremont 2022.pdf
WiDS Claremont 2022.pdfWiDS Claremont 2022.pdf
WiDS Claremont 2022.pdf
Hilary Parker
 
eCOTS 2020
eCOTS 2020eCOTS 2020
eCOTS 2020
Hilary Parker
 
rstudio::conf(2019L)
rstudio::conf(2019L)rstudio::conf(2019L)
rstudio::conf(2019L)
Hilary Parker
 
Using Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceUsing Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and Science
Hilary Parker
 
ICOTS 2018
ICOTS 2018ICOTS 2018
ICOTS 2018
Hilary Parker
 
Women in Analytics Conference, April 2018
Women in Analytics Conference, April 2018Women in Analytics Conference, April 2018
Women in Analytics Conference, April 2018
Hilary Parker
 
testdat: An R package for unit testing of tabular data
testdat: An R package for unit testing of tabular datatestdat: An R package for unit testing of tabular data
testdat: An R package for unit testing of tabular data
Hilary Parker
 

More from Hilary Parker (7)

WiDS Claremont 2022.pdf
WiDS Claremont 2022.pdfWiDS Claremont 2022.pdf
WiDS Claremont 2022.pdf
 
eCOTS 2020
eCOTS 2020eCOTS 2020
eCOTS 2020
 
rstudio::conf(2019L)
rstudio::conf(2019L)rstudio::conf(2019L)
rstudio::conf(2019L)
 
Using Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and ScienceUsing Data Effectively: Beyond Art and Science
Using Data Effectively: Beyond Art and Science
 
ICOTS 2018
ICOTS 2018ICOTS 2018
ICOTS 2018
 
Women in Analytics Conference, April 2018
Women in Analytics Conference, April 2018Women in Analytics Conference, April 2018
Women in Analytics Conference, April 2018
 
testdat: An R package for unit testing of tabular data
testdat: An R package for unit testing of tabular datatestdat: An R package for unit testing of tabular data
testdat: An R package for unit testing of tabular data
 

Recently uploaded

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Opinionated Analysis Development -- EARL SF Keynote

  • 2. Not So Standard Deviations
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
  • 10.
  • 12.
  • 14. • What models are you going to use? • What argument are you going to make? • How are you going to convince your audience of something?
  • 16. • What tools are you going to use to make the deliverable? • What technical and coding choices will you make for producing the deliverable?
  • 17. mastering the statistical tooling :: mastering a musical instrument
  • 18. mastering the statistical tooling :: mastering a musical instrument statistical theory :: musical theory
  • 19. mastering the statistical tooling :: mastering a musical instrument statistical theory :: musical theory full analysis :: piece of music
  • 20.
  • 21. disclaimer: Karl is extremely nice, helpful & non-judgmental
  • 23.
  • 24. ment
  • 25. WHY?
  • 26.
  • 27. don’t want to “limit creativity” embarrassed by personal process
  • 29. There are known, common problems when developing an analysis Making these problems easy to avoid frees up time for the creativity of developing the narrative
  • 30. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. You change code but don’t re-execute it, so the results are out of date. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. A second analyst can’t understand your code. Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice. You make a mistake in your code. You use inefficient code. A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  • 31. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. You change code but don’t re-execute it, so the results are out of date. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. A second analyst can’t understand your code. Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice. You make a mistake in your code. You use inefficient code. A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project. Reproducible Accurate Collaborative
  • 33.
  • 34. Switch from “blaming” a person to blaming the process that they used.
  • 35. The person did not want to make an error, and acted in a way that she thought wouldn’t create an error. The current system failed the person with good intentions
  • 36. AN EXAMPLE OF A SUCCESSFUL(ISH) POST-MORTEM
  • 37.
  • 38. ” “ — Hilary Parker, college essay
  • 39.
  • 40. THE INCIDENT: BROTHER LOSES HIS CD BINDER ON A PLANE
  • 41. ” “ — Hilary Parker, college essay
  • 42.
  • 43.
  • 44. ” “ — Hilary Parker, college essay TYPE A QUOTE HERE.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. • Identify the incident, record what happened • Change the system to make it hard for a person acting in good faith to have the incident occur • Use outside resources and best practices to inform decisions • Make trade-offs
  • 51. ” “ — Mom or Dad “ALEX, KEEP BETTER TRACK OF YOUR CDS NEXT TIME!”
  • 52. ” “ — Boss “CHECK YOUR ANALYSIS MORE CAREFULLY NEXT TIME!”
  • 53.
  • 54. ” “ — John Allspaw An engineer who thinks they’re going to be reprimanded is disincentivized to give the details necessary to get an understanding of the mechanism, pathology, and operation of the failure. https://codeascraft.com/2012/05/22/blameless-postmortems/
  • 55. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. You change code but don’t re-execute it, so the results are out of date. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. A second analyst can’t understand your code. Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice. You make a mistake in your code. You use inefficient code. A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  • 56. Incident Happens Blameless postmortem: discuss system changes to prevent error weigh cost of implementing these changes Adopt new process
  • 57. Executable analysis scripts You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. Defined dependencies An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date. Version control (individual) You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. Code review A second analyst can’t understand your code. Modular, tested code Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Assertive testing of data, Your data becomes corrupted, but you don’t notice. assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice. Code review You make a mistake in your code. You use inefficient code. Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. Issue tracking You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  • 58. Executable analysis scripts You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data. Defined dependencies An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated. Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date. Version control (individual) You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. Code review A second analyst can’t understand your code. Modular, tested code Can you re-use logic in different parts of the analysis? You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice. Assertive testing of data, Your data becomes corrupted, but you don’t notice. assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice. Code review You make a mistake in your code. You use inefficient code. Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so. Two analysts want to combine code but cannot. Issue tracking You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project. “Opinions”
  • 60. You re-run the analysis and get different results. Someone else can’t repeat the analysis. You can’t re-run the analysis on different data.
  • 62. An external library you’re using is updated, and you can’t recreate the results. You change code but don’t re-run downstream code, so the results aren’t consistently updated.
  • 64. You change code but don’t re-execute it, so the results are out of date.
  • 66. You can’t re-use logic in different parts of the analysis. You change the logic used in analysis, but only in some places where it’s implemented. Your code is not performing as expected, but you don’t notice.
  • 68. You update your analysis and can’t compare it to previous results. You can’t point to code changes that resulted in a different analysis results. Two analysts want to combine code but cannot.
  • 69. ASSERTIVE TESTING OF DATA, ASSUMPTIONS AND RESULTS
  • 70. Your data becomes corrupted, but you don’t notice. You use a new dataset that renders your statistical methods invalid, but you don’t notice.
  • 72. You make a mistake in your code. You use inefficient code.
  • 74. You aren’t able to track and communicate known next steps in your analysis. Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
  • 75.
  • 76. WHY DID YOU SPEND MOST OF YOUR KEYNOTE TIME ON PROCESS?
  • 77. WITHOUT GOOD PROCESS, OPINIONS CAN CREATE TOXIC CULTURE
  • 78. 30+ YEARS OF BEING OPINIONATED
  • 79. ” “ — a friend at his birthday party HILARY, I NEVER FIGHT WITH ANYONE ELSE EXCEPT FOR YOU
  • 80. ” “ –John Allspaw A funny thing happens when engineers make mistakes and feel safe when giving details about it: they are not only willing to be held accountable, they are also enthusiastic in helping the rest of the company avoid the same error in the future. They are, after all, the most expert in their own error. They ought to be heavily involved in coming up with remediation items. https://codeascraft.com/2012/05/22/blameless-postmortems/
  • 81.
  • 82.
  • 83.
  • 84. ” “ –helpful (but misguided) coworker “You should be using GitHub.”
  • 85. ” “ “You should be using GitHub.” “If this is a project where you want to compare past and future results, you might want to use version control. GitHub is a good option, and RStudio integrates well with it.” –actually helpful coworker
  • 86. ” “ “Excel is the worst.” –helpful (but misguided) coworker
  • 87. ” “ “Excel is the worst.” “Excel does make it hard to create scripts. If you want to redo your analysis frequently, writing a script will take time now but save you lots of time later.” –actually helpful coworker
  • 88. Shift from “I know better than you” to “lots of people have run into this problem before, and here’s software that makes a solution easy”
  • 90. ” “ — Stuart Eccles Opinionated Software is a software product that believes a certain way of approaching a business process is inherently better and provides software crafted around that approach. https://medium.com/@stueccles/the-rise-of-opinionated-software-ca1ba0140d5b#.by7aq2c3c
  • 91. ” “ –David Heinemeier Hansson, creator of Ruby on Rails It’s a strong disagreement with the conventional wisdom that everything should be configurable, that the framework should be impartial and objective. In my mind, that’s the same as saying that everything should be equally hard.
  • 92. This is bowling. There are rules.
  • 93. This is bowling. There are rules. analysis
  • 94. This is bowling. There are rules. analysis development
  • 96. CREATE AND ENCOURAGE A BLAMELESS CULTURE
  • 97. FRAME DISCUSSIONS AROUND INCIDENTS. DISCUSS OPINIONS, NOT SOFTWARE CHOICES
  • 98. ENCOURAGE SOFTWARE MAKERS TO CREATE OPINIONATED ANALYSIS SOFTWARE
  • 99.