SlideShare a Scribd company logo
1 of 46
Download to read offline
Scaling Analysis
Responsibly
Hilary Parker
@hspter
#rcatladies
Not So Standard Deviations
@keegsdur
“We just don’t have enough analysts!”
“Let’s scale by building the perfect BI tool!”
That sounds great!
We should
automate some
of the things
that are slowing
you down
PRODUCT
TEAM
DATA
http://xkcd.com/
That seems perfectly
reasonable!
Let’s just enlist
some folks from
engineering to
help you with it
DATAPRODUCT
TEAM
DATA ENG
Sure thing!
...and finally can
it add this last
graph?
several months pass…
ENG
Sure! File a ticket!
Can we add
these 132 extra
metrics to the
testing?
PRODUCT
TEAM
You can’t do that,
your family-wise
error rate will tend
to 1!!
ENG PRODUCT
TEAM
DATA
ENG
That’s a reasonable
expectation for an
internal product. I’m on
it!
I’d really like this
tool to be more
stable.
PRODUCT
TEAM
Our test violates a
subtle statistical
assumption for this
new application, and
we need to gut this
stable product!
ENG PRODUCT
TEAM
DATA
Almost impossible to avoid 2-against-1 dysfunction as
product teams become “self-service” with engineering
support
Invariably becomes a race to the bottom as internal
competition for the simplest tool emerges
Stability prioritized over flexibility
(In tech)
Building = Owning
Analysis Developer!
“Analysis Developer”
Someone on the analyst team who develops reproducible,
flexible analyses in R and helps all analysts scale their
work
I’ll work with the analysis
developer on my team!
We should
automate some
of the things
that are slowing
you down
PRODUCT
TEAM
DATA
Avoids common types of dysfunction
Allows for flexible, accurate analysis
Analysts acquire marketable skills!
Instead of creating dashboards or using static BI
tools...
http://dilbert.com/strip/2007-05-16
Series of R packages highly specified for business case,
“mix and match” elements to rapidly create common
reports.
library(“internal_package”)
Instead of “assembly line” data processing…
Close 2-way partnership with data engineers to optimize
the creation of datasets for certain common analyses.
The assembly line handoff from scientist to engineer creates [an
uncreative] environment. The trick is to create an environment
that allows for autonomy, ownership, and focus for everyone
involved. - Jeff Magnusson
http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
Instead of PM anxiously watching dashboards…
https://www.youtube.com/watch?v=CCbWyYr82BM
Analysts can create shorter-lived, reproducible reports
Expectation manage the shorter lifespan of the report, but
include that report will require less work from teams once
created
Productionize in the short-term with CRON jobs
Can add in more stats this way! Y/Y turns into
semiparametric models, etc.
“The Problem with
Dashboards (And A
Solution)” by
Stephanie Evergreen
http://stephanieevergreen.
com/problem-with-dashboards/
http://dilbert.com/strip/2004-04-05
Instead of promotion based on deliverables…
Consider skill acquisition for analyst promotion
For analysis developers, promoted based on whether or
not they were able to help other analysts become more
efficient
Support for skill acquisition!
Education support for learning better analysis
development methods for all analysts
Internally created resources
Instead of PMs self-teaching analysis based on what’s
presented in dashboarding tools..
https://xkcd.com/605/
PMs can use tools for education analysts if they want to
“ramp up” on analytical skills like R
This way you can bake in statistical education as well.
“Isn’t this just package development?”
“Isn’t this just package development?”
No!
Ad-hoc spreadsheet work
Ad-hoc spreadsheet work
+ scripting
Ad-hoc spreadsheet work
R workflows
+ scripting
Ad-hoc spreadsheet work
R workflows
+ scripting
+ reproducibility, some functions, “analysis testing”
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
+ scripting
+ reproducibility, some functions, “analysis testing”
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
+ industry-wide audience
- company-specific code and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
External package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
+ industry-wide audience
- company-specific code and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
External package development
+ reproducibility, some functions, “analysis testing”
+ scripting
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
+ industry-wide audience
- company-specific code and functions
Analysis Developer
Open-Source Developer
Analysis Developer
Stop trying to scale with static BI tools -- this will (almost)
always lead to dysfunction
Instead, scale by increasing analyst efficiency using R and
education!
Hire Analysis Developers to help with all this!
Thanks!
Hilary Parker
@hspter

More Related Content

What's hot

Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Anand Sampat
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningAnand Sampat
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceDaniel Greenfeld
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Using dataset versioning in data science
Using dataset versioning in data scienceUsing dataset versioning in data science
Using dataset versioning in data scienceVenkata Pingali
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Databricks
 
OSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be MissingOSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be MissingCoverity
 
Adopting Agile
Adopting AgileAdopting Agile
Adopting AgileCoverity
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis PrimerCoverity
 
Finding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCopFinding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCopCoverity
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARJonathan Lettvin
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondDatabricks
 
Resource Leaks in Java
Resource Leaks in JavaResource Leaks in Java
Resource Leaks in JavaCoverity
 
Bug prediction based on your code history
Bug prediction based on your code historyBug prediction based on your code history
Bug prediction based on your code historyAlexey Tokar
 
Web Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLWeb Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLRoy Derks
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobDatabricks
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 

What's hot (20)

Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big Commerce
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Using dataset versioning in data science
Using dataset versioning in data scienceUsing dataset versioning in data science
Using dataset versioning in data science
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
 
OSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be MissingOSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be Missing
 
Adopting Agile
Adopting AgileAdopting Agile
Adopting Agile
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis Primer
 
Finding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCopFinding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCop
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STAR
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
 
Resource Leaks in Java
Resource Leaks in JavaResource Leaks in Java
Resource Leaks in Java
 
Bug prediction based on your code history
Bug prediction based on your code historyBug prediction based on your code history
Bug prediction based on your code history
 
Web Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLWeb Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQL
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 

Viewers also liked

R for Everything
R for EverythingR for Everything
R for EverythingWork-Bench
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R ConsortiumWork-Bench
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at AirbnbWork-Bench
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social PenumbrasWork-Bench
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCWork-Bench
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesWork-Bench
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit DataWork-Bench
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesWork-Bench
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceWork-Bench
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and RWork-Bench
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionWork-Bench
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT GraphicsWork-Bench
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big DataWork-Bench
 

Viewers also liked (15)

R for Everything
R for EverythingR for Everything
R for Everything
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
The Feels
The FeelsThe Feels
The Feels
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
 

Similar to Scaling Analysis Responsibly with Analysis Developers

Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys_Partner
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
Sync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpnSync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpnLorenzo Barbieri
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slidesHima Challa
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides againHima Challa
 
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!Lorenzo Barbieri
 
Scalable code Design with slimmer Django models .. and more
Scalable code  Design with slimmer Django models .. and moreScalable code  Design with slimmer Django models .. and more
Scalable code Design with slimmer Django models .. and moreDawa Sherpa
 
Enterprise Data Science
Enterprise Data ScienceEnterprise Data Science
Enterprise Data ScienceMisha Lisovich
 
Ben ford intro
Ben ford introBen ford intro
Ben ford introPuppet
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordPuppet
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scaleLooker
 
Making operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathMaking operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathDevopsdays
 

Similar to Scaling Analysis Responsibly with Analysis Developers (20)

Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn Schepers
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Sync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpnSync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpn
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slides
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides again
 
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
 
Scalable code Design with slimmer Django models .. and more
Scalable code  Design with slimmer Django models .. and moreScalable code  Design with slimmer Django models .. and more
Scalable code Design with slimmer Django models .. and more
 
Enterprise Data Science
Enterprise Data ScienceEnterprise Data Science
Enterprise Data Science
 
Ben ford intro
Ben ford introBen ford intro
Ben ford intro
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben Ford
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Idea v9 product profile
Idea v9 product profileIdea v9 product profile
Idea v9 product profile
 
Making operations visible - Nick Gallbreath
Making operations visible - Nick GallbreathMaking operations visible - Nick Gallbreath
Making operations visible - Nick Gallbreath
 

More from Work-Bench

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise AlmanacWork-Bench
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersWork-Bench
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessWork-Bench
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedWork-Bench
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBWork-Bench
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseWork-Bench
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the EnterpriseWork-Bench
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long GameWork-Bench
 

More from Work-Bench (8)

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long Game
 

Recently uploaded

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

Scaling Analysis Responsibly with Analysis Developers

  • 2. #rcatladies Not So Standard Deviations @keegsdur
  • 3. “We just don’t have enough analysts!”
  • 4. “Let’s scale by building the perfect BI tool!”
  • 5. That sounds great! We should automate some of the things that are slowing you down PRODUCT TEAM DATA http://xkcd.com/
  • 6. That seems perfectly reasonable! Let’s just enlist some folks from engineering to help you with it DATAPRODUCT TEAM
  • 7. DATA ENG Sure thing! ...and finally can it add this last graph?
  • 9. ENG Sure! File a ticket! Can we add these 132 extra metrics to the testing? PRODUCT TEAM
  • 10. You can’t do that, your family-wise error rate will tend to 1!! ENG PRODUCT TEAM DATA
  • 11. ENG That’s a reasonable expectation for an internal product. I’m on it! I’d really like this tool to be more stable. PRODUCT TEAM
  • 12. Our test violates a subtle statistical assumption for this new application, and we need to gut this stable product! ENG PRODUCT TEAM DATA
  • 13. Almost impossible to avoid 2-against-1 dysfunction as product teams become “self-service” with engineering support Invariably becomes a race to the bottom as internal competition for the simplest tool emerges Stability prioritized over flexibility
  • 16. “Analysis Developer” Someone on the analyst team who develops reproducible, flexible analyses in R and helps all analysts scale their work
  • 17. I’ll work with the analysis developer on my team! We should automate some of the things that are slowing you down PRODUCT TEAM DATA
  • 18. Avoids common types of dysfunction Allows for flexible, accurate analysis Analysts acquire marketable skills!
  • 19. Instead of creating dashboards or using static BI tools... http://dilbert.com/strip/2007-05-16
  • 20. Series of R packages highly specified for business case, “mix and match” elements to rapidly create common reports. library(“internal_package”)
  • 21.
  • 22. Instead of “assembly line” data processing…
  • 23. Close 2-way partnership with data engineers to optimize the creation of datasets for certain common analyses. The assembly line handoff from scientist to engineer creates [an uncreative] environment. The trick is to create an environment that allows for autonomy, ownership, and focus for everyone involved. - Jeff Magnusson http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
  • 24. Instead of PM anxiously watching dashboards… https://www.youtube.com/watch?v=CCbWyYr82BM
  • 25. Analysts can create shorter-lived, reproducible reports
  • 26. Expectation manage the shorter lifespan of the report, but include that report will require less work from teams once created Productionize in the short-term with CRON jobs Can add in more stats this way! Y/Y turns into semiparametric models, etc.
  • 27. “The Problem with Dashboards (And A Solution)” by Stephanie Evergreen http://stephanieevergreen. com/problem-with-dashboards/
  • 29. Consider skill acquisition for analyst promotion For analysis developers, promoted based on whether or not they were able to help other analysts become more efficient Support for skill acquisition!
  • 30. Education support for learning better analysis development methods for all analysts Internally created resources
  • 31. Instead of PMs self-teaching analysis based on what’s presented in dashboarding tools.. https://xkcd.com/605/
  • 32. PMs can use tools for education analysts if they want to “ramp up” on analytical skills like R This way you can bake in statistical education as well.
  • 33. “Isn’t this just package development?”
  • 34. “Isn’t this just package development?” No!
  • 37. Ad-hoc spreadsheet work R workflows + scripting
  • 38. Ad-hoc spreadsheet work R workflows + scripting + reproducibility, some functions, “analysis testing”
  • 39. Ad-hoc spreadsheet work R workflows Reproducible R analyses + scripting + reproducibility, some functions, “analysis testing”
  • 40. Ad-hoc spreadsheet work R workflows Reproducible R analyses + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions
  • 41. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions
  • 42. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions + industry-wide audience - company-specific code and functions
  • 43. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development External package development + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions + industry-wide audience - company-specific code and functions
  • 44. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development External package development + reproducibility, some functions, “analysis testing” + scripting + workplace-wide audience, documentation, testing - problem-specific writeups and functions + industry-wide audience - company-specific code and functions Analysis Developer Open-Source Developer
  • 45. Analysis Developer Stop trying to scale with static BI tools -- this will (almost) always lead to dysfunction Instead, scale by increasing analyst efficiency using R and education! Hire Analysis Developers to help with all this!