SlideShare a Scribd company logo
Scaling Analysis
Responsibly
Hilary Parker
@hspter
#rcatladies
Not So Standard Deviations
@keegsdur
“We just don’t have enough analysts!”
“Let’s scale by building the perfect BI tool!”
That sounds great!
We should
automate some
of the things
that are slowing
you down
PRODUCT
TEAM
DATA
http://xkcd.com/
That seems perfectly
reasonable!
Let’s just enlist
some folks from
engineering to
help you with it
DATAPRODUCT
TEAM
DATA ENG
Sure thing!
...and finally can
it add this last
graph?
several months pass…
ENG
Sure! File a ticket!
Can we add
these 132 extra
metrics to the
testing?
PRODUCT
TEAM
You can’t do that,
your family-wise
error rate will tend
to 1!!
ENG PRODUCT
TEAM
DATA
ENG
That’s a reasonable
expectation for an
internal product. I’m on
it!
I’d really like this
tool to be more
stable.
PRODUCT
TEAM
Our test violates a
subtle statistical
assumption for this
new application, and
we need to gut this
stable product!
ENG PRODUCT
TEAM
DATA
Almost impossible to avoid 2-against-1 dysfunction as
product teams become “self-service” with engineering
support
Invariably becomes a race to the bottom as internal
competition for the simplest tool emerges
Stability prioritized over flexibility
(In tech)
Building = Owning
Analysis Developer!
“Analysis Developer”
Someone on the analyst team who develops reproducible,
flexible analyses in R and helps all analysts scale their
work
I’ll work with the analysis
developer on my team!
We should
automate some
of the things
that are slowing
you down
PRODUCT
TEAM
DATA
Avoids common types of dysfunction
Allows for flexible, accurate analysis
Analysts acquire marketable skills!
Instead of creating dashboards or using static BI
tools...
http://dilbert.com/strip/2007-05-16
Series of R packages highly specified for business case,
“mix and match” elements to rapidly create common
reports.
library(“internal_package”)
Instead of “assembly line” data processing…
Close 2-way partnership with data engineers to optimize
the creation of datasets for certain common analyses.
The assembly line handoff from scientist to engineer creates [an
uncreative] environment. The trick is to create an environment
that allows for autonomy, ownership, and focus for everyone
involved. - Jeff Magnusson
http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
Instead of PM anxiously watching dashboards…
https://www.youtube.com/watch?v=CCbWyYr82BM
Analysts can create shorter-lived, reproducible reports
Expectation manage the shorter lifespan of the report, but
include that report will require less work from teams once
created
Productionize in the short-term with CRON jobs
Can add in more stats this way! Y/Y turns into
semiparametric models, etc.
“The Problem with
Dashboards (And A
Solution)” by
Stephanie Evergreen
http://stephanieevergreen.
com/problem-with-dashboards/
http://dilbert.com/strip/2004-04-05
Instead of promotion based on deliverables…
Consider skill acquisition for analyst promotion
For analysis developers, promoted based on whether or
not they were able to help other analysts become more
efficient
Support for skill acquisition!
Education support for learning better analysis
development methods for all analysts
Internally created resources
Instead of PMs self-teaching analysis based on what’s
presented in dashboarding tools..
https://xkcd.com/605/
PMs can use tools for education analysts if they want to
“ramp up” on analytical skills like R
This way you can bake in statistical education as well.
“Isn’t this just package development?”
“Isn’t this just package development?”
No!
Ad-hoc spreadsheet work
Ad-hoc spreadsheet work
+ scripting
Ad-hoc spreadsheet work
R workflows
+ scripting
Ad-hoc spreadsheet work
R workflows
+ scripting
+ reproducibility, some functions, “analysis testing”
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
+ scripting
+ reproducibility, some functions, “analysis testing”
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
+ industry-wide audience
- company-specific code and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
External package development
+ scripting
+ reproducibility, some functions, “analysis testing”
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
+ industry-wide audience
- company-specific code and functions
Ad-hoc spreadsheet work
R workflows
Reproducible R analyses
Internal package development
External package development
+ reproducibility, some functions, “analysis testing”
+ scripting
+ workplace-wide audience, documentation, testing
- problem-specific writeups and functions
+ industry-wide audience
- company-specific code and functions
Analysis Developer
Open-Source Developer
Analysis Developer
Stop trying to scale with static BI tools -- this will (almost)
always lead to dysfunction
Instead, scale by increasing analyst efficiency using R and
education!
Hire Analysis Developers to help with all this!
Thanks!
Hilary Parker
@hspter

More Related Content

What's hot

Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Anand Sampat
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningAnand Sampat
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceDaniel Greenfeld
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Using dataset versioning in data science
Using dataset versioning in data scienceUsing dataset versioning in data science
Using dataset versioning in data scienceVenkata Pingali
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Databricks
 
OSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be MissingOSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be MissingCoverity
 
Adopting Agile
Adopting AgileAdopting Agile
Adopting AgileCoverity
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis PrimerCoverity
 
Finding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCopFinding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCopCoverity
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARJonathan Lettvin
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondDatabricks
 
Resource Leaks in Java
Resource Leaks in JavaResource Leaks in Java
Resource Leaks in JavaCoverity
 
Bug prediction based on your code history
Bug prediction based on your code historyBug prediction based on your code history
Bug prediction based on your code historyAlexey Tokar
 
Web Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLWeb Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLRoy Derks
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobDatabricks
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 

What's hot (20)

Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big Commerce
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Using dataset versioning in data science
Using dataset versioning in data scienceUsing dataset versioning in data science
Using dataset versioning in data science
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
 
OSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be MissingOSS Java Analysis - What You Might Be Missing
OSS Java Analysis - What You Might Be Missing
 
Adopting Agile
Adopting AgileAdopting Agile
Adopting Agile
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis Primer
 
Finding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCopFinding Defects in C#: Coverity vs. FxCop
Finding Defects in C#: Coverity vs. FxCop
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STAR
 
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and BeyondGetting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
Getting Started Contributing to Apache Spark – From PR, CR, JIRA, and Beyond
 
Resource Leaks in Java
Resource Leaks in JavaResource Leaks in Java
Resource Leaks in Java
 
Bug prediction based on your code history
Bug prediction based on your code historyBug prediction based on your code history
Bug prediction based on your code history
 
Web Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQLWeb Applications of the Future with TypeScript and GraphQL
Web Applications of the Future with TypeScript and GraphQL
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the JobStream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 

Viewers also liked

R for Everything
R for EverythingR for Everything
R for EverythingWork-Bench
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R ConsortiumWork-Bench
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at AirbnbWork-Bench
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social PenumbrasWork-Bench
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCWork-Bench
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesWork-Bench
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit DataWork-Bench
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesWork-Bench
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceWork-Bench
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and RWork-Bench
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionWork-Bench
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT GraphicsWork-Bench
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big DataWork-Bench
 

Viewers also liked (15)

R for Everything
R for EverythingR for Everything
R for Everything
 
Inside the R Consortium
Inside the R ConsortiumInside the R Consortium
Inside the R Consortium
 
Scaling Data Science at Airbnb
Scaling Data Science at AirbnbScaling Data Science at Airbnb
Scaling Data Science at Airbnb
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
The Political Impact of Social Penumbras
The Political Impact of Social PenumbrasThe Political Impact of Social Penumbras
The Political Impact of Social Penumbras
 
Reflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYCReflection on the Data Science Profession in NYC
Reflection on the Data Science Profession in NYC
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
The Feels
The FeelsThe Feels
The Feels
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and RImproving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
 
Iterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament editionIterating over statistical models: NCAA tournament edition
Iterating over statistical models: NCAA tournament edition
 
Using R at NYT Graphics
Using R at NYT GraphicsUsing R at NYT Graphics
Using R at NYT Graphics
 
Thinking Small About Big Data
Thinking Small About Big DataThinking Small About Big Data
Thinking Small About Big Data
 

Similar to Scaling Analysis Responsibly

Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys_Partner
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
Sync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpnSync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpnLorenzo Barbieri
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides againHima Challa
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slidesHima Challa
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chentechweb08
 
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!Lorenzo Barbieri
 
Scalable code Design with slimmer Django models .. and more
Scalable code  Design with slimmer Django models .. and moreScalable code  Design with slimmer Django models .. and more
Scalable code Design with slimmer Django models .. and moreDawa Sherpa
 
Enterprise Data Science
Enterprise Data ScienceEnterprise Data Science
Enterprise Data ScienceMisha Lisovich
 
Ben ford intro
Ben ford introBen ford intro
Ben ford introPuppet
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordPuppet
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scaleLooker
 

Similar to Scaling Analysis Responsibly (20)

Neotys PAC - Stijn Schepers
Neotys PAC - Stijn SchepersNeotys PAC - Stijn Schepers
Neotys PAC - Stijn Schepers
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Sync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpnSync Workitems between multiple Team Projects #vssatpn
Sync Workitems between multiple Team Projects #vssatpn
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfOSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides again
Chen's second test slides againChen's second test slides again
Chen's second test slides again
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Chen's second test slides
Chen's second test slidesChen's second test slides
Chen's second test slides
 
A simple test paper from Chen
A simple test paper from ChenA simple test paper from Chen
A simple test paper from Chen
 
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
Azure DevOps Realtime Work Item Sync: the good, the bad, the ugly!
 
Scalable code Design with slimmer Django models .. and more
Scalable code  Design with slimmer Django models .. and moreScalable code  Design with slimmer Django models .. and more
Scalable code Design with slimmer Django models .. and more
 
Enterprise Data Science
Enterprise Data ScienceEnterprise Data Science
Enterprise Data Science
 
Ben ford intro
Ben ford introBen ford intro
Ben ford intro
 
Telemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben FordTelemetry doesn't have to be scary; Ben Ford
Telemetry doesn't have to be scary; Ben Ford
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Idea v9 product profile
Idea v9 product profileIdea v9 product profile
Idea v9 product profile
 

More from Work-Bench

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise AlmanacWork-Bench
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersWork-Bench
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessWork-Bench
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedWork-Bench
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBWork-Bench
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseWork-Bench
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the EnterpriseWork-Bench
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long GameWork-Bench
 

More from Work-Bench (8)

2017 Enterprise Almanac
2017 Enterprise Almanac2017 Enterprise Almanac
2017 Enterprise Almanac
 
AI to Enable Next Generation of People Managers
AI to Enable Next Generation of People ManagersAI to Enable Next Generation of People Managers
AI to Enable Next Generation of People Managers
 
Startup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview ProcessStartup Recruiting Workbook: Sourcing and Interview Process
Startup Recruiting Workbook: Sourcing and Interview Process
 
Cloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions ComparedCloud Native Infrastructure Management Solutions Compared
Cloud Native Infrastructure Management Solutions Compared
 
Building a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDBBuilding a Demand Generation Machine at MongoDB
Building a Demand Generation Machine at MongoDB
 
How to Market Your Startup to the Enterprise
How to Market Your Startup to the EnterpriseHow to Market Your Startup to the Enterprise
How to Market Your Startup to the Enterprise
 
Marketing & Design for the Enterprise
Marketing & Design for the EnterpriseMarketing & Design for the Enterprise
Marketing & Design for the Enterprise
 
Playing the Marketing Long Game
Playing the Marketing Long GamePlaying the Marketing Long Game
Playing the Marketing Long Game
 

Recently uploaded

AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 

Recently uploaded (20)

AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 

Scaling Analysis Responsibly

  • 2. #rcatladies Not So Standard Deviations @keegsdur
  • 3. “We just don’t have enough analysts!”
  • 4. “Let’s scale by building the perfect BI tool!”
  • 5. That sounds great! We should automate some of the things that are slowing you down PRODUCT TEAM DATA http://xkcd.com/
  • 6. That seems perfectly reasonable! Let’s just enlist some folks from engineering to help you with it DATAPRODUCT TEAM
  • 7. DATA ENG Sure thing! ...and finally can it add this last graph?
  • 9. ENG Sure! File a ticket! Can we add these 132 extra metrics to the testing? PRODUCT TEAM
  • 10. You can’t do that, your family-wise error rate will tend to 1!! ENG PRODUCT TEAM DATA
  • 11. ENG That’s a reasonable expectation for an internal product. I’m on it! I’d really like this tool to be more stable. PRODUCT TEAM
  • 12. Our test violates a subtle statistical assumption for this new application, and we need to gut this stable product! ENG PRODUCT TEAM DATA
  • 13. Almost impossible to avoid 2-against-1 dysfunction as product teams become “self-service” with engineering support Invariably becomes a race to the bottom as internal competition for the simplest tool emerges Stability prioritized over flexibility
  • 16. “Analysis Developer” Someone on the analyst team who develops reproducible, flexible analyses in R and helps all analysts scale their work
  • 17. I’ll work with the analysis developer on my team! We should automate some of the things that are slowing you down PRODUCT TEAM DATA
  • 18. Avoids common types of dysfunction Allows for flexible, accurate analysis Analysts acquire marketable skills!
  • 19. Instead of creating dashboards or using static BI tools... http://dilbert.com/strip/2007-05-16
  • 20. Series of R packages highly specified for business case, “mix and match” elements to rapidly create common reports. library(“internal_package”)
  • 21.
  • 22. Instead of “assembly line” data processing…
  • 23. Close 2-way partnership with data engineers to optimize the creation of datasets for certain common analyses. The assembly line handoff from scientist to engineer creates [an uncreative] environment. The trick is to create an environment that allows for autonomy, ownership, and focus for everyone involved. - Jeff Magnusson http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
  • 24. Instead of PM anxiously watching dashboards… https://www.youtube.com/watch?v=CCbWyYr82BM
  • 25. Analysts can create shorter-lived, reproducible reports
  • 26. Expectation manage the shorter lifespan of the report, but include that report will require less work from teams once created Productionize in the short-term with CRON jobs Can add in more stats this way! Y/Y turns into semiparametric models, etc.
  • 27. “The Problem with Dashboards (And A Solution)” by Stephanie Evergreen http://stephanieevergreen. com/problem-with-dashboards/
  • 29. Consider skill acquisition for analyst promotion For analysis developers, promoted based on whether or not they were able to help other analysts become more efficient Support for skill acquisition!
  • 30. Education support for learning better analysis development methods for all analysts Internally created resources
  • 31. Instead of PMs self-teaching analysis based on what’s presented in dashboarding tools.. https://xkcd.com/605/
  • 32. PMs can use tools for education analysts if they want to “ramp up” on analytical skills like R This way you can bake in statistical education as well.
  • 33. “Isn’t this just package development?”
  • 34. “Isn’t this just package development?” No!
  • 37. Ad-hoc spreadsheet work R workflows + scripting
  • 38. Ad-hoc spreadsheet work R workflows + scripting + reproducibility, some functions, “analysis testing”
  • 39. Ad-hoc spreadsheet work R workflows Reproducible R analyses + scripting + reproducibility, some functions, “analysis testing”
  • 40. Ad-hoc spreadsheet work R workflows Reproducible R analyses + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions
  • 41. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions
  • 42. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions + industry-wide audience - company-specific code and functions
  • 43. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development External package development + scripting + reproducibility, some functions, “analysis testing” + workplace-wide audience, documentation, testing - problem-specific writeups and functions + industry-wide audience - company-specific code and functions
  • 44. Ad-hoc spreadsheet work R workflows Reproducible R analyses Internal package development External package development + reproducibility, some functions, “analysis testing” + scripting + workplace-wide audience, documentation, testing - problem-specific writeups and functions + industry-wide audience - company-specific code and functions Analysis Developer Open-Source Developer
  • 45. Analysis Developer Stop trying to scale with static BI tools -- this will (almost) always lead to dysfunction Instead, scale by increasing analyst efficiency using R and education! Hire Analysis Developers to help with all this!