data competitive
Dr June Andrews
March 26, 2019
Delphi Data
outline
Data X
Data Informed
Data Enabled
Data Driven
Data Competitive
Predicting Data ROI Challenges
Metrics
Repetition in DS
Replication
Systematic Building with ML
Uncanny Valley of ML
1
data x
data informed use case
Data is an input into a decision system based on expert knowledge.
Task could be done without cloud based data; it would just take longer.
Process map for operations of a plant. [Siemens]
3
data informed requirements
Checklist:
∙ Existing application already in production
∙ Known quantity to measure and how to measure it
∙ Ability to stream measurement to people at the right time
∙ Known policy on how to react to different measured values
4
data informed roi
Low Variance on Expected ROI
Predictable costs and benefits.
Cost:
∙ Infrastructure to stream measurements
∙ Portal to view measurements
∙ Data storage for compliance
Benefit:
∙ Saves time and manpower over manually collecting measurements
∙ Reliable auditing of historical values
∙ Centralized monitoring
5
data enabled use case
Data enables a product that can be designed with only product knowledge.
Pfizer converts from Data Informed to Data Enabled [HBS 2015]
6
data enabled requirements
Checklist:
∙ Product vision of what needs to be built.
∙ Clear specs on what data is needed.
∙ Clear specs on how to use the data.
∙ Reliable serving and maintenance of the data.
7
data enabled roi
Conditional Impact on Expected ROI
If data enables the use case, then traditional product ROI.
Cost:
∙ All costs of Data Informed (infrastructure, serving & storage)
∙ new Data acquisition
∙ new Data processing (may require ETL & some ML)
∙ new Higher personnel costs given the above.
Benefit:
∙ New product opportunities.
∙ Better customization in a global marketplace.
8
data driven use case
Given goals & constraints of company, data decides the outcome.
Netflix has a budget of $15B in 2019
for Content Creation spent with a
mix of traditional & data driven
approaches.
9
data driven requirements
Checklist:
∙ Clear goals & constraints
∙ Data for mining
∙ Infrastructure for data exploration
∙ People
∙ A conduit for results to be acted on
10
data driven roi
Uncertain ROI
Similar to R&D costs and benefits.
Expensive to start & hard to predict.
Cost:
∙ All costs of data enabled
∙ new Additional infrastructure & personnel for exploration
∙ new Discovery rate of attempts to findings
Benefit:
∙ Make rapid decisions without becoming an expert
∙ Create products that couldn’t have existed otherwise
∙ Optimize product market fit
11
data driven doesn’t always pay off
New is always interesting,
not always better.
Forbes & HBS have many
articles on the failures of
companies to benefit from their
attempts to become data driven.
12
data competitive use case
Data is used systematically over the course of years to optimize the com-
petitive advantage components of a company.
Combination of data informed (Like Button), data enabled (Messages), data
driven (news feed)
13
data competitive requirements
Checklist:
∙ No checklist is going to work - need to evolve in the face of
changing markets and technological abilities.
14
new mind set for creating data competitive company
Goal should be to have enough system support for everything to work,
and then heavy investment in what contributes to competitive
advantage.
15
components of a data competitive company
Traditional areas of a company
∙ People
∙ Vision
∙ Strategy
∙ Market
∙ Product
∙ Communication
Then Data
Treat data as an additional component of a functioning company
that needs to integrate with existing components.
16
investments in data can then be focused on what matters
Usain Bolt’s taste buds work at the level of guaranteeing he can eat
enough safe food for the rest of his body. He does not have the best
sense of taste in the world.
17
data competitive roi
Start with Low ROI, then iterate.
Learn from what has worked to predict what will work & adjust.
Cost:
∙ Combination of {Data Informed, Data Enabled, Data Driven}
∙ Continuous integration of data into all aspects of the company
Benefit:
∙ Ability to adjust to market demands
∙ Flexibility to integrate latest advantages of AI/ML/DS
18
predicting data roi challenges
predicting data roi challenges
Predicting Data ROI Challenges
Metrics
Repetition in DS
Replication
Systematic Building with ML
Uncanny Valley of ML
20
challenge - metric design
Metrics are the translation layer
between what we want and
what we tell the machines we
want.
21
example - fire ring model
Simple case - ask ML to sign up as many users as possible.
Fire Ring model in network effects demonstrates how the number of
users on a site can exponentially expand and then quickly disappear.
22
solution - metric design principles
23
challenge - longevity of data science results
Analysis Tools People
Funnel Analysis Logging Recruiting
User Targeting Data Quality Organization
Forecasting Experiments Data Literacy
Opportunity Sizing Metrics & Dashboarding Ladders
Spam Minimization Productionizing Models Decision Processes
Deep Dives Central Repositories
Sample of Common Data Science Focuses
24
solution - repetition in data science
Roughly every 2 years work in these areas will be redone for
∙ Upgrades
∙ Changing Landscapes
∙ Ownership Bias ’Not Analyzed by Me’
∙ Forgetfulness
Solutions
∙ Bow on Top time dedicated at the end of every project
∙ Document Tribal History
∙ Invite previous employees back for reviews
25
challenge - trusting data science
26
27
28
29
30
31
32
33
solution - trusting data science
Bottom line
It matters which data scientist does an analysis
Solutions
∙ Keep track of data scientist’s success rates
∙ Identify skill gaps and train folks
∙ Identify critical and chaotic decisions - have multiple data
experts produce solutions
34
challenge - systematic innovation
35
36
37
38
39
40
41
42
challenge - uncanny valley of ai
Progression of break through CGI characters in movies Toy Story (1995),
Final Fantasy Spirits Within (2001), and Terminator Genisys (2015)
43
ideal outcome of ml
Ideally want to design systems that are only improved with ML
44
finding the uncanny valley of ml
Mechanical Turk UI for labeling the number of coffee mugs in an image.
Note, adding the extra question of ‘Is the Suggestion Correct’ and the
phrase ‘If NOT correct’ was necessary. Without those UI components,
Mechanical Turk workers were in auto-pilot of labeling images and
ignored the suggested number of coffee mugs, resulting in the same
accuracy regardless of the ML Accuracy. Additionally, workers completing
fewer than 3 labels were filtered out as noise.
45
uncanny valley of ml
46
conclusion
Data Science can be systematic, principled, and foundational.
First it must take it’s own advice & measure what it wants to improve.
After looking in the mirror, iterate, improve & compete.
47

Data Competitive

  • 1.
    data competitive Dr JuneAndrews March 26, 2019 Delphi Data
  • 2.
    outline Data X Data Informed DataEnabled Data Driven Data Competitive Predicting Data ROI Challenges Metrics Repetition in DS Replication Systematic Building with ML Uncanny Valley of ML 1
  • 3.
  • 4.
    data informed usecase Data is an input into a decision system based on expert knowledge. Task could be done without cloud based data; it would just take longer. Process map for operations of a plant. [Siemens] 3
  • 5.
    data informed requirements Checklist: ∙Existing application already in production ∙ Known quantity to measure and how to measure it ∙ Ability to stream measurement to people at the right time ∙ Known policy on how to react to different measured values 4
  • 6.
    data informed roi LowVariance on Expected ROI Predictable costs and benefits. Cost: ∙ Infrastructure to stream measurements ∙ Portal to view measurements ∙ Data storage for compliance Benefit: ∙ Saves time and manpower over manually collecting measurements ∙ Reliable auditing of historical values ∙ Centralized monitoring 5
  • 7.
    data enabled usecase Data enables a product that can be designed with only product knowledge. Pfizer converts from Data Informed to Data Enabled [HBS 2015] 6
  • 8.
    data enabled requirements Checklist: ∙Product vision of what needs to be built. ∙ Clear specs on what data is needed. ∙ Clear specs on how to use the data. ∙ Reliable serving and maintenance of the data. 7
  • 9.
    data enabled roi ConditionalImpact on Expected ROI If data enables the use case, then traditional product ROI. Cost: ∙ All costs of Data Informed (infrastructure, serving & storage) ∙ new Data acquisition ∙ new Data processing (may require ETL & some ML) ∙ new Higher personnel costs given the above. Benefit: ∙ New product opportunities. ∙ Better customization in a global marketplace. 8
  • 10.
    data driven usecase Given goals & constraints of company, data decides the outcome. Netflix has a budget of $15B in 2019 for Content Creation spent with a mix of traditional & data driven approaches. 9
  • 11.
    data driven requirements Checklist: ∙Clear goals & constraints ∙ Data for mining ∙ Infrastructure for data exploration ∙ People ∙ A conduit for results to be acted on 10
  • 12.
    data driven roi UncertainROI Similar to R&D costs and benefits. Expensive to start & hard to predict. Cost: ∙ All costs of data enabled ∙ new Additional infrastructure & personnel for exploration ∙ new Discovery rate of attempts to findings Benefit: ∙ Make rapid decisions without becoming an expert ∙ Create products that couldn’t have existed otherwise ∙ Optimize product market fit 11
  • 13.
    data driven doesn’talways pay off New is always interesting, not always better. Forbes & HBS have many articles on the failures of companies to benefit from their attempts to become data driven. 12
  • 14.
    data competitive usecase Data is used systematically over the course of years to optimize the com- petitive advantage components of a company. Combination of data informed (Like Button), data enabled (Messages), data driven (news feed) 13
  • 15.
    data competitive requirements Checklist: ∙No checklist is going to work - need to evolve in the face of changing markets and technological abilities. 14
  • 16.
    new mind setfor creating data competitive company Goal should be to have enough system support for everything to work, and then heavy investment in what contributes to competitive advantage. 15
  • 17.
    components of adata competitive company Traditional areas of a company ∙ People ∙ Vision ∙ Strategy ∙ Market ∙ Product ∙ Communication Then Data Treat data as an additional component of a functioning company that needs to integrate with existing components. 16
  • 18.
    investments in datacan then be focused on what matters Usain Bolt’s taste buds work at the level of guaranteeing he can eat enough safe food for the rest of his body. He does not have the best sense of taste in the world. 17
  • 19.
    data competitive roi Startwith Low ROI, then iterate. Learn from what has worked to predict what will work & adjust. Cost: ∙ Combination of {Data Informed, Data Enabled, Data Driven} ∙ Continuous integration of data into all aspects of the company Benefit: ∙ Ability to adjust to market demands ∙ Flexibility to integrate latest advantages of AI/ML/DS 18
  • 20.
  • 21.
    predicting data roichallenges Predicting Data ROI Challenges Metrics Repetition in DS Replication Systematic Building with ML Uncanny Valley of ML 20
  • 22.
    challenge - metricdesign Metrics are the translation layer between what we want and what we tell the machines we want. 21
  • 23.
    example - firering model Simple case - ask ML to sign up as many users as possible. Fire Ring model in network effects demonstrates how the number of users on a site can exponentially expand and then quickly disappear. 22
  • 24.
    solution - metricdesign principles 23
  • 25.
    challenge - longevityof data science results Analysis Tools People Funnel Analysis Logging Recruiting User Targeting Data Quality Organization Forecasting Experiments Data Literacy Opportunity Sizing Metrics & Dashboarding Ladders Spam Minimization Productionizing Models Decision Processes Deep Dives Central Repositories Sample of Common Data Science Focuses 24
  • 26.
    solution - repetitionin data science Roughly every 2 years work in these areas will be redone for ∙ Upgrades ∙ Changing Landscapes ∙ Ownership Bias ’Not Analyzed by Me’ ∙ Forgetfulness Solutions ∙ Bow on Top time dedicated at the end of every project ∙ Document Tribal History ∙ Invite previous employees back for reviews 25
  • 27.
    challenge - trustingdata science 26
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    solution - trustingdata science Bottom line It matters which data scientist does an analysis Solutions ∙ Keep track of data scientist’s success rates ∙ Identify skill gaps and train folks ∙ Identify critical and chaotic decisions - have multiple data experts produce solutions 34
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    challenge - uncannyvalley of ai Progression of break through CGI characters in movies Toy Story (1995), Final Fantasy Spirits Within (2001), and Terminator Genisys (2015) 43
  • 45.
    ideal outcome ofml Ideally want to design systems that are only improved with ML 44
  • 46.
    finding the uncannyvalley of ml Mechanical Turk UI for labeling the number of coffee mugs in an image. Note, adding the extra question of ‘Is the Suggestion Correct’ and the phrase ‘If NOT correct’ was necessary. Without those UI components, Mechanical Turk workers were in auto-pilot of labeling images and ignored the suggested number of coffee mugs, resulting in the same accuracy regardless of the ML Accuracy. Additionally, workers completing fewer than 3 labels were filtered out as noise. 45
  • 47.
  • 48.
    conclusion Data Science canbe systematic, principled, and foundational. First it must take it’s own advice & measure what it wants to improve. After looking in the mirror, iterate, improve & compete. 47