calculation | consulting
data science leadership
(TM)
c|c
(TM)
charles@calculationconsulting.com
calculation|consulting
Data Science Leadership
(TM)
charles@caclulationconsulting.com
calculation | consulting data science leadership
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago, Chemical Physics
NSF Fellow in Theoretical Chemistry
Over 10 years experience in applied Machine Learning
Developed ML algos for Demand Media; the first $1B IPO since Google
Lean Start Ups: Aardvark (acquired by Google), eHow,
Wall Street: GLG, BGI, BlackRock
Fortune 500: Roche, France Telecom
Tech / Retail: GoDaddy, eBay, Walmart
Investment: Griffin Advisors, Page Family Offices

www.calculationconsulting.com
charles@calculationconsulting.com
(TM)
3
Recent AI News: Epic Systems
When machine learning FAILS
c|c
(TM)
calculation | consulting data science leadership
(TM)
4
Recent AI News: Epic Systems
c|c
(TM)
calculation | consulting data science leadership
(TM)
5
“[The] definition of sepsis based on
billing codes alone is imprecise and not
the one that is clinically meaningful to a
health system or to patients.”
When machine learning FAILS
Recent AI News: Zillow
When machine learning (or AI) FAILS
c|c
(TM)
calculation | consulting data science leadership
(TM)
6
Recent AI News: Zillow
When machine learning (or AI) FAILS
c|c
(TM)
calculation | consulting data science leadership
(TM)
7
Data Science is Different
c|c
(TM)
calculation | consulting data science leadership
(TM)
8
Data Science Leadership : Becoming Data-Led
c|c
(TM)
(TM)
calculation | consulting data science leadership
9
1. Data Informed: OperationalVisibility
2. Data Driven: Tooling and Insights for Growth
3. Data Led Automation and Innovation
creating the data-led organization
Data Science Leadership : 4 Steps to Leading
c|c
(TM)
(TM)
calculation | consulting data science leadership
10
• Strategy: How can you leverage data ?
• Stage: How mature is your data ?
• Team: What team do we need ?
• Tools: What tools do they need ?
creating the data-led organization
Strategy:Algo Gas Station Analogy
Problem: where to open a gas station ?
Need: good traffic, weak competition
c|c
(TM)
less competitors
no traffic
sweet spot
great traffic
too many competitors
calculation | consulting data science leadership
ML algorithms can predict supply and demand
(TM)
11
Strategy: Data Science Process
• Acquire Domain Knowledge
• Formulate Hypothesis
• Generate Model(s) from the Data
• Predict Revenue Gains
• Backtest Predictions on your Data
• A/B Test in Production
• Attribute Gains to Model(s)
c|c
(TM)
(TM)
acting
solving
framing
calculation | consulting data science leadership
12
c|c
(TM)
• Systems Thinking: leveraging the inter-relationships
between data, marketing, and the customer
• Knowledge Transfer: mentoring — not training — to
develop both personal mastery and team learning
• Mental Models: create a base of small-scale models for
thinking about how to use your data
• Knowledge Sharing: foster collaboration between
research, engineering, and product to drive revenue
Strategy: Learning from Data
calculation | consulting data science leadership
(TM)
13
c|c
(TM)
• Cross-functional engineering, product, marketing, finance
• Autonomous: separate from the traditional engineering
product lifecycle. self-organizing and self-managing
• Experimental: form hypothesis, analyze data, make
predictions, run backtests, A/B testing
• Self-sustaining: not a cost center; generates revenue
(TM)
calculation | consulting data science leadership
14
Strategy: Data Science is not IT
c|c
(TM)
(TM)
Problem: Externalities
calculation | consulting data science leadership
15
external factors can change
(TM)
c|c
(TM)
Data is only is as accurate as it’s original intent demanded
calculation | consulting data science leadership
16
Stage: Your Data Maturity
• Where is your data ? Transaction Database? Web Logs ?
3rd party system ? Data Lake ?
• What product does it service ? Billing ? CRM ?
• Can you access it ? Security ? Regulations ?
• Who owns it ? Responsible for quality ?
Problem: Data Quality Mismatch
(TM)
c|c
(TM)
Data is only is as accurate as it’s original intent demanded
calculation | consulting data science leadership
17
?
Problem: Data Quality Mismatch
(TM)
c|c
(TM)
Data is only is as accurate as it’s original intent demanded
calculation | consulting data science leadership
18
Recommender System
Problem: Data Quality Mismatch
(TM)
c|c
(TM)
Data is only is as accurate as it’s original intent demanded
calculation | consulting data science leadership
19
Recommender System
Quality of product metadata
May not materially impact billing
x
? wrong
missing
(TM)
c|c
(TM)
“Only the paranoid survive” Andy Grove (Intel)
calculation | consulting data science leadership
20
Recommender System
Solution: Be Paranoid and Test Everything
(TM)
c|c
(TM)
“Only the paranoid survive” Andy Grove (Intel)
calculation | consulting data science leadership
21
Recommender System
Solution: Test Everything
Software engineers can be paranoid about programming.
In fact, Paranoid Programming is a thing.
You have to be paranoid about your data.
Thing is, bad code can usually be fixed.
But bad data has usually has to be thrown away
(TM)
c|c
(TM)
calculation | consulting data science leadership
22
Recommender System
Problem: Data Contraband
(TM)
c|c
(TM)
data 'from a friend’ that may violate compliance
calculation | consulting data science leadership
23
Recommender System
Data pulled into
spreadsheet / csv
Data actually stored in DB Data passed around
by email, etc
(TM)
c|c
(TM)
calculation | consulting data science leadership
24
Recommender System
Google Sheets, SAP, etc
(where you can track everything)
Move functions to the data
(stored procedures, Spark, etc)
Jira, GitHub, Confluence, .
Document tracking systems
Solutions: Data Contraband
Team: Data Scientists are Different
c|c
(TM)
calculation | consulting data science leadership
(TM)
25
not all techies are the same
Team: Data Scientists are Different
c|c
(TM)
calculation | consulting data science leadership
theoretical physics
machine learning / AI specialist
(TM)
26
applied physics
data scientist
engineer
software, browser tech, dev ops, …
not all techies are the same
Team: Data Scientists are Different
c|c
(TM)
calculation | consulting data science leadership
Data science group. Can be very isolated.
Very research-y & difficult to productionalize
(TM)
27
Embedded data scientist, solves problems
builds solutions, and deploys them
Software and IT services
Great at managing code and systems
Not great with data, math — or ambiguity
not all techies are the same
FANNG Managers: Fallen Gods
c|c
(TM)
(TM)
calculation | consulting data science leadership
28
the Earth is flat and they fallen off
FANNG Managers: Fallen Gods
c|c
(TM)
(TM)
calculation | consulting data science leadership
29
FAANG infrastructure is 10-20 years ahead
FANNG Managers: Fallen Gods
c|c
(TM)
(TM)
calculation | consulting data science leadership
30
you need infrastructure to deliver data products
Data Strategy : Think like a Beginner
c|c
(TM)
(TM)
calculation | consulting data science leadership
31
cultivate a beginner’s mind
- Test your assumptions. Literally
- Look for problems early on. And never stop looking
- Distinguish between statistical structural outliers.
- Repair your data, if possible.
- Start with simple, robust methods.
- Sophisticated models are more sensitive to
errors
- and are more easily overtrained.
- Evaluate your predictions on real data, and figure
out how to attribute results to your models.
- Re-calibrate your models if necessary.
Tools: What the Team Needs
(TM)
c|c
(TM)
• Infrastructure: Data storage, cloud services, etc
• Analytics: Measuring whats going on
• Operations: Keeping things running
• Machine Learning and AI: Growth and Innovation
Algorithms, not data lakes, generate revenue
calculation | consulting data science leadership
32
Tools: What the Team Needs to Know
(TM)
c|c
(TM)
• Metrics: What KPIs you have, and what to hit
• Access: How to get what they need (i.e self-service)
• Impact: How tooling (used and built) support the business
• Truth: What data is reliable, what is not
Algorithms, not data lakes, generate revenue
calculation | consulting data science leadership
33
c|c
(TM)
(TM)
Final Thoughts: Algorithmic Accountability
calculation | consulting data science leadership
An asset is an economic resource.
Anything tangible or intangible that is capable of
being owned or controlled to produce value and
that is held to have positive economic value is
considered an asset.
algorithms can be valuable assets
34
c|c
(TM)
(TM)
Algorithmic Accountability
calculation | consulting data science leadership
35
does revenue depends on hidden algos ?
• WebMD Google SEO
• Amazon Product Listing Algo
• Pinterest Relevance Algo
• Twitter Spam filter
• Apple App Store Rankings
c|c
(TM)
(TM)
Algorithmic Accountability
calculation | consulting data science leadership
36
do decisions depend on hidden factors ?
A 'Crisis' in Online Ads: One-Third of Traffic Is Bogus
http://www.wsj.com/articles/SB10001424052702304026304579453253860786362
Now Algorithms Are DecidingWhomTo Hire…
http://www.npr.org/blogs/alltechconsidered/2015/03/23/394827451/now-algorithms-are-deciding-whom-to-hire-based-on-voice
What you don’t know about Internet algorithms is hurting you…
http://www.washingtonpost.com/news/the-intersect/wp/2015/03/23/what-you-dont-know-about-internet-algorithms-is-hurting-you-and-you-probably-dont-know-very-much/
c|c
(TM)
(TM)
Solution: Algorithmic Transparency
calculation | consulting data science leadership
37
can you be transparent and not be gamed ?
http://fortune.com/2015/03/18/how-do-you-govern-a-hidden-fluid-and-amoral-algorithm/
83% of the participants in the study changed their behavior
once they knew about the algorithm
How do you govern a (hidden, fluid and amoral) algorithm?
participants mistakenly believed that their friends intentionally
chose not to show them stories
c|c
(TM)
(TM)
Algorithmic Accountability
calculation | consulting data science leadership
Do you depend on some else’s marketplace?
How does your revenue depend on algos?
Do you need an internal algo ?
Who will manage it? build it? maintain it?
algorithms have unforeseen liabilities
38
(TM)
c|c
(TM)
c | c
charles@calculationconsulting.com

Georgetown B-school Talk 2021

  • 1.
    calculation | consulting datascience leadership (TM) c|c (TM) charles@calculationconsulting.com
  • 2.
  • 3.
    calculation | consultingdata science leadership Who Are We? c|c (TM) Dr. Charles H. Martin, PhD University of Chicago, Chemical Physics NSF Fellow in Theoretical Chemistry Over 10 years experience in applied Machine Learning Developed ML algos for Demand Media; the first $1B IPO since Google Lean Start Ups: Aardvark (acquired by Google), eHow, Wall Street: GLG, BGI, BlackRock Fortune 500: Roche, France Telecom Tech / Retail: GoDaddy, eBay, Walmart Investment: Griffin Advisors, Page Family Offices
 www.calculationconsulting.com charles@calculationconsulting.com (TM) 3
  • 4.
    Recent AI News:Epic Systems When machine learning FAILS c|c (TM) calculation | consulting data science leadership (TM) 4
  • 5.
    Recent AI News:Epic Systems c|c (TM) calculation | consulting data science leadership (TM) 5 “[The] definition of sepsis based on billing codes alone is imprecise and not the one that is clinically meaningful to a health system or to patients.” When machine learning FAILS
  • 6.
    Recent AI News:Zillow When machine learning (or AI) FAILS c|c (TM) calculation | consulting data science leadership (TM) 6
  • 7.
    Recent AI News:Zillow When machine learning (or AI) FAILS c|c (TM) calculation | consulting data science leadership (TM) 7
  • 8.
    Data Science isDifferent c|c (TM) calculation | consulting data science leadership (TM) 8
  • 9.
    Data Science Leadership: Becoming Data-Led c|c (TM) (TM) calculation | consulting data science leadership 9 1. Data Informed: OperationalVisibility 2. Data Driven: Tooling and Insights for Growth 3. Data Led Automation and Innovation creating the data-led organization
  • 10.
    Data Science Leadership: 4 Steps to Leading c|c (TM) (TM) calculation | consulting data science leadership 10 • Strategy: How can you leverage data ? • Stage: How mature is your data ? • Team: What team do we need ? • Tools: What tools do they need ? creating the data-led organization
  • 11.
    Strategy:Algo Gas StationAnalogy Problem: where to open a gas station ? Need: good traffic, weak competition c|c (TM) less competitors no traffic sweet spot great traffic too many competitors calculation | consulting data science leadership ML algorithms can predict supply and demand (TM) 11
  • 12.
    Strategy: Data ScienceProcess • Acquire Domain Knowledge • Formulate Hypothesis • Generate Model(s) from the Data • Predict Revenue Gains • Backtest Predictions on your Data • A/B Test in Production • Attribute Gains to Model(s) c|c (TM) (TM) acting solving framing calculation | consulting data science leadership 12
  • 13.
    c|c (TM) • Systems Thinking:leveraging the inter-relationships between data, marketing, and the customer • Knowledge Transfer: mentoring — not training — to develop both personal mastery and team learning • Mental Models: create a base of small-scale models for thinking about how to use your data • Knowledge Sharing: foster collaboration between research, engineering, and product to drive revenue Strategy: Learning from Data calculation | consulting data science leadership (TM) 13
  • 14.
    c|c (TM) • Cross-functional engineering,product, marketing, finance • Autonomous: separate from the traditional engineering product lifecycle. self-organizing and self-managing • Experimental: form hypothesis, analyze data, make predictions, run backtests, A/B testing • Self-sustaining: not a cost center; generates revenue (TM) calculation | consulting data science leadership 14 Strategy: Data Science is not IT
  • 15.
    c|c (TM) (TM) Problem: Externalities calculation |consulting data science leadership 15 external factors can change
  • 16.
    (TM) c|c (TM) Data is onlyis as accurate as it’s original intent demanded calculation | consulting data science leadership 16 Stage: Your Data Maturity • Where is your data ? Transaction Database? Web Logs ? 3rd party system ? Data Lake ? • What product does it service ? Billing ? CRM ? • Can you access it ? Security ? Regulations ? • Who owns it ? Responsible for quality ?
  • 17.
    Problem: Data QualityMismatch (TM) c|c (TM) Data is only is as accurate as it’s original intent demanded calculation | consulting data science leadership 17 ?
  • 18.
    Problem: Data QualityMismatch (TM) c|c (TM) Data is only is as accurate as it’s original intent demanded calculation | consulting data science leadership 18 Recommender System
  • 19.
    Problem: Data QualityMismatch (TM) c|c (TM) Data is only is as accurate as it’s original intent demanded calculation | consulting data science leadership 19 Recommender System Quality of product metadata May not materially impact billing x ? wrong missing
  • 20.
    (TM) c|c (TM) “Only the paranoidsurvive” Andy Grove (Intel) calculation | consulting data science leadership 20 Recommender System Solution: Be Paranoid and Test Everything
  • 21.
    (TM) c|c (TM) “Only the paranoidsurvive” Andy Grove (Intel) calculation | consulting data science leadership 21 Recommender System Solution: Test Everything Software engineers can be paranoid about programming. In fact, Paranoid Programming is a thing. You have to be paranoid about your data. Thing is, bad code can usually be fixed. But bad data has usually has to be thrown away
  • 22.
    (TM) c|c (TM) calculation | consultingdata science leadership 22 Recommender System
  • 23.
    Problem: Data Contraband (TM) c|c (TM) data'from a friend’ that may violate compliance calculation | consulting data science leadership 23 Recommender System Data pulled into spreadsheet / csv Data actually stored in DB Data passed around by email, etc
  • 24.
    (TM) c|c (TM) calculation | consultingdata science leadership 24 Recommender System Google Sheets, SAP, etc (where you can track everything) Move functions to the data (stored procedures, Spark, etc) Jira, GitHub, Confluence, . Document tracking systems Solutions: Data Contraband
  • 25.
    Team: Data Scientistsare Different c|c (TM) calculation | consulting data science leadership (TM) 25 not all techies are the same
  • 26.
    Team: Data Scientistsare Different c|c (TM) calculation | consulting data science leadership theoretical physics machine learning / AI specialist (TM) 26 applied physics data scientist engineer software, browser tech, dev ops, … not all techies are the same
  • 27.
    Team: Data Scientistsare Different c|c (TM) calculation | consulting data science leadership Data science group. Can be very isolated. Very research-y & difficult to productionalize (TM) 27 Embedded data scientist, solves problems builds solutions, and deploys them Software and IT services Great at managing code and systems Not great with data, math — or ambiguity not all techies are the same
  • 28.
    FANNG Managers: FallenGods c|c (TM) (TM) calculation | consulting data science leadership 28 the Earth is flat and they fallen off
  • 29.
    FANNG Managers: FallenGods c|c (TM) (TM) calculation | consulting data science leadership 29 FAANG infrastructure is 10-20 years ahead
  • 30.
    FANNG Managers: FallenGods c|c (TM) (TM) calculation | consulting data science leadership 30 you need infrastructure to deliver data products
  • 31.
    Data Strategy :Think like a Beginner c|c (TM) (TM) calculation | consulting data science leadership 31 cultivate a beginner’s mind - Test your assumptions. Literally - Look for problems early on. And never stop looking - Distinguish between statistical structural outliers. - Repair your data, if possible. - Start with simple, robust methods. - Sophisticated models are more sensitive to errors - and are more easily overtrained. - Evaluate your predictions on real data, and figure out how to attribute results to your models. - Re-calibrate your models if necessary.
  • 32.
    Tools: What theTeam Needs (TM) c|c (TM) • Infrastructure: Data storage, cloud services, etc • Analytics: Measuring whats going on • Operations: Keeping things running • Machine Learning and AI: Growth and Innovation Algorithms, not data lakes, generate revenue calculation | consulting data science leadership 32
  • 33.
    Tools: What theTeam Needs to Know (TM) c|c (TM) • Metrics: What KPIs you have, and what to hit • Access: How to get what they need (i.e self-service) • Impact: How tooling (used and built) support the business • Truth: What data is reliable, what is not Algorithms, not data lakes, generate revenue calculation | consulting data science leadership 33
  • 34.
    c|c (TM) (TM) Final Thoughts: AlgorithmicAccountability calculation | consulting data science leadership An asset is an economic resource. Anything tangible or intangible that is capable of being owned or controlled to produce value and that is held to have positive economic value is considered an asset. algorithms can be valuable assets 34
  • 35.
    c|c (TM) (TM) Algorithmic Accountability calculation |consulting data science leadership 35 does revenue depends on hidden algos ? • WebMD Google SEO • Amazon Product Listing Algo • Pinterest Relevance Algo • Twitter Spam filter • Apple App Store Rankings
  • 36.
    c|c (TM) (TM) Algorithmic Accountability calculation |consulting data science leadership 36 do decisions depend on hidden factors ? A 'Crisis' in Online Ads: One-Third of Traffic Is Bogus http://www.wsj.com/articles/SB10001424052702304026304579453253860786362 Now Algorithms Are DecidingWhomTo Hire… http://www.npr.org/blogs/alltechconsidered/2015/03/23/394827451/now-algorithms-are-deciding-whom-to-hire-based-on-voice What you don’t know about Internet algorithms is hurting you… http://www.washingtonpost.com/news/the-intersect/wp/2015/03/23/what-you-dont-know-about-internet-algorithms-is-hurting-you-and-you-probably-dont-know-very-much/
  • 37.
    c|c (TM) (TM) Solution: Algorithmic Transparency calculation| consulting data science leadership 37 can you be transparent and not be gamed ? http://fortune.com/2015/03/18/how-do-you-govern-a-hidden-fluid-and-amoral-algorithm/ 83% of the participants in the study changed their behavior once they knew about the algorithm How do you govern a (hidden, fluid and amoral) algorithm? participants mistakenly believed that their friends intentionally chose not to show them stories
  • 38.
    c|c (TM) (TM) Algorithmic Accountability calculation |consulting data science leadership Do you depend on some else’s marketplace? How does your revenue depend on algos? Do you need an internal algo ? Who will manage it? build it? maintain it? algorithms have unforeseen liabilities 38
  • 39.