SlideShare a Scribd company logo
1 of 142
Explaining advanced data
science techniques to
humans
SuperWeek ‘20
“
Hi, it’s still me!
Just “Mightier”
Doug Hall, Director of analytics
“
Everything should be made
as simple as possible, but not
simpler
Einstein
Data Science is for everyone
Data Science needs to be non-exclusive - high level, not mathy. Everyone gets value!
“
Any sufficiently advanced
technology is
indistinguishable from magic
Arthur C. Clarke
Techniques appear to be like
magical illusions
At first appearance data science seems to be nothing short of actual wizardry. As soon as you know the trick, the
mystery evaporates - utility grows!
“
When a distinguished but elderly scientist
states that something is possible, she/he is
most certainly right. When she/he states
that something is impossible, she/he is very
probably wrong
Arthur C. Clarke’s first law
Opposite is true for HiPPOs
We can fix this!
The power of analogy,
simile and metaphor
Beware your inner honey-badger
“
As hard as woodpecker lips
Busier than a one armed bricklayer
Nick Cummins, The Honey Badger - Aussie Rugby-ist
Vocabulary matters
Don’t speak nerd to non-nerds
Do you...
Use multiple independent variables in an
ANOVA test to reject the null hypothesis?
Or do you...
Analyse which brand of cereal has most calories?
Explain well
Use your human words.
● Models
○ How to choose a model
○ Prediction
○ Classification
○ Linear regression
○ Logistic regression
● Classification
○ K-means clustering
● Data engineering
○ Data preparation
○ Dimensionality reduction
■ Principal Component Analysis
■ Embedding
○ Structured/unstructured data
○ Integer and one-hot encoding
○ Correlation != Causation
● Machine Learning
○ Supervised/unsupervised/reinforcement machine learning
○ Type I, Type II errors
○ Precision, Recall, Accuracy
○ Under fitting, over fitting
● Attribution
○ Shapely Values and Markov Chains
● Testing
○ The null hypothesis
○ p-values
What you’re
about to hear
Things that will be explained.
BUSINESS QUESTION
Protection against data science detail
How can we
split our
customers
into different
groups to
market to?
DATA SCIENCE
QUESTION
How can we
run a
clustering
algorithm to
segment
customer
data?
DATA SCIENCE
ANSWER
A k-means
clustering
found 3
distinct groups
BUSINESS ANSWER
Here are 3
types of
customers,
new, high
spending, and
commercial
@BecomingDataSci
Model = decision support tool
Contemporary decisioning - making choices in a noisy room, stressed, on fire, doing 1000mph,
with 50 other demands on your time while being kicked.
It’s hard.
Go to your mind palace.
Use carefully
Don’t abdicate decision making responsibility
½ ark ½ Light
@FryRSquared
½ ark ½ Light
@FryRSquared
½ ark ½ Light
@FryRSquared
½ ark ½ Light
@FryRSquared
Let’s get started
First question from the business is:
How do I choose a model?
It depends...
How do I choose a model?
It’s not that simple at first glance
What do you want to do?
What’s the model for?
½ ark ½ Light
Prediction
Models
½ ark ½ Light
Classification
Models
Linear Regression
Start simple
½ ark ½ Light
Simplest of
crystal balls
Goal - prediction:
If I have a fifth cup of coffee, how much will
my productivity increase by?
Goal - explanation:
How much did the 6th coffee reduce hours
slept?
Models
½ ark ½ Light
Simplest of
crystal balls
Goal - prediction:
If page speed increases, what happens to
bounce rate?
Goal - explanation:
Can a change in page speed explain my
change in bounce rate?
Models
½ ark ½ Light
Simplest of
crystal balls
A most basic model of correlation
Models
½ ark ½ Light
SIMPLE Linear
Regression isn’t
extrapolation
Models
Logistic Regression
Think probability
½ ark ½ Light
Is it cat?
How confident are you?
Models
½ ark ½ Light
Is it cat?
How confident are you now?
Models
½ ark ½ Light
Is it cat?
More confident?
Models
½ ark ½ Light
Is it cat?
Got it?
Models
½ ark ½ Light
It’s a cat!
Kittie!
Models
½ ark ½ Light
Is it a puppy?
How confident are you?
Models
½ ark ½ Light
Is it a puppy?
How confident are you?
Models
½ ark ½ Light
Is it a puppy?
How confident are you?
Models
½ ark ½ Light
It’s a puppy!
Puppee!!!!!!!!
Models
½ ark ½ Light
Is it a fish?
How confident are you?
Models
½ ark ½ Light
Is it a fish?
How confident are you?
Models
½ ark ½ Light
Yup, fish...
FishEEEEEE!!!!!
Models
½ ark ½ Light
Logistic
Regression
As bounce rate and device category
change, what’s the conversion probability?
How can budget and time of day changes to
bidding help impression volume for my
display campaign?
Models
½ ark ½ Light
Logistic
Regression
How does the probability of an answer
(yes/no) change when 1 or more other
things change?
Models
Solving for high
dimensionality
This sounds hard...
½ ark ½ Light
Data looks like
this
Data Engineering
½ ark ½ Light
We need data
like this
Data Engineering
DATA SCIENCE WORDS
● Perform an n dimensional linear transformation
● Eigendecomposition of covariance matrices
● Derive eigenvectors and eigenvalues
● Extract Principal Components
We just employ Principal Component Analysis
Data Engineering
Clients be
like...
½ ark ½ Light
What does this
taste of?
(NOT a trick
question!)
Data Engineering
½ ark ½ Light
Chicken Tikka
Terrine
Sweet Pickled
Carrot, Smoked
Garlic Yogurt,
Beans
Kachumber
Data Engineering
½ ark ½ Light
Multiple
ingredients
Complex,
sophisticated
palate
Just like our
data
Data Engineering
½ ark ½ Light
Which
ingredients
combine to
influence the
overall flavour
of the dish?
Sweetness?
Savouriness?
Heat?
Sourness?
Umami?
Data Engineering
½ ark ½ Light
Classify based
on most
important
ingredients
variables
Data Engineering
Clients be
like...
½ ark ½ Light
Visualising 34
dimensions on
a graph
Data Engineering
● Visualisation
○ Now you can see the data
● Less computation
○ You get your output faster and cheaper
● Now use it
Dimension reduction = simpler visualisation
Data Engineering
K-means clustering
Unsupervised learning - grouping like data points
½ ark ½ Light
Remember
your first day at
school?
Kids cluster
when they are
alike.
Classification
½ ark ½ Light
Yep, I’m the kid
with no mates.
BUT
Outliers can be
interesting
data points
Classification
½ ark ½ Light
Cluster on
principal
components
Audience on
intent
Classification
Can we “just” model this?
Where do you keep your phd?
½ ark ½ Light
Preparation
Mise en place
for your data
>80% of the effort
Data Engineering
Prepping data takes effort
No hiding this fact
½ ark ½ Light
Why prep data
Data is oil...blah blah, refine it
Clive Humby
Data is meat...prepare it before it spoils
@strasm
Data Engineering
Structured or
Unstructured?
How’s YOUR data today?
½ ark ½ Light
Structured/
Unstructured
Data Engineering
½ ark ½ Light
Structured/
Unstructured
Data Engineering
½ ark ½ Light
Structured/
Unstructured
Data Engineering
½ ark ½ Light
Models are a bit
strict
Models, like bureaucrats, expect input in
a specific format...
Data Engineering
½ ark ½ Light
Categorical
data
(unstructured)
Data Engineering
½ ark ½ Light
Integer
encoding
1 2
3
Data Engineering
½ ark ½ Light
Integer
encoding
1 2
3
Is a dog 2x a cat?
Data Engineering
½ ark ½ Light
One hot
encoding
1 0
0
I haz a cat!
Data Engineering
½ ark ½ Light
One hot
encoding
Dimensionality
problem
1110000110010001010110100101
Data Engineering
½ ark ½ Light
Embedding
2 dimensions
rather than 12
Data Engineering
½ ark ½ Light
what3words
Data Engineering
Better measurement ->
better data
Better data -> better decision making
½ ark ½ Light
Does your site
measurement
look like this?
Data Engineering
● If it moves, fire an event
● Event confetti
● Vanity metrics
● Signal to noise
● Lots to go wrong
● Hard to understand
● Expensive to model
● There is another way..
Measure what matters
Data Engineering
Correlation != causation
This old chestnut...
½ ark ½ Light
Spurious
correlation
Data Engineering
½ ark ½ Light
Zero
correlation?
No tactical
activation?
Question it and
potentially bin
it.
Data Engineering
½ ark ½ Light
Think long tail
removal?
Do these events
contribute to
signal?
Data Engineering
Supervised or
unsupervised?
Learning is fun kids!
½ ark ½ Light
AI is ML with a
marketing dept
½ ark ½ Light
Let’s talk ML
Machine Learning
½ ark ½ Light
Supervised
learning
I need that
report STAT!
Machine Learning
½ ark ½ Light
UNsupervised
learning
My data feels
weird...can you
take a look?
Machine Learning
½ ark ½ Light
Reinforcement
learning
GOOD BOY!
Machine Learning
I need a PERFECT model
Uh huh...no such thing
½ ark ½ Light
We have credit
Machine Learning
½ ark ½ Light
Did you spend
this?!
Machine Learning
½ ark ½ Light
Did I buy stuff?
Machine Learning
How wrong can you be?
Being wrong on so many levels.
½ ark ½ Light
Type I & Type II
errors
Machine Learning
Credit card
history
inquisition
Machine Learning
Recall/
Precision
How many card transactions can I
recall with precision for the last 6
months?
Machine Learning
Recall/
Precision
Precision
33 true positive
33 true positive + 1 false positive
Recall
33 true positive
33 true positive + 14 false negative
Accuracy
33 true positive + 2 true negative
50 total
= 0.971
= 0.702
= 0.7
Machine Learning
Precision
Machine Learning
Recall
Machine Learning
Accuracy
Machine Learning
Underfitting and overfitting
Finding a balance
½ ark ½ Light
Underfitting
Getting stuck
on “Mount
Stupid”
Just.
Doesn’t.
Learn!
Machine Learning
½ ark ½ Light
Overfitting.
“That’s the way
I’ve always
done it!”
Unable to
generalise.
A student
learning by
rote can’t
handle the
exam.
Machine Learning
Machine Learning
½ ark ½ Light
Underfitting
doesn’t learn.
Overfitting
doesn’t
generalise.
Sweet spot in
between.
Machine Learning
Attribution
Peter O'Neill’s favourite
½ ark ½ Light
Yay Spurs!
Attribution
½ ark ½ Light
How did the
team perform?
Danny Rose Harry Winks Harry Kane
Attribution
½ ark ½ Light
Did Rose and
Winks not turn
up?
0 goals 0 goals 2 goals
Attribution
½ ark ½ Light
Consider a
whole season
What’s the
performance
when they’re
all involved in
passages of
play?
100 goals
5 goals
45 goals
Attribution
½ ark ½ Light
Shapely Values
4.2 5.7
91.2
Attribution
½ ark ½ Light
Shapely values
for channels as
a measure of
contribution to
total
conversions
Last Mid
Click
campaign
Display Campaign A 10
200
Social Campaign B 50
300
Organic 300
250
Direct 500
300
Referral 400
250
Attribution
½ ark ½ Light
I need to get to
Holborn from
Rickmansworth
via the tube.
Attribution
½ ark ½ Light
Today is a bad
day on the
tube...
Attribution
½ ark ½ Light
What route to
take?
Attribution
½ ark ½ Light
Once I’m in
Central
London...I have
options
Attribution
½ ark ½ Light
Change at
Kings Cross?
Attribution
½ ark ½ Light
Walk from
Euston Square?
Attribution
½ ark ½ Light
I’ve done this
before...when
was I on time,
and when was I
late?
Attribution
½ ark ½ Light
Markov Chain
to see
likelihood of
being on time
Attribution
½ ark ½ Light
On-time
conversion
attribution
Walking from Euston Square gets me on
time most often.
Attribution
½ ark ½ Light
Conversion
attribution for
channels
Attribution
½ ark ½ Light
Conversion
attribution for
channels
Attribution
How’s your testing going?
Is the test done yet?
What’s a null hypothesis?
Your new default
½ ark ½ Light
Start with “I’m
wrong”...
Testing
½ ark ½ Light
Testing
½ ark ½ Light
But wait!
Testing
½ ark ½ Light
Being
confident
you’re not
wrong.
p<0.05
Testing
½ ark ½ Light
p is a measure
of surprise
p<0.05
p says I saw a change
that I wasn’t expecting
(according to the null
hypothesis)
Testing
½ ark ½ Light
p is a measure
of surprise
p=0.5 p says Meh….
Testing
What you’ve
just heard
Things that you can now explain.
● Models
○ How to choose a model
○ Prediction
○ Classification
○ Linear regression
○ Logistic regression
● Classification
○ K-means clustering
● Data engineering
○ Data preparation
○ Dimensionality reduction
■ Principal Component Analysis
■ Embedding
○ Structured/unstructured data
○ Integer and one-hot encoding
○ Correlation != Causation
● Machine Learning
○ Supervised/unsupervised/reinforcement machine
learning
○ Type I, Type II errors
○ Precision, Recall, Accuracy
○ Under fitting, over fitting
● Attribution
○ Shapely Values and Markov Chains
● Testing
○ The null hypothesis
○ p-values
Now you can speak Data
Science
Good luck at dinner parties
THANK YOU
Doug Hall
Director of analytics
M I G H T Y H I V E . C O M
M I G H T Y H I V E . C O M

More Related Content

Similar to SPWK '20 - explaining data science to humans.pptx

The Art of Speaking Data.
The Art of Speaking Data.The Art of Speaking Data.
The Art of Speaking Data.David Wellman
 
I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)Ignacio Elola Villar
 
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioData Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioDavid Coallier
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Dhiana Deva
 
What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century Human Capital Media
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine LearningJeff Tanner
 
Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016 Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016 Jon Hawes
 
SearchLove London 2016 | Lea Pica | How to Present to Get Results
SearchLove London 2016 | Lea Pica | How to Present to Get ResultsSearchLove London 2016 | Lea Pica | How to Present to Get Results
SearchLove London 2016 | Lea Pica | How to Present to Get ResultsDistilled
 
Data vs Hunch - Beyond Lecture at Hyper Island 2015
Data vs Hunch - Beyond Lecture at Hyper Island 2015Data vs Hunch - Beyond Lecture at Hyper Island 2015
Data vs Hunch - Beyond Lecture at Hyper Island 2015Beyond
 
Data vs Hunch - Lecture at Hyper Island 2015
Data vs Hunch - Lecture at Hyper Island 2015Data vs Hunch - Lecture at Hyper Island 2015
Data vs Hunch - Lecture at Hyper Island 2015Nils Mork-Ulnes
 
What is Gamification?
What is Gamification? What is Gamification?
What is Gamification? Karl Kapp
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Dont wait what 300 ld leaders have learned about building data fluency
 Dont wait what 300 ld leaders have learned about building data fluency Dont wait what 300 ld leaders have learned about building data fluency
Dont wait what 300 ld leaders have learned about building data fluencyHuman Capital Media
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
Intro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerIntro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerProduct School
 
How to Solve the Customers Problems with Your Product by eBay PM
How to Solve the Customers Problems with Your Product by eBay PMHow to Solve the Customers Problems with Your Product by eBay PM
How to Solve the Customers Problems with Your Product by eBay PMProduct School
 
What is data science? No really, what is a data scientist?
What is data science? No really, what is a data scientist?What is data science? No really, what is a data scientist?
What is data science? No really, what is a data scientist?Dr. Melissa Sassi
 
LTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdfLTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdfjeroen_tjepkema
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)Truong Bomi
 

Similar to SPWK '20 - explaining data science to humans.pptx (20)

The Art of Speaking Data.
The Art of Speaking Data.The Art of Speaking Data.
The Art of Speaking Data.
 
I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)
 
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioData Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century What your employees need to learn to work with data in the 21 st century
What your employees need to learn to work with data in the 21 st century
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016 Lightning talk on the future of analytics - CloudCamp London, 2016
Lightning talk on the future of analytics - CloudCamp London, 2016
 
SearchLove London 2016 | Lea Pica | How to Present to Get Results
SearchLove London 2016 | Lea Pica | How to Present to Get ResultsSearchLove London 2016 | Lea Pica | How to Present to Get Results
SearchLove London 2016 | Lea Pica | How to Present to Get Results
 
Data vs Hunch - Beyond Lecture at Hyper Island 2015
Data vs Hunch - Beyond Lecture at Hyper Island 2015Data vs Hunch - Beyond Lecture at Hyper Island 2015
Data vs Hunch - Beyond Lecture at Hyper Island 2015
 
Data vs Hunch - Lecture at Hyper Island 2015
Data vs Hunch - Lecture at Hyper Island 2015Data vs Hunch - Lecture at Hyper Island 2015
Data vs Hunch - Lecture at Hyper Island 2015
 
What is Gamification?
What is Gamification? What is Gamification?
What is Gamification?
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Dont wait what 300 ld leaders have learned about building data fluency
 Dont wait what 300 ld leaders have learned about building data fluency Dont wait what 300 ld leaders have learned about building data fluency
Dont wait what 300 ld leaders have learned about building data fluency
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Intro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerIntro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product Manager
 
How to Solve the Customers Problems with Your Product by eBay PM
How to Solve the Customers Problems with Your Product by eBay PMHow to Solve the Customers Problems with Your Product by eBay PM
How to Solve the Customers Problems with Your Product by eBay PM
 
What is data science? No really, what is a data scientist?
What is data science? No really, what is a data scientist?What is data science? No really, what is a data scientist?
What is data science? No really, what is a data scientist?
 
LTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdfLTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdf
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

SPWK '20 - explaining data science to humans.pptx

  • 1. Explaining advanced data science techniques to humans SuperWeek ‘20
  • 2. “ Hi, it’s still me! Just “Mightier” Doug Hall, Director of analytics
  • 3.
  • 4. “ Everything should be made as simple as possible, but not simpler Einstein
  • 5. Data Science is for everyone Data Science needs to be non-exclusive - high level, not mathy. Everyone gets value!
  • 6. “ Any sufficiently advanced technology is indistinguishable from magic Arthur C. Clarke
  • 7. Techniques appear to be like magical illusions At first appearance data science seems to be nothing short of actual wizardry. As soon as you know the trick, the mystery evaporates - utility grows!
  • 8. “ When a distinguished but elderly scientist states that something is possible, she/he is most certainly right. When she/he states that something is impossible, she/he is very probably wrong Arthur C. Clarke’s first law
  • 9. Opposite is true for HiPPOs We can fix this!
  • 10. The power of analogy, simile and metaphor Beware your inner honey-badger
  • 11. “ As hard as woodpecker lips Busier than a one armed bricklayer Nick Cummins, The Honey Badger - Aussie Rugby-ist
  • 12. Vocabulary matters Don’t speak nerd to non-nerds
  • 13. Do you... Use multiple independent variables in an ANOVA test to reject the null hypothesis?
  • 14. Or do you... Analyse which brand of cereal has most calories?
  • 15. Explain well Use your human words.
  • 16. ● Models ○ How to choose a model ○ Prediction ○ Classification ○ Linear regression ○ Logistic regression ● Classification ○ K-means clustering ● Data engineering ○ Data preparation ○ Dimensionality reduction ■ Principal Component Analysis ■ Embedding ○ Structured/unstructured data ○ Integer and one-hot encoding ○ Correlation != Causation ● Machine Learning ○ Supervised/unsupervised/reinforcement machine learning ○ Type I, Type II errors ○ Precision, Recall, Accuracy ○ Under fitting, over fitting ● Attribution ○ Shapely Values and Markov Chains ● Testing ○ The null hypothesis ○ p-values What you’re about to hear Things that will be explained.
  • 17. BUSINESS QUESTION Protection against data science detail How can we split our customers into different groups to market to? DATA SCIENCE QUESTION How can we run a clustering algorithm to segment customer data? DATA SCIENCE ANSWER A k-means clustering found 3 distinct groups BUSINESS ANSWER Here are 3 types of customers, new, high spending, and commercial @BecomingDataSci
  • 18. Model = decision support tool Contemporary decisioning - making choices in a noisy room, stressed, on fire, doing 1000mph, with 50 other demands on your time while being kicked. It’s hard. Go to your mind palace.
  • 19. Use carefully Don’t abdicate decision making responsibility
  • 20. ½ ark ½ Light @FryRSquared
  • 21. ½ ark ½ Light @FryRSquared
  • 22. ½ ark ½ Light @FryRSquared
  • 23. ½ ark ½ Light @FryRSquared
  • 24. Let’s get started First question from the business is:
  • 25. How do I choose a model? It depends...
  • 26. How do I choose a model? It’s not that simple at first glance
  • 27.
  • 28. What do you want to do? What’s the model for?
  • 29. ½ ark ½ Light Prediction Models
  • 30. ½ ark ½ Light Classification Models
  • 32. ½ ark ½ Light Simplest of crystal balls Goal - prediction: If I have a fifth cup of coffee, how much will my productivity increase by? Goal - explanation: How much did the 6th coffee reduce hours slept? Models
  • 33. ½ ark ½ Light Simplest of crystal balls Goal - prediction: If page speed increases, what happens to bounce rate? Goal - explanation: Can a change in page speed explain my change in bounce rate? Models
  • 34. ½ ark ½ Light Simplest of crystal balls A most basic model of correlation Models
  • 35. ½ ark ½ Light SIMPLE Linear Regression isn’t extrapolation Models
  • 37. ½ ark ½ Light Is it cat? How confident are you? Models
  • 38. ½ ark ½ Light Is it cat? How confident are you now? Models
  • 39. ½ ark ½ Light Is it cat? More confident? Models
  • 40. ½ ark ½ Light Is it cat? Got it? Models
  • 41. ½ ark ½ Light It’s a cat! Kittie! Models
  • 42. ½ ark ½ Light Is it a puppy? How confident are you? Models
  • 43. ½ ark ½ Light Is it a puppy? How confident are you? Models
  • 44. ½ ark ½ Light Is it a puppy? How confident are you? Models
  • 45. ½ ark ½ Light It’s a puppy! Puppee!!!!!!!! Models
  • 46. ½ ark ½ Light Is it a fish? How confident are you? Models
  • 47. ½ ark ½ Light Is it a fish? How confident are you? Models
  • 48. ½ ark ½ Light Yup, fish... FishEEEEEE!!!!! Models
  • 49. ½ ark ½ Light Logistic Regression As bounce rate and device category change, what’s the conversion probability? How can budget and time of day changes to bidding help impression volume for my display campaign? Models
  • 50. ½ ark ½ Light Logistic Regression How does the probability of an answer (yes/no) change when 1 or more other things change? Models
  • 52. ½ ark ½ Light Data looks like this Data Engineering
  • 53. ½ ark ½ Light We need data like this Data Engineering
  • 54. DATA SCIENCE WORDS ● Perform an n dimensional linear transformation ● Eigendecomposition of covariance matrices ● Derive eigenvectors and eigenvalues ● Extract Principal Components We just employ Principal Component Analysis Data Engineering
  • 56. ½ ark ½ Light What does this taste of? (NOT a trick question!) Data Engineering
  • 57. ½ ark ½ Light Chicken Tikka Terrine Sweet Pickled Carrot, Smoked Garlic Yogurt, Beans Kachumber Data Engineering
  • 58. ½ ark ½ Light Multiple ingredients Complex, sophisticated palate Just like our data Data Engineering
  • 59. ½ ark ½ Light Which ingredients combine to influence the overall flavour of the dish? Sweetness? Savouriness? Heat? Sourness? Umami? Data Engineering
  • 60. ½ ark ½ Light Classify based on most important ingredients variables Data Engineering
  • 62. ½ ark ½ Light Visualising 34 dimensions on a graph Data Engineering
  • 63. ● Visualisation ○ Now you can see the data ● Less computation ○ You get your output faster and cheaper ● Now use it Dimension reduction = simpler visualisation Data Engineering
  • 64. K-means clustering Unsupervised learning - grouping like data points
  • 65. ½ ark ½ Light Remember your first day at school? Kids cluster when they are alike. Classification
  • 66. ½ ark ½ Light Yep, I’m the kid with no mates. BUT Outliers can be interesting data points Classification
  • 67. ½ ark ½ Light Cluster on principal components Audience on intent Classification
  • 68. Can we “just” model this? Where do you keep your phd?
  • 69. ½ ark ½ Light Preparation Mise en place for your data >80% of the effort Data Engineering
  • 70. Prepping data takes effort No hiding this fact
  • 71. ½ ark ½ Light Why prep data Data is oil...blah blah, refine it Clive Humby Data is meat...prepare it before it spoils @strasm Data Engineering
  • 73. ½ ark ½ Light Structured/ Unstructured Data Engineering
  • 74. ½ ark ½ Light Structured/ Unstructured Data Engineering
  • 75. ½ ark ½ Light Structured/ Unstructured Data Engineering
  • 76. ½ ark ½ Light Models are a bit strict Models, like bureaucrats, expect input in a specific format... Data Engineering
  • 77. ½ ark ½ Light Categorical data (unstructured) Data Engineering
  • 78. ½ ark ½ Light Integer encoding 1 2 3 Data Engineering
  • 79. ½ ark ½ Light Integer encoding 1 2 3 Is a dog 2x a cat? Data Engineering
  • 80. ½ ark ½ Light One hot encoding 1 0 0 I haz a cat! Data Engineering
  • 81. ½ ark ½ Light One hot encoding Dimensionality problem 1110000110010001010110100101 Data Engineering
  • 82. ½ ark ½ Light Embedding 2 dimensions rather than 12 Data Engineering
  • 83. ½ ark ½ Light what3words Data Engineering
  • 84. Better measurement -> better data Better data -> better decision making
  • 85. ½ ark ½ Light Does your site measurement look like this? Data Engineering
  • 86. ● If it moves, fire an event ● Event confetti ● Vanity metrics ● Signal to noise ● Lots to go wrong ● Hard to understand ● Expensive to model ● There is another way.. Measure what matters Data Engineering
  • 87. Correlation != causation This old chestnut...
  • 88. ½ ark ½ Light Spurious correlation Data Engineering
  • 89. ½ ark ½ Light Zero correlation? No tactical activation? Question it and potentially bin it. Data Engineering
  • 90. ½ ark ½ Light Think long tail removal? Do these events contribute to signal? Data Engineering
  • 92. ½ ark ½ Light AI is ML with a marketing dept
  • 93. ½ ark ½ Light Let’s talk ML Machine Learning
  • 94. ½ ark ½ Light Supervised learning I need that report STAT! Machine Learning
  • 95. ½ ark ½ Light UNsupervised learning My data feels weird...can you take a look? Machine Learning
  • 96. ½ ark ½ Light Reinforcement learning GOOD BOY! Machine Learning
  • 97. I need a PERFECT model Uh huh...no such thing
  • 98. ½ ark ½ Light We have credit Machine Learning
  • 99. ½ ark ½ Light Did you spend this?! Machine Learning
  • 100. ½ ark ½ Light Did I buy stuff? Machine Learning
  • 101. How wrong can you be? Being wrong on so many levels.
  • 102. ½ ark ½ Light Type I & Type II errors Machine Learning
  • 104. Recall/ Precision How many card transactions can I recall with precision for the last 6 months? Machine Learning
  • 105. Recall/ Precision Precision 33 true positive 33 true positive + 1 false positive Recall 33 true positive 33 true positive + 14 false negative Accuracy 33 true positive + 2 true negative 50 total = 0.971 = 0.702 = 0.7 Machine Learning
  • 110. ½ ark ½ Light Underfitting Getting stuck on “Mount Stupid” Just. Doesn’t. Learn! Machine Learning
  • 111. ½ ark ½ Light Overfitting. “That’s the way I’ve always done it!” Unable to generalise. A student learning by rote can’t handle the exam. Machine Learning
  • 113. ½ ark ½ Light Underfitting doesn’t learn. Overfitting doesn’t generalise. Sweet spot in between. Machine Learning
  • 115. ½ ark ½ Light Yay Spurs! Attribution
  • 116. ½ ark ½ Light How did the team perform? Danny Rose Harry Winks Harry Kane Attribution
  • 117. ½ ark ½ Light Did Rose and Winks not turn up? 0 goals 0 goals 2 goals Attribution
  • 118. ½ ark ½ Light Consider a whole season What’s the performance when they’re all involved in passages of play? 100 goals 5 goals 45 goals Attribution
  • 119. ½ ark ½ Light Shapely Values 4.2 5.7 91.2 Attribution
  • 120. ½ ark ½ Light Shapely values for channels as a measure of contribution to total conversions Last Mid Click campaign Display Campaign A 10 200 Social Campaign B 50 300 Organic 300 250 Direct 500 300 Referral 400 250 Attribution
  • 121. ½ ark ½ Light I need to get to Holborn from Rickmansworth via the tube. Attribution
  • 122. ½ ark ½ Light Today is a bad day on the tube... Attribution
  • 123. ½ ark ½ Light What route to take? Attribution
  • 124. ½ ark ½ Light Once I’m in Central London...I have options Attribution
  • 125. ½ ark ½ Light Change at Kings Cross? Attribution
  • 126. ½ ark ½ Light Walk from Euston Square? Attribution
  • 127. ½ ark ½ Light I’ve done this before...when was I on time, and when was I late? Attribution
  • 128. ½ ark ½ Light Markov Chain to see likelihood of being on time Attribution
  • 129. ½ ark ½ Light On-time conversion attribution Walking from Euston Square gets me on time most often. Attribution
  • 130. ½ ark ½ Light Conversion attribution for channels Attribution
  • 131. ½ ark ½ Light Conversion attribution for channels Attribution
  • 132. How’s your testing going? Is the test done yet?
  • 133. What’s a null hypothesis? Your new default
  • 134. ½ ark ½ Light Start with “I’m wrong”... Testing
  • 135. ½ ark ½ Light Testing
  • 136. ½ ark ½ Light But wait! Testing
  • 137. ½ ark ½ Light Being confident you’re not wrong. p<0.05 Testing
  • 138. ½ ark ½ Light p is a measure of surprise p<0.05 p says I saw a change that I wasn’t expecting (according to the null hypothesis) Testing
  • 139. ½ ark ½ Light p is a measure of surprise p=0.5 p says Meh…. Testing
  • 140. What you’ve just heard Things that you can now explain. ● Models ○ How to choose a model ○ Prediction ○ Classification ○ Linear regression ○ Logistic regression ● Classification ○ K-means clustering ● Data engineering ○ Data preparation ○ Dimensionality reduction ■ Principal Component Analysis ■ Embedding ○ Structured/unstructured data ○ Integer and one-hot encoding ○ Correlation != Causation ● Machine Learning ○ Supervised/unsupervised/reinforcement machine learning ○ Type I, Type II errors ○ Precision, Recall, Accuracy ○ Under fitting, over fitting ● Attribution ○ Shapely Values and Markov Chains ● Testing ○ The null hypothesis ○ p-values
  • 141. Now you can speak Data Science Good luck at dinner parties
  • 142. THANK YOU Doug Hall Director of analytics M I G H T Y H I V E . C O M M I G H T Y H I V E . C O M