7. Techniques appear to be like
magical illusions
At first appearance data science seems to be nothing short of actual wizardry. As soon as you know the trick, the
mystery evaporates - utility grows!
8. “
When a distinguished but elderly scientist
states that something is possible, she/he is
most certainly right. When she/he states
that something is impossible, she/he is very
probably wrong
Arthur C. Clarke’s first law
16. ● Models
○ How to choose a model
○ Prediction
○ Classification
○ Linear regression
○ Logistic regression
● Classification
○ K-means clustering
● Data engineering
○ Data preparation
○ Dimensionality reduction
■ Principal Component Analysis
■ Embedding
○ Structured/unstructured data
○ Integer and one-hot encoding
○ Correlation != Causation
● Machine Learning
○ Supervised/unsupervised/reinforcement machine learning
○ Type I, Type II errors
○ Precision, Recall, Accuracy
○ Under fitting, over fitting
● Attribution
○ Shapely Values and Markov Chains
● Testing
○ The null hypothesis
○ p-values
What you’re
about to hear
Things that will be explained.
17. BUSINESS QUESTION
Protection against data science detail
How can we
split our
customers
into different
groups to
market to?
DATA SCIENCE
QUESTION
How can we
run a
clustering
algorithm to
segment
customer
data?
DATA SCIENCE
ANSWER
A k-means
clustering
found 3
distinct groups
BUSINESS ANSWER
Here are 3
types of
customers,
new, high
spending, and
commercial
@BecomingDataSci
18. Model = decision support tool
Contemporary decisioning - making choices in a noisy room, stressed, on fire, doing 1000mph,
with 50 other demands on your time while being kicked.
It’s hard.
Go to your mind palace.
32. ½ ark ½ Light
Simplest of
crystal balls
Goal - prediction:
If I have a fifth cup of coffee, how much will
my productivity increase by?
Goal - explanation:
How much did the 6th coffee reduce hours
slept?
Models
33. ½ ark ½ Light
Simplest of
crystal balls
Goal - prediction:
If page speed increases, what happens to
bounce rate?
Goal - explanation:
Can a change in page speed explain my
change in bounce rate?
Models
34. ½ ark ½ Light
Simplest of
crystal balls
A most basic model of correlation
Models
35. ½ ark ½ Light
SIMPLE Linear
Regression isn’t
extrapolation
Models
42. ½ ark ½ Light
Is it a puppy?
How confident are you?
Models
43. ½ ark ½ Light
Is it a puppy?
How confident are you?
Models
44. ½ ark ½ Light
Is it a puppy?
How confident are you?
Models
45. ½ ark ½ Light
It’s a puppy!
Puppee!!!!!!!!
Models
46. ½ ark ½ Light
Is it a fish?
How confident are you?
Models
47. ½ ark ½ Light
Is it a fish?
How confident are you?
Models
48. ½ ark ½ Light
Yup, fish...
FishEEEEEE!!!!!
Models
49. ½ ark ½ Light
Logistic
Regression
As bounce rate and device category
change, what’s the conversion probability?
How can budget and time of day changes to
bidding help impression volume for my
display campaign?
Models
50. ½ ark ½ Light
Logistic
Regression
How does the probability of an answer
(yes/no) change when 1 or more other
things change?
Models
52. ½ ark ½ Light
Data looks like
this
Data Engineering
53. ½ ark ½ Light
We need data
like this
Data Engineering
54. DATA SCIENCE WORDS
● Perform an n dimensional linear transformation
● Eigendecomposition of covariance matrices
● Derive eigenvectors and eigenvalues
● Extract Principal Components
We just employ Principal Component Analysis
Data Engineering
56. ½ ark ½ Light
What does this
taste of?
(NOT a trick
question!)
Data Engineering
57. ½ ark ½ Light
Chicken Tikka
Terrine
Sweet Pickled
Carrot, Smoked
Garlic Yogurt,
Beans
Kachumber
Data Engineering
58. ½ ark ½ Light
Multiple
ingredients
Complex,
sophisticated
palate
Just like our
data
Data Engineering
59. ½ ark ½ Light
Which
ingredients
combine to
influence the
overall flavour
of the dish?
Sweetness?
Savouriness?
Heat?
Sourness?
Umami?
Data Engineering
60. ½ ark ½ Light
Classify based
on most
important
ingredients
variables
Data Engineering
62. ½ ark ½ Light
Visualising 34
dimensions on
a graph
Data Engineering
63. ● Visualisation
○ Now you can see the data
● Less computation
○ You get your output faster and cheaper
● Now use it
Dimension reduction = simpler visualisation
Data Engineering
85. ½ ark ½ Light
Does your site
measurement
look like this?
Data Engineering
86. ● If it moves, fire an event
● Event confetti
● Vanity metrics
● Signal to noise
● Lots to go wrong
● Hard to understand
● Expensive to model
● There is another way..
Measure what matters
Data Engineering
110. ½ ark ½ Light
Underfitting
Getting stuck
on “Mount
Stupid”
Just.
Doesn’t.
Learn!
Machine Learning
111. ½ ark ½ Light
Overfitting.
“That’s the way
I’ve always
done it!”
Unable to
generalise.
A student
learning by
rote can’t
handle the
exam.
Machine Learning
116. ½ ark ½ Light
How did the
team perform?
Danny Rose Harry Winks Harry Kane
Attribution
117. ½ ark ½ Light
Did Rose and
Winks not turn
up?
0 goals 0 goals 2 goals
Attribution
118. ½ ark ½ Light
Consider a
whole season
What’s the
performance
when they’re
all involved in
passages of
play?
100 goals
5 goals
45 goals
Attribution
119. ½ ark ½ Light
Shapely Values
4.2 5.7
91.2
Attribution
120. ½ ark ½ Light
Shapely values
for channels as
a measure of
contribution to
total
conversions
Last Mid
Click
campaign
Display Campaign A 10
200
Social Campaign B 50
300
Organic 300
250
Direct 500
300
Referral 400
250
Attribution
121. ½ ark ½ Light
I need to get to
Holborn from
Rickmansworth
via the tube.
Attribution
122. ½ ark ½ Light
Today is a bad
day on the
tube...
Attribution
123. ½ ark ½ Light
What route to
take?
Attribution
124. ½ ark ½ Light
Once I’m in
Central
London...I have
options
Attribution
125. ½ ark ½ Light
Change at
Kings Cross?
Attribution
126. ½ ark ½ Light
Walk from
Euston Square?
Attribution
127. ½ ark ½ Light
I’ve done this
before...when
was I on time,
and when was I
late?
Attribution
128. ½ ark ½ Light
Markov Chain
to see
likelihood of
being on time
Attribution
129. ½ ark ½ Light
On-time
conversion
attribution
Walking from Euston Square gets me on
time most often.
Attribution
130. ½ ark ½ Light
Conversion
attribution for
channels
Attribution
131. ½ ark ½ Light
Conversion
attribution for
channels
Attribution
137. ½ ark ½ Light
Being
confident
you’re not
wrong.
p<0.05
Testing
138. ½ ark ½ Light
p is a measure
of surprise
p<0.05
p says I saw a change
that I wasn’t expecting
(according to the null
hypothesis)
Testing
139. ½ ark ½ Light
p is a measure
of surprise
p=0.5 p says Meh….
Testing
140. What you’ve
just heard
Things that you can now explain.
● Models
○ How to choose a model
○ Prediction
○ Classification
○ Linear regression
○ Logistic regression
● Classification
○ K-means clustering
● Data engineering
○ Data preparation
○ Dimensionality reduction
■ Principal Component Analysis
■ Embedding
○ Structured/unstructured data
○ Integer and one-hot encoding
○ Correlation != Causation
● Machine Learning
○ Supervised/unsupervised/reinforcement machine
learning
○ Type I, Type II errors
○ Precision, Recall, Accuracy
○ Under fitting, over fitting
● Attribution
○ Shapely Values and Markov Chains
● Testing
○ The null hypothesis
○ p-values
141. Now you can speak Data
Science
Good luck at dinner parties