4. “Data Scientist:The Sexiest Job
of the 21st Century”Perhaps NOT! We all will
work as Data-Specialist.
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
6. • Most startups don’t know what they’ll be
when they grow up.
• Markets get disrupted by Data
• Evaluate data to understand what’s working
and what’s not.
13. –Avinash Kaushik
“Analytics is the analysis of qualitative and
quantitative data from your business and
competition to drive continual improvement
of the online experience that your customers
and potential customers have which translates to
your desired outcomes, both online and offline.”
16. –Pat Symonds
“Bloody hell, it's so different these days. I used to be
able to have dinner with the drivers on Friday night!
Nowadays they spend evenings mulling over graphs
and spreadsheets to inform their race preparation.”
17. • 150 sensors per car
• 1.000 data points per lap
• 200 Gigabytes move from track and factory
every race weekend
• 50 Data Engineers per race
23. A good metric is…
• comparable to another time period, group,
competitor,…
• understandable for the target audience.
• a ratio or rate
• targeted to the right audience (developers, marketers,
business development,…)
• a behaviour changer.
24. • Qualitative metrics are unstructured,
anecdotal, revealing, hard to aggregate.
• Quantitative metrics are numbers and stats,
hard facts but less insights.
Discover
Proof
27. VANITY METRICS
• Vanity Metrics makes you feel good but doesn’t
change how you’ll act.
• Vanity Metrics are bad!
28. • Hits
• PageViews
• Visits
• Users
• Followers/friends/likes
• Logins
Only for Ad Inventory
Count People
Cross it with People
Why they stuck or left?
Count actions!
What are the actions and value?
34. CORRELATION
• Use R, Excel, Numbers or online tools to calculate the linear relationship between
two variables.
• Interpret a Correlation Coefficient r
• +- 1:A perfect linear relationship
• +- 0.70 strong linear relationship
• +- 0.50 moderate linear relationship
• +- 0.30: weak linear relationship
• +- 0: no linear relationship
35. STANDARD DEVIATION
• Standard Deviation (SD) is a measure that is used to
quantify the amount of variation or dispersion of a set
of data values.
• A standard deviation close to 0 indicates that the data
points tend to be very close to the mean of the set,
while a high standard deviation indicates that the data
points are spread out over a wider range of values.
36. STATISTICAL SIGNIFICANCE
• Statistical significance is fundamental to statistical
hypothesis testing (determine whether a null
hypothesis should be rejected or retained)
• In any experiment or observation that involves
drawing a sample from a population, there is
always the possibility that an observed effect
would have occurred due to sampling error alone.
38. Find Correlation Test Causality
Optimize the
causal factor
Correlation lets you
predict the future
Causality lets you
change the future
39. • A/B testing (also known as split testing) is a
method of comparing two versions against
each other to determine which one performs
better.
• Testing takes the guesswork out of
optimization and enables data-backed decisions
that shift business conversations from “we
think” to “we know.”
47. • Define a Hypothesis and Goal.
• Define the minimum sample size.
• Wait until the minimum sample size have run
through the test.
• Calculate if the differences between versions
are statistical significant.
60. Stickiness Virality Price
Approach
Math that
matters
Keep people
coming back
Gest customers
faster than you
lose them
Make people
invite friends
How many they
tell, how fast
they tell them
Spend revenue
getting customers
Customers are
worth more than
they cost to get
THREE ENGINES
AARRR
Metric
Retention Referral Revenue
61. Facebook on early days
had only 150k users,
little revenue and many
superior competitors
…but 75% of users visit
one or more times per
day.
And within one month
of launching on a new
campus, could acquire
90% of the students.
63. • Cohort analysis is a subset of behavioural
analytics that takes the data from a product
and rather than looking at all users as one unit,
it breaks them into related groups for analysis.
• These related groups, or cohorts, usually share
common characteristics or experiences within
a defined timespan.