Data Science Toolkit for Product Managers

DATA SCIENCETOOLKIT FOR
PRODUCT MANAGERS
Mahmoud Jalajel | @mjalajel

“While others may deliver deadlines for management,
product managers deliver value for users.”

ACKNOWLEDGEMENTS
• Abdallah Al-Khalidi
• Ashraf Samhouri
• Ibrahem Abu Hijleh
• Mohammad Obaidat
• Rawan Abu Khadra
• Rema Malkawi
• SereenYaseen
• Yousef Alsayeh
Thank you for sharing your experiences and
doing what product managers do best: nagging!

WHO AM ME?
• JOSA member & AIESEC alumnus
• Past Entrepreneur (Currently Undercover)
• Full-Stack Data Scientist:
• Recommender Systems
• Real-time systems
• Other Activities : NLP, Machine Learning, Programming, DevOps,
Hardware, Bash-scripting

WHY ARE WE HERE?
• Develop data intuition and argue about data
• Build data culture
• Make data-driven decisions
• Leverage data to build better products

AGENDA
• Why data?
• Reading into Data
• Data-driven products

CLARIFICATION
• Data Science is not Big Data
• Data Science sometimes uses Big Data technologies.
• Data Science is about extracting value from data
• Motivation: How can I use this data to drive more value.
• Big Data is about solving the data problem
• Motivation:“God! we have too much data our servers are crashing!”
• This talk is about statistics and data science, not big data.

MOTIVATING QUESTIONS
• Which is better? Asking users or watching them use
the product?
• How do you get user’s feedback actually fed-back
into the product?
• How do you discover user needs?
• How do you do any of these with thousands of users?

WHY A DATA CULTURE
• You can’t improve what you can’t measure
• Data is the best equalizer: From top management to
the freshest interns
• Create accountability around data results
• It's all about culture: Build (or enforce) a data culture.

PRODUCT LIFECYCLE
Classical (iteration#1) Data-Driven (iteration#2)
Secondary Research Study Market Analyze Open Datasets
Primary Research Interviews / UserTesting Usage Data Collection
User Criteria (proﬁling) Demographics Segmentatin User Behavioral Proﬁling
Personas and Scenario Assumptions & Interviews User Clusetring
Execute: Get feedback Qualitative Feedback A/BTesting
Improve andTest Qualitative Feedback
A/BTesting
Anomalies in usage patterns

READING DATA
“READING INTO” DATA

“Three statisticians are out hunting. Bird flies up out of
the bush, and the first statistician aims and fires.
Unfortunately for them, he missed, the bullet going
about a foot below the bird.The second one fires, but
the bullet goes about a foot above the bird.
The third statistician puts down his gun and says:
‘All right! We got him!’”

ANTI-PATTERN:AVERAGE
• Average reduces amount of information into a single
misleading ﬁgure.
• average(0, 50, 100) = 50
• average (49, 50, 51) = 50
• average (8, 9, 9, 10, 14, 250) = 50
• average (0, 0, 0, 0, 0, 0, 0, 0, 0, 500) = 50

MEDIAN & PERCENTILES
• Percentiles guarantee
data order and guard
against outliers
• Median:Value occuring
at the center of ordered
values.
• Xth
Percentile: Number
larger than X% of data
Average Median 90th P
0,50,100 50 50 50
49,50,51 50 50 50
8, 9, 9,
10, 14,
250
50 9.5 14
0,0,…
500
50 0 0

STANDARD DEVIATION
• Used for normally-
distributed dataest
• How spread-out
dataset is?
Average σ Median 90th P
0, 50,
100
50 50 50 50
49, 50,
51
50 1 50 50
8, 9, 9,
10, 14,
250
50 98 9.5 14
0,0,…
500
50 158.1 0 0

DOESN’T EVERYTHING FOLLOWS
THE NORMAL DISTRIBUTION?
• Mean and Standard Deviation assumes that data is
normally distributed (has a bell-curve ﬁgure)
• In normally-distributed datasets: mean, median, and
mode are all the same.
• Most data is thought to be normally-distributed,
but it actually is not!

APPLICATIONS
• Service-Level and Server up time:
• On average, each API call will take 200ms
• 80% of calls under 100ms, 95% of calls under 200ms
• Paying customers:
• On average, each user pays $7
• Segment users, remove outliers and represent them with percentiles:
• 90% of basic users pay $5 or more per month
• 90% of premium users pay $13 or more per month
• One outlier paid us $200 last month. Interesting, let’s investigate!

ANTI-PATTERN:ACCURACY
• Accuracy also compresses information into misleading and usually useless ﬁgure.
• Example: If 1% of your email is spam.
• Solution#1: Marks all emails as spam → 100% accurate.
• Solution#2: Marks all emails as non-spam → 99% accurate.
• We care a lot about:
• What kind of error happened?
• Can we tolerate it?
• Are all errors born equal? Assign cost per error type.

CONFUSION MATRIX
TheTruth (Facts)
“Pregnancy”
TRUE (Pregnant) FALSE (Not-Pregnant)
My Guess
“Pregnancy
Test”
Positive
(Yes)
True Positive (TP)
Pregnant Woman!
False Positive (FP)
Pregnant Man!
Negative
(No)
False Negative (FN) 
Pregnant Woman told she
isn’t!
True Negative (TN)
Non-Pregnant Man!

PRECISION AND RECALL
• Accuracy = (TP + TN) / All
• Treats FP and FN equally
• Precision = TP / (TP + FP)
• 100% when FP=0 (no errors returned)
• Useful for search results, sensitive and important information
• “I’d rather say nothing than tell a lie! or embarrass myself with a wrong answer”
• Recall = TP / (TP + FN)
• 100% when FN=0 (When all correct results are returned)
• Useful for passive interactions like recommender systems and loose-searching (similar items)
• “I won’t hide anything from you, even the useless details”

PRECISIONVS.
RECALL
And why you can’t have both!

OTHER MEASURES
• Weighted errors
• if result >= truth, consider it correct
• if result < truth, consider it wrong
• Loss functions
• if result > truth, take difference as error
• if result < truth, take ten times difference as error
• Sum all errors and try to build a model with the least amount of error

APPLICATIONS
• Server caching systems (for speed)
• Search and Recommender Systems
• Product design and error-prioritization
• What kind of error would the user tolerate?

A/BTESTING
The road to proper A/B testing is ﬁlled with coincidences,
correlations, a comic, and fancy “null hypothesis”

WHEN A/BTESTS BECOME
HARMFUL
A bad A/B test can lead to:
• Wasting time, money and effort.
• Making the wrong decision and doing the wrong
thing.
• Inconsistent User Experience.

By ﬁnding the mind reader in the audience!
How can we avoid A/B side-effects?

COINCIDENCE
• If you pull a random answer for any question, how
many times will you be correct?
• How to build a system that makes sure you’re not
doing good coincidentally?
• How harmful can it be to extrapolate a few
samples?

CORRELATIONS
1. Alice doesn’t study and gets a full mark, Bob studies hard and fails the
exam → Studying makes you fail.
2. I lost weight and got invited to talk to you → My weight loss caused
you to invite me!
3. Whenever windmills rotate quickly, wind is strong → Windmills cause
wind
4. Sick people smell bad. → Bad odors cause diseases.
5. High altitudes are colder → Altitude causes cold

NULL HYPOTHESIS
• By default, everything is random.
• Aim: Disprove null hypothesis (Make it fail the
“random” test)
• Usually disproven with conﬁdence of 95% or 99%

FAIR COIN?
• Classical question: How many times should you toss a coin before
deciding it’s a Fair Coin?
• Answer: Depends on:
• What kind of error are you willing to tolerate?
• How confident/sure you want to be
• With 1% error and 68.27% confidence: 2,500 tosses
• With 1% error and 99.90% confidence: 27,225 tosses

A/BTESTS ARE RARELY CLEAN
Variety | Seasonality | Error | Randomness

A/BTESTING GUIDE
• Ensure random split.
• Measure current performance.
• Measure target performance (rise above noise/variation)
• Calculate number of trials needed
• Don’t Peak!

A/B SAMPLE SIZE CALCULATOR
http://www.evanmiller.org/ab-testing/sample-size.html
read the article here: http://www.evanmiller.org/how-not-to-run-an-ab-test.html

USES
• Duplicate detection
• Recommender Systems
• Clustering and Segmentation
• A lot of other applications

SIMILARITY CRITERIA
• Attributes
• Users’ interaction
• Interacts with both together
• Interacts with both sequentially

BASIC RECOMMENDER
• Create co-occurrence table between Product A
and all other products
• When user is on Product#A, show Related Items.

SOPHISTICATED
RECOMMENDER
• Time: New (fresh) products appear ﬁrst
• Location: Products physically near user appear ﬁrst
• Context:Where is the user seeing the result? mobile / web / extension / chatbot?
• History: How did user interact with this category/brand/tag before?
• Price-Sensitivity: How price-sensitive is the user?
• Ephemeral:Are the current searches & browsing habits converging around a certain pattern?
• Related: How did the user react to similar products?
• Quality: Given an internal quality measure, how good is this product?

CONCLUSIONS
• Raw data matters, a lot! UseTableau or a similar
product.
• Always ask one more question.
• Question data sources.
• Build a culture around data possibilities.

WATCH / LISTEN / READ
• Book: ”How to lie with statistics” : http://a.co/0DIGMwt
• Podcast:“Data Driven Product Management AtYammer” — http://bit.ly/
data-driven-yammer
• Choice, happiness and spaghetti sauce | Malcolm Gladwell — https://
youtu.be/iIiAAhUeR6Y
• The BayesianTrap: https://youtu.be/R13BD8qKeTg
• Scientiﬁc Studies: Last WeekTonight with John Oliver (HBO): https://
youtu.be/0Rnq1NpHdmw
• The Future of Product Management — Janice Fraser: https://youtu.be/
f116MblyZbQ

THANKYOU!
Questions?
Mahmoud Jalajel | @mjalajel

Data Science Toolkit for Product Managers

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Data Science Toolkit for Product Managers

Similar to Data Science Toolkit for Product Managers (20)

More from Mahmoud Jalajel

More from Mahmoud Jalajel (6)

Recently uploaded

Recently uploaded (20)

Data Science Toolkit for Product Managers