SlideShare a Scribd company logo
DATA SCIENCETOOLKIT FOR
PRODUCT MANAGERS
Mahmoud Jalajel | @mjalajel
“While others may deliver deadlines for management,
product managers deliver value for users.”
ACKNOWLEDGEMENTS
• Abdallah Al-Khalidi
• Ashraf Samhouri
• Ibrahem Abu Hijleh
• Mohammad Obaidat
• Rawan Abu Khadra
• Rema Malkawi
• SereenYaseen
• Yousef Alsayeh
Thank you for sharing your experiences and
doing what product managers do best: nagging!
WHO AM ME?
• JOSA member & AIESEC alumnus
• Past Entrepreneur (Currently Undercover)
• Full-Stack Data Scientist:
• Recommender Systems
• Real-time systems
• Other Activities : NLP, Machine Learning, Programming, DevOps,
Hardware, Bash-scripting
WHY ARE WE HERE?
• Develop data intuition and argue about data
• Build data culture
• Make data-driven decisions
• Leverage data to build better products
AGENDA
• Why data?
• Reading into Data
• Data-driven products
CLARIFICATION
• Data Science is not Big Data
• Data Science sometimes uses Big Data technologies.
• Data Science is about extracting value from data
• Motivation: How can I use this data to drive more value.
• Big Data is about solving the data problem
• Motivation:“God! we have too much data our servers are crashing!”
• This talk is about statistics and data science, not big data.
Data Science
WHY DATA?
MOTIVATING QUESTIONS
• Which is better? Asking users or watching them use
the product?
• How do you get user’s feedback actually fed-back
into the product?
• How do you discover user needs?
• How do you do any of these with thousands of users?
WHY A DATA CULTURE
• You can’t improve what you can’t measure
• Data is the best equalizer: From top management to
the freshest interns
• Create accountability around data results
• It's all about culture: Build (or enforce) a data culture.
PRODUCT LIFECYCLE
Classical (iteration#1) Data-Driven (iteration#2)
Secondary Research Study Market Analyze Open Datasets
Primary Research Interviews / UserTesting Usage Data Collection
User Criteria (profiling) Demographics Segmentatin User Behavioral Profiling
Personas and Scenario Assumptions & Interviews User Clusetring
Execute: Get feedback Qualitative Feedback A/BTesting
Improve andTest Qualitative Feedback
A/BTesting
Anomalies in usage patterns
READING DATA
“READING INTO” DATA
“Three statisticians are out hunting. Bird flies up out of
the bush, and the first statistician aims and fires.
Unfortunately for them, he missed, the bullet going
about a foot below the bird.The second one fires, but
the bullet goes about a foot above the bird.
The third statistician puts down his gun and says:
‘All right! We got him!’”
ANTI-PATTERN:AVERAGE
• Average reduces amount of information into a single
misleading figure.
• average(0, 50, 100) = 50
• average (49, 50, 51) = 50
• average (8, 9, 9, 10, 14, 250) = 50
• average (0, 0, 0, 0, 0, 0, 0, 0, 0, 500) = 50
MEDIAN & PERCENTILES
• Percentiles guarantee
data order and guard
against outliers
• Median:Value occuring
at the center of ordered
values.
• Xth
Percentile: Number
larger than X% of data
Average Median 90th P
0,50,100 50 50 50
49,50,51 50 50 50
8, 9, 9,
10, 14,
250
50 9.5 14
0,0,…
500
50 0 0
STANDARD DEVIATION
• Used for normally-
distributed dataest
• How spread-out
dataset is?
Average σ Median 90th P
0, 50,
100
50 50 50 50
49, 50,
51
50 1 50 50
8, 9, 9,
10, 14,
250
50 98 9.5 14
0,0,…
500
50 158.1 0 0
STANDARD DEVIATIONS
DOESN’T EVERYTHING FOLLOWS
THE NORMAL DISTRIBUTION?
• Mean and Standard Deviation assumes that data is
normally distributed (has a bell-curve figure)
• In normally-distributed datasets: mean, median, and
mode are all the same.
• Most data is thought to be normally-distributed,
but it actually is not!
BOX PLOTS
BOXPLOT
DETAILS
APPLICATIONS
• Service-Level and Server up time:
• On average, each API call will take 200ms
• 80% of calls under 100ms, 95% of calls under 200ms
• Paying customers:
• On average, each user pays $7
• Segment users, remove outliers and represent them with percentiles:
• 90% of basic users pay $5 or more per month
• 90% of premium users pay $13 or more per month
• One outlier paid us $200 last month. Interesting, let’s investigate!
ANTI-PATTERN:ACCURACY
• Accuracy also compresses information into misleading and usually useless figure.
• Example: If 1% of your email is spam.
• Solution#1: Marks all emails as spam → 100% accurate.
• Solution#2: Marks all emails as non-spam → 99% accurate.
• We care a lot about:
• What kind of error happened?
• Can we tolerate it?
• Are all errors born equal? Assign cost per error type.
CONFUSION MATRIX
TheTruth (Facts)
“Pregnancy”
TRUE (Pregnant) FALSE (Not-Pregnant)
My Guess
“Pregnancy
Test”
Positive
(Yes)
True Positive (TP)
Pregnant Woman!
False Positive (FP)
Pregnant Man!
Negative
(No)
False Negative (FN)

Pregnant Woman told she
isn’t!
True Negative (TN)
Non-Pregnant Man!
PRECISION AND RECALL
• Accuracy = (TP + TN) / All
• Treats FP and FN equally
• Precision = TP / (TP + FP)
• 100% when FP=0 (no errors returned)
• Useful for search results, sensitive and important information
• “I’d rather say nothing than tell a lie! or embarrass myself with a wrong answer”
• Recall = TP / (TP + FN)
• 100% when FN=0 (When all correct results are returned)
• Useful for passive interactions like recommender systems and loose-searching (similar items)
• “I won’t hide anything from you, even the useless details”
PRECISIONVS.
RECALL
And why you can’t have both!
OTHER MEASURES
• Weighted errors
• if result >= truth, consider it correct
• if result < truth, consider it wrong
• Loss functions
• if result > truth, take difference as error
• if result < truth, take ten times difference as error
• Sum all errors and try to build a model with the least amount of error
APPLICATIONS
• Server caching systems (for speed)
• Search and Recommender Systems
• Product design and error-prioritization
• What kind of error would the user tolerate?
A/BTESTING
The road to proper A/B testing is filled with coincidences,
correlations, a comic, and fancy “null hypothesis”
WHEN A/BTESTS BECOME
HARMFUL
A bad A/B test can lead to:
• Wasting time, money and effort.
• Making the wrong decision and doing the wrong
thing.
• Inconsistent User Experience.
By finding the mind reader in the audience!
How can we avoid A/B side-effects?
COINCIDENCE
• If you pull a random answer for any question, how
many times will you be correct?
• How to build a system that makes sure you’re not
doing good coincidentally?
• How harmful can it be to extrapolate a few
samples?
CORRELATIONS
1. Alice doesn’t study and gets a full mark, Bob studies hard and fails the
exam → Studying makes you fail.
2. I lost weight and got invited to talk to you → My weight loss caused
you to invite me!
3. Whenever windmills rotate quickly, wind is strong → Windmills cause
wind
4. Sick people smell bad. → Bad odors cause diseases.
5. High altitudes are colder → Altitude causes cold
NULL HYPOTHESIS
• By default, everything is random.
• Aim: Disprove null hypothesis (Make it fail the
“random” test)
• Usually disproven with confidence of 95% or 99%
STANDARD DEVIATIONS
FAIR COIN?
• Classical question: How many times should you toss a coin before
deciding it’s a Fair Coin?
• Answer: Depends on:
• What kind of error are you willing to tolerate?
• How confident/sure you want to be
• With 1% error and 68.27% confidence: 2,500 tosses
• With 1% error and 99.90% confidence: 27,225 tosses
A/BTESTS ARE RARELY CLEAN
Variety | Seasonality | Error | Randomness
A/BTESTING GUIDE
• Ensure random split.
• Measure current performance.
• Measure target performance (rise above noise/variation)
• Calculate number of trials needed
• Don’t Peak!
A/B SAMPLE SIZE CALCULATOR
http://www.evanmiller.org/ab-testing/sample-size.html
read the article here: http://www.evanmiller.org/how-not-to-run-an-ab-test.html
SIMILARITY
USES
• Duplicate detection
• Recommender Systems
• Clustering and Segmentation
• A lot of other applications
SIMILARITY CRITERIA
• Attributes
• Users’ interaction
• Interacts with both together
• Interacts with both sequentially
BASIC RECOMMENDER
• Create co-occurrence table between Product A
and all other products
• When user is on Product#A, show Related Items.
SOPHISTICATED
RECOMMENDER
• Time: New (fresh) products appear first
• Location: Products physically near user appear first
• Context:Where is the user seeing the result? mobile / web / extension / chatbot?
• History: How did user interact with this category/brand/tag before?
• Price-Sensitivity: How price-sensitive is the user?
• Ephemeral:Are the current searches & browsing habits converging around a certain pattern?
• Related: How did the user react to similar products?
• Quality: Given an internal quality measure, how good is this product?
CONCLUSIONS
CONCLUSIONS
• Raw data matters, a lot! UseTableau or a similar
product.
• Always ask one more question.
• Question data sources.
• Build a culture around data possibilities.
WATCH / LISTEN / READ
• Book: ”How to lie with statistics” : http://a.co/0DIGMwt
• Podcast:“Data Driven Product Management AtYammer” — http://bit.ly/
data-driven-yammer
• Choice, happiness and spaghetti sauce | Malcolm Gladwell — https://
youtu.be/iIiAAhUeR6Y
• The BayesianTrap: https://youtu.be/R13BD8qKeTg
• Scientific Studies: Last WeekTonight with John Oliver (HBO): https://
youtu.be/0Rnq1NpHdmw
• The Future of Product Management — Janice Fraser: https://youtu.be/
f116MblyZbQ
THANKYOU!
Questions?
Mahmoud Jalajel | @mjalajel

More Related Content

What's hot

LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
Troy Magennis
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slides
Displayr
 
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for Anomalies
BigML, Inc
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
BigML, Inc
 
Test Design For Everyone
Test Design For EveryoneTest Design For Everyone
Test Design For Everyone
Alan Page
 
What is the story with agile data keynote agile 2018 (Magennis)
What is the story with agile data keynote   agile 2018 (Magennis)What is the story with agile data keynote   agile 2018 (Magennis)
What is the story with agile data keynote agile 2018 (Magennis)
Troy Magennis
 
UXPA DC - UX 101 Intensive Workshop - Usability Testing
UXPA DC - UX 101 Intensive Workshop - Usability TestingUXPA DC - UX 101 Intensive Workshop - Usability Testing
UXPA DC - UX 101 Intensive Workshop - Usability Testing
Stephanie Pratt
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in ML
BigML, Inc
 

What's hot (8)

LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisLKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slides
 
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for Anomalies
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
 
Test Design For Everyone
Test Design For EveryoneTest Design For Everyone
Test Design For Everyone
 
What is the story with agile data keynote agile 2018 (Magennis)
What is the story with agile data keynote   agile 2018 (Magennis)What is the story with agile data keynote   agile 2018 (Magennis)
What is the story with agile data keynote agile 2018 (Magennis)
 
UXPA DC - UX 101 Intensive Workshop - Usability Testing
UXPA DC - UX 101 Intensive Workshop - Usability TestingUXPA DC - UX 101 Intensive Workshop - Usability Testing
UXPA DC - UX 101 Intensive Workshop - Usability Testing
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in ML
 

Similar to Data Science Toolkit for Product Managers

Model validation
Model validationModel validation
Model validation
Utkarsh Sharma
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
Product School
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
Minho Lee
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
StampedeCon
 
Data Quality: Issues and Fixes
Data Quality: Issues and FixesData Quality: Issues and Fixes
Data Quality: Issues and Fixes
CRRC-Armenia
 
Agile Metrics
Agile MetricsAgile Metrics
Agile Metrics
Erik Weber
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Intro to Lean UX with UserTesting
Intro to Lean UX with UserTestingIntro to Lean UX with UserTesting
Intro to Lean UX with UserTesting
Carlos González de Villaumbrosia
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
sarahdijulio
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
RichardGroom
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
Shishir Choudhary
 
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeBrighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Craig Sullivan
 
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & TableauUnlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
Qualtrics
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Hypothesis driven development - Alexander Bertholds, APPRL
Hypothesis driven development - Alexander Bertholds, APPRLHypothesis driven development - Alexander Bertholds, APPRL
Hypothesis driven development - Alexander Bertholds, APPRL
UXDXConf
 
Moderated user testing: do's and don'ts
Moderated user testing: do's and don'tsModerated user testing: do's and don'ts
Moderated user testing: do's and don'ts
AGConsult
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
Optimizely
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
Srinath Perera
 
Analytics in business
Analytics in businessAnalytics in business
Analytics in business
Niko Vuokko
 
Bad Metric, Bad!
Bad Metric, Bad!Bad Metric, Bad!
Bad Metric, Bad!
Joseph Ours, MBA, PMP
 

Similar to Data Science Toolkit for Product Managers (20)

Model validation
Model validationModel validation
Model validation
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
 
Data Quality: Issues and Fixes
Data Quality: Issues and FixesData Quality: Issues and Fixes
Data Quality: Issues and Fixes
 
Agile Metrics
Agile MetricsAgile Metrics
Agile Metrics
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Intro to Lean UX with UserTesting
Intro to Lean UX with UserTestingIntro to Lean UX with UserTesting
Intro to Lean UX with UserTesting
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeBrighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to Me
 
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & TableauUnlock Your Data's Potential By Integrating Qualtrics & Tableau
Unlock Your Data's Potential By Integrating Qualtrics & Tableau
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Hypothesis driven development - Alexander Bertholds, APPRL
Hypothesis driven development - Alexander Bertholds, APPRLHypothesis driven development - Alexander Bertholds, APPRL
Hypothesis driven development - Alexander Bertholds, APPRL
 
Moderated user testing: do's and don'ts
Moderated user testing: do's and don'tsModerated user testing: do's and don'ts
Moderated user testing: do's and don'ts
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Analytics in business
Analytics in businessAnalytics in business
Analytics in business
 
Bad Metric, Bad!
Bad Metric, Bad!Bad Metric, Bad!
Bad Metric, Bad!
 

More from Mahmoud Jalajel

Case studies on openness and collaboration
Case studies on openness and collaborationCase studies on openness and collaboration
Case studies on openness and collaboration
Mahmoud Jalajel
 
Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government data
Mahmoud Jalajel
 
JOSA Data Science Bootcamp Overview
JOSA Data Science Bootcamp OverviewJOSA Data Science Bootcamp Overview
JOSA Data Science Bootcamp Overview
Mahmoud Jalajel
 
Technology building blocks for innovation
Technology building blocks for innovationTechnology building blocks for innovation
Technology building blocks for innovation
Mahmoud Jalajel
 
Big Data in FinTech
Big Data in FinTechBig Data in FinTech
Big Data in FinTech
Mahmoud Jalajel
 
JOSA TechTalk - Lambda architecture and real-time processing
JOSA TechTalk - Lambda architecture and real-time processingJOSA TechTalk - Lambda architecture and real-time processing
JOSA TechTalk - Lambda architecture and real-time processing
Mahmoud Jalajel
 

More from Mahmoud Jalajel (6)

Case studies on openness and collaboration
Case studies on openness and collaborationCase studies on openness and collaboration
Case studies on openness and collaboration
 
Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government data
 
JOSA Data Science Bootcamp Overview
JOSA Data Science Bootcamp OverviewJOSA Data Science Bootcamp Overview
JOSA Data Science Bootcamp Overview
 
Technology building blocks for innovation
Technology building blocks for innovationTechnology building blocks for innovation
Technology building blocks for innovation
 
Big Data in FinTech
Big Data in FinTechBig Data in FinTech
Big Data in FinTech
 
JOSA TechTalk - Lambda architecture and real-time processing
JOSA TechTalk - Lambda architecture and real-time processingJOSA TechTalk - Lambda architecture and real-time processing
JOSA TechTalk - Lambda architecture and real-time processing
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Data Science Toolkit for Product Managers

  • 1. DATA SCIENCETOOLKIT FOR PRODUCT MANAGERS Mahmoud Jalajel | @mjalajel
  • 2. “While others may deliver deadlines for management, product managers deliver value for users.”
  • 3. ACKNOWLEDGEMENTS • Abdallah Al-Khalidi • Ashraf Samhouri • Ibrahem Abu Hijleh • Mohammad Obaidat • Rawan Abu Khadra • Rema Malkawi • SereenYaseen • Yousef Alsayeh Thank you for sharing your experiences and doing what product managers do best: nagging!
  • 4. WHO AM ME? • JOSA member & AIESEC alumnus • Past Entrepreneur (Currently Undercover) • Full-Stack Data Scientist: • Recommender Systems • Real-time systems • Other Activities : NLP, Machine Learning, Programming, DevOps, Hardware, Bash-scripting
  • 5. WHY ARE WE HERE? • Develop data intuition and argue about data • Build data culture • Make data-driven decisions • Leverage data to build better products
  • 6. AGENDA • Why data? • Reading into Data • Data-driven products
  • 7. CLARIFICATION • Data Science is not Big Data • Data Science sometimes uses Big Data technologies. • Data Science is about extracting value from data • Motivation: How can I use this data to drive more value. • Big Data is about solving the data problem • Motivation:“God! we have too much data our servers are crashing!” • This talk is about statistics and data science, not big data.
  • 10. MOTIVATING QUESTIONS • Which is better? Asking users or watching them use the product? • How do you get user’s feedback actually fed-back into the product? • How do you discover user needs? • How do you do any of these with thousands of users?
  • 11. WHY A DATA CULTURE • You can’t improve what you can’t measure • Data is the best equalizer: From top management to the freshest interns • Create accountability around data results • It's all about culture: Build (or enforce) a data culture.
  • 12. PRODUCT LIFECYCLE Classical (iteration#1) Data-Driven (iteration#2) Secondary Research Study Market Analyze Open Datasets Primary Research Interviews / UserTesting Usage Data Collection User Criteria (profiling) Demographics Segmentatin User Behavioral Profiling Personas and Scenario Assumptions & Interviews User Clusetring Execute: Get feedback Qualitative Feedback A/BTesting Improve andTest Qualitative Feedback A/BTesting Anomalies in usage patterns
  • 14. “Three statisticians are out hunting. Bird flies up out of the bush, and the first statistician aims and fires. Unfortunately for them, he missed, the bullet going about a foot below the bird.The second one fires, but the bullet goes about a foot above the bird. The third statistician puts down his gun and says: ‘All right! We got him!’”
  • 15. ANTI-PATTERN:AVERAGE • Average reduces amount of information into a single misleading figure. • average(0, 50, 100) = 50 • average (49, 50, 51) = 50 • average (8, 9, 9, 10, 14, 250) = 50 • average (0, 0, 0, 0, 0, 0, 0, 0, 0, 500) = 50
  • 16.
  • 17. MEDIAN & PERCENTILES • Percentiles guarantee data order and guard against outliers • Median:Value occuring at the center of ordered values. • Xth Percentile: Number larger than X% of data Average Median 90th P 0,50,100 50 50 50 49,50,51 50 50 50 8, 9, 9, 10, 14, 250 50 9.5 14 0,0,… 500 50 0 0
  • 18. STANDARD DEVIATION • Used for normally- distributed dataest • How spread-out dataset is? Average σ Median 90th P 0, 50, 100 50 50 50 50 49, 50, 51 50 1 50 50 8, 9, 9, 10, 14, 250 50 98 9.5 14 0,0,… 500 50 158.1 0 0
  • 20. DOESN’T EVERYTHING FOLLOWS THE NORMAL DISTRIBUTION? • Mean and Standard Deviation assumes that data is normally distributed (has a bell-curve figure) • In normally-distributed datasets: mean, median, and mode are all the same. • Most data is thought to be normally-distributed, but it actually is not!
  • 23. APPLICATIONS • Service-Level and Server up time: • On average, each API call will take 200ms • 80% of calls under 100ms, 95% of calls under 200ms • Paying customers: • On average, each user pays $7 • Segment users, remove outliers and represent them with percentiles: • 90% of basic users pay $5 or more per month • 90% of premium users pay $13 or more per month • One outlier paid us $200 last month. Interesting, let’s investigate!
  • 24. ANTI-PATTERN:ACCURACY • Accuracy also compresses information into misleading and usually useless figure. • Example: If 1% of your email is spam. • Solution#1: Marks all emails as spam → 100% accurate. • Solution#2: Marks all emails as non-spam → 99% accurate. • We care a lot about: • What kind of error happened? • Can we tolerate it? • Are all errors born equal? Assign cost per error type.
  • 25. CONFUSION MATRIX TheTruth (Facts) “Pregnancy” TRUE (Pregnant) FALSE (Not-Pregnant) My Guess “Pregnancy Test” Positive (Yes) True Positive (TP) Pregnant Woman! False Positive (FP) Pregnant Man! Negative (No) False Negative (FN)
 Pregnant Woman told she isn’t! True Negative (TN) Non-Pregnant Man!
  • 26. PRECISION AND RECALL • Accuracy = (TP + TN) / All • Treats FP and FN equally • Precision = TP / (TP + FP) • 100% when FP=0 (no errors returned) • Useful for search results, sensitive and important information • “I’d rather say nothing than tell a lie! or embarrass myself with a wrong answer” • Recall = TP / (TP + FN) • 100% when FN=0 (When all correct results are returned) • Useful for passive interactions like recommender systems and loose-searching (similar items) • “I won’t hide anything from you, even the useless details”
  • 27. PRECISIONVS. RECALL And why you can’t have both!
  • 28. OTHER MEASURES • Weighted errors • if result >= truth, consider it correct • if result < truth, consider it wrong • Loss functions • if result > truth, take difference as error • if result < truth, take ten times difference as error • Sum all errors and try to build a model with the least amount of error
  • 29. APPLICATIONS • Server caching systems (for speed) • Search and Recommender Systems • Product design and error-prioritization • What kind of error would the user tolerate?
  • 30. A/BTESTING The road to proper A/B testing is filled with coincidences, correlations, a comic, and fancy “null hypothesis”
  • 31. WHEN A/BTESTS BECOME HARMFUL A bad A/B test can lead to: • Wasting time, money and effort. • Making the wrong decision and doing the wrong thing. • Inconsistent User Experience.
  • 32. By finding the mind reader in the audience! How can we avoid A/B side-effects?
  • 33. COINCIDENCE • If you pull a random answer for any question, how many times will you be correct? • How to build a system that makes sure you’re not doing good coincidentally? • How harmful can it be to extrapolate a few samples?
  • 34.
  • 35. CORRELATIONS 1. Alice doesn’t study and gets a full mark, Bob studies hard and fails the exam → Studying makes you fail. 2. I lost weight and got invited to talk to you → My weight loss caused you to invite me! 3. Whenever windmills rotate quickly, wind is strong → Windmills cause wind 4. Sick people smell bad. → Bad odors cause diseases. 5. High altitudes are colder → Altitude causes cold
  • 36.
  • 37. NULL HYPOTHESIS • By default, everything is random. • Aim: Disprove null hypothesis (Make it fail the “random” test) • Usually disproven with confidence of 95% or 99%
  • 39. FAIR COIN? • Classical question: How many times should you toss a coin before deciding it’s a Fair Coin? • Answer: Depends on: • What kind of error are you willing to tolerate? • How confident/sure you want to be • With 1% error and 68.27% confidence: 2,500 tosses • With 1% error and 99.90% confidence: 27,225 tosses
  • 40. A/BTESTS ARE RARELY CLEAN Variety | Seasonality | Error | Randomness
  • 41. A/BTESTING GUIDE • Ensure random split. • Measure current performance. • Measure target performance (rise above noise/variation) • Calculate number of trials needed • Don’t Peak!
  • 42. A/B SAMPLE SIZE CALCULATOR http://www.evanmiller.org/ab-testing/sample-size.html read the article here: http://www.evanmiller.org/how-not-to-run-an-ab-test.html
  • 43.
  • 45. USES • Duplicate detection • Recommender Systems • Clustering and Segmentation • A lot of other applications
  • 46. SIMILARITY CRITERIA • Attributes • Users’ interaction • Interacts with both together • Interacts with both sequentially
  • 47. BASIC RECOMMENDER • Create co-occurrence table between Product A and all other products • When user is on Product#A, show Related Items.
  • 48. SOPHISTICATED RECOMMENDER • Time: New (fresh) products appear first • Location: Products physically near user appear first • Context:Where is the user seeing the result? mobile / web / extension / chatbot? • History: How did user interact with this category/brand/tag before? • Price-Sensitivity: How price-sensitive is the user? • Ephemeral:Are the current searches & browsing habits converging around a certain pattern? • Related: How did the user react to similar products? • Quality: Given an internal quality measure, how good is this product?
  • 50. CONCLUSIONS • Raw data matters, a lot! UseTableau or a similar product. • Always ask one more question. • Question data sources. • Build a culture around data possibilities.
  • 51. WATCH / LISTEN / READ • Book: ”How to lie with statistics” : http://a.co/0DIGMwt • Podcast:“Data Driven Product Management AtYammer” — http://bit.ly/ data-driven-yammer • Choice, happiness and spaghetti sauce | Malcolm Gladwell — https:// youtu.be/iIiAAhUeR6Y • The BayesianTrap: https://youtu.be/R13BD8qKeTg • Scientific Studies: Last WeekTonight with John Oliver (HBO): https:// youtu.be/0Rnq1NpHdmw • The Future of Product Management — Janice Fraser: https://youtu.be/ f116MblyZbQ