SlideShare a Scribd company logo
1 of 20
© 2015 RentPath, LLC. All rights reserved.
Data Science
Péter Molnár <pmolnar@rentpath.com>
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
A design process
2
Purpose
•Determine the
purpose of the
database
•This helps prepare
for the remaining
steps.
Organize
Information
•Find and organize
the information
required
•Gather all of the
types of information
to record in the
database
Tables
•Divide information
items into major
entities or subjects,
such as Products or
Orders.
•Each subject then
becomes a table.
Columns
•Decide what
information needs
to be stored in each
table. Each item
becomes a field,
and is displayed as a
column in the table.
Primary Keys
•The primary key is a
column, or a set of
columns, that is
used to uniquely
identify each row.
Relationships
•Look at each table
and decide how the
data in one table is
related to the data
in other tables.
•Add fields to tables
or create new
tables to clarify the
relationships, as
necessary.
Design
Refinement
•Analyze the design
for errors.
•Create tables and
add a few records of
sample data.
•Check results.
•Make adjustments
to the design, as
needed.
Normalization
•Apply the data
normalization rules
to see if tables are
structured correctly.
•Make adjustments
to the table
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Cross Industry Standard Process for Data Mining
(CRISP-DM)
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
3
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Learning
4
• Supervised:
• Un-supervised Learning
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. 5
Areas of Machine Learning
Discrete Level Target Continuous Value Target
Supervised
Classification Regression
Supervised
Unsupervised
Clustering Association Rules
Unsupervised
Disjoint Segmentation
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Star Schema vs. Vector Space
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Churn Prediction
7
• Who is at risk to leave?
• Why are customers leaving?
• Understanding factors that contribute to churn
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Data Collection & Feature Selection/Engineering
8
Churn Stay
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Training & Validation
9
yX Training
⎲⍙
⎳⍦
y
X Test
⎲⍙
⎳⍦
Y
Predicted
Churn Stay
Actual
Churn
True Positive
We said customer
churns and they left
False Negative
They left, we didn’t
see it coming
Stay
False Positive
We thought they
would leave but
they stayed
True Negative
We know our loyal
customers
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Evaluation
10
Predicted
Churn Stay
Actual
Churn
True Positive = 201
2.01%
We said customer churns and they left
False Negative = 99
0.99%
They left, we didn’t see it coming
Recall Churn
= 0.67
Stay
False Positive = 485
4.85 %
We thought they would leave but they
stayed
True Negative = 9215
92.15%
We know our loyal customers
Recall Stay
= 0.95
Precision Churn = 0.29 Precision Stay = 0.99
• Let’s assume 3% out of 10,000 customers churn
• The model produces the above results
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Why did they leave?
• Many Machine Learning Algorithms are like a magic black box:
we don’t know what’s going on inside
• Sometimes it’s not about creating a the best model but trying to
understand the factors that contribute to churn.
11
Decision Tree Logistic Regression
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Recommendation Engine
12
Customize consumers experience
Categorize properties for predictive modeling
© 2015 RentPath, LLC. All rights reserved.
Which properties are similar?
• Compare properties by
– Price point
– Amenities
– Style
– Location
– Size
– and many other qualities
• Let Consumers define
– Ratings of various properties by same
consumer
– Lead submission by same consumer
to different properties
13
Pros • Assumption of similarity should
hold for new inventory
• Unbiased
Cons • Biased
• Our assumption about similarity
might be wrong
• Requires history data of consumer
preferences
• Fails on new inventory
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
⎲⍙
⎳⍦
Collaborative Filter
14
Items
Users
⭐️ ⭐️⭐️
⭐️ ⭐️⭐️
⭐️⭐️ ⭐️
⭐️⭐️⭐️ ⭐️
⎲⍙
⎳⍦
Item Attributes
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Image Classification
15
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Alex Net
16
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
ImageNet http://image-net.org
17
Places205-AlexNet: CNN trained on 205 scene categories of Places Database
(used in NIPS'14) with ~2.5 million images.
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Summary
18
Discrete Level Target
Continuous Value
Target
Supervised
Classification Regression
Supervised
Unsupervised
Clustering
Association
Rules
Unsupervised
Disjoint Segmentation
⎲
⍙
⎳
⍦
y
⎲
⍙
⎳
⍦
Y
TP FN
FP TN
Péter Molnár <pmolnar@rentpath.com>
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved.
Summary
19
Discrete Level Target
Continuous Value
Target
Supervised
Classification Regression
Supervised
Unsupervised
Clustering
Association
Rules
Unsupervised
Disjoint Segmentation
⎲
⍙
⎳
⍦
y
⎲
⍙
⎳
⍦
Y
TP FN
FP TN
Péter Molnár <pmolnar@rentpath.com>
© 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. 20

More Related Content

Similar to How Data Scientists Work

Sabre: Mastering a strong foundation for operational excellence and enhanced ...
Sabre: Mastering a strong foundation for operational excellence and enhanced ...Sabre: Mastering a strong foundation for operational excellence and enhanced ...
Sabre: Mastering a strong foundation for operational excellence and enhanced ...Orchestra Networks
 
Achieving Procurement Excellence in the Retail Industry
Achieving Procurement Excellence in the Retail IndustryAchieving Procurement Excellence in the Retail Industry
Achieving Procurement Excellence in the Retail IndustrySAP Ariba
 
Better Estimation Through Estimation Process Improvement - Dan Galorath
Better Estimation Through Estimation  Process Improvement - Dan GalorathBetter Estimation Through Estimation  Process Improvement - Dan Galorath
Better Estimation Through Estimation Process Improvement - Dan GalorathNesma
 
Turn Your P2P Function into a Strategic Profit Center
Turn Your P2P Function into a Strategic Profit CenterTurn Your P2P Function into a Strategic Profit Center
Turn Your P2P Function into a Strategic Profit CenterSAP Ariba
 
Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...
Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...
Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...Turing Fest
 
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...VMware Tanzu
 
Agile Analysis Techniques by Harlan Bennett and Kevin Pious
Agile Analysis Techniques by Harlan Bennett and Kevin PiousAgile Analysis Techniques by Harlan Bennett and Kevin Pious
Agile Analysis Techniques by Harlan Bennett and Kevin PiousExcella
 
Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...
Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...
Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...Business of Software Conference
 
BI and Dashboarding Best Practices
 BI and Dashboarding Best Practices BI and Dashboarding Best Practices
BI and Dashboarding Best PracticesRocket Software
 
KScope 14 Implementing HFM - The Rock Opera
KScope 14 Implementing HFM - The Rock OperaKScope 14 Implementing HFM - The Rock Opera
KScope 14 Implementing HFM - The Rock OperaAlithya
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsIronside
 
[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...
[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...
[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...500 Startups
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...DataKitchen
 
Forget your feelings: How to actually quantify your buyer personas - SaaSFest...
Forget your feelings: How to actually quantify your buyer personas - SaaSFest...Forget your feelings: How to actually quantify your buyer personas - SaaSFest...
Forget your feelings: How to actually quantify your buyer personas - SaaSFest...Price Intelligently
 
IwP Delivering the social housing digital dividend NHF FC mar 2018
IwP   Delivering the social housing digital dividend NHF FC mar 2018IwP   Delivering the social housing digital dividend NHF FC mar 2018
IwP Delivering the social housing digital dividend NHF FC mar 2018Golden Marzipan
 
Spend Analysis: Let Your Numbers Do the Talking | Ariba LIVE Rome
Spend Analysis: Let Your Numbers Do the Talking | Ariba LIVE RomeSpend Analysis: Let Your Numbers Do the Talking | Ariba LIVE Rome
Spend Analysis: Let Your Numbers Do the Talking | Ariba LIVE RomeSAP Ariba
 
Strategy and roadmap slides
Strategy and roadmap slidesStrategy and roadmap slides
Strategy and roadmap slidesData Blueprint
 

Similar to How Data Scientists Work (20)

Sabre: Mastering a strong foundation for operational excellence and enhanced ...
Sabre: Mastering a strong foundation for operational excellence and enhanced ...Sabre: Mastering a strong foundation for operational excellence and enhanced ...
Sabre: Mastering a strong foundation for operational excellence and enhanced ...
 
Achieving Procurement Excellence in the Retail Industry
Achieving Procurement Excellence in the Retail IndustryAchieving Procurement Excellence in the Retail Industry
Achieving Procurement Excellence in the Retail Industry
 
Better Estimation Through Estimation Process Improvement - Dan Galorath
Better Estimation Through Estimation  Process Improvement - Dan GalorathBetter Estimation Through Estimation  Process Improvement - Dan Galorath
Better Estimation Through Estimation Process Improvement - Dan Galorath
 
Turn Your P2P Function into a Strategic Profit Center
Turn Your P2P Function into a Strategic Profit CenterTurn Your P2P Function into a Strategic Profit Center
Turn Your P2P Function into a Strategic Profit Center
 
Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...
Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...
Patrick Campbell — How to Build Actual Customer-Driven Product (Turing Festiv...
 
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
 
Agile Analysis Techniques by Harlan Bennett and Kevin Pious
Agile Analysis Techniques by Harlan Bennett and Kevin PiousAgile Analysis Techniques by Harlan Bennett and Kevin Pious
Agile Analysis Techniques by Harlan Bennett and Kevin Pious
 
Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...
Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...
Patrick Campbell, Why a SaaS Pricing Consultancy Gives Away Free Software, Bo...
 
BI and Dashboarding Best Practices
 BI and Dashboarding Best Practices BI and Dashboarding Best Practices
BI and Dashboarding Best Practices
 
KScope 14 Implementing HFM - The Rock Opera
KScope 14 Implementing HFM - The Rock OperaKScope 14 Implementing HFM - The Rock Opera
KScope 14 Implementing HFM - The Rock Opera
 
Data-Quality-OSI-Days
Data-Quality-OSI-DaysData-Quality-OSI-Days
Data-Quality-OSI-Days
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
 
[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...
[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...
[WMD2016] Price Intelligently >> Patrick Campbell "What you need to know to h...
 
The Death of The Unpaid Internship
The Death of The Unpaid Internship The Death of The Unpaid Internship
The Death of The Unpaid Internship
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
 
Forget your feelings: How to actually quantify your buyer personas - SaaSFest...
Forget your feelings: How to actually quantify your buyer personas - SaaSFest...Forget your feelings: How to actually quantify your buyer personas - SaaSFest...
Forget your feelings: How to actually quantify your buyer personas - SaaSFest...
 
Triple S - Shared Services Standard
Triple S - Shared Services StandardTriple S - Shared Services Standard
Triple S - Shared Services Standard
 
IwP Delivering the social housing digital dividend NHF FC mar 2018
IwP   Delivering the social housing digital dividend NHF FC mar 2018IwP   Delivering the social housing digital dividend NHF FC mar 2018
IwP Delivering the social housing digital dividend NHF FC mar 2018
 
Spend Analysis: Let Your Numbers Do the Talking | Ariba LIVE Rome
Spend Analysis: Let Your Numbers Do the Talking | Ariba LIVE RomeSpend Analysis: Let Your Numbers Do the Talking | Ariba LIVE Rome
Spend Analysis: Let Your Numbers Do the Talking | Ariba LIVE Rome
 
Strategy and roadmap slides
Strategy and roadmap slidesStrategy and roadmap slides
Strategy and roadmap slides
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

How Data Scientists Work

  • 1. © 2015 RentPath, LLC. All rights reserved. Data Science Péter Molnár <pmolnar@rentpath.com>
  • 2. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. A design process 2 Purpose •Determine the purpose of the database •This helps prepare for the remaining steps. Organize Information •Find and organize the information required •Gather all of the types of information to record in the database Tables •Divide information items into major entities or subjects, such as Products or Orders. •Each subject then becomes a table. Columns •Decide what information needs to be stored in each table. Each item becomes a field, and is displayed as a column in the table. Primary Keys •The primary key is a column, or a set of columns, that is used to uniquely identify each row. Relationships •Look at each table and decide how the data in one table is related to the data in other tables. •Add fields to tables or create new tables to clarify the relationships, as necessary. Design Refinement •Analyze the design for errors. •Create tables and add a few records of sample data. •Check results. •Make adjustments to the design, as needed. Normalization •Apply the data normalization rules to see if tables are structured correctly. •Make adjustments to the table
  • 3. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Cross Industry Standard Process for Data Mining (CRISP-DM) https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining 3
  • 4. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Learning 4 • Supervised: • Un-supervised Learning
  • 5. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. 5 Areas of Machine Learning Discrete Level Target Continuous Value Target Supervised Classification Regression Supervised Unsupervised Clustering Association Rules Unsupervised Disjoint Segmentation
  • 6. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Star Schema vs. Vector Space
  • 7. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Churn Prediction 7 • Who is at risk to leave? • Why are customers leaving? • Understanding factors that contribute to churn
  • 8. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Data Collection & Feature Selection/Engineering 8 Churn Stay
  • 9. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Training & Validation 9 yX Training ⎲⍙ ⎳⍦ y X Test ⎲⍙ ⎳⍦ Y Predicted Churn Stay Actual Churn True Positive We said customer churns and they left False Negative They left, we didn’t see it coming Stay False Positive We thought they would leave but they stayed True Negative We know our loyal customers
  • 10. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Evaluation 10 Predicted Churn Stay Actual Churn True Positive = 201 2.01% We said customer churns and they left False Negative = 99 0.99% They left, we didn’t see it coming Recall Churn = 0.67 Stay False Positive = 485 4.85 % We thought they would leave but they stayed True Negative = 9215 92.15% We know our loyal customers Recall Stay = 0.95 Precision Churn = 0.29 Precision Stay = 0.99 • Let’s assume 3% out of 10,000 customers churn • The model produces the above results
  • 11. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Why did they leave? • Many Machine Learning Algorithms are like a magic black box: we don’t know what’s going on inside • Sometimes it’s not about creating a the best model but trying to understand the factors that contribute to churn. 11 Decision Tree Logistic Regression
  • 12. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Recommendation Engine 12 Customize consumers experience Categorize properties for predictive modeling
  • 13. © 2015 RentPath, LLC. All rights reserved. Which properties are similar? • Compare properties by – Price point – Amenities – Style – Location – Size – and many other qualities • Let Consumers define – Ratings of various properties by same consumer – Lead submission by same consumer to different properties 13 Pros • Assumption of similarity should hold for new inventory • Unbiased Cons • Biased • Our assumption about similarity might be wrong • Requires history data of consumer preferences • Fails on new inventory
  • 14. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. ⎲⍙ ⎳⍦ Collaborative Filter 14 Items Users ⭐️ ⭐️⭐️ ⭐️ ⭐️⭐️ ⭐️⭐️ ⭐️ ⭐️⭐️⭐️ ⭐️ ⎲⍙ ⎳⍦ Item Attributes
  • 15. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Image Classification 15
  • 16. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Alex Net 16 https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
  • 17. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. ImageNet http://image-net.org 17 Places205-AlexNet: CNN trained on 205 scene categories of Places Database (used in NIPS'14) with ~2.5 million images.
  • 18. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Summary 18 Discrete Level Target Continuous Value Target Supervised Classification Regression Supervised Unsupervised Clustering Association Rules Unsupervised Disjoint Segmentation ⎲ ⍙ ⎳ ⍦ y ⎲ ⍙ ⎳ ⍦ Y TP FN FP TN Péter Molnár <pmolnar@rentpath.com>
  • 19. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. Summary 19 Discrete Level Target Continuous Value Target Supervised Classification Regression Supervised Unsupervised Clustering Association Rules Unsupervised Disjoint Segmentation ⎲ ⍙ ⎳ ⍦ y ⎲ ⍙ ⎳ ⍦ Y TP FN FP TN Péter Molnár <pmolnar@rentpath.com>
  • 20. © 2015 RentPath, LLC. All rights reserved.© 2017 RentPath, LLC. All rights reserved. 20

Editor's Notes

  1. Data Science The talk aims to give an overview of common topics data science and how they may relate to database administration. Often the required data live on multiple database systems in structures that are different from the format that the analytics algorithm requires. On the large scale this can add pressure on the existing DB infrastructure, and slow down the data extraction process. Optimally, one would consider those requirement in the DB design process. However, many data science projects are for exploratory purposes and have a short lifespan. Dr. Péter Molnár is a data scientist at RentPath, LLC and faculty at the Institute of Insight at Georgia State University. As a academic and business professional, he is advancing and applying data science theories and tools in both public and private domains, including research in robotics, artificial intelligence, and machine learning.
  2. Data transformation: Star schema vs vector space