Analyzed gender differences in mate selection, by evaluating survey data from speed dating experiment. Built model to predict match between two individuals based on their partner preferences. Presented findings visually using tableau
3. 3
BUSINESS CASE
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
As of today the dating industry is worth approximately $2.4 billion; of
that $1.1 billion is from online dating. About 10% of the U.S.
population visits dating sites every month which equates to
approximately 30 million unique users (with either
profiles/subscriptions). We are trying to tap into the online dating
segment by introducing speed dating virtually to a customer.
BUSINESS CASE
5. 5
BUSINESS CASE
FINANCIAL
IMPLICATIONS
SOCIAL IMPLICATIONS
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
Our two major competitors Match.com and eHarmony
respectively charge a monthly fee of $42 and $60 a
month. Currently, our speed dating events run weekly,
for which we would charge a monthly rate $48 per
person.
FINANCIAL IMPLICATIONS
Confusion Matrix Description
Financial
Impact
True Positive People that were predicted to match and did $48.00
False Positive People that were predicted to match but didn't $48.00
True Negative People that were not predicted to match and didn't $0.00
False Negative People that were not predicted to match and could've ($48.00)
6. 6
Over the last decade, individuals prefer to find a partner through a preselection process
because of certain variables such as:
• Values
• Demographics
• Safety
SOCIAL IMPLICATIONS
BUSINESS CASE
FINANCIAL
IMPLICATIONS
SOCIAL IMPLICATIONS
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
13. 13
BUSINESS CASE
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
MODELS
• Predicting match between males and females using their mutual
interests.
• Predicting the decision of males and females using their
preferences in the opposite gender.
14. 14
Type of Model Predictive
Target Variable Dec (1=yes, 0=no)
Predictive Variables See Appendix
Females Males
attr attr
shar fun
fun fun1_1
race sinc1_1
shar1_1 from
BUSINESS CASE
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
15. 15
Type of Model Predictive
Target Variable Match (1=yes, 0=no)
Predictive Variables See Appendix
Females Males
attr_o fun_o
attr attr
fun attr_o
shar shar
cat_prob_o pf_o_fun
BUSINESS CASE
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
16. 16
BUSINESS CASE
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
MODELING APPROACHES
Predicting Decision for Females
Predicting Decision for Males
Model Accuracy Precision Recall F-Score AUC
Average
Log Loss Training Log Loss
Linear Regression 0.7651 0.7204 0.5839 0.6450 0.8344 0.4810 26.7300
Boosted Decision (1 tree) 0.7452 0.6805 0.5708 0.6209 0.7982 0.5278 19.5959
Boosted Decision (100 tree) 0.8193 0.7624 0.7342 0.7481 0.8920 0.9318 -41.9357
Decision Forest 0.7532 0.6511 0.6993 0.6744 0.8387 0.7870 -19.8750
Neural Network 0.7938 0.7762 0.6122 0.6845 0.8573 0.7758 -18.1732
Model Accuracy Precision Recall F-Score AUC
Average
Log Loss Training Log Loss
Linear Regression 0.7846 0.7782 0.7638 0.7709 0.8739 0.4505 34.8875
Boosted Decision (1 tree) 0.7639 0.7174 0.8291 0.7692 0.8361 0.5025 27.3735
Boosted Decision (100 tree) 0.8291 0.8091 0.8375 0.8230 0.9085 0.8423 -21.7416
Decision Forest 0.7909 0.8224 0.7136 0.7641 0.8661 1.0454 -51.1021
Neural Network 0.7893 0.7515 0.8308 0.7892 0.8683 0.8381 -21.1342
Key Metric – Recall
Base Rate - 47%
Base Rate - 36%
17. 17
BUSINESS CASE
DATA EXPLANATION
MODELING
APPROACHES
DATA INSIGHTS
FUTURE ACTIONS
MODELING APPROACHES
Key Metric – Recall
Predicting Match for Females
Predicting Match for Males
Model Accuracy Precision Recall F-Score AUC
Average
Log Loss Training Log Loss
Linear Regression 0.8609 0.6739 0.2995 0.4147 0.8352 0.3349 25.0940
Boosted Decision (1 tree) 0.8498 0.5652 0.3768 0.4522 0.7923 0.3817 14.6349
Decision Forest 0.8482 0.5930 0.2464 0.3481 0.7814 0.8849 -97.9012
Neural Network 0.8180 0.4476 0.4541 0.4508 0.7918 0.8554 -91.3179
Boosted Decision (100 tree) 0.8386 0.5093 0.5266 0.5178 0.8264 0.6255 -39.8906
Model Accuracy Precision Recall F-Score AUC
Average
Log Loss Training Log Loss
Linear Regression 0.8510 0.5943 0.3043 0.4026 0.8318 0.3486 22.1469
Boosted Decision (1 tree) 0.8478 0.5678 0.3237 0.4123 0.7694 0.4000 10.6589
Boosted Decision (100 tree) 0.8430 0.5439 0.2995 0.3863 0.7715 1.1412 -154.8671
Neural Network 0.8478 0.5435 0.4831 0.5115 0.8287 0.3826 14.5651
Decision Forest 0.8351 0.5000 0.4251 0.4595 0.8024 0.4823 -7.7106
Base Rate -16%