The motive of this project is to build a best performing model to predict which current non-subscribers would be more likely to convert to premium subscribers (paid subscription).
• Built and evaluated each model with training and validation data. Pre-processed data with oversampling, normalization and parameter optimization techniques
• Used a majority voting based ensemble of K Nearest Neighbors (k-NN), Support Vector Machines (SVM) and Neural Networks to develop the best cost-effective model
• Applied feature selection, and oversampling to improve model accuracy, and used ensemble cost effective techniques to trade off accuracy for lower misclassification cost
5. Techniques that worked
Data Processing
Data Preprocessing
Generate Attributes
Attribute Selection
Techniques
Attributes Selection
Optimize Parameters
Techniques
PCA, SMOTE
Voting
Normalization
Bagging, Boosting,
Stacking
Filter Examples Sampling
6. 6
SMOTE: Resampling Approach
• SMOTE -Synthetic Minority Oversampling combines Informed Oversampling of the
minority class with Random Under-sampling of the majority class.
• For each minority Sample
– Find its k-nearest minority neighbors
– Randomly select j of these neighbors
– Randomly generate synthetic samples along the lines joining the minority sample
and its j selected neighbors
*SMOTE currently yields best results as far as re-sampling and modifying probabilistic
estimate techniques (Chawla, 2003).
7. Deep Dive: SMOTE Sampling
: Minority sample
: Synthetic sample
What happens if there is a
nearby majority sample?
: Majority sample
8. Techniques that did not work
• Meta-cost
• Forward Selection
• Logistic Regression
10. Scope - Improvements
FilterExamples
Metric Change Improvement
Average
Friend Age
17 to 31 Positive
Tenure > 4 Positive
Songs
Listened
> 1 Negative
Age > 8 and <70 Negative
11. Key Learnings: Warnings
• Remove Oversampling - Bias in the data
• Generate Calculated Attributes
• complex ≠ f-measure
• Try to train your models on relatively higher
variability capturing records – Using Filter
Examples