2. Dataset
• Dataset consists of userstats
= user_id
features:
• First login
• Last login
• Number of games
• Number of wins
• Number of losses
• Total number of correct answers
3. Assumption and new features
Assume dataset created on a certain date (Cutoff = 20150622)
Calculated features:
• Life = Last login – First login
• Freq = Number of games / Life
• Windeep = Number of correct answers / Number of games
• Sinceplayed = Cutoff – Last login
• Stoppedplaying = IF Sinceplayed > 31 days THEN true ELSE false
4. Decision Tree (rpart) on Stoppedplaying
Good players (or winners)
Correctanswers > 2450 => Stoppedplaying = FALSE (75%)
Bad players (or quitters)
Correctanswers < 938 => Stoppedplaying = TRUE (88%)
Mediocre players (or doubters)
Most interesting group!
6. Decision tree for doubters
Introduce frequency features to the training of the model.
Result: mediocre players are more likely to keep playing if they play
(and win) less frequently or they have smaller percentage of wins
contra all played games.
It is as if mediocre players are reminded of their mediocrity often
enough - they quit! (different samples give 58% to 66%)
7. Decision tree
for doubters
• R output
fit<-rpart(stopped
~freq+winfreq+windeep
+howgood+won+correctanswers,
control=rpart.control(minsplit=30,
cp=0.01),
+ method="class",
data=mysample[,2:16])
prp(fit,varlen=20,type=4,extra=4)