This document summarizes research on predicting when non-paying video game users will become paying users (convert) through the use of ensemble learning techniques. It finds that survival analysis methods like Conditional Inference Survival Ensembles and Random Survival Forests outperform traditional Cox regression models at predicting the time until a user's first purchase. These ensemble methods provide more accurate predictions of conversion lifetime, level reached, and playtime compared to Cox regression. The research aims to help personalize the game experience by identifying players likely to convert in order to better engage and retain them.
ACM FDG 2019, SLO, CA, USA, From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning
1. From Non-Paying to Premium:
Predicting User Conversion
in Video Games with
Ensemble Learning
Anna Guitart, Shi Hui Tan, Ana Fernández del Río, Pei Pei Chen and África Periáñez
Yokozuna Data, a Keywords Studio
The 2019 Workshop on User Experience of Artificial Intelligence in Games
FGD 2019 San Luis Obispo, CA | 26th August 2019
A KEYWORDS STUDIO
YOKOZUNAdata
2. Ensemble learning, time-series forecasting,
sequential analysis methods and validation techniques.
MSc Theoretical Physics | MSc Artificial Intelligence
Co-author of 10 peer-reviewed articles in Game Data Science
Anna Guitart, MSc
Data Scientist
3. WHAT IS
YOKOZUNA DATA
Founded in 2015 inside Silicon Studio, joined Keywords Studios in 2018
to push back the frontiers of General Behavioral Machine Learning
and to revolutionize video-game industry: Personalized games
6. BIG DATA
Utilizing the latest
techniques in
big data processing and
cloud computing,
YOKOZUNA data scales
to datasets of any size.
7. From Non-Paying to Premium:
Predicting User Conversion in
Video Games with Ensemble Learning
CHALLENGE: Prediction of user conversion
When are players going to become paying users?
8. Survival Analysis
Time-to-event modelling
“Censoring” (dataset with incomplete information)
Classical methods, like regressions, are appropriate
when all individuals have suffered the “event”
Survival analysis methods do not follow any
particular statistical distribution: fitted from the data
9. EVENT
Churn Prediction1,2,3: when a player leaves the game
difficult to determine the moment of the event
User conversion: non-paying user to paying user
event happens when the user first makes a purchase
time-to-first-purchase in terms of days1,2, level3, played hours3
1) Rothenbuehler J. et al., 2015. Hidden markov models for churn prediction.
2) Periáñez A. et al., 2016. Churn prediction in mobile social games: towards a complete assessment using survival ensembles.
3) Bertens P. et al., 2016. Games and big data: a scalable multi-dimensional churn prediction model.
11. COX REGRESSION4
Cox Regression4 (proportional hazards regression)
Semi-parametric model.
Fixed link between output and covariates (linear-exponential relation):
assumption of a constant hazard
(hazard functions for any two individuals at any point in time are proportional)
h0 - baseline hazard function (failure rate)
xi - covariates
β - regression coefficients
4) Cox. D.R., 1972. Regression Models and Life-Tables.
12. SURVIVAL TREE
Split the feature space recursively
Based on survival statistical criterion the root
node is divided in two daughter nodes
Maximize the survival difference between nodes
A single tree produces instability predictions
SURVIVAL ENSEMBLES
Make use of hundreds of trees
Outstanding predictions
Robust information about variable importance
Rather stable in front of overfitting
Less biased approach
13. CONDITIONAL INFERENCE SURVIVAL ENSEMBLES5
(Conditional Inference Forest)
Fully non-parametric tree-based method.
It uses a weighted Kaplan-Meier estimate as a splitting criterion.
Two steps algorithm (conditional inference trees):
1) the optimal split variable is selected: association between covariates and response
2) the optimal split point is determined by comparing two-sample linear statistics for all
possible partitions of the split variable
5) Hothorn T. et al., 2006. Unbiased recursive partitioning: A conditional inference framework.
14. RANDOM SURVIVAL FOREST (RSF)6
RSF is based on original random forest algorithm.7
Ensemble of decision trees trained using bootstrap samples, fully non-parametric.
RSF favors variables with many possible split points over variables with fewer
- Selection of the split variable and the split point is performed at the same step.
- Selection of the splitting variable at each node at random.
- The split point that maximized predefined splitting criteria (Gini impurity measure).
Ensemble is constructed using tree-based Nelson-Aalen estimators:
6) Ishwaran H. et. al, 2008. Random Survival Forests.
7) Breiman L., 2001. Random forests. Machine learning.
15. Random Survival Forests with competing risks8
Extension of the random survival forest considering competing risks.
Reason for not becoming PU:
1) lack of interest in purchasing
2) churning (leave the game)
8) Ishwaran C. et. al., 2014. Random survival forests for competing risks.
17. DATASETS
January 2015 - February 2017
5.32% PU
30,000 users
June 2017 - May 2018
5.30% PU
10,000 users
RPG free-to-play games
18. DATASETS
FEATURES
Daily records of playtime, actions, sessions, level-up.
Performing statistical operations (average, etc.).
RESPONSE VARIABLES
Lifetime: Number of days since the user’s registration date until first purchase.
Level: Latest game level reached by the player when purchasing for the first time.
Playtime: How many seconds the user played the game until first purchase.
23. Conditional Inference
Survival Ensembles
Random Survival
Forest
Cox Regression
--logarithm--
Scatter plots of
observed vs.
predicted “times”
of occurrence of
of the event
Becoming a PU
AGE OF
ISHTARIA
25. SUMMARY AND CONCLUSIONS
Survival analysis is a suitable framework to study user
conversion in video games.
Ensemble models outperform the classical Cox regression model.
- RSF method yields slightly better predictions in terms of lifetime and
level, but critically fails at predicting playtime.
- RSF + competing risks do not have a clear positive impact.
- Conditional inference survival ensembles as the most viable model in
controlled production settings.
26. SUMMARY AND CONCLUSIONS
Steping towards personalization of the game experience:
Target players individually, not only based on current or past actions but
also on their future expected behavior.
Actions can be taken on players that have potential to become PUs
- to ensure they remain long enough in the game
- to accelerate conversion
Future extensions:
- Applying same approach to identify the VIP players.
- Detect conversions between different types of purchasing behavior.