SlideShare a Scribd company logo
1 of 27
Download to read offline
From Non-Paying to Premium:
Predicting User Conversion
in Video Games with
Ensemble Learning
Anna Guitart, Shi Hui Tan, Ana Fernández del Río, Pei Pei Chen and África Periáñez
Yokozuna Data, a Keywords Studio
The 2019 Workshop on User Experience of Artificial Intelligence in Games
FGD 2019 San Luis Obispo, CA | 26th August 2019
A KEYWORDS STUDIO
YOKOZUNAdata
Ensemble learning, time-series forecasting,
sequential analysis methods and validation techniques.
MSc Theoretical Physics | MSc Artificial Intelligence
Co-author of 10 peer-reviewed articles in Game Data Science
Anna Guitart, MSc
Data Scientist
WHAT IS
YOKOZUNA DATA
Founded in 2015 inside Silicon Studio, joined Keywords Studios in 2018
to push back the frontiers of General Behavioral Machine Learning
and to revolutionize video-game industry: Personalized games
OPERATIONAL
PLAYER
BEHAVIOURAL
PREDICTION
The first recommendation
system and prediction
platform that utilizes
next-gen AI algorithms
to move game development
into the future.
DEEP
LEARNING 

AND
ENSEMBLE
LEARNING
Deep Learning and
Ensemble Learning
provide individual
behavioral predictions,
simulations and
personalized challenges
and rewards.
BIG DATA
Utilizing the latest
techniques in
big data processing and
cloud computing,
YOKOZUNA data scales
to datasets of any size.
From Non-Paying to Premium:
Predicting User Conversion in
Video Games with Ensemble Learning
CHALLENGE: Prediction of user conversion
When are players going to become paying users?
Survival Analysis
Time-to-event modelling
“Censoring” (dataset with incomplete information)
Classical methods, like regressions, are appropriate
when all individuals have suffered the “event”
Survival analysis methods do not follow any
particular statistical distribution: fitted from the data
EVENT
Churn Prediction1,2,3: when a player leaves the game
difficult to determine the moment of the event
User conversion: non-paying user to paying user
event happens when the user first makes a purchase
time-to-first-purchase in terms of days1,2, level3, played hours3
1) Rothenbuehler J. et al., 2015. Hidden markov models for churn prediction.
2) Periáñez A. et al., 2016. Churn prediction in mobile social games: towards a complete assessment using survival ensembles.
3) Bertens P. et al., 2016. Games and big data: a scalable multi-dimensional churn prediction model.
MODELS
COX REGRESSION4
Cox Regression4 (proportional hazards regression)
Semi-parametric model.
Fixed link between output and covariates (linear-exponential relation):
assumption of a constant hazard
(hazard functions for any two individuals at any point in time are proportional)
h0 - baseline hazard function (failure rate)
xi - covariates
β - regression coefficients
4) Cox. D.R., 1972. Regression Models and Life-Tables.
SURVIVAL TREE
Split the feature space recursively
Based on survival statistical criterion the root
node is divided in two daughter nodes
Maximize the survival difference between nodes
A single tree produces instability predictions
SURVIVAL ENSEMBLES
Make use of hundreds of trees
Outstanding predictions
Robust information about variable importance
Rather stable in front of overfitting
Less biased approach
CONDITIONAL INFERENCE SURVIVAL ENSEMBLES5
(Conditional Inference Forest)
Fully non-parametric tree-based method.
It uses a weighted Kaplan-Meier estimate as a splitting criterion.
Two steps algorithm (conditional inference trees):
1) the optimal split variable is selected: association between covariates and response
2) the optimal split point is determined by comparing two-sample linear statistics for all
possible partitions of the split variable
5) Hothorn T. et al., 2006. Unbiased recursive partitioning: A conditional inference framework.
RANDOM SURVIVAL FOREST (RSF)6
RSF is based on original random forest algorithm.7
Ensemble of decision trees trained using bootstrap samples, fully non-parametric.
RSF favors variables with many possible split points over variables with fewer
- Selection of the split variable and the split point is performed at the same step.
- Selection of the splitting variable at each node at random.
- The split point that maximized predefined splitting criteria (Gini impurity measure).
Ensemble is constructed using tree-based Nelson-Aalen estimators:
6) Ishwaran H. et. al, 2008. Random Survival Forests.
7) Breiman L., 2001. Random forests. Machine learning.
Random Survival Forests with competing risks8
Extension of the random survival forest considering competing risks.
Reason for not becoming PU:
1) lack of interest in purchasing
2) churning (leave the game)
8) Ishwaran C. et. al., 2014. Random survival forests for competing risks.
DATASETS
DATASETS
January 2015 - February 2017
5.32% PU
30,000 users
June 2017 - May 2018
5.30% PU
10,000 users
RPG free-to-play games
DATASETS
FEATURES
Daily records of playtime, actions, sessions, level-up.
Performing statistical operations (average, etc.).
RESPONSE VARIABLES
Lifetime: Number of days since the user’s registration date until first purchase.
Level: Latest game level reached by the player when purchasing for the first time.
Playtime: How many seconds the user played the game until first purchase.
Cumulative incidence function:
Probability to become PU
All players
Age of Ishatria
Grand Sphere
Cumulative incidence function:
Probability to become PU
Only paying users
Age of Ishatria
Grand Sphere
RESULTS
Conditional Inference
Survival Ensembles
Random Survival
Forest
Cox Regression
Scatter plots of
observed vs.
predicted “times”
of occurrence of
of the event
Becoming a PU
AGE OF

ISHTARIA
Conditional Inference
Survival Ensembles
Random Survival
Forest
Cox Regression
--logarithm--
Scatter plots of
observed vs.
predicted “times”
of occurrence of
of the event
Becoming a PU
AGE OF

ISHTARIA
RESULTS
SUMMARY AND CONCLUSIONS
Survival analysis is a suitable framework to study user
conversion in video games.
Ensemble models outperform the classical Cox regression model.
- RSF method yields slightly better predictions in terms of lifetime and
level, but critically fails at predicting playtime.
- RSF + competing risks do not have a clear positive impact.
- Conditional inference survival ensembles as the most viable model in
controlled production settings.
SUMMARY AND CONCLUSIONS
Steping towards personalization of the game experience:
Target players individually, not only based on current or past actions but
also on their future expected behavior.
Actions can be taken on players that have potential to become PUs
- to ensure they remain long enough in the game
- to accelerate conversion
Future extensions:
- Applying same approach to identify the VIP players.
- Detect conversions between different types of purchasing behavior.
THANK YOU!
aguitart@yokozunadata.com
linkedin.com/company/yokozunadata
www.yokozunadata.com
@yokozunadata

More Related Content

Similar to ACM FDG 2019, SLO, CA, USA, From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning

Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemBenjamin Murphy
 
Sonic Perceptual Crossings: A tic-tac-toe Audio Game
Sonic Perceptual Crossings: A tic-tac-toe Audio GameSonic Perceptual Crossings: A tic-tac-toe Audio Game
Sonic Perceptual Crossings: A tic-tac-toe Audio GameAndreas Floros
 
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Deren Lei
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering
 
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
On the Dynamics of Machine Learning Algorithms and Behavioral Game TheoryOn the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
On the Dynamics of Machine Learning Algorithms and Behavioral Game TheoryRikiya Takahashi
 
Computer vision series
Computer vision seriesComputer vision series
Computer vision seriesPerry Lea
 
Hardware realization of Stereo camera and associated embedded system
Hardware realization of Stereo camera and associated embedded systemHardware realization of Stereo camera and associated embedded system
Hardware realization of Stereo camera and associated embedded systemIJERA Editor
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
 
Serious games for upper limb rehabilitation following stroke
Serious games for upper limb rehabilitation following strokeSerious games for upper limb rehabilitation following stroke
Serious games for upper limb rehabilitation following strokeJames Burke
 
The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...
The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...
The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...AM Publications,India
 
Sign Language Recognition using Deep Learning
Sign Language Recognition using Deep LearningSign Language Recognition using Deep Learning
Sign Language Recognition using Deep LearningIRJET Journal
 
SIGGRAPH 2014論文紹介 - Sound & Light + Fabrication Session
SIGGRAPH 2014論文紹介 - Sound & Light + Fabrication SessionSIGGRAPH 2014論文紹介 - Sound & Light + Fabrication Session
SIGGRAPH 2014論文紹介 - Sound & Light + Fabrication Sessionyamo_o
 
[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...
[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...
[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...Jinhan Kim
 
Несанкционированный сбор информации с использованием интегрированных датчиков...
Несанкционированный сбор информации с использованием интегрированных датчиков...Несанкционированный сбор информации с использованием интегрированных датчиков...
Несанкционированный сбор информации с использованием интегрированных датчиков...Anton Boiko
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVMIRJET Journal
 
Novel approach in e learning to imbibe environmental awareness-2
Novel approach in e learning to imbibe environmental awareness-2Novel approach in e learning to imbibe environmental awareness-2
Novel approach in e learning to imbibe environmental awareness-2IAEME Publication
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksUniversity of Nantes
 

Similar to ACM FDG 2019, SLO, CA, USA, From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning (20)

Bellec cornell 2021
Bellec cornell 2021Bellec cornell 2021
Bellec cornell 2021
 
Genetic Algorithm Demonstation System
Genetic Algorithm Demonstation SystemGenetic Algorithm Demonstation System
Genetic Algorithm Demonstation System
 
Sonic Perceptual Crossings: A tic-tac-toe Audio Game
Sonic Perceptual Crossings: A tic-tac-toe Audio GameSonic Perceptual Crossings: A tic-tac-toe Audio Game
Sonic Perceptual Crossings: A tic-tac-toe Audio Game
 
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
On the Dynamics of Machine Learning Algorithms and Behavioral Game TheoryOn the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
 
Presentation of Visual Tracking
Presentation of Visual TrackingPresentation of Visual Tracking
Presentation of Visual Tracking
 
Kaggle kenneth
Kaggle kennethKaggle kenneth
Kaggle kenneth
 
Computer vision series
Computer vision seriesComputer vision series
Computer vision series
 
Hardware realization of Stereo camera and associated embedded system
Hardware realization of Stereo camera and associated embedded systemHardware realization of Stereo camera and associated embedded system
Hardware realization of Stereo camera and associated embedded system
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Serious games for upper limb rehabilitation following stroke
Serious games for upper limb rehabilitation following strokeSerious games for upper limb rehabilitation following stroke
Serious games for upper limb rehabilitation following stroke
 
The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...
The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...
The Effectiveness of using a Historical Sequence-based Predictor Algorithm in...
 
Sign Language Recognition using Deep Learning
Sign Language Recognition using Deep LearningSign Language Recognition using Deep Learning
Sign Language Recognition using Deep Learning
 
SIGGRAPH 2014論文紹介 - Sound & Light + Fabrication Session
SIGGRAPH 2014論文紹介 - Sound & Light + Fabrication SessionSIGGRAPH 2014論文紹介 - Sound & Light + Fabrication Session
SIGGRAPH 2014論文紹介 - Sound & Light + Fabrication Session
 
[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...
[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...
[Review] RAVEN: Perception-aware Optimization of Power Consumption for Mobile...
 
Несанкционированный сбор информации с использованием интегрированных датчиков...
Несанкционированный сбор информации с использованием интегрированных датчиков...Несанкционированный сбор информации с использованием интегрированных датчиков...
Несанкционированный сбор информации с использованием интегрированных датчиков...
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
Novel approach in e learning to imbibe environmental awareness-2
Novel approach in e learning to imbibe environmental awareness-2Novel approach in e learning to imbibe environmental awareness-2
Novel approach in e learning to imbibe environmental awareness-2
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian Networks
 

Recently uploaded

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

ACM FDG 2019, SLO, CA, USA, From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning

  • 1. From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning Anna Guitart, Shi Hui Tan, Ana Fernández del Río, Pei Pei Chen and África Periáñez Yokozuna Data, a Keywords Studio The 2019 Workshop on User Experience of Artificial Intelligence in Games FGD 2019 San Luis Obispo, CA | 26th August 2019 A KEYWORDS STUDIO YOKOZUNAdata
  • 2. Ensemble learning, time-series forecasting, sequential analysis methods and validation techniques. MSc Theoretical Physics | MSc Artificial Intelligence Co-author of 10 peer-reviewed articles in Game Data Science Anna Guitart, MSc Data Scientist
  • 3. WHAT IS YOKOZUNA DATA Founded in 2015 inside Silicon Studio, joined Keywords Studios in 2018 to push back the frontiers of General Behavioral Machine Learning and to revolutionize video-game industry: Personalized games
  • 4. OPERATIONAL PLAYER BEHAVIOURAL PREDICTION The first recommendation system and prediction platform that utilizes next-gen AI algorithms to move game development into the future.
  • 5. DEEP LEARNING 
 AND ENSEMBLE LEARNING Deep Learning and Ensemble Learning provide individual behavioral predictions, simulations and personalized challenges and rewards.
  • 6. BIG DATA Utilizing the latest techniques in big data processing and cloud computing, YOKOZUNA data scales to datasets of any size.
  • 7. From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning CHALLENGE: Prediction of user conversion When are players going to become paying users?
  • 8. Survival Analysis Time-to-event modelling “Censoring” (dataset with incomplete information) Classical methods, like regressions, are appropriate when all individuals have suffered the “event” Survival analysis methods do not follow any particular statistical distribution: fitted from the data
  • 9. EVENT Churn Prediction1,2,3: when a player leaves the game difficult to determine the moment of the event User conversion: non-paying user to paying user event happens when the user first makes a purchase time-to-first-purchase in terms of days1,2, level3, played hours3 1) Rothenbuehler J. et al., 2015. Hidden markov models for churn prediction. 2) Periáñez A. et al., 2016. Churn prediction in mobile social games: towards a complete assessment using survival ensembles. 3) Bertens P. et al., 2016. Games and big data: a scalable multi-dimensional churn prediction model.
  • 11. COX REGRESSION4 Cox Regression4 (proportional hazards regression) Semi-parametric model. Fixed link between output and covariates (linear-exponential relation): assumption of a constant hazard (hazard functions for any two individuals at any point in time are proportional) h0 - baseline hazard function (failure rate) xi - covariates β - regression coefficients 4) Cox. D.R., 1972. Regression Models and Life-Tables.
  • 12. SURVIVAL TREE Split the feature space recursively Based on survival statistical criterion the root node is divided in two daughter nodes Maximize the survival difference between nodes A single tree produces instability predictions SURVIVAL ENSEMBLES Make use of hundreds of trees Outstanding predictions Robust information about variable importance Rather stable in front of overfitting Less biased approach
  • 13. CONDITIONAL INFERENCE SURVIVAL ENSEMBLES5 (Conditional Inference Forest) Fully non-parametric tree-based method. It uses a weighted Kaplan-Meier estimate as a splitting criterion. Two steps algorithm (conditional inference trees): 1) the optimal split variable is selected: association between covariates and response 2) the optimal split point is determined by comparing two-sample linear statistics for all possible partitions of the split variable 5) Hothorn T. et al., 2006. Unbiased recursive partitioning: A conditional inference framework.
  • 14. RANDOM SURVIVAL FOREST (RSF)6 RSF is based on original random forest algorithm.7 Ensemble of decision trees trained using bootstrap samples, fully non-parametric. RSF favors variables with many possible split points over variables with fewer - Selection of the split variable and the split point is performed at the same step. - Selection of the splitting variable at each node at random. - The split point that maximized predefined splitting criteria (Gini impurity measure). Ensemble is constructed using tree-based Nelson-Aalen estimators: 6) Ishwaran H. et. al, 2008. Random Survival Forests. 7) Breiman L., 2001. Random forests. Machine learning.
  • 15. Random Survival Forests with competing risks8 Extension of the random survival forest considering competing risks. Reason for not becoming PU: 1) lack of interest in purchasing 2) churning (leave the game) 8) Ishwaran C. et. al., 2014. Random survival forests for competing risks.
  • 17. DATASETS January 2015 - February 2017 5.32% PU 30,000 users June 2017 - May 2018 5.30% PU 10,000 users RPG free-to-play games
  • 18. DATASETS FEATURES Daily records of playtime, actions, sessions, level-up. Performing statistical operations (average, etc.). RESPONSE VARIABLES Lifetime: Number of days since the user’s registration date until first purchase. Level: Latest game level reached by the player when purchasing for the first time. Playtime: How many seconds the user played the game until first purchase.
  • 19. Cumulative incidence function: Probability to become PU All players Age of Ishatria Grand Sphere
  • 20. Cumulative incidence function: Probability to become PU Only paying users Age of Ishatria Grand Sphere
  • 22. Conditional Inference Survival Ensembles Random Survival Forest Cox Regression Scatter plots of observed vs. predicted “times” of occurrence of of the event Becoming a PU AGE OF
 ISHTARIA
  • 23. Conditional Inference Survival Ensembles Random Survival Forest Cox Regression --logarithm-- Scatter plots of observed vs. predicted “times” of occurrence of of the event Becoming a PU AGE OF
 ISHTARIA
  • 25. SUMMARY AND CONCLUSIONS Survival analysis is a suitable framework to study user conversion in video games. Ensemble models outperform the classical Cox regression model. - RSF method yields slightly better predictions in terms of lifetime and level, but critically fails at predicting playtime. - RSF + competing risks do not have a clear positive impact. - Conditional inference survival ensembles as the most viable model in controlled production settings.
  • 26. SUMMARY AND CONCLUSIONS Steping towards personalization of the game experience: Target players individually, not only based on current or past actions but also on their future expected behavior. Actions can be taken on players that have potential to become PUs - to ensure they remain long enough in the game - to accelerate conversion Future extensions: - Applying same approach to identify the VIP players. - Detect conversions between different types of purchasing behavior.