Understanding
 Analytics is the process of Business Intelligence
 Analytics is the process of discovering and communicating
patterns in data towards solving problems in business
 Supporting decision management
 Driving action
 Improving performance
 Research & Development
 - Or for purely frivolous and artistic reasons!
 Game analytics = subdomain of analytics: game development
and game research
 The game as a product: user experience, behavior, revenue,
system …
 The game as a project: the process of developing the game
 Evidence-driven support for decision making
 Strategic GA:The global view on how a game should evolve
based on analysis of user behavior and the business model.
 Defining a monetization model, scoping DLC
 Tactical GA: informs game design at the short-term,
 A/B test of a new game feature, prediction, profiling
 Operational GA: Analysis and evaluation of the immediate,
current situation in the game.
 Removing a bug, adapting game to user behavior in real-time,
reacting to cheating/piracy
User
Vastly most common data source!
 User metrics
 Metrics related to the users, viewing them as either
customers (revenue sources) or players (behaviors)
 Metrics related to the game system interacting with players
(e.g. AI director)
 Metrics related to artificial agents behaving as players (bots,
mobs, etc.)
 ARPU, DAU, MAU -> customer focus -> revenue goal
 Avg. Playtime, Completion Rate -> player focus -> user
experience goal
 Navigation, strategies, responses, agent behavior -> system
focus -> game optimization
The Knowledge Discovery Process in GA
Attribute
definition
Data
acquisition
Data pre-
processing
Metrics
development
Analysis and
evaluation
Visualization
Reporting
Knowledge
deployment
GA Knowledge
Discovery Cycle
Hypothesis testing
Confirming ideas/looking for confirmation
Example: classification
Problem: require existing knowledge/theory
Explorative analysis
We do not know what is going on/want to find out what is going on
Example: clustering
Problem: “feature creep” -> resource demanding
In practice often …
Mixture of explorative and hypothesis-driven
(and highly iterative)
Also consider:
Descriptive stats enough? (KISS)
Spatial – non-spatial solution?
Sampling or not?
Static vs. dynamic deliverable?
 Predefined metrics (DAU, MAU…)?
 Useful for KPIs
 Cannot stand on their own
 Says nothing beyond themselves (no exploration)
 Emergent patterns: temporal dynamics
 “Top-down and core out”: user experience, system, revenue
 Common mechanics -> genre/game specific
 Core mechanics -> peripheral mechanics
 Law of Diminishing Returns
 Look at players and system together
 Keep an eye out for unexpected behaviors
(clustering)
Unexpected Behaviors
 Dealing with high-dimensional, (massive) datasets
 Clustering is used for reducing dimensionality and finding
commonalities
 Explorative: What is going on?Why?
 The most common questions in game analytics …
 Formal: Group objects so that intra-cluster similarity is high and inter-
cluster similarity is low.
 Practice: Interpretable clusters that accurately encapsulate
player/AI/system/X behavior
 Compare and benchmark games
 Understanding player behavior
 Evaluate if there is sufficient variety in gameplay
 Detect churn behavior and link with root causes
 Discover what playstyles people use
 Detecting extreme behaviors
 Develop profiles for adaptative systems
 Skill profiling -> then cater to them via AI
 Experience management/experience personalization
 Finding the factors most important to characterize user behavior
 Monitor development in players´ profiles to track behavior changes:
target novice -> expert
▪ Useful for evaluating learning curves fx.
SIVM: finding extreme profiles
 Assassins
 Veterans
 Target dummies
 Assault-Recon
 Medic-Engineer
 Driver
 Assault wannabee
 Each different playstyles, and different things that keep them in
the game:
 ”Driver”: drives, flies, sails – all the time and favors maps with
vehicles
 ”Assassin”: kills – afar or close – no vehicles!
 ”Target dummies”: unskilled novices – high dropout unless they
quickly transfer to another cluster
 Many algorithms tailored to different problems in different fields
 What a ”cluster” is varies depending on the model
 Any assignment of objects to clusters can be either hard
or soft.
 Hard: Player belongs to the “Rainbowdash” cluster
 Soft: Player belongs 73% to “Rainbowdash”, 27% to “Fluttershy”
 NOT an automatic process – iterative procedure, human
decisions
 Same data can lead to different outcomes depending on
algorithm and parameters
 No “correct” algorithm
 Established model can (potentially) be automated
 Hierarchical clustering : agglomerative methods based on
proximities
 K-means: popular, simple, intuitive, but …
 Clusters represent averages and are not always interpretable in
terms of the behavior of “real” objects/players
 Centroid clustering: represent clusters in terms of central
vectors which do not need to be actual objects
 Distribution-based clustering: uses statistical distribution
models such as Gaussian mixtures.
 Clusters reflect how likely it is that objects belong to the same
distribution.
 Density clustering: determines areas of high density and
applies local density estimators so that clusters (i.e. regions in
data space) can have arbitrary shapes.
1) Are the data high-dimensional and/or sparse?
If so, consider models tailored to sparse data (AA
or NMF).
2)What is the overall goal?To build general models of behavior or
to detect extreme behaviors (e.g. cheating, gold-farming)?
For the former, consider centroid-seeking models (k-means, k-
medoids)
For the latter, consider models such as AA.
3) Are the data numerical or relational?
For the latter, use spectral clustering or kernel methods.
4) Are the players tightly grouped in the data space?
If so, k-means might have difficulties distinguishing them.
If so, consider density-based approaches that do not operate on
Euclidean distances.
5) Are the data noisy?
If so density-based methods might be appropriate as they are
better tunable
(but also require more knowledge/expertise)
 Validation: validating clusters
 Interpretation: what do the clusters signify?
 Time/progress-dependency: players change behavior and
progress
 Data type mixing: normalization generally advisable
Handout: Reference list covering the topics of this presentation
(at reception desk, also on andersdrachen.com)
Clustering Game Behavior Data – a more detailed guide
(printed copies at reception, arxiv.com)
Introducing Clustering I-IV – a beginners guide (blog.gameanalytics.com;
gamasutra.com)
anders@gameanalytics.com / @andersdrachen
“You cannot improve what
you cannot measure”
Lord Kelvin

Understanding Game Analytics & Behavioral Clustering for Games

  • 1.
  • 3.
     Analytics isthe process of Business Intelligence  Analytics is the process of discovering and communicating patterns in data towards solving problems in business  Supporting decision management  Driving action  Improving performance  Research & Development  - Or for purely frivolous and artistic reasons!
  • 4.
     Game analytics= subdomain of analytics: game development and game research  The game as a product: user experience, behavior, revenue, system …  The game as a project: the process of developing the game  Evidence-driven support for decision making
  • 5.
     Strategic GA:Theglobal view on how a game should evolve based on analysis of user behavior and the business model.  Defining a monetization model, scoping DLC  Tactical GA: informs game design at the short-term,  A/B test of a new game feature, prediction, profiling  Operational GA: Analysis and evaluation of the immediate, current situation in the game.  Removing a bug, adapting game to user behavior in real-time, reacting to cheating/piracy
  • 6.
  • 7.
     User metrics Metrics related to the users, viewing them as either customers (revenue sources) or players (behaviors)  Metrics related to the game system interacting with players (e.g. AI director)  Metrics related to artificial agents behaving as players (bots, mobs, etc.)
  • 8.
     ARPU, DAU,MAU -> customer focus -> revenue goal  Avg. Playtime, Completion Rate -> player focus -> user experience goal  Navigation, strategies, responses, agent behavior -> system focus -> game optimization
  • 9.
  • 10.
  • 11.
    Hypothesis testing Confirming ideas/lookingfor confirmation Example: classification Problem: require existing knowledge/theory Explorative analysis We do not know what is going on/want to find out what is going on Example: clustering Problem: “feature creep” -> resource demanding
  • 12.
    In practice often… Mixture of explorative and hypothesis-driven (and highly iterative) Also consider: Descriptive stats enough? (KISS) Spatial – non-spatial solution? Sampling or not? Static vs. dynamic deliverable?
  • 14.
     Predefined metrics(DAU, MAU…)?  Useful for KPIs  Cannot stand on their own  Says nothing beyond themselves (no exploration)  Emergent patterns: temporal dynamics  “Top-down and core out”: user experience, system, revenue  Common mechanics -> genre/game specific  Core mechanics -> peripheral mechanics
  • 15.
     Law ofDiminishing Returns  Look at players and system together  Keep an eye out for unexpected behaviors (clustering)
  • 16.
  • 19.
     Dealing withhigh-dimensional, (massive) datasets  Clustering is used for reducing dimensionality and finding commonalities  Explorative: What is going on?Why?  The most common questions in game analytics …
  • 20.
     Formal: Groupobjects so that intra-cluster similarity is high and inter- cluster similarity is low.  Practice: Interpretable clusters that accurately encapsulate player/AI/system/X behavior
  • 22.
     Compare andbenchmark games  Understanding player behavior  Evaluate if there is sufficient variety in gameplay  Detect churn behavior and link with root causes  Discover what playstyles people use  Detecting extreme behaviors
  • 23.
     Develop profilesfor adaptative systems  Skill profiling -> then cater to them via AI  Experience management/experience personalization  Finding the factors most important to characterize user behavior  Monitor development in players´ profiles to track behavior changes: target novice -> expert ▪ Useful for evaluating learning curves fx.
  • 24.
    SIVM: finding extremeprofiles  Assassins  Veterans  Target dummies  Assault-Recon  Medic-Engineer  Driver  Assault wannabee
  • 25.
     Each differentplaystyles, and different things that keep them in the game:  ”Driver”: drives, flies, sails – all the time and favors maps with vehicles  ”Assassin”: kills – afar or close – no vehicles!  ”Target dummies”: unskilled novices – high dropout unless they quickly transfer to another cluster
  • 28.
     Many algorithmstailored to different problems in different fields  What a ”cluster” is varies depending on the model  Any assignment of objects to clusters can be either hard or soft.  Hard: Player belongs to the “Rainbowdash” cluster  Soft: Player belongs 73% to “Rainbowdash”, 27% to “Fluttershy”
  • 29.
     NOT anautomatic process – iterative procedure, human decisions  Same data can lead to different outcomes depending on algorithm and parameters  No “correct” algorithm  Established model can (potentially) be automated
  • 30.
     Hierarchical clustering: agglomerative methods based on proximities  K-means: popular, simple, intuitive, but …  Clusters represent averages and are not always interpretable in terms of the behavior of “real” objects/players  Centroid clustering: represent clusters in terms of central vectors which do not need to be actual objects
  • 31.
     Distribution-based clustering:uses statistical distribution models such as Gaussian mixtures.  Clusters reflect how likely it is that objects belong to the same distribution.  Density clustering: determines areas of high density and applies local density estimators so that clusters (i.e. regions in data space) can have arbitrary shapes.
  • 32.
    1) Are thedata high-dimensional and/or sparse? If so, consider models tailored to sparse data (AA or NMF).
  • 33.
    2)What is theoverall goal?To build general models of behavior or to detect extreme behaviors (e.g. cheating, gold-farming)? For the former, consider centroid-seeking models (k-means, k- medoids) For the latter, consider models such as AA.
  • 34.
    3) Are thedata numerical or relational? For the latter, use spectral clustering or kernel methods.
  • 35.
    4) Are theplayers tightly grouped in the data space? If so, k-means might have difficulties distinguishing them. If so, consider density-based approaches that do not operate on Euclidean distances.
  • 36.
    5) Are thedata noisy? If so density-based methods might be appropriate as they are better tunable (but also require more knowledge/expertise)
  • 37.
     Validation: validatingclusters  Interpretation: what do the clusters signify?  Time/progress-dependency: players change behavior and progress  Data type mixing: normalization generally advisable
  • 38.
    Handout: Reference listcovering the topics of this presentation (at reception desk, also on andersdrachen.com) Clustering Game Behavior Data – a more detailed guide (printed copies at reception, arxiv.com) Introducing Clustering I-IV – a beginners guide (blog.gameanalytics.com; gamasutra.com) anders@gameanalytics.com / @andersdrachen
  • 39.
    “You cannot improvewhat you cannot measure” Lord Kelvin