These are the slides from my presentation at the excellent AI/Game conference in Vienna July 2014 (http://gameaiconf.com/), covering two topics: 1) a broad introduction to the practice of game analytics; 2) a description of how to use cluster analysis to build profiles of players based on game telemetry.
3. Analytics is the process of Business Intelligence
Analytics is the process of discovering and communicating
patterns in data towards solving problems in business
Supporting decision management
Driving action
Improving performance
Research & Development
- Or for purely frivolous and artistic reasons!
4. Game analytics = subdomain of analytics: game development
and game research
The game as a product: user experience, behavior, revenue,
system …
The game as a project: the process of developing the game
Evidence-driven support for decision making
5. Strategic GA:The global view on how a game should evolve
based on analysis of user behavior and the business model.
Defining a monetization model, scoping DLC
Tactical GA: informs game design at the short-term,
A/B test of a new game feature, prediction, profiling
Operational GA: Analysis and evaluation of the immediate,
current situation in the game.
Removing a bug, adapting game to user behavior in real-time,
reacting to cheating/piracy
7. User metrics
Metrics related to the users, viewing them as either
customers (revenue sources) or players (behaviors)
Metrics related to the game system interacting with players
(e.g. AI director)
Metrics related to artificial agents behaving as players (bots,
mobs, etc.)
8. ARPU, DAU, MAU -> customer focus -> revenue goal
Avg. Playtime, Completion Rate -> player focus -> user
experience goal
Navigation, strategies, responses, agent behavior -> system
focus -> game optimization
11. Hypothesis testing
Confirming ideas/looking for confirmation
Example: classification
Problem: require existing knowledge/theory
Explorative analysis
We do not know what is going on/want to find out what is going on
Example: clustering
Problem: “feature creep” -> resource demanding
12. In practice often …
Mixture of explorative and hypothesis-driven
(and highly iterative)
Also consider:
Descriptive stats enough? (KISS)
Spatial – non-spatial solution?
Sampling or not?
Static vs. dynamic deliverable?
13.
14. Predefined metrics (DAU, MAU…)?
Useful for KPIs
Cannot stand on their own
Says nothing beyond themselves (no exploration)
Emergent patterns: temporal dynamics
“Top-down and core out”: user experience, system, revenue
Common mechanics -> genre/game specific
Core mechanics -> peripheral mechanics
15. Law of Diminishing Returns
Look at players and system together
Keep an eye out for unexpected behaviors
(clustering)
19. Dealing with high-dimensional, (massive) datasets
Clustering is used for reducing dimensionality and finding
commonalities
Explorative: What is going on?Why?
The most common questions in game analytics …
20. Formal: Group objects so that intra-cluster similarity is high and inter-
cluster similarity is low.
Practice: Interpretable clusters that accurately encapsulate
player/AI/system/X behavior
21.
22. Compare and benchmark games
Understanding player behavior
Evaluate if there is sufficient variety in gameplay
Detect churn behavior and link with root causes
Discover what playstyles people use
Detecting extreme behaviors
23. Develop profiles for adaptative systems
Skill profiling -> then cater to them via AI
Experience management/experience personalization
Finding the factors most important to characterize user behavior
Monitor development in players´ profiles to track behavior changes:
target novice -> expert
▪ Useful for evaluating learning curves fx.
25. Each different playstyles, and different things that keep them in
the game:
”Driver”: drives, flies, sails – all the time and favors maps with
vehicles
”Assassin”: kills – afar or close – no vehicles!
”Target dummies”: unskilled novices – high dropout unless they
quickly transfer to another cluster
26.
27.
28. Many algorithms tailored to different problems in different fields
What a ”cluster” is varies depending on the model
Any assignment of objects to clusters can be either hard
or soft.
Hard: Player belongs to the “Rainbowdash” cluster
Soft: Player belongs 73% to “Rainbowdash”, 27% to “Fluttershy”
29. NOT an automatic process – iterative procedure, human
decisions
Same data can lead to different outcomes depending on
algorithm and parameters
No “correct” algorithm
Established model can (potentially) be automated
30. Hierarchical clustering : agglomerative methods based on
proximities
K-means: popular, simple, intuitive, but …
Clusters represent averages and are not always interpretable in
terms of the behavior of “real” objects/players
Centroid clustering: represent clusters in terms of central
vectors which do not need to be actual objects
31. Distribution-based clustering: uses statistical distribution
models such as Gaussian mixtures.
Clusters reflect how likely it is that objects belong to the same
distribution.
Density clustering: determines areas of high density and
applies local density estimators so that clusters (i.e. regions in
data space) can have arbitrary shapes.
32. 1) Are the data high-dimensional and/or sparse?
If so, consider models tailored to sparse data (AA
or NMF).
33. 2)What is the overall goal?To build general models of behavior or
to detect extreme behaviors (e.g. cheating, gold-farming)?
For the former, consider centroid-seeking models (k-means, k-
medoids)
For the latter, consider models such as AA.
34. 3) Are the data numerical or relational?
For the latter, use spectral clustering or kernel methods.
35. 4) Are the players tightly grouped in the data space?
If so, k-means might have difficulties distinguishing them.
If so, consider density-based approaches that do not operate on
Euclidean distances.
36. 5) Are the data noisy?
If so density-based methods might be appropriate as they are
better tunable
(but also require more knowledge/expertise)
37. Validation: validating clusters
Interpretation: what do the clusters signify?
Time/progress-dependency: players change behavior and
progress
Data type mixing: normalization generally advisable
38. Handout: Reference list covering the topics of this presentation
(at reception desk, also on andersdrachen.com)
Clustering Game Behavior Data – a more detailed guide
(printed copies at reception, arxiv.com)
Introducing Clustering I-IV – a beginners guide (blog.gameanalytics.com;
gamasutra.com)
anders@gameanalytics.com / @andersdrachen