expressiveintelligencestudio
Game Analytics
& Machine Learning
Guest Lecture: UCSC CMPM 146
Ben Weber
Sr. Data Scientist
Electronic Arts
beweber@ea.com
expressiveintelligencestudio UC Santa Cruz
About Me
 Ben G. Weber
 Senior Data Scientist
 Electronic Arts
 Experience
 PhD in Computer Science, UCSC
 Technical Analyst Intern, EA
 User Research Analyst, Microsoft Studios
 Ecommerce Analyst, Sony Online Entertainment
 Director of BI & Analytics, Daybreak Games
expressiveintelligencestudio UC Santa Cruz
Games & Services
expressiveintelligencestudio UC Santa Cruz
Talk Overview
 Game Analytics
 Classification Algorithms
 Strategy Prediction in StarCraft: Brood War
 Membership Conversion in DC Universe Online
 Regression Algorithms
 Player Retention in Madden NFL
 Recommendation Algorithms
 Recommendation System in EverQuest Landmark
 Data Science at EA
expressiveintelligencestudio UC Santa Cruz
What is Game Analytics?
 Using data to inform game design and improve player
lifecycles
 Gathering feedback from players, making changes, and
measuring the impact
 Example Applications
 Tuning game difficulty
 Balancing the game economy
 Content recommendations
 Computing the lifetime value of players
expressiveintelligencestudio UC Santa Cruz
What Data Can We Analyze?
 Game data
 In-Game events
 Commerce events
 Operational metrics
 User Data
 Web sessions
 Acquisition channel
 3rd Party Data
 Steam Spy
 Platform Data
expressiveintelligencestudio UC Santa Cruz
How can we analyze players?
 Descriptive statistics and exploratory data analysis
 Experimentation & Measurement
 Segmentation
 Forecasting
 Predictive analytics
expressiveintelligencestudio UC Santa Cruz
What teams use Analytics?
 Design
 Product Management
 Marketing
 Business Development
 Strategy
 Finance
 Dev Ops
 Community Management
expressiveintelligencestudio UC Santa Cruz
Analytics for Design
 Question: Identify questions about the
current design
 Record: Enumerate which data needs to be
collected and deploy the game
 Analyze: Determine if the recorded data
matches expectations
 Refine: Given the findings, analyze the design
and formulate additional questions
expressiveintelligencestudio UC Santa Cruz
Tools for Analytics
 Storage
 DBMS, Hadoop, Cloud Storage
 Reporting
 Excel, MicroStrategy, Tableau, D3
 Analysis
 SQL, R, Python, Scala
 Model Building
 R, Weka, MLlib
expressiveintelligencestudio UC Santa Cruz
Applications of Analytics in Games
 Dashboard Analytics (Metrics & KPIs)
 Spatial Analytics (Visualization)
 User Research Analytics
 Market Research Analytics
 Data Science
expressiveintelligencestudio UC Santa Cruz
Dashboard Analytics
 Dead Space 2 Data Cracker (Ben Medler, GDC 2011)
expressiveintelligencestudio UC Santa Cruz
Spatial Analytics
 Player movement in SWTOR (Georg Zoeller, GDC 2010)
expressiveintelligencestudio UC Santa Cruz
User Research Analytics
 My Heart on Halo, Ben Lewis-Evans
expressiveintelligencestudio UC Santa Cruz
Market Research Analytics
 Player Retention data scraped from Raptr.com
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5 6 7 8 9
GameActivityRatio
Weeks Since Release
Dead Rising 2 Case Zero
Mafia II
Crackdown 2
NHL 11
NCAA 11
expressiveintelligencestudio UC Santa Cruz
Recommended Reading (Gamasutra)
 Intro to User Analytics
 What to track and how to analyze data
 Anders Drachen et al.
 Game Analytics 101
 What technologies to use?
 Dmitri Williams
 Indie Game Analytics 101
 What metrics to track?
 Dylan Jones
expressiveintelligencestudio UC Santa Cruz
Machine Learning (ML)
 Algorithms that learn from data and make inferences
or predictions
 Types of Problems
 Classification:
 Identifying the category of an instance
 Prediction: label
 Regression:
 Estimating relationships between variables
 Prediction: value
 Clustering
 Identifying similar groups of instances
expressiveintelligencestudio UC Santa Cruz
The 3 Cultures of Machine Learning
Jason Eisner (2015)
http://cs.jhu.edu/~jason/tutorials/ml-simplex.html
expressiveintelligencestudio UC Santa Cruz
Application to Game Analytics
Forecasting
Predictive
Analytics
A/B Testing
expressiveintelligencestudio UC Santa Cruz
Applying ML to Games
 How to represent the problem?
 What features to use?
 How to evaluate the model being produced?
 How to deploy the model?
expressiveintelligencestudio UC Santa Cruz
Classification Algorithms
 Goal
 Identify the category that an instance belongs to
 Examples
 Is a player going to make a purchase?
 What is an opponent’s build order?
 Is a user going to quit playing this week?
 Algorithms
 Decision Trees
 Logistic Regression
 Neural Networks
 Boosting
 Support Vector Machines
 Nearest Neighbor
expressiveintelligencestudio UC Santa Cruz
Strategy Prediction in StarCraft
StarCraft: Brood War, Blizzard Entertainment 1998
expressiveintelligencestudio UC Santa Cruz
Predicting Build-Orders in StarCraft
 Goal
 Predict the build order of opponents
 Data Sources
 Thousands of replays from Professional players
 Results
 Able to identify opponent build order minutes
before it is executed
 Also able to predict the timing of specific units
expressiveintelligencestudio UC Santa Cruz
Application of ML to StarCraft
 Problem Representation
 Multiclass classification, 6 labels for mid-game builds
 Collect data from professional StarCraft replays
 Features
 Build-order timings of units and features
 Model Evaluation
 Offline evaluation of classification algorithms
 Simulated game time between 0 – 12 minutes
 Model Deployment
 During games, encode game state and predict build order
expressiveintelligencestudio UC Santa Cruz
Timing Distributions
 Different structure and units timings provide
indicators of different build orders
0
100
200
300
400
0 4Game Time (minutes)
Spawning Pool Timing (ZvT)
0
100
200
300
400
500
600
0 6Game Time (minutes)
Factory Timing (TvP)
expressiveintelligencestudio UC Santa Cruz
StarCraft Replay Data
 A partial game log from a Terran versus Zerg game
Player Game Time Action
2
1
2
1
1
2
1
2
1
2
0:00
0:00
1:18
1:22
2:04
2:25
2:50
2:54
3:18
4:10
Train Drone
Train SCV
Train Overlord
Build Supply Depot
Build Barracks
Build Hatchery
Build Barracks
Build Spawning Pool
Train Marine
Train Zergling
expressiveintelligencestudio UC Santa Cruz
Approach
 Feature encoding
 Each player’s actions are encoded in a single vector
 Vectors are labeled using a build-order rule set
 Features describe the game cycle when a unit or
building type is first produced by a player
t, time when x is first produced by P
0, x was not (yet) produced by P
f(x) = {
expressiveintelligencestudio UC Santa Cruz
Labeling Replays
 A rule set for mid game strategies was built for each race
based on analysis of expert play
 Replays are labeled based on the order in which the tech tree
is expanded
Zealot Legs
(Fast Legs)
Archives
(Fast DT)
Support Bay
(Reaver)
Observatory
(Standard)
Nexus
(Expand)
Stargate
(Fast Air)
Robotics
Bay
Citadel
Tier 1
Strategy
Protoss Rule Set
expressiveintelligencestudio UC Santa Cruz
Experiment Methodology
 Algorithms explored
 Nearest neighbor variants
 Decision trees
 Boosting methods
 State lattice
 Rule set
 Simulation Approach
 Set all features to 0
 Step through replay events and update features
 Predict build order every 30 seconds
expressiveintelligencestudio UC Santa Cruz
Build-Order Prediction Results
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9 10 11 12
RecallPrecision
Game Time (minutes)
NNge Boosting Rule Set State Lattice
expressiveintelligencestudio UC Santa Cruz
Timing Prediction Results
0
0.5
1
1.5
2
MeanPredictionError(minutes)
Linear Regression M5' Additive Regression
expressiveintelligencestudio UC Santa Cruz
EISBot
expressiveintelligencestudio UC Santa Cruz
Membership Conversion in DCUO
DC Universe Online, Daybreak Games 2011
expressiveintelligencestudio UC Santa Cruz
Membership Conversion in DCUO
 Goal
 Predict which users will convert to membership
 Data Sources
 Session and commerce data
 Detailed in-game telemetry
 Results
 Weekly deployment of targeted user list
 Experimented with different targeting strategies
and uplift modeling
expressiveintelligencestudio UC Santa Cruz
Application of ML to DCUO
 Problem Representation
 Binary classification (Converter vs. non-converter)
 Features
 Login patterns, recent purchases, game-feature usage
 Model Evaluation
 Offline evaluation of classification algorithms
 Simulated upsell
 Model Deployment
 Send list of targeted users to game team
expressiveintelligencestudio UC Santa Cruz
Model Evaluation
 Lift
 Compares sampled response to baseline response
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Predicted/ActualConversions
Percentage of Users Targeted
expressiveintelligencestudio UC Santa Cruz
Uplift Modeling
 Goal
 Target only persuadable users
Source: Predictive Analytics (Eric Siegel, 2013)
expressiveintelligencestudio UC Santa Cruz
Uplift Modeling in DCUO
 2-Phase Approach
 Phase 1
 Select users for targeting
 Phase 2
 Split users into Sure Things & Persuadable
 Target users in the Persuadable group
expressiveintelligencestudio UC Santa Cruz
Measuring Impact
 Goal
 Maximize Incremental revenue
 Approach
 Select targeted user group
 Split targeted group into control and test
 Deploy targeting and measure conversions
 Compare test and control groups
 Calculate incremental revenue
expressiveintelligencestudio UC Santa Cruz
Regression Algorithms
 Goal
 Predict the value of the dependent variable of an instance
 Examples
 How many games is a user going to play?
 What is the lifetime value of a player?
 How to allocate players for balanced matchmaking?
 Algorithms
 Linear Regression
 Regression Tree
 Curve fitting
 Boosting
 Neural Networks
 Nearest Neighbor
expressiveintelligencestudio UC Santa Cruz
Player Retention in Madden NFL
Madden NFL 11, Electronic Arts 2010
expressiveintelligencestudio UC Santa Cruz
Player Retention in Madden NFL
 Goal
 Predict how long players will play
 Identify features correlated with retention
 Data Sources
 Play-by-play game logs
 Results
 Recommendations provided to game team
 Helped develop data-driven culture at EA
expressiveintelligencestudio UC Santa Cruz
Application of ML to Madden
 Problem Representation
 Regression & Simulation
 Features
 Gameplay features used
 Player performance
 Model Evaluation
 Offline evaluation of regression algorithms
 Unique Effect Analysis
 Model Deployment
 Recommendations provided to game team
expressiveintelligencestudio UC Santa Cruz
Robust Unique Effect Analysis
 An algorithm that performs regression and
analyzes unique effects to rank features
 Algorithm overview
1. Build regression models for predicting retention
2. Perturb the inputs to the models
3. Compute the impact of individual features
expressiveintelligencestudio UC Santa Cruz
Algorithm Overview
Testing
Data
Models
Feature
Rankings
Feature
Tweaking
Analyst
Training
Data
ML
Algorithms
Players
expressiveintelligencestudio UC Santa Cruz
Player Representation
 Each player’s behavior is encoded as the following
features (46 total):
 Game modes
 Usage
 Win rates
 Performance metrics
 Turnovers
 Gain
 End conditions
 Completions
 Peer quits
 Feature usage
 Gameflow
 Scouting
 Audibles
 Special moves
 Play Preference
 Running
 Play Diversity
expressiveintelligencestudio UC Santa Cruz
Predicting the Number of Games Played
0
50
100
150
200
250
0 50 100 150 200 250
ActualGamesPlayed
Predicted Games Played
Correlation Coefficient: 0.88
expressiveintelligencestudio UC Santa Cruz
Feature Impact on Number of Games Played
 How does tweaking a single feature impact retention?
0
10
20
30
40
50
60
70
0 0.2 0.4 0.6 0.8 1
PredictedNumberofGamesPlayed
Value of tweaked Feature (normalized)
Peer Quit Ratio
Play Diversity
Actions Per Play
Sacks Allowed
Online Franchise Games
expressiveintelligencestudio UC Santa Cruz
Most Influential Features
 The following features were identified as the most influential
in predicting player retention
Feature Impact
Play Diversity Negative
Online Franchise Wins Positive
Running Plays Positive
Sacks Made Positive
Actions per Play Positive
Interceptions Caught Positive
Sacks Allowed Negative
Peer Quit Ratio Negative
CorrelationStrength
expressiveintelligencestudio UC Santa Cruz
Predicted Number of Games for Different Win Rates
0
5
10
15
20
25
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PredictedNumberofGames
Win Rate
PlayNow
Ranked
Superstar
Franchise
expressiveintelligencestudio UC Santa Cruz
Madden Project Findings
 Simplify playbooks
 Players presented with a large variety of plays have
lower retention and less success
 Clearly present the controls
 Knowledge of controls had a larger impact than
winning on player retention
 Provide the correct challenge
 Multiplayer matches should be as even as possible,
while single player should greatly favor the player
expressiveintelligencestudio UC Santa Cruz
Recommendation Algorithms
 Content-Based Filtering
 Collaborative Filtering
 Item-to-Item
 User-to-User
 Model Based
 Bayesian Inference
expressiveintelligencestudio UC Santa Cruz
Content-Based Filtering
 Steam Storefront
expressiveintelligencestudio UC Santa Cruz
Collaborative Filtering
 EverQuest Landmark
expressiveintelligencestudio UC Santa Cruz
Model Based
 Xbox Recommendation System
[Koenigstein et al., RecSys 2012]
expressiveintelligencestudio UC Santa Cruz
Choosing an Algorithm
 How big is the item catalog? Is it curated?
 What is the target number of users?
 What player context will be used to provide
item recommendations?
expressiveintelligencestudio UC Santa Cruz
Collaborative Filtering (User-Based)
 Rates items for a player based on the player’s
similarity to other players
 Does not require meta-data to be maintained
 Can use explicit and implicit data collection
 Challenges include scalability and cold starts
expressiveintelligencestudio UC Santa Cruz
User-Based Collaborative Filtering
 Users
 Items
A B
101 102 103
Recommendation
Similar
User
expressiveintelligencestudio UC Santa Cruz
Algorithm Overview
Computing a recommendation for a user, U:
For every other user, V
Compute the similarity, S, between U and V
For every item, I, rated by V
Add V’s rating for I, weighted by S to a running average of I
Return the top rated items
expressiveintelligencestudio UC Santa Cruz
Prototyping a Recommender
 Apache Mahout
 Free & scalable Java machine learning library
 Functionality
 User-based and item-based collaborative filtering
 Single machine and cluster implementations
 Built-in evaluation methods
expressiveintelligencestudio UC Santa Cruz
Getting Started with Mahout
1. Choose what to recommend:
ratings or rankings
2. Select a recommendation algorithm
3. Select a similarity measure
4. Encode your data into Mahout’s format
5. Evaluate the results
6. Encode additional features and iterate
expressiveintelligencestudio UC Santa Cruz
Generating Recommendations
Building the Recommender
model = new DataModel(new File(“SalesData.csv”));
similarity = new TanimotoSimilarity(model);
recommender = new UserBasedRecommender(
model, similarity);
Generating a List
recommendations = recommender.recommend(1, 6);
expressiveintelligencestudio UC Santa Cruz
Evaluating Recommendations
0%
10%
20%
30%
40%
50%
1 2 3 4 5 6 7 8 9 10 11 12
#Relevant/#Retrieved
Number of Items Recommended
User-Based (Tanimoto)
User-Based (Log Likelihood)
Item-Based (Tanimoto)
Item-Based (Log Likelihood)
Precision on the
Landmark Dataset
expressiveintelligencestudio UC Santa Cruz
Holdout Experiment
 An experiment that excludes a single item from a
player’s list of purchases
 Goals
 Generate the smallest list that includes the item
 Enable offline evaluation of different algorithms
 Compare recommendations with rule-based approaches
expressiveintelligencestudio UC Santa Cruz
Landmark’s Holdout Results
Recommendations significantly
outperform a top sellers list
80% increase in the holdout
Recall Rate at 6 items
22%
33%
42%
48%
40%
54%
65%
73%
0%
10%
20%
30%
40%
50%
60%
70%
80%
6 Items 12 Items 18 Items 24 Items
RecallRate
# of Recommended Items
Top Sellers Recommendations
expressiveintelligencestudio UC Santa Cruz
Deployment in Landmark
 In-house implementation
 Current Deployment
 Recommendations are generated on the fly and cached
 Planned Expansion
 An offline process builds a user-similarity matrix
 An online process generates item recommendations in
near real-time
expressiveintelligencestudio UC Santa Cruz
Recommended Reading
expressiveintelligencestudio UC Santa Cruz
Data Science at Electronic Arts
 Team Structure
 Technology Stack
 Project Lifecycles
 Career Path
expressiveintelligencestudio UC Santa Cruz
My Projects at EA
 R Server & R Packages
 Origin & EA Access Analytics
 Analytics Best Practices
 Technology Innovation
expressiveintelligencestudio UC Santa Cruz
Questions?
 Ben G. Weber
 Senior Data Scientist, Electronic Arts
 beweber@ea.com, @bgweber
 http://tinyurl.com/WeberGameML
 http://careers.ea.com

Game Analytics & Machine Learning

  • 1.
    expressiveintelligencestudio Game Analytics & MachineLearning Guest Lecture: UCSC CMPM 146 Ben Weber Sr. Data Scientist Electronic Arts beweber@ea.com
  • 2.
    expressiveintelligencestudio UC SantaCruz About Me  Ben G. Weber  Senior Data Scientist  Electronic Arts  Experience  PhD in Computer Science, UCSC  Technical Analyst Intern, EA  User Research Analyst, Microsoft Studios  Ecommerce Analyst, Sony Online Entertainment  Director of BI & Analytics, Daybreak Games
  • 3.
  • 4.
    expressiveintelligencestudio UC SantaCruz Talk Overview  Game Analytics  Classification Algorithms  Strategy Prediction in StarCraft: Brood War  Membership Conversion in DC Universe Online  Regression Algorithms  Player Retention in Madden NFL  Recommendation Algorithms  Recommendation System in EverQuest Landmark  Data Science at EA
  • 5.
    expressiveintelligencestudio UC SantaCruz What is Game Analytics?  Using data to inform game design and improve player lifecycles  Gathering feedback from players, making changes, and measuring the impact  Example Applications  Tuning game difficulty  Balancing the game economy  Content recommendations  Computing the lifetime value of players
  • 6.
    expressiveintelligencestudio UC SantaCruz What Data Can We Analyze?  Game data  In-Game events  Commerce events  Operational metrics  User Data  Web sessions  Acquisition channel  3rd Party Data  Steam Spy  Platform Data
  • 7.
    expressiveintelligencestudio UC SantaCruz How can we analyze players?  Descriptive statistics and exploratory data analysis  Experimentation & Measurement  Segmentation  Forecasting  Predictive analytics
  • 8.
    expressiveintelligencestudio UC SantaCruz What teams use Analytics?  Design  Product Management  Marketing  Business Development  Strategy  Finance  Dev Ops  Community Management
  • 9.
    expressiveintelligencestudio UC SantaCruz Analytics for Design  Question: Identify questions about the current design  Record: Enumerate which data needs to be collected and deploy the game  Analyze: Determine if the recorded data matches expectations  Refine: Given the findings, analyze the design and formulate additional questions
  • 10.
    expressiveintelligencestudio UC SantaCruz Tools for Analytics  Storage  DBMS, Hadoop, Cloud Storage  Reporting  Excel, MicroStrategy, Tableau, D3  Analysis  SQL, R, Python, Scala  Model Building  R, Weka, MLlib
  • 11.
    expressiveintelligencestudio UC SantaCruz Applications of Analytics in Games  Dashboard Analytics (Metrics & KPIs)  Spatial Analytics (Visualization)  User Research Analytics  Market Research Analytics  Data Science
  • 12.
    expressiveintelligencestudio UC SantaCruz Dashboard Analytics  Dead Space 2 Data Cracker (Ben Medler, GDC 2011)
  • 13.
    expressiveintelligencestudio UC SantaCruz Spatial Analytics  Player movement in SWTOR (Georg Zoeller, GDC 2010)
  • 14.
    expressiveintelligencestudio UC SantaCruz User Research Analytics  My Heart on Halo, Ben Lewis-Evans
  • 15.
    expressiveintelligencestudio UC SantaCruz Market Research Analytics  Player Retention data scraped from Raptr.com 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 2 3 4 5 6 7 8 9 GameActivityRatio Weeks Since Release Dead Rising 2 Case Zero Mafia II Crackdown 2 NHL 11 NCAA 11
  • 16.
    expressiveintelligencestudio UC SantaCruz Recommended Reading (Gamasutra)  Intro to User Analytics  What to track and how to analyze data  Anders Drachen et al.  Game Analytics 101  What technologies to use?  Dmitri Williams  Indie Game Analytics 101  What metrics to track?  Dylan Jones
  • 17.
    expressiveintelligencestudio UC SantaCruz Machine Learning (ML)  Algorithms that learn from data and make inferences or predictions  Types of Problems  Classification:  Identifying the category of an instance  Prediction: label  Regression:  Estimating relationships between variables  Prediction: value  Clustering  Identifying similar groups of instances
  • 18.
    expressiveintelligencestudio UC SantaCruz The 3 Cultures of Machine Learning Jason Eisner (2015) http://cs.jhu.edu/~jason/tutorials/ml-simplex.html
  • 19.
    expressiveintelligencestudio UC SantaCruz Application to Game Analytics Forecasting Predictive Analytics A/B Testing
  • 20.
    expressiveintelligencestudio UC SantaCruz Applying ML to Games  How to represent the problem?  What features to use?  How to evaluate the model being produced?  How to deploy the model?
  • 21.
    expressiveintelligencestudio UC SantaCruz Classification Algorithms  Goal  Identify the category that an instance belongs to  Examples  Is a player going to make a purchase?  What is an opponent’s build order?  Is a user going to quit playing this week?  Algorithms  Decision Trees  Logistic Regression  Neural Networks  Boosting  Support Vector Machines  Nearest Neighbor
  • 22.
    expressiveintelligencestudio UC SantaCruz Strategy Prediction in StarCraft StarCraft: Brood War, Blizzard Entertainment 1998
  • 23.
    expressiveintelligencestudio UC SantaCruz Predicting Build-Orders in StarCraft  Goal  Predict the build order of opponents  Data Sources  Thousands of replays from Professional players  Results  Able to identify opponent build order minutes before it is executed  Also able to predict the timing of specific units
  • 24.
    expressiveintelligencestudio UC SantaCruz Application of ML to StarCraft  Problem Representation  Multiclass classification, 6 labels for mid-game builds  Collect data from professional StarCraft replays  Features  Build-order timings of units and features  Model Evaluation  Offline evaluation of classification algorithms  Simulated game time between 0 – 12 minutes  Model Deployment  During games, encode game state and predict build order
  • 25.
    expressiveintelligencestudio UC SantaCruz Timing Distributions  Different structure and units timings provide indicators of different build orders 0 100 200 300 400 0 4Game Time (minutes) Spawning Pool Timing (ZvT) 0 100 200 300 400 500 600 0 6Game Time (minutes) Factory Timing (TvP)
  • 26.
    expressiveintelligencestudio UC SantaCruz StarCraft Replay Data  A partial game log from a Terran versus Zerg game Player Game Time Action 2 1 2 1 1 2 1 2 1 2 0:00 0:00 1:18 1:22 2:04 2:25 2:50 2:54 3:18 4:10 Train Drone Train SCV Train Overlord Build Supply Depot Build Barracks Build Hatchery Build Barracks Build Spawning Pool Train Marine Train Zergling
  • 27.
    expressiveintelligencestudio UC SantaCruz Approach  Feature encoding  Each player’s actions are encoded in a single vector  Vectors are labeled using a build-order rule set  Features describe the game cycle when a unit or building type is first produced by a player t, time when x is first produced by P 0, x was not (yet) produced by P f(x) = {
  • 28.
    expressiveintelligencestudio UC SantaCruz Labeling Replays  A rule set for mid game strategies was built for each race based on analysis of expert play  Replays are labeled based on the order in which the tech tree is expanded Zealot Legs (Fast Legs) Archives (Fast DT) Support Bay (Reaver) Observatory (Standard) Nexus (Expand) Stargate (Fast Air) Robotics Bay Citadel Tier 1 Strategy Protoss Rule Set
  • 29.
    expressiveintelligencestudio UC SantaCruz Experiment Methodology  Algorithms explored  Nearest neighbor variants  Decision trees  Boosting methods  State lattice  Rule set  Simulation Approach  Set all features to 0  Step through replay events and update features  Predict build order every 30 seconds
  • 30.
    expressiveintelligencestudio UC SantaCruz Build-Order Prediction Results 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 12 RecallPrecision Game Time (minutes) NNge Boosting Rule Set State Lattice
  • 31.
    expressiveintelligencestudio UC SantaCruz Timing Prediction Results 0 0.5 1 1.5 2 MeanPredictionError(minutes) Linear Regression M5' Additive Regression
  • 32.
  • 33.
    expressiveintelligencestudio UC SantaCruz Membership Conversion in DCUO DC Universe Online, Daybreak Games 2011
  • 34.
    expressiveintelligencestudio UC SantaCruz Membership Conversion in DCUO  Goal  Predict which users will convert to membership  Data Sources  Session and commerce data  Detailed in-game telemetry  Results  Weekly deployment of targeted user list  Experimented with different targeting strategies and uplift modeling
  • 35.
    expressiveintelligencestudio UC SantaCruz Application of ML to DCUO  Problem Representation  Binary classification (Converter vs. non-converter)  Features  Login patterns, recent purchases, game-feature usage  Model Evaluation  Offline evaluation of classification algorithms  Simulated upsell  Model Deployment  Send list of targeted users to game team
  • 36.
    expressiveintelligencestudio UC SantaCruz Model Evaluation  Lift  Compares sampled response to baseline response 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Predicted/ActualConversions Percentage of Users Targeted
  • 37.
    expressiveintelligencestudio UC SantaCruz Uplift Modeling  Goal  Target only persuadable users Source: Predictive Analytics (Eric Siegel, 2013)
  • 38.
    expressiveintelligencestudio UC SantaCruz Uplift Modeling in DCUO  2-Phase Approach  Phase 1  Select users for targeting  Phase 2  Split users into Sure Things & Persuadable  Target users in the Persuadable group
  • 39.
    expressiveintelligencestudio UC SantaCruz Measuring Impact  Goal  Maximize Incremental revenue  Approach  Select targeted user group  Split targeted group into control and test  Deploy targeting and measure conversions  Compare test and control groups  Calculate incremental revenue
  • 40.
    expressiveintelligencestudio UC SantaCruz Regression Algorithms  Goal  Predict the value of the dependent variable of an instance  Examples  How many games is a user going to play?  What is the lifetime value of a player?  How to allocate players for balanced matchmaking?  Algorithms  Linear Regression  Regression Tree  Curve fitting  Boosting  Neural Networks  Nearest Neighbor
  • 41.
    expressiveintelligencestudio UC SantaCruz Player Retention in Madden NFL Madden NFL 11, Electronic Arts 2010
  • 42.
    expressiveintelligencestudio UC SantaCruz Player Retention in Madden NFL  Goal  Predict how long players will play  Identify features correlated with retention  Data Sources  Play-by-play game logs  Results  Recommendations provided to game team  Helped develop data-driven culture at EA
  • 43.
    expressiveintelligencestudio UC SantaCruz Application of ML to Madden  Problem Representation  Regression & Simulation  Features  Gameplay features used  Player performance  Model Evaluation  Offline evaluation of regression algorithms  Unique Effect Analysis  Model Deployment  Recommendations provided to game team
  • 44.
    expressiveintelligencestudio UC SantaCruz Robust Unique Effect Analysis  An algorithm that performs regression and analyzes unique effects to rank features  Algorithm overview 1. Build regression models for predicting retention 2. Perturb the inputs to the models 3. Compute the impact of individual features
  • 45.
    expressiveintelligencestudio UC SantaCruz Algorithm Overview Testing Data Models Feature Rankings Feature Tweaking Analyst Training Data ML Algorithms Players
  • 46.
    expressiveintelligencestudio UC SantaCruz Player Representation  Each player’s behavior is encoded as the following features (46 total):  Game modes  Usage  Win rates  Performance metrics  Turnovers  Gain  End conditions  Completions  Peer quits  Feature usage  Gameflow  Scouting  Audibles  Special moves  Play Preference  Running  Play Diversity
  • 47.
    expressiveintelligencestudio UC SantaCruz Predicting the Number of Games Played 0 50 100 150 200 250 0 50 100 150 200 250 ActualGamesPlayed Predicted Games Played Correlation Coefficient: 0.88
  • 48.
    expressiveintelligencestudio UC SantaCruz Feature Impact on Number of Games Played  How does tweaking a single feature impact retention? 0 10 20 30 40 50 60 70 0 0.2 0.4 0.6 0.8 1 PredictedNumberofGamesPlayed Value of tweaked Feature (normalized) Peer Quit Ratio Play Diversity Actions Per Play Sacks Allowed Online Franchise Games
  • 49.
    expressiveintelligencestudio UC SantaCruz Most Influential Features  The following features were identified as the most influential in predicting player retention Feature Impact Play Diversity Negative Online Franchise Wins Positive Running Plays Positive Sacks Made Positive Actions per Play Positive Interceptions Caught Positive Sacks Allowed Negative Peer Quit Ratio Negative CorrelationStrength
  • 50.
    expressiveintelligencestudio UC SantaCruz Predicted Number of Games for Different Win Rates 0 5 10 15 20 25 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% PredictedNumberofGames Win Rate PlayNow Ranked Superstar Franchise
  • 51.
    expressiveintelligencestudio UC SantaCruz Madden Project Findings  Simplify playbooks  Players presented with a large variety of plays have lower retention and less success  Clearly present the controls  Knowledge of controls had a larger impact than winning on player retention  Provide the correct challenge  Multiplayer matches should be as even as possible, while single player should greatly favor the player
  • 52.
    expressiveintelligencestudio UC SantaCruz Recommendation Algorithms  Content-Based Filtering  Collaborative Filtering  Item-to-Item  User-to-User  Model Based  Bayesian Inference
  • 53.
    expressiveintelligencestudio UC SantaCruz Content-Based Filtering  Steam Storefront
  • 54.
    expressiveintelligencestudio UC SantaCruz Collaborative Filtering  EverQuest Landmark
  • 55.
    expressiveintelligencestudio UC SantaCruz Model Based  Xbox Recommendation System [Koenigstein et al., RecSys 2012]
  • 56.
    expressiveintelligencestudio UC SantaCruz Choosing an Algorithm  How big is the item catalog? Is it curated?  What is the target number of users?  What player context will be used to provide item recommendations?
  • 57.
    expressiveintelligencestudio UC SantaCruz Collaborative Filtering (User-Based)  Rates items for a player based on the player’s similarity to other players  Does not require meta-data to be maintained  Can use explicit and implicit data collection  Challenges include scalability and cold starts
  • 58.
    expressiveintelligencestudio UC SantaCruz User-Based Collaborative Filtering  Users  Items A B 101 102 103 Recommendation Similar User
  • 59.
    expressiveintelligencestudio UC SantaCruz Algorithm Overview Computing a recommendation for a user, U: For every other user, V Compute the similarity, S, between U and V For every item, I, rated by V Add V’s rating for I, weighted by S to a running average of I Return the top rated items
  • 60.
    expressiveintelligencestudio UC SantaCruz Prototyping a Recommender  Apache Mahout  Free & scalable Java machine learning library  Functionality  User-based and item-based collaborative filtering  Single machine and cluster implementations  Built-in evaluation methods
  • 61.
    expressiveintelligencestudio UC SantaCruz Getting Started with Mahout 1. Choose what to recommend: ratings or rankings 2. Select a recommendation algorithm 3. Select a similarity measure 4. Encode your data into Mahout’s format 5. Evaluate the results 6. Encode additional features and iterate
  • 62.
    expressiveintelligencestudio UC SantaCruz Generating Recommendations Building the Recommender model = new DataModel(new File(“SalesData.csv”)); similarity = new TanimotoSimilarity(model); recommender = new UserBasedRecommender( model, similarity); Generating a List recommendations = recommender.recommend(1, 6);
  • 63.
    expressiveintelligencestudio UC SantaCruz Evaluating Recommendations 0% 10% 20% 30% 40% 50% 1 2 3 4 5 6 7 8 9 10 11 12 #Relevant/#Retrieved Number of Items Recommended User-Based (Tanimoto) User-Based (Log Likelihood) Item-Based (Tanimoto) Item-Based (Log Likelihood) Precision on the Landmark Dataset
  • 64.
    expressiveintelligencestudio UC SantaCruz Holdout Experiment  An experiment that excludes a single item from a player’s list of purchases  Goals  Generate the smallest list that includes the item  Enable offline evaluation of different algorithms  Compare recommendations with rule-based approaches
  • 65.
    expressiveintelligencestudio UC SantaCruz Landmark’s Holdout Results Recommendations significantly outperform a top sellers list 80% increase in the holdout Recall Rate at 6 items 22% 33% 42% 48% 40% 54% 65% 73% 0% 10% 20% 30% 40% 50% 60% 70% 80% 6 Items 12 Items 18 Items 24 Items RecallRate # of Recommended Items Top Sellers Recommendations
  • 66.
    expressiveintelligencestudio UC SantaCruz Deployment in Landmark  In-house implementation  Current Deployment  Recommendations are generated on the fly and cached  Planned Expansion  An offline process builds a user-similarity matrix  An online process generates item recommendations in near real-time
  • 67.
    expressiveintelligencestudio UC SantaCruz Recommended Reading
  • 68.
    expressiveintelligencestudio UC SantaCruz Data Science at Electronic Arts  Team Structure  Technology Stack  Project Lifecycles  Career Path
  • 69.
    expressiveintelligencestudio UC SantaCruz My Projects at EA  R Server & R Packages  Origin & EA Access Analytics  Analytics Best Practices  Technology Innovation
  • 70.
    expressiveintelligencestudio UC SantaCruz Questions?  Ben G. Weber  Senior Data Scientist, Electronic Arts  beweber@ea.com, @bgweber  http://tinyurl.com/WeberGameML  http://careers.ea.com