Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Game Analytics & Machine Learning

3,771 views

Published on

Slides from my guest lecture for the Game AI course at UC Santa Cruz on data science in the gaming industry. March 2016.

Published in: Technology

Game Analytics & Machine Learning

  1. 1. expressiveintelligencestudio Game Analytics & Machine Learning Guest Lecture: UCSC CMPM 146 Ben Weber Sr. Data Scientist Electronic Arts beweber@ea.com
  2. 2. expressiveintelligencestudio UC Santa Cruz About Me  Ben G. Weber  Senior Data Scientist  Electronic Arts  Experience  PhD in Computer Science, UCSC  Technical Analyst Intern, EA  User Research Analyst, Microsoft Studios  Ecommerce Analyst, Sony Online Entertainment  Director of BI & Analytics, Daybreak Games
  3. 3. expressiveintelligencestudio UC Santa Cruz Games & Services
  4. 4. expressiveintelligencestudio UC Santa Cruz Talk Overview  Game Analytics  Classification Algorithms  Strategy Prediction in StarCraft: Brood War  Membership Conversion in DC Universe Online  Regression Algorithms  Player Retention in Madden NFL  Recommendation Algorithms  Recommendation System in EverQuest Landmark  Data Science at EA
  5. 5. expressiveintelligencestudio UC Santa Cruz What is Game Analytics?  Using data to inform game design and improve player lifecycles  Gathering feedback from players, making changes, and measuring the impact  Example Applications  Tuning game difficulty  Balancing the game economy  Content recommendations  Computing the lifetime value of players
  6. 6. expressiveintelligencestudio UC Santa Cruz What Data Can We Analyze?  Game data  In-Game events  Commerce events  Operational metrics  User Data  Web sessions  Acquisition channel  3rd Party Data  Steam Spy  Platform Data
  7. 7. expressiveintelligencestudio UC Santa Cruz How can we analyze players?  Descriptive statistics and exploratory data analysis  Experimentation & Measurement  Segmentation  Forecasting  Predictive analytics
  8. 8. expressiveintelligencestudio UC Santa Cruz What teams use Analytics?  Design  Product Management  Marketing  Business Development  Strategy  Finance  Dev Ops  Community Management
  9. 9. expressiveintelligencestudio UC Santa Cruz Analytics for Design  Question: Identify questions about the current design  Record: Enumerate which data needs to be collected and deploy the game  Analyze: Determine if the recorded data matches expectations  Refine: Given the findings, analyze the design and formulate additional questions
  10. 10. expressiveintelligencestudio UC Santa Cruz Tools for Analytics  Storage  DBMS, Hadoop, Cloud Storage  Reporting  Excel, MicroStrategy, Tableau, D3  Analysis  SQL, R, Python, Scala  Model Building  R, Weka, MLlib
  11. 11. expressiveintelligencestudio UC Santa Cruz Applications of Analytics in Games  Dashboard Analytics (Metrics & KPIs)  Spatial Analytics (Visualization)  User Research Analytics  Market Research Analytics  Data Science
  12. 12. expressiveintelligencestudio UC Santa Cruz Dashboard Analytics  Dead Space 2 Data Cracker (Ben Medler, GDC 2011)
  13. 13. expressiveintelligencestudio UC Santa Cruz Spatial Analytics  Player movement in SWTOR (Georg Zoeller, GDC 2010)
  14. 14. expressiveintelligencestudio UC Santa Cruz User Research Analytics  My Heart on Halo, Ben Lewis-Evans
  15. 15. expressiveintelligencestudio UC Santa Cruz Market Research Analytics  Player Retention data scraped from Raptr.com 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 2 3 4 5 6 7 8 9 GameActivityRatio Weeks Since Release Dead Rising 2 Case Zero Mafia II Crackdown 2 NHL 11 NCAA 11
  16. 16. expressiveintelligencestudio UC Santa Cruz Recommended Reading (Gamasutra)  Intro to User Analytics  What to track and how to analyze data  Anders Drachen et al.  Game Analytics 101  What technologies to use?  Dmitri Williams  Indie Game Analytics 101  What metrics to track?  Dylan Jones
  17. 17. expressiveintelligencestudio UC Santa Cruz Machine Learning (ML)  Algorithms that learn from data and make inferences or predictions  Types of Problems  Classification:  Identifying the category of an instance  Prediction: label  Regression:  Estimating relationships between variables  Prediction: value  Clustering  Identifying similar groups of instances
  18. 18. expressiveintelligencestudio UC Santa Cruz The 3 Cultures of Machine Learning Jason Eisner (2015) http://cs.jhu.edu/~jason/tutorials/ml-simplex.html
  19. 19. expressiveintelligencestudio UC Santa Cruz Application to Game Analytics Forecasting Predictive Analytics A/B Testing
  20. 20. expressiveintelligencestudio UC Santa Cruz Applying ML to Games  How to represent the problem?  What features to use?  How to evaluate the model being produced?  How to deploy the model?
  21. 21. expressiveintelligencestudio UC Santa Cruz Classification Algorithms  Goal  Identify the category that an instance belongs to  Examples  Is a player going to make a purchase?  What is an opponent’s build order?  Is a user going to quit playing this week?  Algorithms  Decision Trees  Logistic Regression  Neural Networks  Boosting  Support Vector Machines  Nearest Neighbor
  22. 22. expressiveintelligencestudio UC Santa Cruz Strategy Prediction in StarCraft StarCraft: Brood War, Blizzard Entertainment 1998
  23. 23. expressiveintelligencestudio UC Santa Cruz Predicting Build-Orders in StarCraft  Goal  Predict the build order of opponents  Data Sources  Thousands of replays from Professional players  Results  Able to identify opponent build order minutes before it is executed  Also able to predict the timing of specific units
  24. 24. expressiveintelligencestudio UC Santa Cruz Application of ML to StarCraft  Problem Representation  Multiclass classification, 6 labels for mid-game builds  Collect data from professional StarCraft replays  Features  Build-order timings of units and features  Model Evaluation  Offline evaluation of classification algorithms  Simulated game time between 0 – 12 minutes  Model Deployment  During games, encode game state and predict build order
  25. 25. expressiveintelligencestudio UC Santa Cruz Timing Distributions  Different structure and units timings provide indicators of different build orders 0 100 200 300 400 0 4Game Time (minutes) Spawning Pool Timing (ZvT) 0 100 200 300 400 500 600 0 6Game Time (minutes) Factory Timing (TvP)
  26. 26. expressiveintelligencestudio UC Santa Cruz StarCraft Replay Data  A partial game log from a Terran versus Zerg game Player Game Time Action 2 1 2 1 1 2 1 2 1 2 0:00 0:00 1:18 1:22 2:04 2:25 2:50 2:54 3:18 4:10 Train Drone Train SCV Train Overlord Build Supply Depot Build Barracks Build Hatchery Build Barracks Build Spawning Pool Train Marine Train Zergling
  27. 27. expressiveintelligencestudio UC Santa Cruz Approach  Feature encoding  Each player’s actions are encoded in a single vector  Vectors are labeled using a build-order rule set  Features describe the game cycle when a unit or building type is first produced by a player t, time when x is first produced by P 0, x was not (yet) produced by P f(x) = {
  28. 28. expressiveintelligencestudio UC Santa Cruz Labeling Replays  A rule set for mid game strategies was built for each race based on analysis of expert play  Replays are labeled based on the order in which the tech tree is expanded Zealot Legs (Fast Legs) Archives (Fast DT) Support Bay (Reaver) Observatory (Standard) Nexus (Expand) Stargate (Fast Air) Robotics Bay Citadel Tier 1 Strategy Protoss Rule Set
  29. 29. expressiveintelligencestudio UC Santa Cruz Experiment Methodology  Algorithms explored  Nearest neighbor variants  Decision trees  Boosting methods  State lattice  Rule set  Simulation Approach  Set all features to 0  Step through replay events and update features  Predict build order every 30 seconds
  30. 30. expressiveintelligencestudio UC Santa Cruz Build-Order Prediction Results 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 12 RecallPrecision Game Time (minutes) NNge Boosting Rule Set State Lattice
  31. 31. expressiveintelligencestudio UC Santa Cruz Timing Prediction Results 0 0.5 1 1.5 2 MeanPredictionError(minutes) Linear Regression M5' Additive Regression
  32. 32. expressiveintelligencestudio UC Santa Cruz EISBot
  33. 33. expressiveintelligencestudio UC Santa Cruz Membership Conversion in DCUO DC Universe Online, Daybreak Games 2011
  34. 34. expressiveintelligencestudio UC Santa Cruz Membership Conversion in DCUO  Goal  Predict which users will convert to membership  Data Sources  Session and commerce data  Detailed in-game telemetry  Results  Weekly deployment of targeted user list  Experimented with different targeting strategies and uplift modeling
  35. 35. expressiveintelligencestudio UC Santa Cruz Application of ML to DCUO  Problem Representation  Binary classification (Converter vs. non-converter)  Features  Login patterns, recent purchases, game-feature usage  Model Evaluation  Offline evaluation of classification algorithms  Simulated upsell  Model Deployment  Send list of targeted users to game team
  36. 36. expressiveintelligencestudio UC Santa Cruz Model Evaluation  Lift  Compares sampled response to baseline response 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Predicted/ActualConversions Percentage of Users Targeted
  37. 37. expressiveintelligencestudio UC Santa Cruz Uplift Modeling  Goal  Target only persuadable users Source: Predictive Analytics (Eric Siegel, 2013)
  38. 38. expressiveintelligencestudio UC Santa Cruz Uplift Modeling in DCUO  2-Phase Approach  Phase 1  Select users for targeting  Phase 2  Split users into Sure Things & Persuadable  Target users in the Persuadable group
  39. 39. expressiveintelligencestudio UC Santa Cruz Measuring Impact  Goal  Maximize Incremental revenue  Approach  Select targeted user group  Split targeted group into control and test  Deploy targeting and measure conversions  Compare test and control groups  Calculate incremental revenue
  40. 40. expressiveintelligencestudio UC Santa Cruz Regression Algorithms  Goal  Predict the value of the dependent variable of an instance  Examples  How many games is a user going to play?  What is the lifetime value of a player?  How to allocate players for balanced matchmaking?  Algorithms  Linear Regression  Regression Tree  Curve fitting  Boosting  Neural Networks  Nearest Neighbor
  41. 41. expressiveintelligencestudio UC Santa Cruz Player Retention in Madden NFL Madden NFL 11, Electronic Arts 2010
  42. 42. expressiveintelligencestudio UC Santa Cruz Player Retention in Madden NFL  Goal  Predict how long players will play  Identify features correlated with retention  Data Sources  Play-by-play game logs  Results  Recommendations provided to game team  Helped develop data-driven culture at EA
  43. 43. expressiveintelligencestudio UC Santa Cruz Application of ML to Madden  Problem Representation  Regression & Simulation  Features  Gameplay features used  Player performance  Model Evaluation  Offline evaluation of regression algorithms  Unique Effect Analysis  Model Deployment  Recommendations provided to game team
  44. 44. expressiveintelligencestudio UC Santa Cruz Robust Unique Effect Analysis  An algorithm that performs regression and analyzes unique effects to rank features  Algorithm overview 1. Build regression models for predicting retention 2. Perturb the inputs to the models 3. Compute the impact of individual features
  45. 45. expressiveintelligencestudio UC Santa Cruz Algorithm Overview Testing Data Models Feature Rankings Feature Tweaking Analyst Training Data ML Algorithms Players
  46. 46. expressiveintelligencestudio UC Santa Cruz Player Representation  Each player’s behavior is encoded as the following features (46 total):  Game modes  Usage  Win rates  Performance metrics  Turnovers  Gain  End conditions  Completions  Peer quits  Feature usage  Gameflow  Scouting  Audibles  Special moves  Play Preference  Running  Play Diversity
  47. 47. expressiveintelligencestudio UC Santa Cruz Predicting the Number of Games Played 0 50 100 150 200 250 0 50 100 150 200 250 ActualGamesPlayed Predicted Games Played Correlation Coefficient: 0.88
  48. 48. expressiveintelligencestudio UC Santa Cruz Feature Impact on Number of Games Played  How does tweaking a single feature impact retention? 0 10 20 30 40 50 60 70 0 0.2 0.4 0.6 0.8 1 PredictedNumberofGamesPlayed Value of tweaked Feature (normalized) Peer Quit Ratio Play Diversity Actions Per Play Sacks Allowed Online Franchise Games
  49. 49. expressiveintelligencestudio UC Santa Cruz Most Influential Features  The following features were identified as the most influential in predicting player retention Feature Impact Play Diversity Negative Online Franchise Wins Positive Running Plays Positive Sacks Made Positive Actions per Play Positive Interceptions Caught Positive Sacks Allowed Negative Peer Quit Ratio Negative CorrelationStrength
  50. 50. expressiveintelligencestudio UC Santa Cruz Predicted Number of Games for Different Win Rates 0 5 10 15 20 25 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% PredictedNumberofGames Win Rate PlayNow Ranked Superstar Franchise
  51. 51. expressiveintelligencestudio UC Santa Cruz Madden Project Findings  Simplify playbooks  Players presented with a large variety of plays have lower retention and less success  Clearly present the controls  Knowledge of controls had a larger impact than winning on player retention  Provide the correct challenge  Multiplayer matches should be as even as possible, while single player should greatly favor the player
  52. 52. expressiveintelligencestudio UC Santa Cruz Recommendation Algorithms  Content-Based Filtering  Collaborative Filtering  Item-to-Item  User-to-User  Model Based  Bayesian Inference
  53. 53. expressiveintelligencestudio UC Santa Cruz Content-Based Filtering  Steam Storefront
  54. 54. expressiveintelligencestudio UC Santa Cruz Collaborative Filtering  EverQuest Landmark
  55. 55. expressiveintelligencestudio UC Santa Cruz Model Based  Xbox Recommendation System [Koenigstein et al., RecSys 2012]
  56. 56. expressiveintelligencestudio UC Santa Cruz Choosing an Algorithm  How big is the item catalog? Is it curated?  What is the target number of users?  What player context will be used to provide item recommendations?
  57. 57. expressiveintelligencestudio UC Santa Cruz Collaborative Filtering (User-Based)  Rates items for a player based on the player’s similarity to other players  Does not require meta-data to be maintained  Can use explicit and implicit data collection  Challenges include scalability and cold starts
  58. 58. expressiveintelligencestudio UC Santa Cruz User-Based Collaborative Filtering  Users  Items A B 101 102 103 Recommendation Similar User
  59. 59. expressiveintelligencestudio UC Santa Cruz Algorithm Overview Computing a recommendation for a user, U: For every other user, V Compute the similarity, S, between U and V For every item, I, rated by V Add V’s rating for I, weighted by S to a running average of I Return the top rated items
  60. 60. expressiveintelligencestudio UC Santa Cruz Prototyping a Recommender  Apache Mahout  Free & scalable Java machine learning library  Functionality  User-based and item-based collaborative filtering  Single machine and cluster implementations  Built-in evaluation methods
  61. 61. expressiveintelligencestudio UC Santa Cruz Getting Started with Mahout 1. Choose what to recommend: ratings or rankings 2. Select a recommendation algorithm 3. Select a similarity measure 4. Encode your data into Mahout’s format 5. Evaluate the results 6. Encode additional features and iterate
  62. 62. expressiveintelligencestudio UC Santa Cruz Generating Recommendations Building the Recommender model = new DataModel(new File(“SalesData.csv”)); similarity = new TanimotoSimilarity(model); recommender = new UserBasedRecommender( model, similarity); Generating a List recommendations = recommender.recommend(1, 6);
  63. 63. expressiveintelligencestudio UC Santa Cruz Evaluating Recommendations 0% 10% 20% 30% 40% 50% 1 2 3 4 5 6 7 8 9 10 11 12 #Relevant/#Retrieved Number of Items Recommended User-Based (Tanimoto) User-Based (Log Likelihood) Item-Based (Tanimoto) Item-Based (Log Likelihood) Precision on the Landmark Dataset
  64. 64. expressiveintelligencestudio UC Santa Cruz Holdout Experiment  An experiment that excludes a single item from a player’s list of purchases  Goals  Generate the smallest list that includes the item  Enable offline evaluation of different algorithms  Compare recommendations with rule-based approaches
  65. 65. expressiveintelligencestudio UC Santa Cruz Landmark’s Holdout Results Recommendations significantly outperform a top sellers list 80% increase in the holdout Recall Rate at 6 items 22% 33% 42% 48% 40% 54% 65% 73% 0% 10% 20% 30% 40% 50% 60% 70% 80% 6 Items 12 Items 18 Items 24 Items RecallRate # of Recommended Items Top Sellers Recommendations
  66. 66. expressiveintelligencestudio UC Santa Cruz Deployment in Landmark  In-house implementation  Current Deployment  Recommendations are generated on the fly and cached  Planned Expansion  An offline process builds a user-similarity matrix  An online process generates item recommendations in near real-time
  67. 67. expressiveintelligencestudio UC Santa Cruz Recommended Reading
  68. 68. expressiveintelligencestudio UC Santa Cruz Data Science at Electronic Arts  Team Structure  Technology Stack  Project Lifecycles  Career Path
  69. 69. expressiveintelligencestudio UC Santa Cruz My Projects at EA  R Server & R Packages  Origin & EA Access Analytics  Analytics Best Practices  Technology Innovation
  70. 70. expressiveintelligencestudio UC Santa Cruz Questions?  Ben G. Weber  Senior Data Scientist, Electronic Arts  beweber@ea.com, @bgweber  http://tinyurl.com/WeberGameML  http://careers.ea.com

×