SlideShare a Scribd company logo
1 of 70
Download to read offline
expressiveintelligencestudio
Game Analytics
& Machine Learning
Guest Lecture: UCSC CMPM 146
Ben Weber
Sr. Data Scientist
Electronic Arts
beweber@ea.com
expressiveintelligencestudio UC Santa Cruz
About Me
 Ben G. Weber
 Senior Data Scientist
 Electronic Arts
 Experience
 PhD in Computer Science, UCSC
 Technical Analyst Intern, EA
 User Research Analyst, Microsoft Studios
 Ecommerce Analyst, Sony Online Entertainment
 Director of BI & Analytics, Daybreak Games
expressiveintelligencestudio UC Santa Cruz
Games & Services
expressiveintelligencestudio UC Santa Cruz
Talk Overview
 Game Analytics
 Classification Algorithms
 Strategy Prediction in StarCraft: Brood War
 Membership Conversion in DC Universe Online
 Regression Algorithms
 Player Retention in Madden NFL
 Recommendation Algorithms
 Recommendation System in EverQuest Landmark
 Data Science at EA
expressiveintelligencestudio UC Santa Cruz
What is Game Analytics?
 Using data to inform game design and improve player
lifecycles
 Gathering feedback from players, making changes, and
measuring the impact
 Example Applications
 Tuning game difficulty
 Balancing the game economy
 Content recommendations
 Computing the lifetime value of players
expressiveintelligencestudio UC Santa Cruz
What Data Can We Analyze?
 Game data
 In-Game events
 Commerce events
 Operational metrics
 User Data
 Web sessions
 Acquisition channel
 3rd Party Data
 Steam Spy
 Platform Data
expressiveintelligencestudio UC Santa Cruz
How can we analyze players?
 Descriptive statistics and exploratory data analysis
 Experimentation & Measurement
 Segmentation
 Forecasting
 Predictive analytics
expressiveintelligencestudio UC Santa Cruz
What teams use Analytics?
 Design
 Product Management
 Marketing
 Business Development
 Strategy
 Finance
 Dev Ops
 Community Management
expressiveintelligencestudio UC Santa Cruz
Analytics for Design
 Question: Identify questions about the
current design
 Record: Enumerate which data needs to be
collected and deploy the game
 Analyze: Determine if the recorded data
matches expectations
 Refine: Given the findings, analyze the design
and formulate additional questions
expressiveintelligencestudio UC Santa Cruz
Tools for Analytics
 Storage
 DBMS, Hadoop, Cloud Storage
 Reporting
 Excel, MicroStrategy, Tableau, D3
 Analysis
 SQL, R, Python, Scala
 Model Building
 R, Weka, MLlib
expressiveintelligencestudio UC Santa Cruz
Applications of Analytics in Games
 Dashboard Analytics (Metrics & KPIs)
 Spatial Analytics (Visualization)
 User Research Analytics
 Market Research Analytics
 Data Science
expressiveintelligencestudio UC Santa Cruz
Dashboard Analytics
 Dead Space 2 Data Cracker (Ben Medler, GDC 2011)
expressiveintelligencestudio UC Santa Cruz
Spatial Analytics
 Player movement in SWTOR (Georg Zoeller, GDC 2010)
expressiveintelligencestudio UC Santa Cruz
User Research Analytics
 My Heart on Halo, Ben Lewis-Evans
expressiveintelligencestudio UC Santa Cruz
Market Research Analytics
 Player Retention data scraped from Raptr.com
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5 6 7 8 9
GameActivityRatio
Weeks Since Release
Dead Rising 2 Case Zero
Mafia II
Crackdown 2
NHL 11
NCAA 11
expressiveintelligencestudio UC Santa Cruz
Recommended Reading (Gamasutra)
 Intro to User Analytics
 What to track and how to analyze data
 Anders Drachen et al.
 Game Analytics 101
 What technologies to use?
 Dmitri Williams
 Indie Game Analytics 101
 What metrics to track?
 Dylan Jones
expressiveintelligencestudio UC Santa Cruz
Machine Learning (ML)
 Algorithms that learn from data and make inferences
or predictions
 Types of Problems
 Classification:
 Identifying the category of an instance
 Prediction: label
 Regression:
 Estimating relationships between variables
 Prediction: value
 Clustering
 Identifying similar groups of instances
expressiveintelligencestudio UC Santa Cruz
The 3 Cultures of Machine Learning
Jason Eisner (2015)
http://cs.jhu.edu/~jason/tutorials/ml-simplex.html
expressiveintelligencestudio UC Santa Cruz
Application to Game Analytics
Forecasting
Predictive
Analytics
A/B Testing
expressiveintelligencestudio UC Santa Cruz
Applying ML to Games
 How to represent the problem?
 What features to use?
 How to evaluate the model being produced?
 How to deploy the model?
expressiveintelligencestudio UC Santa Cruz
Classification Algorithms
 Goal
 Identify the category that an instance belongs to
 Examples
 Is a player going to make a purchase?
 What is an opponent’s build order?
 Is a user going to quit playing this week?
 Algorithms
 Decision Trees
 Logistic Regression
 Neural Networks
 Boosting
 Support Vector Machines
 Nearest Neighbor
expressiveintelligencestudio UC Santa Cruz
Strategy Prediction in StarCraft
StarCraft: Brood War, Blizzard Entertainment 1998
expressiveintelligencestudio UC Santa Cruz
Predicting Build-Orders in StarCraft
 Goal
 Predict the build order of opponents
 Data Sources
 Thousands of replays from Professional players
 Results
 Able to identify opponent build order minutes
before it is executed
 Also able to predict the timing of specific units
expressiveintelligencestudio UC Santa Cruz
Application of ML to StarCraft
 Problem Representation
 Multiclass classification, 6 labels for mid-game builds
 Collect data from professional StarCraft replays
 Features
 Build-order timings of units and features
 Model Evaluation
 Offline evaluation of classification algorithms
 Simulated game time between 0 – 12 minutes
 Model Deployment
 During games, encode game state and predict build order
expressiveintelligencestudio UC Santa Cruz
Timing Distributions
 Different structure and units timings provide
indicators of different build orders
0
100
200
300
400
0 4Game Time (minutes)
Spawning Pool Timing (ZvT)
0
100
200
300
400
500
600
0 6Game Time (minutes)
Factory Timing (TvP)
expressiveintelligencestudio UC Santa Cruz
StarCraft Replay Data
 A partial game log from a Terran versus Zerg game
Player Game Time Action
2
1
2
1
1
2
1
2
1
2
0:00
0:00
1:18
1:22
2:04
2:25
2:50
2:54
3:18
4:10
Train Drone
Train SCV
Train Overlord
Build Supply Depot
Build Barracks
Build Hatchery
Build Barracks
Build Spawning Pool
Train Marine
Train Zergling
expressiveintelligencestudio UC Santa Cruz
Approach
 Feature encoding
 Each player’s actions are encoded in a single vector
 Vectors are labeled using a build-order rule set
 Features describe the game cycle when a unit or
building type is first produced by a player
t, time when x is first produced by P
0, x was not (yet) produced by P
f(x) = {
expressiveintelligencestudio UC Santa Cruz
Labeling Replays
 A rule set for mid game strategies was built for each race
based on analysis of expert play
 Replays are labeled based on the order in which the tech tree
is expanded
Zealot Legs
(Fast Legs)
Archives
(Fast DT)
Support Bay
(Reaver)
Observatory
(Standard)
Nexus
(Expand)
Stargate
(Fast Air)
Robotics
Bay
Citadel
Tier 1
Strategy
Protoss Rule Set
expressiveintelligencestudio UC Santa Cruz
Experiment Methodology
 Algorithms explored
 Nearest neighbor variants
 Decision trees
 Boosting methods
 State lattice
 Rule set
 Simulation Approach
 Set all features to 0
 Step through replay events and update features
 Predict build order every 30 seconds
expressiveintelligencestudio UC Santa Cruz
Build-Order Prediction Results
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8 9 10 11 12
RecallPrecision
Game Time (minutes)
NNge Boosting Rule Set State Lattice
expressiveintelligencestudio UC Santa Cruz
Timing Prediction Results
0
0.5
1
1.5
2
MeanPredictionError(minutes)
Linear Regression M5' Additive Regression
expressiveintelligencestudio UC Santa Cruz
EISBot
expressiveintelligencestudio UC Santa Cruz
Membership Conversion in DCUO
DC Universe Online, Daybreak Games 2011
expressiveintelligencestudio UC Santa Cruz
Membership Conversion in DCUO
 Goal
 Predict which users will convert to membership
 Data Sources
 Session and commerce data
 Detailed in-game telemetry
 Results
 Weekly deployment of targeted user list
 Experimented with different targeting strategies
and uplift modeling
expressiveintelligencestudio UC Santa Cruz
Application of ML to DCUO
 Problem Representation
 Binary classification (Converter vs. non-converter)
 Features
 Login patterns, recent purchases, game-feature usage
 Model Evaluation
 Offline evaluation of classification algorithms
 Simulated upsell
 Model Deployment
 Send list of targeted users to game team
expressiveintelligencestudio UC Santa Cruz
Model Evaluation
 Lift
 Compares sampled response to baseline response
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Predicted/ActualConversions
Percentage of Users Targeted
expressiveintelligencestudio UC Santa Cruz
Uplift Modeling
 Goal
 Target only persuadable users
Source: Predictive Analytics (Eric Siegel, 2013)
expressiveintelligencestudio UC Santa Cruz
Uplift Modeling in DCUO
 2-Phase Approach
 Phase 1
 Select users for targeting
 Phase 2
 Split users into Sure Things & Persuadable
 Target users in the Persuadable group
expressiveintelligencestudio UC Santa Cruz
Measuring Impact
 Goal
 Maximize Incremental revenue
 Approach
 Select targeted user group
 Split targeted group into control and test
 Deploy targeting and measure conversions
 Compare test and control groups
 Calculate incremental revenue
expressiveintelligencestudio UC Santa Cruz
Regression Algorithms
 Goal
 Predict the value of the dependent variable of an instance
 Examples
 How many games is a user going to play?
 What is the lifetime value of a player?
 How to allocate players for balanced matchmaking?
 Algorithms
 Linear Regression
 Regression Tree
 Curve fitting
 Boosting
 Neural Networks
 Nearest Neighbor
expressiveintelligencestudio UC Santa Cruz
Player Retention in Madden NFL
Madden NFL 11, Electronic Arts 2010
expressiveintelligencestudio UC Santa Cruz
Player Retention in Madden NFL
 Goal
 Predict how long players will play
 Identify features correlated with retention
 Data Sources
 Play-by-play game logs
 Results
 Recommendations provided to game team
 Helped develop data-driven culture at EA
expressiveintelligencestudio UC Santa Cruz
Application of ML to Madden
 Problem Representation
 Regression & Simulation
 Features
 Gameplay features used
 Player performance
 Model Evaluation
 Offline evaluation of regression algorithms
 Unique Effect Analysis
 Model Deployment
 Recommendations provided to game team
expressiveintelligencestudio UC Santa Cruz
Robust Unique Effect Analysis
 An algorithm that performs regression and
analyzes unique effects to rank features
 Algorithm overview
1. Build regression models for predicting retention
2. Perturb the inputs to the models
3. Compute the impact of individual features
expressiveintelligencestudio UC Santa Cruz
Algorithm Overview
Testing
Data
Models
Feature
Rankings
Feature
Tweaking
Analyst
Training
Data
ML
Algorithms
Players
expressiveintelligencestudio UC Santa Cruz
Player Representation
 Each player’s behavior is encoded as the following
features (46 total):
 Game modes
 Usage
 Win rates
 Performance metrics
 Turnovers
 Gain
 End conditions
 Completions
 Peer quits
 Feature usage
 Gameflow
 Scouting
 Audibles
 Special moves
 Play Preference
 Running
 Play Diversity
expressiveintelligencestudio UC Santa Cruz
Predicting the Number of Games Played
0
50
100
150
200
250
0 50 100 150 200 250
ActualGamesPlayed
Predicted Games Played
Correlation Coefficient: 0.88
expressiveintelligencestudio UC Santa Cruz
Feature Impact on Number of Games Played
 How does tweaking a single feature impact retention?
0
10
20
30
40
50
60
70
0 0.2 0.4 0.6 0.8 1
PredictedNumberofGamesPlayed
Value of tweaked Feature (normalized)
Peer Quit Ratio
Play Diversity
Actions Per Play
Sacks Allowed
Online Franchise Games
expressiveintelligencestudio UC Santa Cruz
Most Influential Features
 The following features were identified as the most influential
in predicting player retention
Feature Impact
Play Diversity Negative
Online Franchise Wins Positive
Running Plays Positive
Sacks Made Positive
Actions per Play Positive
Interceptions Caught Positive
Sacks Allowed Negative
Peer Quit Ratio Negative
CorrelationStrength
expressiveintelligencestudio UC Santa Cruz
Predicted Number of Games for Different Win Rates
0
5
10
15
20
25
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PredictedNumberofGames
Win Rate
PlayNow
Ranked
Superstar
Franchise
expressiveintelligencestudio UC Santa Cruz
Madden Project Findings
 Simplify playbooks
 Players presented with a large variety of plays have
lower retention and less success
 Clearly present the controls
 Knowledge of controls had a larger impact than
winning on player retention
 Provide the correct challenge
 Multiplayer matches should be as even as possible,
while single player should greatly favor the player
expressiveintelligencestudio UC Santa Cruz
Recommendation Algorithms
 Content-Based Filtering
 Collaborative Filtering
 Item-to-Item
 User-to-User
 Model Based
 Bayesian Inference
expressiveintelligencestudio UC Santa Cruz
Content-Based Filtering
 Steam Storefront
expressiveintelligencestudio UC Santa Cruz
Collaborative Filtering
 EverQuest Landmark
expressiveintelligencestudio UC Santa Cruz
Model Based
 Xbox Recommendation System
[Koenigstein et al., RecSys 2012]
expressiveintelligencestudio UC Santa Cruz
Choosing an Algorithm
 How big is the item catalog? Is it curated?
 What is the target number of users?
 What player context will be used to provide
item recommendations?
expressiveintelligencestudio UC Santa Cruz
Collaborative Filtering (User-Based)
 Rates items for a player based on the player’s
similarity to other players
 Does not require meta-data to be maintained
 Can use explicit and implicit data collection
 Challenges include scalability and cold starts
expressiveintelligencestudio UC Santa Cruz
User-Based Collaborative Filtering
 Users
 Items
A B
101 102 103
Recommendation
Similar
User
expressiveintelligencestudio UC Santa Cruz
Algorithm Overview
Computing a recommendation for a user, U:
For every other user, V
Compute the similarity, S, between U and V
For every item, I, rated by V
Add V’s rating for I, weighted by S to a running average of I
Return the top rated items
expressiveintelligencestudio UC Santa Cruz
Prototyping a Recommender
 Apache Mahout
 Free & scalable Java machine learning library
 Functionality
 User-based and item-based collaborative filtering
 Single machine and cluster implementations
 Built-in evaluation methods
expressiveintelligencestudio UC Santa Cruz
Getting Started with Mahout
1. Choose what to recommend:
ratings or rankings
2. Select a recommendation algorithm
3. Select a similarity measure
4. Encode your data into Mahout’s format
5. Evaluate the results
6. Encode additional features and iterate
expressiveintelligencestudio UC Santa Cruz
Generating Recommendations
Building the Recommender
model = new DataModel(new File(“SalesData.csv”));
similarity = new TanimotoSimilarity(model);
recommender = new UserBasedRecommender(
model, similarity);
Generating a List
recommendations = recommender.recommend(1, 6);
expressiveintelligencestudio UC Santa Cruz
Evaluating Recommendations
0%
10%
20%
30%
40%
50%
1 2 3 4 5 6 7 8 9 10 11 12
#Relevant/#Retrieved
Number of Items Recommended
User-Based (Tanimoto)
User-Based (Log Likelihood)
Item-Based (Tanimoto)
Item-Based (Log Likelihood)
Precision on the
Landmark Dataset
expressiveintelligencestudio UC Santa Cruz
Holdout Experiment
 An experiment that excludes a single item from a
player’s list of purchases
 Goals
 Generate the smallest list that includes the item
 Enable offline evaluation of different algorithms
 Compare recommendations with rule-based approaches
expressiveintelligencestudio UC Santa Cruz
Landmark’s Holdout Results
Recommendations significantly
outperform a top sellers list
80% increase in the holdout
Recall Rate at 6 items
22%
33%
42%
48%
40%
54%
65%
73%
0%
10%
20%
30%
40%
50%
60%
70%
80%
6 Items 12 Items 18 Items 24 Items
RecallRate
# of Recommended Items
Top Sellers Recommendations
expressiveintelligencestudio UC Santa Cruz
Deployment in Landmark
 In-house implementation
 Current Deployment
 Recommendations are generated on the fly and cached
 Planned Expansion
 An offline process builds a user-similarity matrix
 An online process generates item recommendations in
near real-time
expressiveintelligencestudio UC Santa Cruz
Recommended Reading
expressiveintelligencestudio UC Santa Cruz
Data Science at Electronic Arts
 Team Structure
 Technology Stack
 Project Lifecycles
 Career Path
expressiveintelligencestudio UC Santa Cruz
My Projects at EA
 R Server & R Packages
 Origin & EA Access Analytics
 Analytics Best Practices
 Technology Innovation
expressiveintelligencestudio UC Santa Cruz
Questions?
 Ben G. Weber
 Senior Data Scientist, Electronic Arts
 beweber@ea.com, @bgweber
 http://tinyurl.com/WeberGameML
 http://careers.ea.com

More Related Content

What's hot

What's hot (20)

최소 300억은 버는 글로벌 게임 기획 : 몬스터슈퍼리그 사례를 중심으로
최소 300억은 버는 글로벌 게임 기획 : 몬스터슈퍼리그 사례를 중심으로최소 300억은 버는 글로벌 게임 기획 : 몬스터슈퍼리그 사례를 중심으로
최소 300억은 버는 글로벌 게임 기획 : 몬스터슈퍼리그 사례를 중심으로
 
Report on gaming industry in india
Report on gaming industry in indiaReport on gaming industry in india
Report on gaming industry in india
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports Analytics
 
Jago Studios Pitch Deck
Jago Studios Pitch DeckJago Studios Pitch Deck
Jago Studios Pitch Deck
 
Live ops in mobile gaming - how to do it right?
 Live ops in mobile gaming - how to do it right? Live ops in mobile gaming - how to do it right?
Live ops in mobile gaming - how to do it right?
 
Using Data Science to grow games / Robert Magyar (SuperScale)
Using Data Science to grow games / Robert Magyar (SuperScale)Using Data Science to grow games / Robert Magyar (SuperScale)
Using Data Science to grow games / Robert Magyar (SuperScale)
 
GDC Talk: Lifetime Value: The long tail of Mid-Core games
GDC Talk: Lifetime Value: The long tail of Mid-Core gamesGDC Talk: Lifetime Value: The long tail of Mid-Core games
GDC Talk: Lifetime Value: The long tail of Mid-Core games
 
Game development Pre-Production
Game development Pre-ProductionGame development Pre-Production
Game development Pre-Production
 
Personalisation as the key to optimising your game's revenue & LTV.
Personalisation as the key to optimising your game's revenue & LTV.Personalisation as the key to optimising your game's revenue & LTV.
Personalisation as the key to optimising your game's revenue & LTV.
 
Introducing PlayFab -- Effective LiveOps
Introducing PlayFab -- Effective LiveOpsIntroducing PlayFab -- Effective LiveOps
Introducing PlayFab -- Effective LiveOps
 
Game design through the eyes of gaming history
Game design through the eyes of gaming historyGame design through the eyes of gaming history
Game design through the eyes of gaming history
 
Game Design Process
Game Design ProcessGame Design Process
Game Design Process
 
Game Development Step by Step
Game Development Step by StepGame Development Step by Step
Game Development Step by Step
 
Phases of game development
Phases of game developmentPhases of game development
Phases of game development
 
Effective LiveOps Strategies for F2P Games
Effective LiveOps Strategies for F2P GamesEffective LiveOps Strategies for F2P Games
Effective LiveOps Strategies for F2P Games
 
Case Study: Introducing LiveOps and F2P to Traditional Game Mechanics in Roll...
Case Study: Introducing LiveOps and F2P to Traditional Game Mechanics in Roll...Case Study: Introducing LiveOps and F2P to Traditional Game Mechanics in Roll...
Case Study: Introducing LiveOps and F2P to Traditional Game Mechanics in Roll...
 
Riot Games Interview presentation Product launch plan
Riot Games Interview presentation Product launch planRiot Games Interview presentation Product launch plan
Riot Games Interview presentation Product launch plan
 
Intro to liveops
Intro to liveopsIntro to liveops
Intro to liveops
 
Zombi - Shoot for Survive
Zombi - Shoot for SurviveZombi - Shoot for Survive
Zombi - Shoot for Survive
 
[PandoraCube] 게임 기획자 면접 시 가장 많이 하는 질문들과 나의 답
[PandoraCube] 게임 기획자 면접 시 가장 많이 하는 질문들과 나의 답[PandoraCube] 게임 기획자 면접 시 가장 많이 하는 질문들과 나의 답
[PandoraCube] 게임 기획자 면접 시 가장 많이 하는 질문들과 나의 답
 

Similar to Game Analytics & Machine Learning

[Seminar] 200807 Seongyeol Choe
[Seminar] 200807 Seongyeol Choe[Seminar] 200807 Seongyeol Choe
[Seminar] 200807 Seongyeol Choe
ivaderivader
 

Similar to Game Analytics & Machine Learning (20)

Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The ...
Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The ...Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The ...
Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The ...
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
Mining the Madden Experience
Mining the Madden ExperienceMining the Madden Experience
Mining the Madden Experience
 
4Front Game Data Science
4Front Game Data Science4Front Game Data Science
4Front Game Data Science
 
Testing Blockbuster Games: Lessons for All Testers
Testing Blockbuster Games: Lessons for All TestersTesting Blockbuster Games: Lessons for All Testers
Testing Blockbuster Games: Lessons for All Testers
 
Baseball Prediction Model on Tensorflow
Baseball Prediction Model on TensorflowBaseball Prediction Model on Tensorflow
Baseball Prediction Model on Tensorflow
 
Ludocore: A Logical Game Engine for Modeling Videogames
Ludocore: A Logical Game Engine for Modeling VideogamesLudocore: A Logical Game Engine for Modeling Videogames
Ludocore: A Logical Game Engine for Modeling Videogames
 
[Pandora 22] Boosting Game Design with Analytics - Nikola Vasiljevic
[Pandora 22] Boosting Game Design with Analytics - Nikola Vasiljevic[Pandora 22] Boosting Game Design with Analytics - Nikola Vasiljevic
[Pandora 22] Boosting Game Design with Analytics - Nikola Vasiljevic
 
Careers in Computer Games: Game Analytics
Careers in Computer Games: Game AnalyticsCareers in Computer Games: Game Analytics
Careers in Computer Games: Game Analytics
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
Ia G4h 2008
Ia G4h 2008Ia G4h 2008
Ia G4h 2008
 
[Seminar] 200807 Seongyeol Choe
[Seminar] 200807 Seongyeol Choe[Seminar] 200807 Seongyeol Choe
[Seminar] 200807 Seongyeol Choe
 
Tracking a soccer game with Big Data
Tracking a soccer game with Big DataTracking a soccer game with Big Data
Tracking a soccer game with Big Data
 
Strata 2014 Talk:Tracking a Soccer Game with Big Data
Strata 2014 Talk:Tracking a Soccer Game with Big DataStrata 2014 Talk:Tracking a Soccer Game with Big Data
Strata 2014 Talk:Tracking a Soccer Game with Big Data
 
Connecting Your Customers – Building Successful Mobile Games through the Powe...
Connecting Your Customers – Building Successful Mobile Games through the Powe...Connecting Your Customers – Building Successful Mobile Games through the Powe...
Connecting Your Customers – Building Successful Mobile Games through the Powe...
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 
Msis5633 cfb presentation
Msis5633 cfb presentationMsis5633 cfb presentation
Msis5633 cfb presentation
 
Resume_updated
Resume_updatedResume_updated
Resume_updated
 

More from Ben Weber (7)

Impact AI 2020: Portfolio-Scale Data Science at Zynga
Impact AI 2020: Portfolio-Scale Data Science at ZyngaImpact AI 2020: Portfolio-Scale Data Science at Zynga
Impact AI 2020: Portfolio-Scale Data Science at Zynga
 
Ai expo 2019
Ai expo 2019Ai expo 2019
Ai expo 2019
 
Building an Applied Science Portfolio
Building an Applied Science PortfolioBuilding an Applied Science Portfolio
Building an Applied Science Portfolio
 
Building a Recommendation System for EverQuest Landmark’s Marketplace
Building a Recommendation System for EverQuest Landmark’s MarketplaceBuilding a Recommendation System for EverQuest Landmark’s Marketplace
Building a Recommendation System for EverQuest Landmark’s Marketplace
 
Holding Effective Data Meetings
Holding Effective Data MeetingsHolding Effective Data Meetings
Holding Effective Data Meetings
 
Applying Reactive Planning Idioms to Behavior Trees
Applying Reactive Planning Idioms to Behavior TreesApplying Reactive Planning Idioms to Behavior Trees
Applying Reactive Planning Idioms to Behavior Trees
 
Building a Recommendation System for EverQuest Landmark's Marketplace
Building a Recommendation System for EverQuest Landmark's MarketplaceBuilding a Recommendation System for EverQuest Landmark's Marketplace
Building a Recommendation System for EverQuest Landmark's Marketplace
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Game Analytics & Machine Learning

  • 1. expressiveintelligencestudio Game Analytics & Machine Learning Guest Lecture: UCSC CMPM 146 Ben Weber Sr. Data Scientist Electronic Arts beweber@ea.com
  • 2. expressiveintelligencestudio UC Santa Cruz About Me  Ben G. Weber  Senior Data Scientist  Electronic Arts  Experience  PhD in Computer Science, UCSC  Technical Analyst Intern, EA  User Research Analyst, Microsoft Studios  Ecommerce Analyst, Sony Online Entertainment  Director of BI & Analytics, Daybreak Games
  • 4. expressiveintelligencestudio UC Santa Cruz Talk Overview  Game Analytics  Classification Algorithms  Strategy Prediction in StarCraft: Brood War  Membership Conversion in DC Universe Online  Regression Algorithms  Player Retention in Madden NFL  Recommendation Algorithms  Recommendation System in EverQuest Landmark  Data Science at EA
  • 5. expressiveintelligencestudio UC Santa Cruz What is Game Analytics?  Using data to inform game design and improve player lifecycles  Gathering feedback from players, making changes, and measuring the impact  Example Applications  Tuning game difficulty  Balancing the game economy  Content recommendations  Computing the lifetime value of players
  • 6. expressiveintelligencestudio UC Santa Cruz What Data Can We Analyze?  Game data  In-Game events  Commerce events  Operational metrics  User Data  Web sessions  Acquisition channel  3rd Party Data  Steam Spy  Platform Data
  • 7. expressiveintelligencestudio UC Santa Cruz How can we analyze players?  Descriptive statistics and exploratory data analysis  Experimentation & Measurement  Segmentation  Forecasting  Predictive analytics
  • 8. expressiveintelligencestudio UC Santa Cruz What teams use Analytics?  Design  Product Management  Marketing  Business Development  Strategy  Finance  Dev Ops  Community Management
  • 9. expressiveintelligencestudio UC Santa Cruz Analytics for Design  Question: Identify questions about the current design  Record: Enumerate which data needs to be collected and deploy the game  Analyze: Determine if the recorded data matches expectations  Refine: Given the findings, analyze the design and formulate additional questions
  • 10. expressiveintelligencestudio UC Santa Cruz Tools for Analytics  Storage  DBMS, Hadoop, Cloud Storage  Reporting  Excel, MicroStrategy, Tableau, D3  Analysis  SQL, R, Python, Scala  Model Building  R, Weka, MLlib
  • 11. expressiveintelligencestudio UC Santa Cruz Applications of Analytics in Games  Dashboard Analytics (Metrics & KPIs)  Spatial Analytics (Visualization)  User Research Analytics  Market Research Analytics  Data Science
  • 12. expressiveintelligencestudio UC Santa Cruz Dashboard Analytics  Dead Space 2 Data Cracker (Ben Medler, GDC 2011)
  • 13. expressiveintelligencestudio UC Santa Cruz Spatial Analytics  Player movement in SWTOR (Georg Zoeller, GDC 2010)
  • 14. expressiveintelligencestudio UC Santa Cruz User Research Analytics  My Heart on Halo, Ben Lewis-Evans
  • 15. expressiveintelligencestudio UC Santa Cruz Market Research Analytics  Player Retention data scraped from Raptr.com 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 2 3 4 5 6 7 8 9 GameActivityRatio Weeks Since Release Dead Rising 2 Case Zero Mafia II Crackdown 2 NHL 11 NCAA 11
  • 16. expressiveintelligencestudio UC Santa Cruz Recommended Reading (Gamasutra)  Intro to User Analytics  What to track and how to analyze data  Anders Drachen et al.  Game Analytics 101  What technologies to use?  Dmitri Williams  Indie Game Analytics 101  What metrics to track?  Dylan Jones
  • 17. expressiveintelligencestudio UC Santa Cruz Machine Learning (ML)  Algorithms that learn from data and make inferences or predictions  Types of Problems  Classification:  Identifying the category of an instance  Prediction: label  Regression:  Estimating relationships between variables  Prediction: value  Clustering  Identifying similar groups of instances
  • 18. expressiveintelligencestudio UC Santa Cruz The 3 Cultures of Machine Learning Jason Eisner (2015) http://cs.jhu.edu/~jason/tutorials/ml-simplex.html
  • 19. expressiveintelligencestudio UC Santa Cruz Application to Game Analytics Forecasting Predictive Analytics A/B Testing
  • 20. expressiveintelligencestudio UC Santa Cruz Applying ML to Games  How to represent the problem?  What features to use?  How to evaluate the model being produced?  How to deploy the model?
  • 21. expressiveintelligencestudio UC Santa Cruz Classification Algorithms  Goal  Identify the category that an instance belongs to  Examples  Is a player going to make a purchase?  What is an opponent’s build order?  Is a user going to quit playing this week?  Algorithms  Decision Trees  Logistic Regression  Neural Networks  Boosting  Support Vector Machines  Nearest Neighbor
  • 22. expressiveintelligencestudio UC Santa Cruz Strategy Prediction in StarCraft StarCraft: Brood War, Blizzard Entertainment 1998
  • 23. expressiveintelligencestudio UC Santa Cruz Predicting Build-Orders in StarCraft  Goal  Predict the build order of opponents  Data Sources  Thousands of replays from Professional players  Results  Able to identify opponent build order minutes before it is executed  Also able to predict the timing of specific units
  • 24. expressiveintelligencestudio UC Santa Cruz Application of ML to StarCraft  Problem Representation  Multiclass classification, 6 labels for mid-game builds  Collect data from professional StarCraft replays  Features  Build-order timings of units and features  Model Evaluation  Offline evaluation of classification algorithms  Simulated game time between 0 – 12 minutes  Model Deployment  During games, encode game state and predict build order
  • 25. expressiveintelligencestudio UC Santa Cruz Timing Distributions  Different structure and units timings provide indicators of different build orders 0 100 200 300 400 0 4Game Time (minutes) Spawning Pool Timing (ZvT) 0 100 200 300 400 500 600 0 6Game Time (minutes) Factory Timing (TvP)
  • 26. expressiveintelligencestudio UC Santa Cruz StarCraft Replay Data  A partial game log from a Terran versus Zerg game Player Game Time Action 2 1 2 1 1 2 1 2 1 2 0:00 0:00 1:18 1:22 2:04 2:25 2:50 2:54 3:18 4:10 Train Drone Train SCV Train Overlord Build Supply Depot Build Barracks Build Hatchery Build Barracks Build Spawning Pool Train Marine Train Zergling
  • 27. expressiveintelligencestudio UC Santa Cruz Approach  Feature encoding  Each player’s actions are encoded in a single vector  Vectors are labeled using a build-order rule set  Features describe the game cycle when a unit or building type is first produced by a player t, time when x is first produced by P 0, x was not (yet) produced by P f(x) = {
  • 28. expressiveintelligencestudio UC Santa Cruz Labeling Replays  A rule set for mid game strategies was built for each race based on analysis of expert play  Replays are labeled based on the order in which the tech tree is expanded Zealot Legs (Fast Legs) Archives (Fast DT) Support Bay (Reaver) Observatory (Standard) Nexus (Expand) Stargate (Fast Air) Robotics Bay Citadel Tier 1 Strategy Protoss Rule Set
  • 29. expressiveintelligencestudio UC Santa Cruz Experiment Methodology  Algorithms explored  Nearest neighbor variants  Decision trees  Boosting methods  State lattice  Rule set  Simulation Approach  Set all features to 0  Step through replay events and update features  Predict build order every 30 seconds
  • 30. expressiveintelligencestudio UC Santa Cruz Build-Order Prediction Results 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 12 RecallPrecision Game Time (minutes) NNge Boosting Rule Set State Lattice
  • 31. expressiveintelligencestudio UC Santa Cruz Timing Prediction Results 0 0.5 1 1.5 2 MeanPredictionError(minutes) Linear Regression M5' Additive Regression
  • 33. expressiveintelligencestudio UC Santa Cruz Membership Conversion in DCUO DC Universe Online, Daybreak Games 2011
  • 34. expressiveintelligencestudio UC Santa Cruz Membership Conversion in DCUO  Goal  Predict which users will convert to membership  Data Sources  Session and commerce data  Detailed in-game telemetry  Results  Weekly deployment of targeted user list  Experimented with different targeting strategies and uplift modeling
  • 35. expressiveintelligencestudio UC Santa Cruz Application of ML to DCUO  Problem Representation  Binary classification (Converter vs. non-converter)  Features  Login patterns, recent purchases, game-feature usage  Model Evaluation  Offline evaluation of classification algorithms  Simulated upsell  Model Deployment  Send list of targeted users to game team
  • 36. expressiveintelligencestudio UC Santa Cruz Model Evaluation  Lift  Compares sampled response to baseline response 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Predicted/ActualConversions Percentage of Users Targeted
  • 37. expressiveintelligencestudio UC Santa Cruz Uplift Modeling  Goal  Target only persuadable users Source: Predictive Analytics (Eric Siegel, 2013)
  • 38. expressiveintelligencestudio UC Santa Cruz Uplift Modeling in DCUO  2-Phase Approach  Phase 1  Select users for targeting  Phase 2  Split users into Sure Things & Persuadable  Target users in the Persuadable group
  • 39. expressiveintelligencestudio UC Santa Cruz Measuring Impact  Goal  Maximize Incremental revenue  Approach  Select targeted user group  Split targeted group into control and test  Deploy targeting and measure conversions  Compare test and control groups  Calculate incremental revenue
  • 40. expressiveintelligencestudio UC Santa Cruz Regression Algorithms  Goal  Predict the value of the dependent variable of an instance  Examples  How many games is a user going to play?  What is the lifetime value of a player?  How to allocate players for balanced matchmaking?  Algorithms  Linear Regression  Regression Tree  Curve fitting  Boosting  Neural Networks  Nearest Neighbor
  • 41. expressiveintelligencestudio UC Santa Cruz Player Retention in Madden NFL Madden NFL 11, Electronic Arts 2010
  • 42. expressiveintelligencestudio UC Santa Cruz Player Retention in Madden NFL  Goal  Predict how long players will play  Identify features correlated with retention  Data Sources  Play-by-play game logs  Results  Recommendations provided to game team  Helped develop data-driven culture at EA
  • 43. expressiveintelligencestudio UC Santa Cruz Application of ML to Madden  Problem Representation  Regression & Simulation  Features  Gameplay features used  Player performance  Model Evaluation  Offline evaluation of regression algorithms  Unique Effect Analysis  Model Deployment  Recommendations provided to game team
  • 44. expressiveintelligencestudio UC Santa Cruz Robust Unique Effect Analysis  An algorithm that performs regression and analyzes unique effects to rank features  Algorithm overview 1. Build regression models for predicting retention 2. Perturb the inputs to the models 3. Compute the impact of individual features
  • 45. expressiveintelligencestudio UC Santa Cruz Algorithm Overview Testing Data Models Feature Rankings Feature Tweaking Analyst Training Data ML Algorithms Players
  • 46. expressiveintelligencestudio UC Santa Cruz Player Representation  Each player’s behavior is encoded as the following features (46 total):  Game modes  Usage  Win rates  Performance metrics  Turnovers  Gain  End conditions  Completions  Peer quits  Feature usage  Gameflow  Scouting  Audibles  Special moves  Play Preference  Running  Play Diversity
  • 47. expressiveintelligencestudio UC Santa Cruz Predicting the Number of Games Played 0 50 100 150 200 250 0 50 100 150 200 250 ActualGamesPlayed Predicted Games Played Correlation Coefficient: 0.88
  • 48. expressiveintelligencestudio UC Santa Cruz Feature Impact on Number of Games Played  How does tweaking a single feature impact retention? 0 10 20 30 40 50 60 70 0 0.2 0.4 0.6 0.8 1 PredictedNumberofGamesPlayed Value of tweaked Feature (normalized) Peer Quit Ratio Play Diversity Actions Per Play Sacks Allowed Online Franchise Games
  • 49. expressiveintelligencestudio UC Santa Cruz Most Influential Features  The following features were identified as the most influential in predicting player retention Feature Impact Play Diversity Negative Online Franchise Wins Positive Running Plays Positive Sacks Made Positive Actions per Play Positive Interceptions Caught Positive Sacks Allowed Negative Peer Quit Ratio Negative CorrelationStrength
  • 50. expressiveintelligencestudio UC Santa Cruz Predicted Number of Games for Different Win Rates 0 5 10 15 20 25 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% PredictedNumberofGames Win Rate PlayNow Ranked Superstar Franchise
  • 51. expressiveintelligencestudio UC Santa Cruz Madden Project Findings  Simplify playbooks  Players presented with a large variety of plays have lower retention and less success  Clearly present the controls  Knowledge of controls had a larger impact than winning on player retention  Provide the correct challenge  Multiplayer matches should be as even as possible, while single player should greatly favor the player
  • 52. expressiveintelligencestudio UC Santa Cruz Recommendation Algorithms  Content-Based Filtering  Collaborative Filtering  Item-to-Item  User-to-User  Model Based  Bayesian Inference
  • 53. expressiveintelligencestudio UC Santa Cruz Content-Based Filtering  Steam Storefront
  • 54. expressiveintelligencestudio UC Santa Cruz Collaborative Filtering  EverQuest Landmark
  • 55. expressiveintelligencestudio UC Santa Cruz Model Based  Xbox Recommendation System [Koenigstein et al., RecSys 2012]
  • 56. expressiveintelligencestudio UC Santa Cruz Choosing an Algorithm  How big is the item catalog? Is it curated?  What is the target number of users?  What player context will be used to provide item recommendations?
  • 57. expressiveintelligencestudio UC Santa Cruz Collaborative Filtering (User-Based)  Rates items for a player based on the player’s similarity to other players  Does not require meta-data to be maintained  Can use explicit and implicit data collection  Challenges include scalability and cold starts
  • 58. expressiveintelligencestudio UC Santa Cruz User-Based Collaborative Filtering  Users  Items A B 101 102 103 Recommendation Similar User
  • 59. expressiveintelligencestudio UC Santa Cruz Algorithm Overview Computing a recommendation for a user, U: For every other user, V Compute the similarity, S, between U and V For every item, I, rated by V Add V’s rating for I, weighted by S to a running average of I Return the top rated items
  • 60. expressiveintelligencestudio UC Santa Cruz Prototyping a Recommender  Apache Mahout  Free & scalable Java machine learning library  Functionality  User-based and item-based collaborative filtering  Single machine and cluster implementations  Built-in evaluation methods
  • 61. expressiveintelligencestudio UC Santa Cruz Getting Started with Mahout 1. Choose what to recommend: ratings or rankings 2. Select a recommendation algorithm 3. Select a similarity measure 4. Encode your data into Mahout’s format 5. Evaluate the results 6. Encode additional features and iterate
  • 62. expressiveintelligencestudio UC Santa Cruz Generating Recommendations Building the Recommender model = new DataModel(new File(“SalesData.csv”)); similarity = new TanimotoSimilarity(model); recommender = new UserBasedRecommender( model, similarity); Generating a List recommendations = recommender.recommend(1, 6);
  • 63. expressiveintelligencestudio UC Santa Cruz Evaluating Recommendations 0% 10% 20% 30% 40% 50% 1 2 3 4 5 6 7 8 9 10 11 12 #Relevant/#Retrieved Number of Items Recommended User-Based (Tanimoto) User-Based (Log Likelihood) Item-Based (Tanimoto) Item-Based (Log Likelihood) Precision on the Landmark Dataset
  • 64. expressiveintelligencestudio UC Santa Cruz Holdout Experiment  An experiment that excludes a single item from a player’s list of purchases  Goals  Generate the smallest list that includes the item  Enable offline evaluation of different algorithms  Compare recommendations with rule-based approaches
  • 65. expressiveintelligencestudio UC Santa Cruz Landmark’s Holdout Results Recommendations significantly outperform a top sellers list 80% increase in the holdout Recall Rate at 6 items 22% 33% 42% 48% 40% 54% 65% 73% 0% 10% 20% 30% 40% 50% 60% 70% 80% 6 Items 12 Items 18 Items 24 Items RecallRate # of Recommended Items Top Sellers Recommendations
  • 66. expressiveintelligencestudio UC Santa Cruz Deployment in Landmark  In-house implementation  Current Deployment  Recommendations are generated on the fly and cached  Planned Expansion  An offline process builds a user-similarity matrix  An online process generates item recommendations in near real-time
  • 67. expressiveintelligencestudio UC Santa Cruz Recommended Reading
  • 68. expressiveintelligencestudio UC Santa Cruz Data Science at Electronic Arts  Team Structure  Technology Stack  Project Lifecycles  Career Path
  • 69. expressiveintelligencestudio UC Santa Cruz My Projects at EA  R Server & R Packages  Origin & EA Access Analytics  Analytics Best Practices  Technology Innovation
  • 70. expressiveintelligencestudio UC Santa Cruz Questions?  Ben G. Weber  Senior Data Scientist, Electronic Arts  beweber@ea.com, @bgweber  http://tinyurl.com/WeberGameML  http://careers.ea.com