Automated Feature
Engineering at Zynga
1
Ben Weber
Data Science @ Zynga
November 14, 2019
2
Zynga Games
2
Our Challenge
• We have tens of millions of players and dozens of
games across multiple platforms
• Our games have diverse event taxonomies
• We want to build accurate models for personalizing
our gameplay experiences
3
“One of the holy grails of machine learning is
to automate more and more of the feature
engineering process.”
Pedro Domingos
CACM 2012
4
Our Approach
• Leverage ML libraries to automate feature engineering
• Develop Portfolio-Scale data products
• Empower our game studios with ML models
5
Use Cases
6
Applications
Propensity Models: What actions are players performing?
Segmentation: Who are our players?
Anomaly Detection: which players are bad actors?
Recommendation: What actions should they take?
7
Feature Encoding
Input Dataset
• Thousands of events per player
Feature Generation
• Aggregation with FeatureTools
Output Dataset
• A single row per player
8
Raw
Event
Data
Player Summaries
Propensity Models
• We predict which users are likely to act using classification models
• Game studios use propensity scores to define experiment groups
• Feature generation reduces the need for manual feature engineering
9
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish
Segmentation
• Generated features are used as input to k-means clustering
• Archetype labels are assigned based on qualitative analysis
10
Anomaly Detection
• Players are represented as 1D images
• We train an autoencoder to reduce dimensionality
• Players with large vector differences are flagged as suspect
11
Features
Latent
Space
InputLayer
OutputLayer
Players
Features
Players
AutoencoderInput Vectors Output Vectors
Recommendation Systems
• Feature engineering is used for item & guild recommendations
• Cosine similarity is applied to normalized generated features
Item Recommendations
sim(u, v) = u * v
|| u || * || v ||
weighti
= ∑ sim(u, w) * rating(w, i)
w = user neighborhood
12
Feature
Engineering
13
FeatureTools
• A python library for deep feature synthesis
• Represents data as entity sets
• Identifies feature descriptors for transforming your
data into a shallow and wide format
• Open-source version maintained by FeatureLabs
14
Kaggle NHL Dataset
15
16
Data Frames
game_df
plays_df
17
Entity Sets
• Define the tables and
relationships for DFS
• Operate on Pandas
data frames
18
1-Hot Encoding
19
Deep Feature Synthesis
Applying FeatureTools
• We translate our raw tracking events into player summaries
• Supports dozens of games with diverse taxonomies
• Minimizes manual steps in our data science workflows
• Scales to millions of players and billions of records
20
Deployment
21
Tech Stack
• Databricks for PySpark
• FeatureTools for generation
• Pandas UDFs for distribution
• MLlib for predictive modeling
22
• Introduced in Spark 2.3
• Provide Scalar and Grouped map operations
• Partitioned using a groupby clause
• Enable distributing code that uses Pandas
23
Pandas UDFs
24
UDF
Pandas
Output
Pandas
Input
Spark Output
Spark Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
Grouped MAP UDFs
25
Feature Generation at Scale
AutoModel System
•Generates hundreds of propensity models
•Powers features in our games & live services
26
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish
Wrapping Up
27
Machine Learning at Zynga
Old Approach
• Custom data science and
engineering work per model
• Months-long development cycles
• Ad-hoc process for deploying
models to production
28
New Approach
• Minimal effort spent on the
feature engineering stage
• No custom work for new games
• Model outputs are published to
application databases
Takeaways
• Zynga is leveraging automated feature engineering to build
Portfolio-Scale data products
• We are using PySpark to scale to tens of millions of players
• Feature generation has unlocked novel data products
29
30
Automated Feature Engineering at Zynga
Ben Weber
Distinguished Data Scientist
bweber@zynga.com
https://www.zynga.com/jobs/

Ai expo 2019