Ai expo 2019

Automated Feature
Engineering at Zynga
1
Ben Weber
Data Science @ Zynga
November 14, 2019

Our Challenge
• We have tens of millions of players and dozens of
games across multiple platforms
• Our games have diverse event taxonomies
• We want to build accurate models for personalizing
our gameplay experiences
3

“One of the holy grails of machine learning is
to automate more and more of the feature
engineering process.”
Pedro Domingos
CACM 2012
4

Our Approach
• Leverage ML libraries to automate feature engineering
• Develop Portfolio-Scale data products
• Empower our game studios with ML models
5

Applications
Propensity Models: What actions are players performing?
Segmentation: Who are our players?
Anomaly Detection: which players are bad actors?
Recommendation: What actions should they take?
7

Feature Encoding
Input Dataset
• Thousands of events per player
Feature Generation
• Aggregation with FeatureTools
Output Dataset
• A single row per player
8
Raw
Event
Data
Player Summaries

Propensity Models
• We predict which users are likely to act using classiﬁcation models
• Game studios use propensity scores to deﬁne experiment groups
• Feature generation reduces the need for manual feature engineering
9
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish

Segmentation
• Generated features are used as input to k-means clustering
• Archetype labels are assigned based on qualitative analysis
10

Anomaly Detection
• Players are represented as 1D images
• We train an autoencoder to reduce dimensionality
• Players with large vector differences are ﬂagged as suspect
11
Features
Latent
Space
InputLayer
OutputLayer
Players
Features
Players
AutoencoderInput Vectors Output Vectors

Recommendation Systems
• Feature engineering is used for item & guild recommendations
• Cosine similarity is applied to normalized generated features
Item Recommendations
sim(u, v) = u * v
|| u || * || v ||
weighti
= ∑ sim(u, w) * rating(w, i)
w = user neighborhood
12

FeatureTools
• A python library for deep feature synthesis
• Represents data as entity sets
• Identiﬁes feature descriptors for transforming your
data into a shallow and wide format
• Open-source version maintained by FeatureLabs
14

16
Data Frames
game_df
plays_df

17
Entity Sets
• Deﬁne the tables and
relationships for DFS
• Operate on Pandas
data frames

Applying FeatureTools
• We translate our raw tracking events into player summaries
• Supports dozens of games with diverse taxonomies
• Minimizes manual steps in our data science workﬂows
• Scales to millions of players and billions of records
20

Tech Stack
• Databricks for PySpark
• FeatureTools for generation
• Pandas UDFs for distribution
• MLlib for predictive modeling
22

• Introduced in Spark 2.3
• Provide Scalar and Grouped map operations
• Partitioned using a groupby clause
• Enable distributing code that uses Pandas
23
Pandas UDFs

24
UDF
Pandas
Output
Pandas
Input
Spark Output
Spark Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
UDF
Pandas
Output
Pandas
Input
Grouped MAP UDFs

25
Feature Generation at Scale

AutoModel System
•Generates hundreds of propensity models
•Powers features in our games & live services
26
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish

Machine Learning at Zynga
Old Approach
• Custom data science and
engineering work per model
• Months-long development cycles
• Ad-hoc process for deploying
models to production
28
New Approach
• Minimal effort spent on the
feature engineering stage
• No custom work for new games
• Model outputs are published to
application databases

Takeaways
• Zynga is leveraging automated feature engineering to build
Portfolio-Scale data products
• We are using PySpark to scale to tens of millions of players
• Feature generation has unlocked novel data products
29

30
Automated Feature Engineering at Zynga
Ben Weber
Distinguished Data Scientist
bweber@zynga.com
https://www.zynga.com/jobs/

Ai expo 2019

More Related Content

Similar to Ai expo 2019

More from Ben Weber

Recently uploaded

Ai expo 2019