Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ai expo 2019

170 views

Published on

https://www.ai-expo.net/northamerica/speaker/ben-weber/

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Ai expo 2019

  1. 1. Automated Feature Engineering at Zynga 1 Ben Weber Data Science @ Zynga November 14, 2019
  2. 2. 2 Zynga Games 2
  3. 3. Our Challenge • We have tens of millions of players and dozens of games across multiple platforms • Our games have diverse event taxonomies • We want to build accurate models for personalizing our gameplay experiences 3
  4. 4. “One of the holy grails of machine learning is to automate more and more of the feature engineering process.” Pedro Domingos CACM 2012 4
  5. 5. Our Approach • Leverage ML libraries to automate feature engineering • Develop Portfolio-Scale data products • Empower our game studios with ML models 5
  6. 6. Use Cases 6
  7. 7. Applications Propensity Models: What actions are players performing? Segmentation: Who are our players? Anomaly Detection: which players are bad actors? Recommendation: What actions should they take? 7
  8. 8. Feature Encoding Input Dataset • Thousands of events per player Feature Generation • Aggregation with FeatureTools Output Dataset • A single row per player 8 Raw Event Data Player Summaries
  9. 9. Propensity Models • We predict which users are likely to act using classification models • Game studios use propensity scores to define experiment groups • Feature generation reduces the need for manual feature engineering 9 Data Extract Feature Engineering Feature Application Model Training Model Publish
  10. 10. Segmentation • Generated features are used as input to k-means clustering • Archetype labels are assigned based on qualitative analysis 10
  11. 11. Anomaly Detection • Players are represented as 1D images • We train an autoencoder to reduce dimensionality • Players with large vector differences are flagged as suspect 11 Features Latent Space InputLayer OutputLayer Players Features Players AutoencoderInput Vectors Output Vectors
  12. 12. Recommendation Systems • Feature engineering is used for item & guild recommendations • Cosine similarity is applied to normalized generated features Item Recommendations sim(u, v) = u * v || u || * || v || weighti = ∑ sim(u, w) * rating(w, i) w = user neighborhood 12
  13. 13. Feature Engineering 13
  14. 14. FeatureTools • A python library for deep feature synthesis • Represents data as entity sets • Identifies feature descriptors for transforming your data into a shallow and wide format • Open-source version maintained by FeatureLabs 14
  15. 15. Kaggle NHL Dataset 15
  16. 16. 16 Data Frames game_df plays_df
  17. 17. 17 Entity Sets • Define the tables and relationships for DFS • Operate on Pandas data frames
  18. 18. 18 1-Hot Encoding
  19. 19. 19 Deep Feature Synthesis
  20. 20. Applying FeatureTools • We translate our raw tracking events into player summaries • Supports dozens of games with diverse taxonomies • Minimizes manual steps in our data science workflows • Scales to millions of players and billions of records 20
  21. 21. Deployment 21
  22. 22. Tech Stack • Databricks for PySpark • FeatureTools for generation • Pandas UDFs for distribution • MLlib for predictive modeling 22
  23. 23. • Introduced in Spark 2.3 • Provide Scalar and Grouped map operations • Partitioned using a groupby clause • Enable distributing code that uses Pandas 23 Pandas UDFs
  24. 24. 24 UDF Pandas Output Pandas Input Spark Output Spark Input UDF Pandas Output Pandas Input UDF Pandas Output Pandas Input UDF Pandas Output Pandas Input UDF Pandas Output Pandas Input Grouped MAP UDFs
  25. 25. 25 Feature Generation at Scale
  26. 26. AutoModel System •Generates hundreds of propensity models •Powers features in our games & live services 26 Data Extract Feature Engineering Feature Application Model Training Model Publish
  27. 27. Wrapping Up 27
  28. 28. Machine Learning at Zynga Old Approach • Custom data science and engineering work per model • Months-long development cycles • Ad-hoc process for deploying models to production 28 New Approach • Minimal effort spent on the feature engineering stage • No custom work for new games • Model outputs are published to application databases
  29. 29. Takeaways • Zynga is leveraging automated feature engineering to build Portfolio-Scale data products • We are using PySpark to scale to tens of millions of players • Feature generation has unlocked novel data products 29
  30. 30. 30 Automated Feature Engineering at Zynga Ben Weber Distinguished Data Scientist bweber@zynga.com https://www.zynga.com/jobs/

×