Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar


Published on

As a data driven company, we use Machine Learning algos and A/B tests to drive all of the content recommendations for our members. To improve the quality of our personalized recommendations, we try an idea offline using historical data. Ideas that improve our offline metrics are then pushed as A/B tests which are measured through statistically significant improvements in core metrics such as member engagement, satisfaction, and retention.The heart of such offline analyses are historical facts data that are used to generate features required by the machine learning model. For example, viewing history of a member, videos in mylist etc.

Building a fact store at an ever evolving Netflix scale is non trivial. Ensuring we capture enough fact data to cover all stratification needs of various experiments and guarantee that the data we serve is temporally accurate is an important requirement. In this talk, we will present the key requirements, evolution of our fact store design, its implementation, the scale and our learnings.

We will also take a deep dive into fact vs feature logging, design tradeoffs, infrastructure performance, reliability and query API for the store. We use Spark and Scala extensively and variety of compression techniques to store/retrieve data efficiently.

Published in: Data & Analytics
  • Login to see the comments

Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar Sadekar

  1. 1. Kedar Sadekar, Netflix Nitin Sharma, Netflix Fact Store - Netflix Recommendations #DevSAIS11
  2. 2. Agenda ● ● ● ● ● ● #DevSAIS11
  3. 3. #DevSAIS11 Recommendations at Netflix ● Personalized Homepage for each member ○ Goal: Quickly help members find content they’d like to watch ○ Risk: Member may lose interest and abandon the service ○ Challenge: Recommendations at Scale
  4. 4. #DevSAIS11 Scale @ Netflix ● ● ● ●
  5. 5. #DevSAIS11 Experimentation Cycle @ Netflix Design a New Experiment to Test Out Different Ideas Offline Experiment Online System Design Experiment Model Testing Collect Label Data Offline Feature Generation Model Training Model Validation Metrics Online A/B Testing
  6. 6. #DevSAIS11 ML Feature Engineering - Architectural View Online Feature Generation Microservices Online Scoring Offline Feature Generation Shared Feature Encoders Model Training Deploy Models Online SystemOffline Experiment Features Facts
  7. 7. #DevSAIS11 What is a Fact? ● Fact ○ Input data for feature encoders. Used to construct a feature ○ Example: Viewing history of member, my list of a member ● Historical Version of a fact ○ Rewindable - State of the world at that time ● Temporal ■ Facts are temporal i.e. they change with time ■ Each online scoring service uses the latest value of a fact
  8. 8. #DevSAIS11 Online Scoring Predictor Fact Microservices Features Facts Log these Online Scoring Predictor Fact Microservices Features Facts Log these Recommendations Recommendations Feature Logging Fact Logging
  9. 9. #DevSAIS11 Fact Logging - Pull Architecture Pull ● Daily snapshots of key facts ● Storage ○ S3 & Parquet ● Api to access the data ○ RDD & DataFrames ● Cons ○ Lacks temporal accuracy ○ Load on Microservices ○ Missing Experiment specific facts Capture Snapshots Fact Microservices Stratified Member sets Snapshots
  10. 10. #DevSAIS11 Fact Logging - Push Architecture ● Compute engines themselves control what to log ● Stratification ● Temporal accuracy Compute Services Fact Store Fact Transformer Fact Fetcher Fact Logger ML Workflows Feature Generation Model Training
  11. 11. #DevSAIS11 Fact Logger Precompute Live Compute Fact Logger ● Library ● Facts ○ User Related ○ Video Related ○ Computation Specific ● Serialization ● Stratification Service ● Fact Stream ● Storage Base Fact Tables Stratification
  12. 12. #DevSAIS11 Fact Logging - Scalability Precompute Live Compute Fact Logger ● 5-10x increase in data through Kafka ● SLA Impact; Cost Increase ● Compression - 70% decrease
  13. 13. Storage & Access Fact Store Fact Transformer Deduplication Precompute Live Compute Fact Logger ● Pipeline load ○ Repeated facts ● Aggressive or not ○ Loss threshold Conditional push ● Spark Job ○ Fact pointers ○ SLA #DevSAIS11
  14. 14. #DevSAIS11 API Lookback Member ID My List View History Thumbs 122312 My List Value View History Pointer Thumbs Value 254637 My List Pointer View History Pointer Thumbs Pointer Member n My List Pointer View History Value Thumbs Pointer My List Partition 1 Values Partition m Values View History Partition 1 Values Partition m Values Thumbs Partition 1 Values Partition m Values log_time - x log_time - y log_time - z
  15. 15. Storage & Access Fact Store Fact Transformer Read API Precompute Live Compute Fact Logger ● Query performance ○ Slow moving facts ● Point query ○ Connector ● Query time reduction ○ Hours to minutes Deduplication Conditional push Write Read #DevSAIS11
  16. 16. Performance: Storage • Partitioning scheme – Noisy neighbor • Storage format – Exploratory vs production • Fast & Slow lane – Lookback limit #DevSAIS11
  17. 17. Performance: Spark reads • Bloom Filters – Reduce scan • Cache Access – EVCache, Spectator • MapPartitions vs UDF – Eager vs Lazy – SPARK-11438, SPARK-11469, SPARK-20586 Application ML Library Read API #DevSAIS11
  18. 18. Future Work • Structured with schema evolution – Best of both (POJO & Spark SQL), Iceberg • Streaming vs Batch – Multiple lanes, accountability, independent scale • Duplication – Storage vs Runtime cost #DevSAIS11
  19. 19. #DevSAIS11 Questions?