Netflix Recommendations - Fact Store

•

0 likes•66 views

Karthik Murugesan

Technology

Kedar Sadekar, Netflix
Nitin Sharma, Netflix
Fact Store - Netflix
Recommendations
#DevSAIS11

#DevSAIS11
Recommendations at Netflix
● Personalized Homepage for each member
○ Goal: Quickly help members find content they’d like to
watch
○ Risk: Member may lose interest and abandon the service
○ Challenge: Recommendations at Scale

#DevSAIS11
Experimentation Cycle @ Netflix
Design a New Experiment to Test Out Different Ideas
Offline
Experiment
Online
System
Design Experiment Model Testing
Collect Label Data
Offline Feature
Generation
Model Training Model Validation Metrics
Online A/B Testing

#DevSAIS11
ML Feature Engineering - Architectural View
Online Feature
Generation
Microservices
Online Scoring
Offline Feature
Generation
Shared Feature
Encoders
Model
Training
Deploy Models
Online SystemOffline Experiment
Features
Facts

#DevSAIS11
What is a Fact?
● Fact
○ Input data for feature encoders. Used to construct a feature
○ Example: Viewing history of member, my list of a member
● Historical Version of a fact
○ Rewindable - State of the world at that time
● Temporal
■ Facts are temporal i.e. they change with time
■ Each online scoring service uses the latest value of a fact

#DevSAIS11
Online
Scoring
Predictor
Fact Microservices
Features
Facts
Log these
Online
Scoring
Predictor
Fact Microservices
Features
Facts
Log these
Recommendations Recommendations
Feature Logging Fact Logging

#DevSAIS11
Fact Logging - Pull Architecture
Pull
● Daily snapshots of key facts
● Storage
○ S3 & Parquet
● Api to access the data
○ RDD & DataFrames
● Cons
○ Lacks temporal accuracy
○ Load on Microservices
○ Missing Experiment specific facts
Capture
Snapshots
Fact Microservices
Stratified
Member sets
Snapshots

#DevSAIS11
Fact Logging - Push Architecture
● Compute engines
themselves control
what to log
● Stratification
● Temporal accuracy
Compute Services
Fact Store
Fact Transformer
Fact Fetcher
Fact Logger
ML Workflows
Feature
Generation
Model
Training

#DevSAIS11
Fact Logger
Precompute Live Compute
Fact
Logger
● Library
● Facts
○ User Related
○ Video Related
○ Computation Specific
● Serialization
● Stratification Service
● Fact Stream
● Storage
Base Fact Tables
Stratification

#DevSAIS11
Fact Logging - Scalability
Precompute Live Compute
Fact Logger
● 5-10x increase in data through
Kafka
● SLA Impact; Cost Increase
● Compression - 70% decrease

Storage & Access
Fact Store
Fact Transformer
Deduplication
Precompute Live Compute
Fact Logger
● Pipeline load
○ Repeated facts
● Aggressive or not
○ Loss threshold
Conditional push
● Spark Job
○ Fact pointers
○ SLA
#DevSAIS11

#DevSAIS11
API Lookback
Member ID My List View History Thumbs
122312 My List Value View History Pointer Thumbs Value
254637 My List Pointer View History Pointer Thumbs Pointer
Member n My List Pointer View History Value Thumbs Pointer
My List
Partition 1
Values
Partition m
Values
View
History
Partition 1
Values
Partition m
Values
Thumbs
Partition 1
Values
Partition m
Values
log_time - x
log_time - y
log_time - z

Storage & Access
Fact Store
Fact Transformer
Read API
Precompute Live Compute
Fact Logger
● Query performance
○ Slow moving facts
● Point query
○ Connector
● Query time reduction
○ Hours to minutes
Deduplication
Conditional push
Write
Read
#DevSAIS11

Performance: Storage
• Partitioning scheme
– Noisy neighbor
• Storage format
– Exploratory vs production
• Fast & Slow lane
– Lookback limit
#DevSAIS11

Performance: Spark reads
• Bloom Filters
– Reduce scan
• Cache Access
– EVCache, Spectator
• MapPartitions vs UDF
– Eager vs Lazy
– SPARK-11438, SPARK-11469,
SPARK-20586
Application
ML Library
Read API
#DevSAIS11

Future Work
• Structured with schema evolution
– Best of both (POJO & Spark SQL), Iceberg
• Streaming vs Batch
– Multiple lanes, accountability, independent scale
• Duplication
– Storage vs Runtime cost
#DevSAIS11

What's hot

The Past, Present, and Future of Machine Learning APIsBigML, Inc

A Microservices Framework for Real-Time Model Scoring Using Structured Stream...Databricks

“Houston, we have a model...” Introduction to MLOpsRui Quintino

Home robots meet IBM Watson for Voice UI, and AIBill Liu

Sf big analytics: bigheadChester Chen

MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks

Nasscom ml ops webinarSameer Mahajan

Seamless End-to-End Production Machine Learning with Seldon and MLflowDatabricks

Use MLflow to manage and deploy Machine Learning model on Spark Herman Wu

Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen

Introduction to ML.NETGianni Rosa Gallina

Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks

ML-Ops:Philosophy, Best-Practices and ToolsJorge Davila-Chacon

Storage Challenges for Production Machine LearningNisha Talagala

Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...Sri Ambati

platform for Machine LearningSivapriyaS12

SFBigAnalytics- hybrid data management using cdapChester Chen

MLFlow as part of ML CI/CD at AvalaraManoj Mahalingam

Rishubh Agrawal ResumeRish Agrawal

resume-2016springJingshuang Ge

What's hot (20)

The Past, Present, and Future of Machine Learning APIs

A Microservices Framework for Real-Time Model Scoring Using Structured Stream...

“Houston, we have a model...” Introduction to MLOps

Home robots meet IBM Watson for Voice UI, and AI

Sf big analytics: bighead

MLflow: Infrastructure for a Complete Machine Learning Life Cycle

Nasscom ml ops webinar

Seamless End-to-End Production Machine Learning with Seldon and MLflow

Use MLflow to manage and deploy Machine Learning model on Spark

Lambda Architecture 2.0 for Reactive AB Testing

Introduction to ML.NET

Zipline—Airbnb’s Declarative Feature Engineering Framework

ML-Ops:Philosophy, Best-Practices and Tools

Storage Challenges for Production Machine Learning

Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...

platform for Machine Learning

SFBigAnalytics- hybrid data management using cdap

MLFlow as part of ML CI/CD at Avalara

Rishubh Agrawal Resume

resume-2016spring

Similar to Netflix Recommendations - Fact Store

Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...Databricks

Near Real-Time Netflix Recommendations Using Apache Spark Streaming Karthik Murugesan

RealTime Recommendations @Netflix - SparkNitin S

Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA

Structured Streaming in SparkDigital Vidya

Netflix - Realtime Impression Store Nitin S

Revealing ALLSTOCKERMasashi Umezawa

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari

Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion

Growing into a proactive Data PlatformLivePerson

Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPragueLuis Beltran

Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit

Extracting Insights from Data at TwitterPrasad Wagle

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari

Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi

AWS Lambda and Serverless framework: lessons learned while building a serverl...Luciano Mammino

KFServing and FeastAnimesh Singh

Netflix Big Data Paris 2017Jason Flittner

Data Science in the Cloud @StitchFixC4Media

Apache Pinot Meetup Sept02, 2020Mayank Shrivastava

Similar to Netflix Recommendations - Fact Store (20)

Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...

Near Real-Time Netflix Recommendations Using Apache Spark Streaming

RealTime Recommendations @Netflix - Spark

Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...

Structured Streaming in Spark

Netflix - Realtime Impression Store

Revealing ALLSTOCKER

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...

Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...

Growing into a proactive Data Platform

Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

Extracting Insights from Data at Twitter

Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017

Netflix Recommendations Feature Engineering with Time Travel

AWS Lambda and Serverless framework: lessons learned while building a serverl...

KFServing and Feast

Netflix Big Data Paris 2017

Data Science in the Cloud @StitchFix

Apache Pinot Meetup Sept02, 2020

Recently uploaded

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Bluetooth Controlled Car with Arduino.pdfngoud9212

AI as an Interface for Commercial BuildingsMemoori

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Build your next Gen AI Breakthrough - April 2024Neo4j

Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

"ML in Production",Oleksandr BaganFwdays

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Artificial intelligence in the post-deep learning eraDeakin University

APIForce Zurich 5 April Automation LPDGMarianaLemus7

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows

SQL Database Design For Developers at php[tek] 2024

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Bluetooth Controlled Car with Arduino.pdf

AI as an Interface for Commercial Buildings

Streamlining Python Development: A Guide to a Modern Project Setup

Build your next Gen AI Breakthrough - April 2024

Science&tech:THE INFORMATION AGE STS.pdf

Benefits Of Flutter Compared To Other Frameworks

"ML in Production",Oleksandr Bagan

Designing IA for AI - Information Architecture Conference 2024

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Gen AI in Business - Global Trends Report 2024.pdf

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Artificial intelligence in the post-deep learning era

APIForce Zurich 5 April Automation LPDG

Advanced Test Driven-Development @ php[tek] 2024

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Netflix Recommendations - Fact Store

1. Kedar Sadekar, Netflix Nitin Sharma, Netflix Fact Store - Netflix Recommendations #DevSAIS11

2. Agenda ● ● ● ● ● ● #DevSAIS11

3. #DevSAIS11 Recommendations at Netflix ● Personalized Homepage for each member ○ Goal: Quickly help members find content they’d like to watch ○ Risk: Member may lose interest and abandon the service ○ Challenge: Recommendations at Scale

4. #DevSAIS11 Scale @ Netflix ● ● ● ●

5. #DevSAIS11 Experimentation Cycle @ Netflix Design a New Experiment to Test Out Different Ideas Offline Experiment Online System Design Experiment Model Testing Collect Label Data Offline Feature Generation Model Training Model Validation Metrics Online A/B Testing

6. #DevSAIS11 ML Feature Engineering - Architectural View Online Feature Generation Microservices Online Scoring Offline Feature Generation Shared Feature Encoders Model Training Deploy Models Online SystemOffline Experiment Features Facts

7. #DevSAIS11 What is a Fact? ● Fact ○ Input data for feature encoders. Used to construct a feature ○ Example: Viewing history of member, my list of a member ● Historical Version of a fact ○ Rewindable - State of the world at that time ● Temporal ■ Facts are temporal i.e. they change with time ■ Each online scoring service uses the latest value of a fact

8. #DevSAIS11 Online Scoring Predictor Fact Microservices Features Facts Log these Online Scoring Predictor Fact Microservices Features Facts Log these Recommendations Recommendations Feature Logging Fact Logging

9. #DevSAIS11 Fact Logging - Pull Architecture Pull ● Daily snapshots of key facts ● Storage ○ S3 & Parquet ● Api to access the data ○ RDD & DataFrames ● Cons ○ Lacks temporal accuracy ○ Load on Microservices ○ Missing Experiment specific facts Capture Snapshots Fact Microservices Stratified Member sets Snapshots

10. #DevSAIS11 Fact Logging - Push Architecture ● Compute engines themselves control what to log ● Stratification ● Temporal accuracy Compute Services Fact Store Fact Transformer Fact Fetcher Fact Logger ML Workflows Feature Generation Model Training

11. #DevSAIS11 Fact Logger Precompute Live Compute Fact Logger ● Library ● Facts ○ User Related ○ Video Related ○ Computation Specific ● Serialization ● Stratification Service ● Fact Stream ● Storage Base Fact Tables Stratification

12. #DevSAIS11 Fact Logging - Scalability Precompute Live Compute Fact Logger ● 5-10x increase in data through Kafka ● SLA Impact; Cost Increase ● Compression - 70% decrease

13. Storage & Access Fact Store Fact Transformer Deduplication Precompute Live Compute Fact Logger ● Pipeline load ○ Repeated facts ● Aggressive or not ○ Loss threshold Conditional push ● Spark Job ○ Fact pointers ○ SLA #DevSAIS11

14. #DevSAIS11 API Lookback Member ID My List View History Thumbs 122312 My List Value View History Pointer Thumbs Value 254637 My List Pointer View History Pointer Thumbs Pointer Member n My List Pointer View History Value Thumbs Pointer My List Partition 1 Values Partition m Values View History Partition 1 Values Partition m Values Thumbs Partition 1 Values Partition m Values log_time - x log_time - y log_time - z

15. Storage & Access Fact Store Fact Transformer Read API Precompute Live Compute Fact Logger ● Query performance ○ Slow moving facts ● Point query ○ Connector ● Query time reduction ○ Hours to minutes Deduplication Conditional push Write Read #DevSAIS11

16. Performance: Storage • Partitioning scheme – Noisy neighbor • Storage format – Exploratory vs production • Fast & Slow lane – Lookback limit #DevSAIS11

17. Performance: Spark reads • Bloom Filters – Reduce scan • Cache Access – EVCache, Spectator • MapPartitions vs UDF – Eager vs Lazy – SPARK-11438, SPARK-11469, SPARK-20586 Application ML Library Read API #DevSAIS11

18. Future Work • Structured with schema evolution – Best of both (POJO & Spark SQL), Iceberg • Streaming vs Batch – Multiple lanes, accountability, independent scale • Duplication – Storage vs Runtime cost #DevSAIS11

19. #DevSAIS11 Questions?

Netflix Recommendations - Fact Store

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Netflix Recommendations - Fact Store

Similar to Netflix Recommendations - Fact Store (20)

More from Karthik Murugesan

More from Karthik Murugesan (20)

Recently uploaded

Recently uploaded (20)

Netflix Recommendations - Fact Store