Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Big Data to Driving Big Engagement

923 views

Published on

講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.

  • Be the first to comment

Using Big Data to Driving Big Engagement

  1. 1. Using Big Data to Drive Big Engagement Name: George Chiu Company: Teradata
  2. 2. Netflix: Using Big Data to Drive Big Engagement 40PB Analytics in AWS George Chiu, Sr. Industry Consultant Oct. 2017
  3. 3. 3 #1 Streaming video service Started 1998 when Reed Hastings accrued $40 late fee on “Apollo 11” In 2000, Blockbuster Video declined chance to purchase Netflix for $50M Current Market Cap: $56B Teradata Customer since 2007 86M members in 190 countries Stream 132M hrs/day aka 92K hrs/min aka 10.5 yrs/min 600B events generated daily 40PB on AWS-S3 Read/write 10% daily 350 active big data users
  4. 4. 4 Agenda 1. What Analytics that Netflix used for driving more engagement? 2. Insights & Approach 3. Netflix Architecture on AWS with Teradata DW.a.a.S.
  5. 5. 5 © 2017 Teradata What Analytics that Netflix used for driving more engagement?
  6. 6. 6 © 2016 Teradata Netflix • Focus is on making it easy to find things to watch • Spend $150m on data & analytics ➢ 20x more than average ➢ 2% of ARPU • Processing 400bn interactions daily • Hundreds of analyst continually deriving new metadata
  7. 7. 7 © 2017 Teradata Differentiate or Disappear • More content, newer, more exclusive • Make it easy for customers to find • Make it easy to watch • Provide a great service • Provide relevant, timely and consistent interactions • Provide flexible packages https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf
  8. 8. 8 Can we influence customer engagement? • 1.2% of high value TV package subscribers down spin each month (+11% on LY) • Perceived value diminishes when initial discount ends…12 months & beyond • Subscribers who down spin are not engaged with the content and watch 15% less exclusive/premium TV • Current marketing limited with no 121 content Identify at risk customers and prevent down spin with personalised recommendations © 2017 Teradata
  9. 9. 9 © 2017 Teradata Insights and Approach
  10. 10. 10 © 2017 Teradata Approach Step 1: Profile Subscriber Viewing Against Genres Step 2: Create Behavioural Clusters Step 3: Which Subscribers to target per cluster? Step 4: Build Recommenda tions per subscriber Step 5: Apply Business Rules
  11. 11. 11 © 2017 Teradata Step 1: Profile Subscriber Viewing Against Genres News Soccer Reality Documentary Horro r Music Crime Drama … … 5 10 32 18 1 4 5 … … News Soccer Reality Documentary Horro r Music Crime Drama … … 0.07 0.13 0.43 0.24 0.01 0.05 0.07 … … Identify the proportion of each subscribers viewing duration that can be attributed to each genre. This subscriber watches majority Reality content (43%), but also likes Documentaries (24%) and Soccer (13%).
  12. 12. 12 © 2017 Teradata Soccer, Drama, News Cluster #: 0 # Subscribers: 61k Soccer, News, Sports Talk Cluster #: 8 # Subscribers: 32k Reality, Documentary, Ents Cluster #: 17 # Subscribers: 85k Music Cluster #: 25 # Subscribers: 25k Step 2: Create Behavioural Clusters Crime Dama Cluster #: 13 # Subscribers: 28k Documentary Cluster #: 21 # Subscribers: 56k Children, Animated, Adventure Cluster #: 11 # Subscribers: 56k Reality Cluster #: 15 # Subscribers: 57k
  13. 13. 13 © 2017 Teradata Step 3: Which Subscribers to Target Per Cluster? % Channels Viewed Premium %DurationViewedPremium Deciding on a threshold: Threshold RecallofChurners By focusing on subscribers who watch less than 30% Premium content and channels, allows us to identify 80% of the churning population (who churn within the next month). 30:30 Rule Low Engagement High Engagement
  14. 14. 14 Programmes Subscribers Subscriber 1 Subscriber 2 Subscriber 3 Recommended to Subscriber 1 Recommended to Subscriber 2 Step 4: Build Recommendations per Subscriber (Series) Uses a ‘People Like Me’ Collaborative Filtering approach to identify similar programmes based on subscribers who watch programmes together. © 2017 Teradata
  15. 15. 15 Programmes Subscribers Subscriber 1 Subscriber 2 Subscriber 3 Step 4: Build Recommendations per Subscriber (Movies) Similarity of movies watched in the same cluster is computed using a Pearson Correlation metric based on the IMDB features of the movies (Genre, Director, Cast, Rating etc). © 2017 Teradata
  16. 16. 16 © 2016 Teradata Step 5: Apply Business Rules All Recommendations Eliminate previously watched content & content no longer available live or on demand Apply business profitability rules.
  17. 17. 17 © 2017 Teradata QlikView: Behavioural Cluster Dashboard A dashboard can be created to convey the outputs of advanced analytics.
  18. 18. 18 © 2017 Teradata Next Steps We think you’ll like this, Ruth • How effective are personalised recommendations in engaging customers with premium and package exclusive content? o Personalised banner in weekly email o Measurement of downspin Test versus Control
  19. 19. Netflix AWS Architecture with Teradata DW.a.a.S
  20. 20. 20 AmazonS3 NETFLIX Architecture Users Cassandra LogCollection&ODS Keystore (Kafka) Pig Hive EMR ETL $$$ Redshift Redshift Redshift Future Analytic Engines DWaaS1,100,000 QPD (50,000 analytic) 300TB Disk 3,500 QPD 40PB Disk
  21. 21. 21
  22. 22. 22 100% Open Source SQL Query Engine for the Modern Data Ecosystem
  23. 23. 23 Presto workerPresto worker Presto worker Presto worker Presto Coordinator What is Presto? Client SELECT u.UserID, count(s.*) as ClickCnt FROM MySQL.MDM.Users as u JOIN Hive.Web.Clicks as s on u.SessID = s.SessID Group by u.UserID Order by ClickCnt desc;
  24. 24. 24 Also, NOT Hadoop • Not an Apache Project • Daemon based, not MapReduce • Typically stand-alone cluster • Hadoop large source of data LOOKS like a Database • ANSI SQL compliant • Advanced SQL features • In-Memory operations • ODBC / JDBC drivers NOT a Database • No persistent store • Sources data at runtime • Doesn’t run at “relational speed” What is Presto? X X
  25. 25. 25 Why Presto@Netflix? Selection Criteria • Petabyte Scale • Open Source • ANSI Compliant • Hadoop-Friendly • Running Facebook • Well Designed Java • 1 Month to Write S3 API • Performance
  26. 26. 26 Presto Use Cases @ Netflix If you need to… Then try… However, if… Then use… Run reports via Tableau or MSTR, or analytics on aggregate data Teradata Data needed at a lower grain, or for longer historical period Presto Adhoc Interactive exploration on detail data Presto Joining 2 big tables, or otherwise doesn’t fit into memory Hive Long running queries joining big tables Hive Sub-Second analysis on pre- generated cube structures Druid Question falls outside cube definition Teradata / Presto Run Batch ETL in legacy framework Pig Building new ETL in future framework Spark Build new ETL from scratch Spark Data size too big Pig Validate ETL accuracy Presto Joining 2 big tables, or otherwise doesn’t fit into memory Hive EMR
  27. 27. 27 Presto • Detailed Exploration – Network behavior prior to event – User segment clustering – Historical viewing trends – Historic user behavior – Program correlation analysis – Recommendation validation – Predictive production decisions – Etc. Teradata • Enterprise reporting Microstrategy – Subscriptions by country – Average Minutes per Sitting – Errors per 1M streams – Monthly profitability by device • BI tool exploration & analytics Tableau – Reasons for quitting mid-stream – Seasonal viewing trends by genre – Marketing responsiveness Analytics at Netflix
  28. 28. 28 Netflix User Experience Very positive! • ~3500 Queries per Day • 90% of queries complete under 1 minute • 60% of queries complete under 5 seconds • Integrated into Big Data Portal • Easy cluster scaling up/down Adoption was rapid and overwhelmingly positive
  29. 29. 29 Netflix Data Pipeline Compute EMR S M Operational 15 minutes Daily Cloud Apps Cassandra Kakfa Storage AmazonS3
  30. 30. 30 Netflix Data Pipeline Compute EMR Service MetaCat Tools Forklift Sting Charlotte Data Movement Data Visualization Data Lineage Data Quality Pig Workflow Visualization Job Cluster Perf. Visualization Quinto Lipstick API API API API API API API Big Data Portal Big Data Portal TeradataV SELECT * FROM MyTable; Submit ✓ ✓ ✓ ✓ ✓ ✓ ServicesTeradata Presto EMR Hive Spark Druid =
  31. 31. 31 https://www.linkedin.com/in/george-chiu/ THANK YOU

×