Using Big Data to Driving Big Engagement

Using Big Data to Drive Big Engagement
Name: George Chiu
Company: Teradata

Netflix: Using Big Data to Drive Big Engagement
40PB Analytics in AWS
George Chiu, Sr. Industry Consultant
Oct. 2017

3
#1 Streaming
video service
Started 1998
when Reed
Hastings
accrued $40
late fee on
“Apollo 11”
In 2000,
Blockbuster
Video
declined
chance to
purchase
Netflix for $50M
Current
Market Cap:
$56B
Teradata
Customer
since 2007
86M
members in
190 countries
Stream
132M hrs/day
aka
92K hrs/min
aka
10.5 yrs/min
600B events
generated
daily
40PB on
AWS-S3
Read/write
10% daily
350 active
big data
users

4
Agenda
1. What Analytics that Netflix used
for driving more engagement?
2. Insights & Approach
3. Netflix Architecture on AWS with
Teradata DW.a.a.S.

5 © 2017 Teradata
What Analytics that Netflix used
for driving more engagement?

6
© 2016 Teradata
Netflix
• Focus is on making it easy to find things
to watch
• Spend $150m on data & analytics
➢ 20x more than average
➢ 2% of ARPU
• Processing 400bn interactions daily
• Hundreds of analyst continually deriving
new metadata

7 © 2017 Teradata
Differentiate or Disappear
• More content, newer, more exclusive
• Make it easy for customers to find
• Make it easy to watch
• Provide a great service
• Provide relevant, timely and consistent
interactions
• Provide flexible packages
https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf

8
Can we influence customer engagement?
• 1.2% of high value TV package subscribers
down spin each month (+11% on LY)
• Perceived value diminishes when initial discount
ends…12 months & beyond
• Subscribers who down spin are not engaged
with the content and watch 15% less
exclusive/premium TV
• Current marketing limited with no 121 content
Identify at risk customers and
prevent down spin with
personalised
recommendations
© 2017 Teradata

9 © 2017 Teradata
Insights and Approach

10 © 2017 Teradata
Approach
Step 1:
Profile Subscriber
Viewing Against
Genres
Step 2:
Create
Behavioural
Clusters
Step 3:
Which
Subscribers to
target per
cluster?
Step 4:
Build
Recommenda
tions per
subscriber
Step 5:
Apply Business
Rules

11 © 2017 Teradata
Step 1: Profile Subscriber Viewing Against Genres
News Soccer Reality Documentary Horro
r
Music Crime Drama … …
5 10 32 18 1 4 5 … …
News Soccer Reality Documentary Horro
r
Music Crime Drama … …
0.07 0.13 0.43 0.24 0.01 0.05 0.07 … …
Identify the proportion of each subscribers
viewing duration that can be attributed
to each genre.
This subscriber
watches majority
Reality content
(43%), but also likes
Documentaries (24%)
and Soccer (13%).

12 © 2017 Teradata
Soccer, Drama, News
Cluster #: 0
# Subscribers: 61k
Soccer, News, Sports Talk
Cluster #: 8
# Subscribers: 32k
Reality, Documentary, Ents
Cluster #: 17
# Subscribers: 85k
Music
Cluster #: 25
# Subscribers: 25k
Step 2: Create Behavioural Clusters
Crime Dama
Cluster #: 13
# Subscribers: 28k
Documentary
Cluster #: 21
# Subscribers: 56k
Children, Animated, Adventure
Cluster #: 11
# Subscribers: 56k
Reality
Cluster #: 15
# Subscribers: 57k

13 © 2017 Teradata
Step 3: Which Subscribers to Target Per Cluster?
% Channels Viewed Premium
%DurationViewedPremium
Deciding on a threshold:
Threshold
RecallofChurners
By focusing on subscribers who watch less
than 30% Premium content and channels,
allows us to identify 80% of the churning
population (who churn within the next month).
30:30 Rule
Low
Engagement
High
Engagement

14
Programmes
Subscribers
Subscriber 1 Subscriber 2 Subscriber 3
Recommended
to Subscriber 1
Recommended
to Subscriber 2
Step 4: Build Recommendations per Subscriber (Series)
Uses a ‘People Like
Me’ Collaborative
Filtering approach to
identify similar
programmes based
on subscribers who
watch programmes
together.
© 2017 Teradata

15
Programmes
Subscribers
Subscriber 1 Subscriber 2 Subscriber 3
Step 4: Build Recommendations per Subscriber (Movies)
Similarity of movies watched in the same cluster is
computed using a Pearson Correlation metric
based on the IMDB features of the movies (Genre,
Director, Cast, Rating etc).
© 2017 Teradata

16 © 2016 Teradata
Step 5: Apply Business Rules
All Recommendations
Eliminate previously watched
content & content no longer
available live or on demand
Apply business profitability
rules.

17 © 2017 Teradata
QlikView: Behavioural Cluster Dashboard
A dashboard can
be created to
convey the
outputs of
advanced
analytics.

18 © 2017 Teradata
Next Steps
We think you’ll like this, Ruth
• How effective are personalised
recommendations in engaging
customers with premium and
package exclusive content?
o Personalised banner in
weekly email
o Measurement of downspin
Test versus Control

Netflix AWS Architecture with Teradata DW.a.a.S

20
AmazonS3
NETFLIX Architecture
Users
Cassandra
LogCollection&ODS
Keystore
(Kafka)
Pig
Hive
EMR
ETL
$$$
Redshift
Redshift
Redshift
Future
Analytic
Engines
DWaaS1,100,000 QPD
(50,000 analytic)
300TB Disk
3,500 QPD
40PB Disk

22
100% Open Source SQL Query Engine
for the Modern Data Ecosystem

23
Presto workerPresto worker Presto worker Presto worker
Presto Coordinator
What is Presto?
Client
SELECT u.UserID,
count(s.*) as ClickCnt
FROM MySQL.MDM.Users as u
JOIN Hive.Web.Clicks as s
on u.SessID = s.SessID
Group by u.UserID
Order by ClickCnt desc;

24
Also, NOT Hadoop
• Not an Apache Project
• Daemon based, not MapReduce
• Typically stand-alone cluster
• Hadoop large source of data
LOOKS like a Database
• ANSI SQL compliant
• Advanced SQL features
• In-Memory operations
• ODBC / JDBC drivers
NOT a Database
• No persistent store
• Sources data at runtime
• Doesn’t run at “relational
speed”
What is Presto?
X X

25
Why Presto@Netflix?
Selection Criteria
• Petabyte Scale
• Open Source
• ANSI Compliant
• Hadoop-Friendly
• Running Facebook
• Well Designed Java
• 1 Month to Write S3 API
• Performance

26
Presto Use Cases @ Netflix
If you need to… Then
try…
However, if… Then
use…
Run reports via Tableau or
MSTR, or analytics on
aggregate data
Teradata Data needed at a lower grain, or
for longer historical period
Presto
Adhoc Interactive
exploration on detail data
Presto Joining 2 big tables, or otherwise
doesn’t fit into memory
Hive
Long running queries joining
big tables
Hive
Sub-Second analysis on pre-
generated cube structures
Druid Question falls outside cube
definition
Teradata
/ Presto
Run Batch ETL in legacy
framework
Pig Building new ETL in future
framework
Spark
Build new ETL from scratch Spark Data size too big Pig
Validate ETL accuracy Presto Joining 2 big tables, or otherwise
doesn’t fit into memory
Hive
EMR

27
Presto
• Detailed Exploration
– Network behavior prior to event
– User segment clustering
– Historical viewing trends
– Historic user behavior
– Program correlation analysis
– Recommendation validation
– Predictive production decisions
– Etc.
Teradata
• Enterprise reporting
Microstrategy
– Subscriptions by country
– Average Minutes per Sitting
– Errors per 1M streams
– Monthly profitability by device
• BI tool exploration & analytics
Tableau
– Reasons for quitting mid-stream
– Seasonal viewing trends by genre
– Marketing responsiveness
Analytics at Netflix

28
Netflix User Experience
Very positive!
• ~3500 Queries per Day
• 90% of queries complete
under 1 minute
• 60% of queries complete
under 5 seconds
• Integrated into Big Data Portal
• Easy cluster scaling up/down
Adoption was rapid and overwhelmingly positive

29
Netflix Data Pipeline
Compute
EMR
S
M
Operational
15 minutes
Daily
Cloud
Apps
Cassandra
Kakfa
Storage
AmazonS3

30
Netflix Data Pipeline
Compute
EMR
Service
MetaCat
Tools
Forklift
Sting
Charlotte
Data Movement
Data Visualization
Data Lineage
Data Quality
Pig Workflow
Visualization
Job Cluster Perf.
Visualization
Quinto
Lipstick
API
API
API
API
API
API
API
Big Data
Portal
Big Data Portal TeradataV
SELECT *
FROM MyTable;
Submit
✓
✓
✓
✓
✓
✓
ServicesTeradata
Presto
EMR Hive
Spark
Druid
=

31
https://www.linkedin.com/in/george-chiu/
THANK YOU

Using Big Data to Driving Big Engagement

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Using Big Data to Driving Big Engagement

Similar to Using Big Data to Driving Big Engagement (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Using Big Data to Driving Big Engagement