講師: George Chiu 邱志威, Sr. Industry Consultant, Teradata
Learn how Netflix engages customers by leveraging Teradata as a critical component of its data and analytics platform to create a data-driven, customer-focused business.
1. Using Big Data to Drive Big Engagement
Name: George Chiu
Company: Teradata
2. Netflix: Using Big Data to Drive Big Engagement
40PB Analytics in AWS
George Chiu, Sr. Industry Consultant
Oct. 2017
3. 3
#1 Streaming
video service
Started 1998
when Reed
Hastings
accrued $40
late fee on
“Apollo 11”
In 2000,
Blockbuster
Video
declined
chance to
purchase
Netflix for $50M
Current
Market Cap:
$56B
Teradata
Customer
since 2007
86M
members in
190 countries
Stream
132M hrs/day
aka
92K hrs/min
aka
10.5 yrs/min
600B events
generated
daily
40PB on
AWS-S3
Read/write
10% daily
350 active
big data
users
4. 4
Agenda
1. What Analytics that Netflix used
for driving more engagement?
2. Insights & Approach
3. Netflix Architecture on AWS with
Teradata DW.a.a.S.
23. 23
Presto workerPresto worker Presto worker Presto worker
Presto Coordinator
What is Presto?
Client
SELECT u.UserID,
count(s.*) as ClickCnt
FROM MySQL.MDM.Users as u
JOIN Hive.Web.Clicks as s
on u.SessID = s.SessID
Group by u.UserID
Order by ClickCnt desc;
24. 24
Also, NOT Hadoop
• Not an Apache Project
• Daemon based, not MapReduce
• Typically stand-alone cluster
• Hadoop large source of data
LOOKS like a Database
• ANSI SQL compliant
• Advanced SQL features
• In-Memory operations
• ODBC / JDBC drivers
NOT a Database
• No persistent store
• Sources data at runtime
• Doesn’t run at “relational
speed”
What is Presto?
X X
25. 25
Why Presto@Netflix?
Selection Criteria
• Petabyte Scale
• Open Source
• ANSI Compliant
• Hadoop-Friendly
• Running Facebook
• Well Designed Java
• 1 Month to Write S3 API
• Performance
26. 26
Presto Use Cases @ Netflix
If you need to… Then
try…
However, if… Then
use…
Run reports via Tableau or
MSTR, or analytics on
aggregate data
Teradata Data needed at a lower grain, or
for longer historical period
Presto
Adhoc Interactive
exploration on detail data
Presto Joining 2 big tables, or otherwise
doesn’t fit into memory
Hive
Long running queries joining
big tables
Hive
Sub-Second analysis on pre-
generated cube structures
Druid Question falls outside cube
definition
Teradata
/ Presto
Run Batch ETL in legacy
framework
Pig Building new ETL in future
framework
Spark
Build new ETL from scratch Spark Data size too big Pig
Validate ETL accuracy Presto Joining 2 big tables, or otherwise
doesn’t fit into memory
Hive
EMR
27. 27
Presto
• Detailed Exploration
– Network behavior prior to event
– User segment clustering
– Historical viewing trends
– Historic user behavior
– Program correlation analysis
– Recommendation validation
– Predictive production decisions
– Etc.
Teradata
• Enterprise reporting
Microstrategy
– Subscriptions by country
– Average Minutes per Sitting
– Errors per 1M streams
– Monthly profitability by device
• BI tool exploration & analytics
Tableau
– Reasons for quitting mid-stream
– Seasonal viewing trends by genre
– Marketing responsiveness
Analytics at Netflix
28. 28
Netflix User Experience
Very positive!
• ~3500 Queries per Day
• 90% of queries complete
under 1 minute
• 60% of queries complete
under 5 seconds
• Integrated into Big Data Portal
• Easy cluster scaling up/down
Adoption was rapid and overwhelmingly positive
30. 30
Netflix Data Pipeline
Compute
EMR
Service
MetaCat
Tools
Forklift
Sting
Charlotte
Data Movement
Data Visualization
Data Lineage
Data Quality
Pig Workflow
Visualization
Job Cluster Perf.
Visualization
Quinto
Lipstick
API
API
API
API
API
API
API
Big Data
Portal
Big Data Portal TeradataV
SELECT *
FROM MyTable;
Submit
✓
✓
✓
✓
✓
✓
ServicesTeradata
Presto
EMR Hive
Spark
Druid
=