SlideShare a Scribd company logo
1 of 31
Download to read offline
Using Big Data to Drive Big Engagement
Name: George Chiu
Company: Teradata
Netflix: Using Big Data to Drive Big Engagement
40PB Analytics in AWS
George Chiu, Sr. Industry Consultant
Oct. 2017
3
#1 Streaming
video service
Started 1998
when Reed
Hastings
accrued $40
late fee on
“Apollo 11”
In 2000,
Blockbuster
Video
declined
chance to
purchase
Netflix for $50M
Current
Market Cap:
$56B
Teradata
Customer
since 2007
86M
members in
190 countries
Stream
132M hrs/day
aka
92K hrs/min
aka
10.5 yrs/min
600B events
generated
daily
40PB on
AWS-S3
Read/write
10% daily
350 active
big data
users
4
Agenda
1. What Analytics that Netflix used
for driving more engagement?
2. Insights & Approach
3. Netflix Architecture on AWS with
Teradata DW.a.a.S.
5 © 2017 Teradata
What Analytics that Netflix used
for driving more engagement?
6
© 2016 Teradata
Netflix
• Focus is on making it easy to find things
to watch
• Spend $150m on data & analytics
➢ 20x more than average
➢ 2% of ARPU
• Processing 400bn interactions daily
• Hundreds of analyst continually deriving
new metadata
7 © 2017 Teradata
Differentiate or Disappear
• More content, newer, more exclusive
• Make it easy for customers to find
• Make it easy to watch
• Provide a great service
• Provide relevant, timely and consistent
interactions
• Provide flexible packages
https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf
8
Can we influence customer engagement?
• 1.2% of high value TV package subscribers
down spin each month (+11% on LY)
• Perceived value diminishes when initial discount
ends…12 months & beyond
• Subscribers who down spin are not engaged
with the content and watch 15% less
exclusive/premium TV
• Current marketing limited with no 121 content
Identify at risk customers and
prevent down spin with
personalised
recommendations
© 2017 Teradata
9 © 2017 Teradata
Insights and Approach
10 © 2017 Teradata
Approach
Step 1:
Profile Subscriber
Viewing Against
Genres
Step 2:
Create
Behavioural
Clusters
Step 3:
Which
Subscribers to
target per
cluster?
Step 4:
Build
Recommenda
tions per
subscriber
Step 5:
Apply Business
Rules
11 © 2017 Teradata
Step 1: Profile Subscriber Viewing Against Genres
News Soccer Reality Documentary Horro
r
Music Crime Drama … …
5 10 32 18 1 4 5 … …
News Soccer Reality Documentary Horro
r
Music Crime Drama … …
0.07 0.13 0.43 0.24 0.01 0.05 0.07 … …
Identify the proportion of each subscribers
viewing duration that can be attributed
to each genre.
This subscriber
watches majority
Reality content
(43%), but also likes
Documentaries (24%)
and Soccer (13%).
12 © 2017 Teradata
Soccer, Drama, News
Cluster #: 0
# Subscribers: 61k
Soccer, News, Sports Talk
Cluster #: 8
# Subscribers: 32k
Reality, Documentary, Ents
Cluster #: 17
# Subscribers: 85k
Music
Cluster #: 25
# Subscribers: 25k
Step 2: Create Behavioural Clusters
Crime Dama
Cluster #: 13
# Subscribers: 28k
Documentary
Cluster #: 21
# Subscribers: 56k
Children, Animated, Adventure
Cluster #: 11
# Subscribers: 56k
Reality
Cluster #: 15
# Subscribers: 57k
13 © 2017 Teradata
Step 3: Which Subscribers to Target Per Cluster?
% Channels Viewed Premium
%DurationViewedPremium
Deciding on a threshold:
Threshold
RecallofChurners
By focusing on subscribers who watch less
than 30% Premium content and channels,
allows us to identify 80% of the churning
population (who churn within the next month).
30:30 Rule
Low
Engagement
High
Engagement
14
Programmes
Subscribers
Subscriber 1 Subscriber 2 Subscriber 3
Recommended
to Subscriber 1
Recommended
to Subscriber 2
Step 4: Build Recommendations per Subscriber (Series)
Uses a ‘People Like
Me’ Collaborative
Filtering approach to
identify similar
programmes based
on subscribers who
watch programmes
together.
© 2017 Teradata
15
Programmes
Subscribers
Subscriber 1 Subscriber 2 Subscriber 3
Step 4: Build Recommendations per Subscriber (Movies)
Similarity of movies watched in the same cluster is
computed using a Pearson Correlation metric
based on the IMDB features of the movies (Genre,
Director, Cast, Rating etc).
© 2017 Teradata
16 © 2016 Teradata
Step 5: Apply Business Rules
All Recommendations
Eliminate previously watched
content & content no longer
available live or on demand
Apply business profitability
rules.
17 © 2017 Teradata
QlikView: Behavioural Cluster Dashboard
A dashboard can
be created to
convey the
outputs of
advanced
analytics.
18 © 2017 Teradata
Next Steps
We think you’ll like this, Ruth
• How effective are personalised
recommendations in engaging
customers with premium and
package exclusive content?
o Personalised banner in
weekly email
o Measurement of downspin
Test versus Control
Netflix AWS Architecture with Teradata DW.a.a.S
20
AmazonS3
NETFLIX Architecture
Users
Cassandra
LogCollection&ODS
Keystore
(Kafka)
Pig
Hive
EMR
ETL
$$$
Redshift
Redshift
Redshift
Future
Analytic
Engines
DWaaS1,100,000 QPD
(50,000 analytic)
300TB Disk
3,500 QPD
40PB Disk
21
22
100% Open Source SQL Query Engine
for the Modern Data Ecosystem
23
Presto workerPresto worker Presto worker Presto worker
Presto Coordinator
What is Presto?
Client
SELECT u.UserID,
count(s.*) as ClickCnt
FROM MySQL.MDM.Users as u
JOIN Hive.Web.Clicks as s
on u.SessID = s.SessID
Group by u.UserID
Order by ClickCnt desc;
24
Also, NOT Hadoop
• Not an Apache Project
• Daemon based, not MapReduce
• Typically stand-alone cluster
• Hadoop large source of data
LOOKS like a Database
• ANSI SQL compliant
• Advanced SQL features
• In-Memory operations
• ODBC / JDBC drivers
NOT a Database
• No persistent store
• Sources data at runtime
• Doesn’t run at “relational
speed”
What is Presto?
X X
25
Why Presto@Netflix?
Selection Criteria
• Petabyte Scale
• Open Source
• ANSI Compliant
• Hadoop-Friendly
• Running Facebook
• Well Designed Java
• 1 Month to Write S3 API
• Performance
26
Presto Use Cases @ Netflix
If you need to… Then
try…
However, if… Then
use…
Run reports via Tableau or
MSTR, or analytics on
aggregate data
Teradata Data needed at a lower grain, or
for longer historical period
Presto
Adhoc Interactive
exploration on detail data
Presto Joining 2 big tables, or otherwise
doesn’t fit into memory
Hive
Long running queries joining
big tables
Hive
Sub-Second analysis on pre-
generated cube structures
Druid Question falls outside cube
definition
Teradata
/ Presto
Run Batch ETL in legacy
framework
Pig Building new ETL in future
framework
Spark
Build new ETL from scratch Spark Data size too big Pig
Validate ETL accuracy Presto Joining 2 big tables, or otherwise
doesn’t fit into memory
Hive
EMR
27
Presto
• Detailed Exploration
– Network behavior prior to event
– User segment clustering
– Historical viewing trends
– Historic user behavior
– Program correlation analysis
– Recommendation validation
– Predictive production decisions
– Etc.
Teradata
• Enterprise reporting
Microstrategy
– Subscriptions by country
– Average Minutes per Sitting
– Errors per 1M streams
– Monthly profitability by device
• BI tool exploration & analytics
Tableau
– Reasons for quitting mid-stream
– Seasonal viewing trends by genre
– Marketing responsiveness
Analytics at Netflix
28
Netflix User Experience
Very positive!
• ~3500 Queries per Day
• 90% of queries complete
under 1 minute
• 60% of queries complete
under 5 seconds
• Integrated into Big Data Portal
• Easy cluster scaling up/down
Adoption was rapid and overwhelmingly positive
29
Netflix Data Pipeline
Compute
EMR
S
M
Operational
15 minutes
Daily
Cloud
Apps
Cassandra
Kakfa
Storage
AmazonS3
30
Netflix Data Pipeline
Compute
EMR
Service
MetaCat
Tools
Forklift
Sting
Charlotte
Data Movement
Data Visualization
Data Lineage
Data Quality
Pig Workflow
Visualization
Job Cluster Perf.
Visualization
Quinto
Lipstick
API
API
API
API
API
API
API
Big Data
Portal
Big Data Portal TeradataV
SELECT *
FROM MyTable;
Submit
✓
✓
✓
✓
✓
✓
ServicesTeradata
Presto
EMR Hive
Spark
Druid
=
31
https://www.linkedin.com/in/george-chiu/
THANK YOU

More Related Content

What's hot

Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360Capgemini
 
BI & Big data use case for banking - by rully feranata
BI & Big data use case for banking - by rully feranataBI & Big data use case for banking - by rully feranata
BI & Big data use case for banking - by rully feranataRully Feranata
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
Uberization of field service management
Uberization of field service managementUberization of field service management
Uberization of field service managementDista
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

What's hot (20)

Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360The Connected Consumer – Real-time Customer 360
The Connected Consumer – Real-time Customer 360
 
BI & Big data use case for banking - by rully feranata
BI & Big data use case for banking - by rully feranataBI & Big data use case for banking - by rully feranata
BI & Big data use case for banking - by rully feranata
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Data mesh
Data meshData mesh
Data mesh
 
Big data
Big dataBig data
Big data
 
Uberization of field service management
Uberization of field service managementUberization of field service management
Uberization of field service management
 
Data analytics
Data analyticsData analytics
Data analytics
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Viewers also liked

Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonAmazon Web Services
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAmazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務
運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務
運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務Amazon Web Services
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAmazon Web Services
 
運用大數據掌握您的客戶
運用大數據掌握您的客戶運用大數據掌握您的客戶
運用大數據掌握您的客戶Amazon Web Services
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?Amazon Web Services
 

Viewers also liked (10)

Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and Amazon
 
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNetAWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
AWS 機器學習 II ─ 深度學習 Deep Learning & MXNet
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
智能零售解決方案
智能零售解決方案智能零售解決方案
智能零售解決方案
 
運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務
運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務
運用 Amazon 提供 Robo-Advisors 與 FinteXchange 交易市集上的AaaS、DaaS、PaaS 服務
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
運用大數據掌握您的客戶
運用大數據掌握您的客戶運用大數據掌握您的客戶
運用大數據掌握您的客戶
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?
 
AWS AI Solutions
AWS AI SolutionsAWS AI Solutions
AWS AI Solutions
 

Similar to Using Big Data to Driving Big Engagement

Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Nicola Sandoli
 
Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRedis Labs
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...Amazon Web Services
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services
 
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Jedha Bootcamp
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Jupyter in the modern enterprise data and analytics ecosystem
Jupyter in the modern enterprise data and analytics ecosystem Jupyter in the modern enterprise data and analytics ecosystem
Jupyter in the modern enterprise data and analytics ecosystem Gerald Rousselle
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Amazon Web Services
 
Don't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsDon't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsRed Gate Software
 

Similar to Using Big Data to Driving Big Engagement (20)

Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your Business
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
 
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Jupyter in the modern enterprise data and analytics ecosystem
Jupyter in the modern enterprise data and analytics ecosystem Jupyter in the modern enterprise data and analytics ecosystem
Jupyter in the modern enterprise data and analytics ecosystem
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons LearnedSelf-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
 
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
 
Don't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsDon't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOps
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Using Big Data to Driving Big Engagement

  • 1. Using Big Data to Drive Big Engagement Name: George Chiu Company: Teradata
  • 2. Netflix: Using Big Data to Drive Big Engagement 40PB Analytics in AWS George Chiu, Sr. Industry Consultant Oct. 2017
  • 3. 3 #1 Streaming video service Started 1998 when Reed Hastings accrued $40 late fee on “Apollo 11” In 2000, Blockbuster Video declined chance to purchase Netflix for $50M Current Market Cap: $56B Teradata Customer since 2007 86M members in 190 countries Stream 132M hrs/day aka 92K hrs/min aka 10.5 yrs/min 600B events generated daily 40PB on AWS-S3 Read/write 10% daily 350 active big data users
  • 4. 4 Agenda 1. What Analytics that Netflix used for driving more engagement? 2. Insights & Approach 3. Netflix Architecture on AWS with Teradata DW.a.a.S.
  • 5. 5 © 2017 Teradata What Analytics that Netflix used for driving more engagement?
  • 6. 6 © 2016 Teradata Netflix • Focus is on making it easy to find things to watch • Spend $150m on data & analytics ➢ 20x more than average ➢ 2% of ARPU • Processing 400bn interactions daily • Hundreds of analyst continually deriving new metadata
  • 7. 7 © 2017 Teradata Differentiate or Disappear • More content, newer, more exclusive • Make it easy for customers to find • Make it easy to watch • Provide a great service • Provide relevant, timely and consistent interactions • Provide flexible packages https://business.tivo.com/content/dam/tivo/resources/whitepapers/Q3_2016_Video_Trends_Report.pdf
  • 8. 8 Can we influence customer engagement? • 1.2% of high value TV package subscribers down spin each month (+11% on LY) • Perceived value diminishes when initial discount ends…12 months & beyond • Subscribers who down spin are not engaged with the content and watch 15% less exclusive/premium TV • Current marketing limited with no 121 content Identify at risk customers and prevent down spin with personalised recommendations © 2017 Teradata
  • 9. 9 © 2017 Teradata Insights and Approach
  • 10. 10 © 2017 Teradata Approach Step 1: Profile Subscriber Viewing Against Genres Step 2: Create Behavioural Clusters Step 3: Which Subscribers to target per cluster? Step 4: Build Recommenda tions per subscriber Step 5: Apply Business Rules
  • 11. 11 © 2017 Teradata Step 1: Profile Subscriber Viewing Against Genres News Soccer Reality Documentary Horro r Music Crime Drama … … 5 10 32 18 1 4 5 … … News Soccer Reality Documentary Horro r Music Crime Drama … … 0.07 0.13 0.43 0.24 0.01 0.05 0.07 … … Identify the proportion of each subscribers viewing duration that can be attributed to each genre. This subscriber watches majority Reality content (43%), but also likes Documentaries (24%) and Soccer (13%).
  • 12. 12 © 2017 Teradata Soccer, Drama, News Cluster #: 0 # Subscribers: 61k Soccer, News, Sports Talk Cluster #: 8 # Subscribers: 32k Reality, Documentary, Ents Cluster #: 17 # Subscribers: 85k Music Cluster #: 25 # Subscribers: 25k Step 2: Create Behavioural Clusters Crime Dama Cluster #: 13 # Subscribers: 28k Documentary Cluster #: 21 # Subscribers: 56k Children, Animated, Adventure Cluster #: 11 # Subscribers: 56k Reality Cluster #: 15 # Subscribers: 57k
  • 13. 13 © 2017 Teradata Step 3: Which Subscribers to Target Per Cluster? % Channels Viewed Premium %DurationViewedPremium Deciding on a threshold: Threshold RecallofChurners By focusing on subscribers who watch less than 30% Premium content and channels, allows us to identify 80% of the churning population (who churn within the next month). 30:30 Rule Low Engagement High Engagement
  • 14. 14 Programmes Subscribers Subscriber 1 Subscriber 2 Subscriber 3 Recommended to Subscriber 1 Recommended to Subscriber 2 Step 4: Build Recommendations per Subscriber (Series) Uses a ‘People Like Me’ Collaborative Filtering approach to identify similar programmes based on subscribers who watch programmes together. © 2017 Teradata
  • 15. 15 Programmes Subscribers Subscriber 1 Subscriber 2 Subscriber 3 Step 4: Build Recommendations per Subscriber (Movies) Similarity of movies watched in the same cluster is computed using a Pearson Correlation metric based on the IMDB features of the movies (Genre, Director, Cast, Rating etc). © 2017 Teradata
  • 16. 16 © 2016 Teradata Step 5: Apply Business Rules All Recommendations Eliminate previously watched content & content no longer available live or on demand Apply business profitability rules.
  • 17. 17 © 2017 Teradata QlikView: Behavioural Cluster Dashboard A dashboard can be created to convey the outputs of advanced analytics.
  • 18. 18 © 2017 Teradata Next Steps We think you’ll like this, Ruth • How effective are personalised recommendations in engaging customers with premium and package exclusive content? o Personalised banner in weekly email o Measurement of downspin Test versus Control
  • 19. Netflix AWS Architecture with Teradata DW.a.a.S
  • 21. 21
  • 22. 22 100% Open Source SQL Query Engine for the Modern Data Ecosystem
  • 23. 23 Presto workerPresto worker Presto worker Presto worker Presto Coordinator What is Presto? Client SELECT u.UserID, count(s.*) as ClickCnt FROM MySQL.MDM.Users as u JOIN Hive.Web.Clicks as s on u.SessID = s.SessID Group by u.UserID Order by ClickCnt desc;
  • 24. 24 Also, NOT Hadoop • Not an Apache Project • Daemon based, not MapReduce • Typically stand-alone cluster • Hadoop large source of data LOOKS like a Database • ANSI SQL compliant • Advanced SQL features • In-Memory operations • ODBC / JDBC drivers NOT a Database • No persistent store • Sources data at runtime • Doesn’t run at “relational speed” What is Presto? X X
  • 25. 25 Why Presto@Netflix? Selection Criteria • Petabyte Scale • Open Source • ANSI Compliant • Hadoop-Friendly • Running Facebook • Well Designed Java • 1 Month to Write S3 API • Performance
  • 26. 26 Presto Use Cases @ Netflix If you need to… Then try… However, if… Then use… Run reports via Tableau or MSTR, or analytics on aggregate data Teradata Data needed at a lower grain, or for longer historical period Presto Adhoc Interactive exploration on detail data Presto Joining 2 big tables, or otherwise doesn’t fit into memory Hive Long running queries joining big tables Hive Sub-Second analysis on pre- generated cube structures Druid Question falls outside cube definition Teradata / Presto Run Batch ETL in legacy framework Pig Building new ETL in future framework Spark Build new ETL from scratch Spark Data size too big Pig Validate ETL accuracy Presto Joining 2 big tables, or otherwise doesn’t fit into memory Hive EMR
  • 27. 27 Presto • Detailed Exploration – Network behavior prior to event – User segment clustering – Historical viewing trends – Historic user behavior – Program correlation analysis – Recommendation validation – Predictive production decisions – Etc. Teradata • Enterprise reporting Microstrategy – Subscriptions by country – Average Minutes per Sitting – Errors per 1M streams – Monthly profitability by device • BI tool exploration & analytics Tableau – Reasons for quitting mid-stream – Seasonal viewing trends by genre – Marketing responsiveness Analytics at Netflix
  • 28. 28 Netflix User Experience Very positive! • ~3500 Queries per Day • 90% of queries complete under 1 minute • 60% of queries complete under 5 seconds • Integrated into Big Data Portal • Easy cluster scaling up/down Adoption was rapid and overwhelmingly positive
  • 29. 29 Netflix Data Pipeline Compute EMR S M Operational 15 minutes Daily Cloud Apps Cassandra Kakfa Storage AmazonS3
  • 30. 30 Netflix Data Pipeline Compute EMR Service MetaCat Tools Forklift Sting Charlotte Data Movement Data Visualization Data Lineage Data Quality Pig Workflow Visualization Job Cluster Perf. Visualization Quinto Lipstick API API API API API API API Big Data Portal Big Data Portal TeradataV SELECT * FROM MyTable; Submit ✓ ✓ ✓ ✓ ✓ ✓ ServicesTeradata Presto EMR Hive Spark Druid =