Data-Driven @ Netflix
Michelle Ufford
Principal Architect
Data Engineering & Analytics
Michelle Ufford
Highlights
● Principal Architect at Netflix
Data Engineering & Analytics
● Prev. Engineering Manager at GoDaddy
Data Platform
● Microsoft Data Platform MVP
● 10+ years building web-scale analytics &
data engineering infrastructure
● advises on Big Data topics
Microsoft, Hortonworks, Teradata, etc.
Gratuitous picture of my kids
By the Numbers.
The business numbers.
86.7 million
members
1000+ devices
supported
125+ million
hours watched
launched
19 years ago
every. day.
Any device. Anywhere.*
* Well, almost anywhere.
The data numbers.
4 petabyte
DW reads
300 terabyte
DW writes
40 petabyte
data warehouse
700+ billion
events written
Data in Action.
Content.
What should we license?
Predicting Value for
Licensed Content.
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
● past performance of similar content on Netflix
● broadcast & Box Office performance
● talent (writers, actors, directors, etc.)
● critic & user reviews
● awards & accolades
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
● terms (length, exclusivity, etc.)
● bid amount
● negotiations
Feature Engineering Predictive Models License Terms Content Efficiency
Predicting Value for
Licensed Content.
● value / cost
● if efficient, license
Feature Engineering Predictive Models License Terms Content Efficiency
“ last year our original content overall
was some of our most efficient content.
”
“ We are building a studio in the cloud
and pioneering new approaches to movie
production, optimizing pitches, production
schedules, subtitling, and digital asset
management for our Original content. ”
What should we license?create?
Product UX.
Data. Driven. Experience.
There are 86 million different
versions of Netflix.
billboard
rows, row order
titles, title order
title artwork
Public Relations.
Analytics of news.Analytics is news.
Content Delivery.
Monitoring a global service.
YouTube video of Vizceral demo
https://youtu.be/JctsPpgEsVs
Behind the Scenes.
data access
AWS
S3
Big Data Platform
Amazon
Redshift
data processing
fast storage data viz
METACA
T
data services
events data
operational data
elastic storage Apache Pig
Philosophy.
Freedom &
Responsibility.
Context,
Not Control.
Highly Aligned &
Loosely Coupled.
Big Data
Platform
Data Engineering & Analytics
MarketingProduct PlaybackContent Finance
105 talented engineers & analysts
data viz engineers
analytics engineers
data engineers
Big Data
Platform
analysts
Results,
Not Opinions.
Experimentation Platform
Batch &
Ad Hoc
Analysis
Questions?
Thank you
for attending!
Michelle Ufford
linkedin.com/in/mufford
@sqlfool
Data @
Netflix
@NetflixData
hadoopsie.com techblog.netflix.com
tinyurl.com/NetflixData

Data-Driven @ Netflix

Editor's Notes

  • #2 Abstract: Netflix is the quintessential data-driven company. It’s 83 million members stream more than 125 million hours in over 190 countries every day and generate more than 700 billion events in the process. In this session, we’ll share how data is used to make informed decisions across the entire business — from content acquisition to content delivery, and everything in between. We’ll look at how Netflix successfully employs a scalable cloud-based data platform to support a constant deluge of data and a small army of data analysts, engineers, and scientists. We’ll discuss the advanced analytical capabilities that are enabled through modern data technologies. Lastly, we’ll explore some of the architectural & operational principals that enable Netflix to so effectively make use of its data.
  • #3 Obligatory “why should you listen to me talk?” slide
  • #5 Numbers as of Q3 2016
  • #6 During CES 2016 this January, ‘flipped the switch’ making Netflix available in 130+ new countries. Netflix is presently available in over 190 countries worldwide.
  • #9 What content should we license? How much should we bid? How should we value exclusivity? How should we measure content performance?
  • #17 Originals content: 2015 - 450 hours 2016 - 600 hours 2017 - 1000 hours
  • #20 Netflix website: circa 2012
  • #21 Netflix website: circa 2013
  • #22 Netflix website: circa 2014
  • #23 Netflix website: circa 2015
  • #24 Netflix website: circa 2016
  • #32 Vizceral Open-Source Project: https://github.com/netflix/vizceral http://techblog.netflix.com/2016/08/vizceral-open-source.html http://techblog.netflix.com/2015/10/flux-new-approach-to-system-intuition.html
  • #34 Genie – federated job execution engine Metacat – federated metadata service Kragle – python APIs
  • #36 15m views on SlideShare
  • #37 Minimize rules Make smart choices Take ownership
  • #39  Avoid prescriptive requirements Give visibility
  • #42 Set context (strategy, goals) Communicate only as much as needed
  • #45 At Netflix, we use the scientific method We’re often right at predicting behavior – for people exactly like us Most people aren’t like us