Binging on Data:
Enabling Analytics
at Netflix
BLAKE IRVINE
TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
World Markets
World Markets
ANALYTICS
BIG DATA
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Intro & Topics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
D A T A E N G I N E E R I NG +
I N F R A S T R U C T U R E
● Binging on Data
● Enabling Analytics
● Tableau Environment & Challenges
Topics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
DATA
ULTURE
Binging on Data
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Data and Analytics are embraced across the company
○ Engineering, UX, Customer Service, Finance, & more
● A/B Testing of almost everything...
○ Product, Signup Methods, Payments, Messaging, & more
● Algorithms for...
○ Recommendations, Content, Marketing, & more
Data is Ubiquitous
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Employees
BLAKE IRVINE | TABLEAU CONFERENCE 2018
5000 employees
300 in data teams
200+ in dedicated analytic teams
Analytic Ecosystem
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
@
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Enabling Analytics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
How do we Enable Analytics?
BIG
DATA
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Enablement
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Big Data Portal
Data Platform
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Big Data Portal
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Why not use Tableau Server?
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Data Portal - Tables
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Data Portal - Tables 2
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Enabling Analytics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
People
● Highly Aligned, Loosely Coupled
Alignment
Process
Context
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Vertical Teams
Organization
Content Marketing Growth Tech
Data Engineering
Science & Analytics
Business Teams
Analytic Teams
Engineering Teams
#content-analytics
#marketing-analytics
#growth-analytics
#tech-analytics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Growing user base
● We’ve started up:
○ A Tableau User Group
○ Education tracks
● Early days... much more to do here!
○ Office Hours
○ Tableau Days
○ Data Doctor & more
Community
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Enabling Analytics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Tableau
Environment
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Overview
2000+ Users
250 Developers
On version 10.4 Q4 → 2018.2
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Tableau Servers
BLAKE IRVINE | TABLEAU CONFERENCE 2018
7 x =
> 448 vCPU
> 1.8 TB RAM
> 175 Gigabit IO
Cluster Config
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● The vast majority of our data sources are Extracts
○ Very few live connections
● Why?
○ BIG DATA
○ Some direct connections to Presto or MPP
● Extracts provide an aggregation and caching layer
We Love Data Extracts!
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Tableau Data Sources
1500
50% via Extract API
50% run on Server
BLAKE IRVINE | TABLEAU CONFERENCE 2018
1 Use Big Data Portal to develop query
2 Commit query to ETL repository & deploy
3 Configure ETL workflow so data dependencies are met
4 Use ETL job to publish TDE to server
5 Connect to TDE, Develop Viz, Publish to server, Share
“Best Practice” Pattern
BLAKE IRVINE | TABLEAU CONFERENCE 2018
1 Use Big Data Portal to develop query
2 Paste the query into Tableau
3 Develop Viz
4 Publish, and Schedule data refresh on Tableau Server
“Self-Serve” Pattern
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● “Best Practice” Pattern is:
○ More robust
○ But complex
● “Self-Serve” Pattern is:
○ Easy and convenient
○ Less scalable
○ Harder to manage
Dilemma...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
easy
Publish & Refresh
BLAKE IRVINE | TABLEAU CONFERENCE 2018
BIG
DATA
PORTAL
Enabling Analytics
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Challenges at Netflix
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Data Scale
● Data Lineage
● Push Reporting
Challenges
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Challenge 1: Data Scale
We have REALLY big data
1 Trillion
New Data Events Daily
150 Petabyte
Warehouse
300 Terabytes
Written Daily
5 Petabytes
Read Daily
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Data volume
● Level of Detail
Constantly Balancing
● Speed of access
● Data prep
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Development Choices
Choice 1 Choice 2 Choice 3
Data Engine MPP Cloud TDE
Data Size < 1B rows < 10B rows < 100M rows
Performance
Up to many
minutes
Many
minutes
Up to many
seconds
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● For REALLY big data use cases
● For very fast interactivity
● For custom UI/UX/dataviz
● Custom Analytic Tools
○ Web app built with Javascript
○ Data stored in Druid
Choice 4...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Druid
○ An open source data system for analytic applications
○ Distributed, horizontally scalable architecture
○ VERY, VERY fast
○ Queries are in JSON format to REST endpoint
Druid white paper: http://static.druid.io/docs/druid.pdf
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Can we connect Tableau to Druid?
○ All the performance benefits of Druid...
○ Tableau or web apps use same data store…
● We are exploring this...
○ There is now a Druid SQL layer based on Apache Calcite
○ Have done some testing, finding limitations
Tableau ?
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● TDE -> Hyper with 2018.2 upgrade
○ Happening now(ish)
○ Expectations: faster for small and medium data (<100M)
● Snowflake
○ Fast for “large” data stores (1B+)
● Data scale is always a challenge!
In the meantime...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Challenge 2: Data Lineage
● Where did this data come from?
● Can I trust this data?
Challenge 2: Data Lineage
● Tableau PRO: very easy to pull in data, analyze, and publish
● Tableau CON: very easy to pull in data, analyze, and publish
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Example
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Workbooks
Data Sources
Data Tables
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● ...but not about Tableau
We have Data Lineage...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Can the upcoming Metadata APIs and Object Model help?
● Metadata APIs:
○ Inventory of workbooks, data sources, and metrics
○ Identify similar existing data and workbooks?
● Automate building of similar insights, and integrate to our
existing data lineage system
Metadata APIs
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Data Model
Data Model
● Better practices across our “vertical” teams
● Manual / brute force methods
● Potentially evaluate Alation, Unifi, Collibra, AtScale, Dremio
In the meantime...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Challenge 3: Push Reporting
Challenge 3: Push Reporting
BLAKE IRVINE | TABLEAU CONFERENCE 2018
What we do...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
● Improved layout & pagination
● Export to different formats
● Distribution management: what, who, and when
What we’d like
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Looking Forward
BLAKE IRVINE | TABLEAU CONFERENCE 2018
In 2019 and Beyond
easy
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Before we wrap up...
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Thank YOU!
BLAKE IRVINE | TABLEAU CONFERENCE 2018
Q&A
Blake Irvine
birvine@netflix.com
@blakeirvine
linkedin.com/in/blakeirvine/
Don’tforget theSurvey!

Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix

  • 1.
    Binging on Data: EnablingAnalytics at Netflix BLAKE IRVINE TABLEAU CONFERENCE 2018
  • 2.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 3.
  • 4.
  • 7.
    ANALYTICS BIG DATA BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 8.
    Intro & Topics BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 9.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018 D A T A E N G I N E E R I NG + I N F R A S T R U C T U R E
  • 10.
    ● Binging onData ● Enabling Analytics ● Tableau Environment & Challenges Topics BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 11.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018 DATA ULTURE
  • 12.
    Binging on Data BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 13.
    ● Data andAnalytics are embraced across the company ○ Engineering, UX, Customer Service, Finance, & more ● A/B Testing of almost everything... ○ Product, Signup Methods, Payments, Messaging, & more ● Algorithms for... ○ Recommendations, Content, Marketing, & more Data is Ubiquitous BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 14.
    Employees BLAKE IRVINE |TABLEAU CONFERENCE 2018 5000 employees 300 in data teams 200+ in dedicated analytic teams
  • 15.
    Analytic Ecosystem BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 16.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018 @
  • 17.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 18.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 19.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 20.
    Enabling Analytics BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 21.
    How do weEnable Analytics? BIG DATA BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 22.
    Enablement BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 23.
  • 24.
    Data Platform BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 25.
    Big Data Portal BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 26.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 27.
    Why not useTableau Server? BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 28.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 29.
    Data Portal -Tables BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 30.
    Data Portal -Tables 2 BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 31.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 32.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 33.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 34.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 35.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 36.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 37.
    Enabling Analytics BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 38.
  • 39.
    ● Highly Aligned,Loosely Coupled Alignment Process Context BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 40.
    ● Vertical Teams Organization ContentMarketing Growth Tech Data Engineering Science & Analytics Business Teams Analytic Teams Engineering Teams #content-analytics #marketing-analytics #growth-analytics #tech-analytics BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 41.
    ● Growing userbase ● We’ve started up: ○ A Tableau User Group ○ Education tracks ● Early days... much more to do here! ○ Office Hours ○ Tableau Days ○ Data Doctor & more Community BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 42.
    Enabling Analytics BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 43.
    Tableau Environment BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 44.
    Overview 2000+ Users 250 Developers Onversion 10.4 Q4 → 2018.2 BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 45.
    Tableau Servers BLAKE IRVINE| TABLEAU CONFERENCE 2018 7 x = > 448 vCPU > 1.8 TB RAM > 175 Gigabit IO
  • 46.
    Cluster Config BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 47.
    ● The vastmajority of our data sources are Extracts ○ Very few live connections ● Why? ○ BIG DATA ○ Some direct connections to Presto or MPP ● Extracts provide an aggregation and caching layer We Love Data Extracts! BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 48.
    Tableau Data Sources 1500 50%via Extract API 50% run on Server BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 49.
    1 Use BigData Portal to develop query 2 Commit query to ETL repository & deploy 3 Configure ETL workflow so data dependencies are met 4 Use ETL job to publish TDE to server 5 Connect to TDE, Develop Viz, Publish to server, Share “Best Practice” Pattern BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 50.
    1 Use BigData Portal to develop query 2 Paste the query into Tableau 3 Develop Viz 4 Publish, and Schedule data refresh on Tableau Server “Self-Serve” Pattern BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 51.
    ● “Best Practice”Pattern is: ○ More robust ○ But complex ● “Self-Serve” Pattern is: ○ Easy and convenient ○ Less scalable ○ Harder to manage Dilemma... BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 52.
    easy Publish & Refresh BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 53.
  • 54.
    Challenges at Netflix BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 55.
    ● Data Scale ●Data Lineage ● Push Reporting Challenges BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 56.
  • 57.
    We have REALLYbig data 1 Trillion New Data Events Daily 150 Petabyte Warehouse 300 Terabytes Written Daily 5 Petabytes Read Daily BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 58.
    ● Data volume ●Level of Detail Constantly Balancing ● Speed of access ● Data prep BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 59.
    Development Choices Choice 1Choice 2 Choice 3 Data Engine MPP Cloud TDE Data Size < 1B rows < 10B rows < 100M rows Performance Up to many minutes Many minutes Up to many seconds BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 60.
    ● For REALLYbig data use cases ● For very fast interactivity ● For custom UI/UX/dataviz ● Custom Analytic Tools ○ Web app built with Javascript ○ Data stored in Druid Choice 4... BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 61.
    ● Druid ○ Anopen source data system for analytic applications ○ Distributed, horizontally scalable architecture ○ VERY, VERY fast ○ Queries are in JSON format to REST endpoint Druid white paper: http://static.druid.io/docs/druid.pdf BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 62.
    ● Can weconnect Tableau to Druid? ○ All the performance benefits of Druid... ○ Tableau or web apps use same data store… ● We are exploring this... ○ There is now a Druid SQL layer based on Apache Calcite ○ Have done some testing, finding limitations Tableau ? BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 63.
    ● TDE ->Hyper with 2018.2 upgrade ○ Happening now(ish) ○ Expectations: faster for small and medium data (<100M) ● Snowflake ○ Fast for “large” data stores (1B+) ● Data scale is always a challenge! In the meantime... BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 64.
  • 65.
    ● Where didthis data come from? ● Can I trust this data? Challenge 2: Data Lineage ● Tableau PRO: very easy to pull in data, analyze, and publish ● Tableau CON: very easy to pull in data, analyze, and publish BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 66.
    Example BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 67.
    Workbooks Data Sources Data Tables BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 68.
    ● ...but notabout Tableau We have Data Lineage... BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 69.
    ● Can theupcoming Metadata APIs and Object Model help? ● Metadata APIs: ○ Inventory of workbooks, data sources, and metrics ○ Identify similar existing data and workbooks? ● Automate building of similar insights, and integrate to our existing data lineage system Metadata APIs BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 70.
  • 71.
  • 72.
    ● Better practicesacross our “vertical” teams ● Manual / brute force methods ● Potentially evaluate Alation, Unifi, Collibra, AtScale, Dremio In the meantime... BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 73.
  • 74.
    Challenge 3: PushReporting BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 75.
    What we do... BLAKEIRVINE | TABLEAU CONFERENCE 2018
  • 76.
    ● Improved layout& pagination ● Export to different formats ● Distribution management: what, who, and when What we’d like BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 77.
    Looking Forward BLAKE IRVINE| TABLEAU CONFERENCE 2018
  • 78.
    In 2019 andBeyond easy BLAKE IRVINE | TABLEAU CONFERENCE 2018
  • 79.
    Before we wrapup... BLAKE IRVINE | TABLEAU CONFERENCE 2018 Thank YOU!
  • 80.
    BLAKE IRVINE |TABLEAU CONFERENCE 2018
  • 81.