Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix

Binging on Data:
Enabling Analytics
at Netflix
BLAKE IRVINE
TABLEAU CONFERENCE 2018

BLAKE IRVINE | TABLEAU CONFERENCE 2018

ANALYTICS
BIG DATA

Intro & Topics

D A T A E N G I N E E R I NG +
I N F R A S T R U C T U R E

● Binging on Data
● Enabling Analytics
● Tableau Environment & Challenges
Topics

DATA
ULTURE

Binging on Data

● Data and Analytics are embraced across the company
○ Engineering, UX, Customer Service, Finance, & more
● A/B Testing of almost everything...
○ Product, Signup Methods, Payments, Messaging, & more
● Algorithms for...
○ Recommendations, Content, Marketing, & more
Data is Ubiquitous

Employees
5000 employees
300 in data teams
200+ in dedicated analytic teams

Analytic Ecosystem

@

Enabling Analytics

How do we Enable Analytics?
BIG
DATA

Enablement

Data Platform

Big Data Portal

Why not use Tableau Server?

Data Portal - Tables

Data Portal - Tables 2

● Highly Aligned, Loosely Coupled
Alignment
Process
Context

● Vertical Teams
Organization
Content Marketing Growth Tech
Data Engineering
Science & Analytics
Business Teams
Analytic Teams
Engineering Teams
#content-analytics
#marketing-analytics
#growth-analytics
#tech-analytics

● Growing user base
● We’ve started up:
○ A Tableau User Group
○ Education tracks
● Early days... much more to do here!
○ Office Hours
○ Tableau Days
○ Data Doctor & more
Community

Tableau
Environment

Overview
2000+ Users
250 Developers
On version 10.4 Q4 → 2018.2

Tableau Servers
7 x =
> 448 vCPU
> 1.8 TB RAM
> 175 Gigabit IO

Cluster Config

● The vast majority of our data sources are Extracts
○ Very few live connections
● Why?
○ BIG DATA
○ Some direct connections to Presto or MPP
● Extracts provide an aggregation and caching layer
We Love Data Extracts!

Tableau Data Sources
1500
50% via Extract API
50% run on Server

1 Use Big Data Portal to develop query
2 Commit query to ETL repository & deploy
3 Configure ETL workflow so data dependencies are met
4 Use ETL job to publish TDE to server
5 Connect to TDE, Develop Viz, Publish to server, Share
“Best Practice” Pattern

1 Use Big Data Portal to develop query
2 Paste the query into Tableau
3 Develop Viz
4 Publish, and Schedule data refresh on Tableau Server
“Self-Serve” Pattern

● “Best Practice” Pattern is:
○ More robust
○ But complex
● “Self-Serve” Pattern is:
○ Easy and convenient
○ Less scalable
○ Harder to manage
Dilemma...

easy
Publish & Refresh

BIG
DATA
PORTAL
Enabling Analytics

Challenges at Netflix

● Data Scale
● Data Lineage
● Push Reporting
Challenges

We have REALLY big data
1 Trillion
New Data Events Daily
150 Petabyte
Warehouse
300 Terabytes
Written Daily
5 Petabytes
Read Daily

● Data volume
● Level of Detail
Constantly Balancing
● Speed of access
● Data prep

Development Choices
Choice 1 Choice 2 Choice 3
Data Engine MPP Cloud TDE
Data Size < 1B rows < 10B rows < 100M rows
Performance
Up to many
minutes
Many
minutes
Up to many
seconds

● For REALLY big data use cases
● For very fast interactivity
● For custom UI/UX/dataviz
● Custom Analytic Tools
○ Web app built with Javascript
○ Data stored in Druid
Choice 4...

● Druid
○ An open source data system for analytic applications
○ Distributed, horizontally scalable architecture
○ VERY, VERY fast
○ Queries are in JSON format to REST endpoint
Druid white paper: http://static.druid.io/docs/druid.pdf

● Can we connect Tableau to Druid?
○ All the performance benefits of Druid...
○ Tableau or web apps use same data store…
● We are exploring this...
○ There is now a Druid SQL layer based on Apache Calcite
○ Have done some testing, finding limitations
Tableau ?

● TDE -> Hyper with 2018.2 upgrade
○ Happening now(ish)
○ Expectations: faster for small and medium data (<100M)
● Snowflake
○ Fast for “large” data stores (1B+)
● Data scale is always a challenge!
In the meantime...

● Where did this data come from?
● Can I trust this data?
Challenge 2: Data Lineage
● Tableau PRO: very easy to pull in data, analyze, and publish
● Tableau CON: very easy to pull in data, analyze, and publish

Example

Workbooks
Data Sources
Data Tables

● ...but not about Tableau
We have Data Lineage...

● Can the upcoming Metadata APIs and Object Model help?
● Metadata APIs:
○ Inventory of workbooks, data sources, and metrics
○ Identify similar existing data and workbooks?
● Automate building of similar insights, and integrate to our
existing data lineage system
Metadata APIs

● Better practices across our “vertical” teams
● Manual / brute force methods
● Potentially evaluate Alation, Unifi, Collibra, AtScale, Dremio
In the meantime...

Challenge 3: Push Reporting

What we do...

● Improved layout & pagination
● Export to different formats
● Distribution management: what, who, and when
What we’d like

Looking Forward

In 2019 and Beyond
easy

Before we wrap up...
Thank YOU!

Q&A
Blake Irvine
birvine@netflix.com
@blakeirvine
linkedin.com/in/blakeirvine/
Don’tforget theSurvey!

Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix

Similar to Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix (20)

Recently uploaded

Recently uploaded (20)

Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix