Gian Merlino, Matt Herman, Vivek Pasari, Samarth Jain
Druid Meetup
11/14/18 - Los Gatos, CA
@
Agenda
● Druid Deployment @ Netflix
● Scaling & Sketch Strings
● Druid Roadmap
● Q&A
Druid
Deployment &
Use Cases
@Netflix
Overview
● Druid in Netflix D/W
● Data ingestion
● Deploying Druid @ Netflix
● Use Cases
Netflix Data Warehouse Pipeline
Druid Ingestion
● Batch ingestion
● Druid hadoop indexer
● Input - Hive Text/Parquet tables
● S3 deep storage
Druid Ingestion
Druid Ingestion
import BigDataApi as bda
tbl = bda.Table("hive/table_name")
# build spec
spec = bda.druid.DruidSpec.from_table(tbl)
.spec(ingestion_spec.json)
# create a job from the spec
job = bda.genie.DruidIndexerJob(spec)
.cluster(kg.druid.clusters.DRUID_CLUSTER_NAME)
# submit the job…
job.execute()
Druid Cluster @ Netflix
● r 4.16 x large instance type
● 0.12.2 version
● ~100s nodes
Multitenancy
● Single Tier
● Router
○ Ad hoc
○ Experimental - broker downtime acceptable. Used
for query fine tuning etc.
○ Reporting - pre-defined queries /dashboards
Autoscale
● Favor segments in memory
● Autoscale up - cluster disk utilization beyond 80%
● Handle large data ingestion without having to worry
about cluster tripping over
Deployment Pipeline
● Spinnaker (https://www.spinnaker.io/)
● Clusters upgraded using red black
○ Jenkins jobs - druid tar ball and debian package
○ Deploy components with new code line
○ Wait for segments to load
○ Switch dns records
○ Scale down old cluster
● Rollback
○ Switch dns back to old cluster
Deployment Pipeline
Use Cases
● Dashboard backend
● Sub second query times
○ User interactive slice and dice
○ Longer data retention vs Redshift
○ More dimensions vs Redshift
● Custom UI
AWS Capacity Planning
AWS Capacity Planning
AWS Capacity Planning
Other use cases
● Payments analysis
● Algorithms comparison
● Security
● Quality of Experience (QoE)
Future work
● Real time ingestion
○ Tranquility or Kafka indexing
● Open source T-Digest based Histogram module
● Investigate tiering
● Change auto-scaling policy considering EBS
Scaling & Sketch
Strings
How Netflix Processes 160B
Daily Customer Actions to
Monitor Client Performance
#netflixeverywhere
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”
● 160 Billion client side
data points daily
● 135+ million members
● 190 countries
● 300 million devices
● 4 major UI platforms
TVUI, Web, iOS,
Android
Measure Everything Consistently
Client Performance
● Metrics
○ App launch times
○ Play delay
○ Details page time
● 29 dimensions
○ Geo
○ Network
○ Device
○ AB test cell
Client Performance Metrics
Architecture
Show Me
The Data
Summarize Instead
Anscombe’s Quartet
Saved by Sketch Strings
Box & Whisker Plots
Median application load times are similar
Country B has a larger IQR and long tail
Cumulative Distribution Functions
Recap
● Ingesting consistent and
highly dimensional data
● Analyzing data via custom
web visualizations
● Summarizing responsibly via
sketch strings
● Druid helps us provide the
best customer experience
Druid Roadmap
roadmap and community update
Gian Merlino
gian@imply.io
Who am I?
Gian Merlino
Committer & PMC member on
Cofounder at
>10 years working on scalable systems
40
Druid 0.13.0
…and beyond!!
Druid 0.13.0
400 new features and bug fixes from 81 contributors!
42
Druid 0.13.0
Our first Apache release!
(After years as an independent project.)
43
Druid 0.13.0
● Native parallel batch indexing (phase 1)
● Automatic compaction (phase 1)
● Ingestion statistics and errors via API
● SQL system tables: segments, tasks, servers
● SQL standard-compliant null handling option
● Additional aggregators (stringFirst/stringLast, new HllSketch)
● Support for multiple grouping specs in groupBy query
● Backpressure, compact result formats for large result sets
44
…and beyond!!
● Native parallel batch indexing (phase 2)
● Automatic compaction (phase 2)
● Smaller, faster compression (FastPFOR, etc)
● Faster quantiles: Fixed-bin histograms, moments sketches
● Dynamic prioritization
● Simpler, self-configuring deployment
● … your item here!!
45
Try this at home
46
Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
47
Contribute
48
https://github.com/apache/druid
Stay in touch
49
@druidio
http://druid.io/community
Q&A

Druid meetup @ Netflix (11/14/2018 )

  • 1.
    Gian Merlino, MattHerman, Vivek Pasari, Samarth Jain Druid Meetup 11/14/18 - Los Gatos, CA @
  • 2.
    Agenda ● Druid Deployment@ Netflix ● Scaling & Sketch Strings ● Druid Roadmap ● Q&A
  • 3.
  • 4.
    Overview ● Druid inNetflix D/W ● Data ingestion ● Deploying Druid @ Netflix ● Use Cases
  • 5.
  • 7.
    Druid Ingestion ● Batchingestion ● Druid hadoop indexer ● Input - Hive Text/Parquet tables ● S3 deep storage
  • 8.
  • 9.
    Druid Ingestion import BigDataApias bda tbl = bda.Table("hive/table_name") # build spec spec = bda.druid.DruidSpec.from_table(tbl) .spec(ingestion_spec.json) # create a job from the spec job = bda.genie.DruidIndexerJob(spec) .cluster(kg.druid.clusters.DRUID_CLUSTER_NAME) # submit the job… job.execute()
  • 10.
    Druid Cluster @Netflix ● r 4.16 x large instance type ● 0.12.2 version ● ~100s nodes
  • 11.
    Multitenancy ● Single Tier ●Router ○ Ad hoc ○ Experimental - broker downtime acceptable. Used for query fine tuning etc. ○ Reporting - pre-defined queries /dashboards
  • 12.
    Autoscale ● Favor segmentsin memory ● Autoscale up - cluster disk utilization beyond 80% ● Handle large data ingestion without having to worry about cluster tripping over
  • 13.
    Deployment Pipeline ● Spinnaker(https://www.spinnaker.io/) ● Clusters upgraded using red black ○ Jenkins jobs - druid tar ball and debian package ○ Deploy components with new code line ○ Wait for segments to load ○ Switch dns records ○ Scale down old cluster ● Rollback ○ Switch dns back to old cluster
  • 14.
  • 15.
    Use Cases ● Dashboardbackend ● Sub second query times ○ User interactive slice and dice ○ Longer data retention vs Redshift ○ More dimensions vs Redshift ● Custom UI
  • 16.
  • 17.
  • 18.
  • 22.
    Other use cases ●Payments analysis ● Algorithms comparison ● Security ● Quality of Experience (QoE)
  • 23.
    Future work ● Realtime ingestion ○ Tranquility or Kafka indexing ● Open source T-Digest based Histogram module ● Investigate tiering ● Change auto-scaling policy considering EBS
  • 24.
    Scaling & Sketch Strings HowNetflix Processes 160B Daily Customer Actions to Monitor Client Performance
  • 25.
  • 26.
    “With this launch,consumers around the world will be able to enjoy TV shows and movies simultaneously -- no more waiting. With the help of the Internet, we are putting power in consumers’ hands to watch whenever, wherever and on whatever device.” “With this launch, consumers around the world will be able to enjoy TV shows and movies simultaneously -- no more waiting. With the help of the Internet, we are putting power in consumers’ hands to watch whenever, wherever and on whatever device.” “With this launch, consumers around the world will be able to enjoy TV shows and movies simultaneously -- no more waiting. With the help of the Internet, we are putting power in consumers’ hands to watch whenever, wherever and on whatever device.” “With this launch, consumers around the world will be able to enjoy TV shows and movies simultaneously -- no more waiting. With the help of the Internet, we are putting power in consumers’ hands to watch whenever, wherever and on whatever device.” “With this launch, consumers around the world will be able to enjoy TV shows and movies simultaneously -- no more waiting. With the help of the Internet, we are putting power in consumers’ hands to watch whenever, wherever and on whatever device.”
  • 27.
    ● 160 Billionclient side data points daily ● 135+ million members ● 190 countries ● 300 million devices ● 4 major UI platforms TVUI, Web, iOS, Android Measure Everything Consistently
  • 28.
  • 29.
    ● Metrics ○ Applaunch times ○ Play delay ○ Details page time ● 29 dimensions ○ Geo ○ Network ○ Device ○ AB test cell Client Performance Metrics
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    Box & WhiskerPlots Median application load times are similar Country B has a larger IQR and long tail
  • 36.
  • 37.
    Recap ● Ingesting consistentand highly dimensional data ● Analyzing data via custom web visualizations ● Summarizing responsibly via sketch strings ● Druid helps us provide the best customer experience
  • 38.
  • 39.
    roadmap and communityupdate Gian Merlino gian@imply.io
  • 40.
    Who am I? GianMerlino Committer & PMC member on Cofounder at >10 years working on scalable systems 40
  • 41.
  • 42.
    Druid 0.13.0 400 newfeatures and bug fixes from 81 contributors! 42
  • 43.
    Druid 0.13.0 Our firstApache release! (After years as an independent project.) 43
  • 44.
    Druid 0.13.0 ● Nativeparallel batch indexing (phase 1) ● Automatic compaction (phase 1) ● Ingestion statistics and errors via API ● SQL system tables: segments, tasks, servers ● SQL standard-compliant null handling option ● Additional aggregators (stringFirst/stringLast, new HllSketch) ● Support for multiple grouping specs in groupBy query ● Backpressure, compact result formats for large result sets 44
  • 45.
    …and beyond!! ● Nativeparallel batch indexing (phase 2) ● Automatic compaction (phase 2) ● Smaller, faster compression (FastPFOR, etc) ● Faster quantiles: Fixed-bin histograms, moments sketches ● Dynamic prioritization ● Simpler, self-configuring deployment ● … your item here!! 45
  • 46.
    Try this athome 46
  • 47.
    Download Druid community site(current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started 47
  • 48.
  • 49.
  • 50.