Druid meetup @ Netflix (11/14/2018 )

Gian Merlino, Matt Herman, Vivek Pasari, Samarth Jain
Druid Meetup
11/14/18 - Los Gatos, CA
@

Agenda
● Druid Deployment @ Netflix
● Scaling & Sketch Strings
● Druid Roadmap
● Q&A

Druid
Deployment &
Use Cases
@Netflix

Overview
● Druid in Netflix D/W
● Data ingestion
● Deploying Druid @ Netflix
● Use Cases

Netflix Data Warehouse Pipeline

Druid Ingestion
● Batch ingestion
● Druid hadoop indexer
● Input - Hive Text/Parquet tables
● S3 deep storage

Druid Ingestion
import BigDataApi as bda
tbl = bda.Table("hive/table_name")
# build spec
spec = bda.druid.DruidSpec.from_table(tbl)
.spec(ingestion_spec.json)
# create a job from the spec
job = bda.genie.DruidIndexerJob(spec)
.cluster(kg.druid.clusters.DRUID_CLUSTER_NAME)
# submit the job…
job.execute()

Druid Cluster @ Netflix
● r 4.16 x large instance type
● 0.12.2 version
● ~100s nodes

Multitenancy
● Single Tier
● Router
○ Ad hoc
○ Experimental - broker downtime acceptable. Used
for query fine tuning etc.
○ Reporting - pre-defined queries /dashboards

Autoscale
● Favor segments in memory
● Autoscale up - cluster disk utilization beyond 80%
● Handle large data ingestion without having to worry
about cluster tripping over

Deployment Pipeline
● Spinnaker (https://www.spinnaker.io/)
● Clusters upgraded using red black
○ Jenkins jobs - druid tar ball and debian package
○ Deploy components with new code line
○ Wait for segments to load
○ Switch dns records
○ Scale down old cluster
● Rollback
○ Switch dns back to old cluster

Use Cases
● Dashboard backend
● Sub second query times
○ User interactive slice and dice
○ Longer data retention vs Redshift
○ More dimensions vs Redshift
● Custom UI

Other use cases
● Payments analysis
● Algorithms comparison
● Security
● Quality of Experience (QoE)

Future work
● Real time ingestion
○ Tranquility or Kafka indexing
● Open source T-Digest based Histogram module
● Investigate tiering
● Change auto-scaling policy considering EBS

Scaling & Sketch
Strings
How Netflix Processes 160B
Daily Customer Actions to
Monitor Client Performance

“With this launch, consumers around the world
will be able to enjoy TV shows and movies
simultaneously -- no more waiting. With the help
of the Internet, we are putting power in
consumers’ hands to watch whenever, wherever
and on whatever device.”

● 160 Billion client side
data points daily
● 135+ million members
● 190 countries
● 300 million devices
● 4 major UI platforms
TVUI, Web, iOS,
Android
Measure Everything Consistently

● Metrics
○ App launch times
○ Play delay
○ Details page time
● 29 dimensions
○ Geo
○ Network
○ Device
○ AB test cell
Client Performance Metrics

Box & Whisker Plots
Median application load times are similar
Country B has a larger IQR and long tail

Cumulative Distribution Functions

Recap
● Ingesting consistent and
highly dimensional data
● Analyzing data via custom
web visualizations
● Summarizing responsibly via
sketch strings
● Druid helps us provide the
best customer experience

roadmap and community update
Gian Merlino
gian@imply.io

Who am I?
Gian Merlino
Committer & PMC member on
Cofounder at
>10 years working on scalable systems
40

Druid 0.13.0
400 new features and bug fixes from 81 contributors!
42

Druid 0.13.0
Our first Apache release!
(After years as an independent project.)
43

Druid 0.13.0
● Native parallel batch indexing (phase 1)
● Automatic compaction (phase 1)
● Ingestion statistics and errors via API
● SQL system tables: segments, tasks, servers
● SQL standard-compliant null handling option
● Additional aggregators (stringFirst/stringLast, new HllSketch)
● Support for multiple grouping specs in groupBy query
● Backpressure, compact result formats for large result sets
44

…and beyond!!
● Native parallel batch indexing (phase 2)
● Automatic compaction (phase 2)
● Smaller, faster compression (FastPFOR, etc)
● Faster quantiles: Fixed-bin histograms, moments sketches
● Dynamic prioritization
● Simpler, self-configuring deployment
● … your item here!!
45

Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
47

Contribute
48
https://github.com/apache/druid

Stay in touch
49
@druidio
http://druid.io/community

Druid meetup @ Netflix (11/14/2018 )

More Related Content

What's hot

Similar to Druid meetup @ Netflix (11/14/2018 )

Recently uploaded

Druid meetup @ Netflix (11/14/2018 )