1 © Hortonworks Inc. 2011–2018. All rights reserved
An Introduction to Druid
Nishant Bangarwa
Software Developer
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
2
Agenda
History and Motivation
Introduction
Data Storage Format
Druid Architecture – Indexing and Querying Data
Druid In Production
Recent Improvements
3 © Hortonworks Inc. 2011–2018. All rights reserved
HISTORY
• Druid Open sourced in late 2012
• Initial Use case
• Power ad-tech analytics product
• Requirements
• Query any combination of metrics and
dimensions
• Scalability : trillions of events/day
• Real-time : data freshness
• Streaming Ingestion
• Interactive : low latency queries
4 © Hortonworks Inc. 2011–2018. All rights reserved
4
How Big is the initial use case ?
5 © Hortonworks Inc. 2011–2018. All rights reserved
5
MOTIVATION
• Business Intelligence Queries
• Arbitrary slicing and dicing of data
• Interactive real time visualizations on
Complex data streams
• Answer BI questions
• How many unique male visitors visited my
website last month ?
• How many products were sold last quarter
broken down by a demographic and product
category ?
• Not interested in dumping entire dataset
6 © Hortonworks Inc. 2011–2018. All rights reserved
Introduction
7 © Hortonworks Inc. 2011–2018. All rights reserved
7
What is Druid ?
• Column-oriented distributed datastore
• Sub-Second query times
• Realtime streaming ingestion
• Arbitrary slicing and dicing of data
• Automatic Data Summarization
• Approximate algorithms (hyperLogLog, theta)
• Scalable to petabytes of data
• Highly available
8 © Hortonworks Inc. 2011–2018. All rights reserved
8
Companies Using Druid
9 © Hortonworks Inc. 2011–2018. All rights reserved
Druid Architecture
10 © Hortonworks Inc. 2011–2018. All rights reserved
1
Node Types
• Realtime Nodes
• Historical Nodes
• Broker Nodes
• Coordinator Nodes
11 © Hortonworks Inc. 2011–2018. All rights reserved
Realtime
Nodes
Historical
Nodes
1
Druid Architecture
Batch Data
Event
Historical
Nodes
Broker
Nodes
Realtime
Index Tasks
Streaming
Data
Historical
Nodes
Handoff
12 © Hortonworks Inc. 2011–2018. All rights reserved
1
Druid Architecture
Batch Data
Queries
Metadata
Store
Coordinator
Nodes
Zookeepe
r
Historical
Nodes
Broker
Nodes
Realtime
Index Tasks
Streaming
Data
Handoff
13 © Hortonworks Inc. 2011–2018. All rights reserved
Storage Format
14 © Hortonworks Inc. 2011–2018. All rights reserved
Druid: Segments
• Data in Druid is stored in Segment Files.
• Partitioned by time
• Ideally, segment files are each smaller than 1GB.
• If files are large, smaller time partitions are needed.
Time
Segment 1:
Monday
Segment 2:
Tuesday
Segment 3:
Wednesday
Segment 4:
Thursday
Segment 5_2:
Friday
Segment 5_1:
Friday
15 © Hortonworks Inc. 2011–2018. All rights reserved
1
Example Wikipedia Edit Dataset
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
Timestamp Dimensions Metrics
16 © Hortonworks Inc. 2011–2018. All rights reserved
1
Data Rollup
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
timestamp page language city country count sum_added sum_deleted min_added max_added ….
2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 32
2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 43
2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 12
Rollup by hour
17 © Hortonworks Inc. 2011–2018. All rights reserved
1
Dictionary Encoding
• Create and store Ids for each value
• e.g. page column
⬢ Values - Justin Bieber, Ke$ha, Selena Gomes
⬢ Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2
⬢ Column Data - [0 0 0 1 1 2]
• city column - [0 0 0 1 1 1]
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
18 © Hortonworks Inc. 2011–2018. All rights reserved
1
Bitmap Indices
• Store Bitmap Indices for each value
⬢ Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0]
⬢ Ke$ha -> [3, 4] -> [0 0 0 1 1 0]
⬢ Selena Gomes -> [5] -> [0 0 0 0 0 1]
• Queries
⬢ Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0]
⬢ language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1]
• Indexes compressed with Concise or Roaring encoding
timestamp page language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45
2011-01-01T00:01:35Z Ke$ha en Calgary CA 17 87
2011-01-01T00:01:35Z Ke$ha en Calgary CA 43 99
2011-01-01T00:01:35Z Selena Gomes en Calgary CA 12 53
19 © Hortonworks Inc. 2011–2018. All rights reserved
1
Approximate Sketch Columns
timestamp page userid language city country … added deleted
2011-01-01T00:01:35Z Justin Bieber user1111111 en SF USA 10 65
2011-01-01T00:03:63Z Justin Bieber user1111111 en SF USA 15 62
2011-01-01T00:04:51Z Justin Bieber user2222222 en SF USA 32 45
2011-01-01T00:05:35Z Ke$ha user3333333 en Calgary CA 17 87
2011-01-01T00:06:41Z Ke$ha user4444444 en Calgary CA 43 99
2011-01-02T00:08:35Z Selena Gomes user1111111 en Calgary CA 12 53
timestamp page language city country count sum_added sum_delete
d
min_added Userid_sket
ch
….
2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 {sketch}
2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 {sketch}
2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 {sketch}
Rollup by hour
20 © Hortonworks Inc. 2011–2018. All rights reserved
Approximate Sketch Columns
• Better rollup for high cardinality columns e.g userid
• Reduced storage size
• Use Cases
• Fast approximate distinct counts
• Approximate histograms
• Funnel/retention analysis
• Limitation
• Not possible to do exact counts
• filter on individual row values
21 © Hortonworks Inc. 2011–2018. All rights reserved
Indexing Data
22 © Hortonworks Inc. 2011–2018. All rights reserved
Indexing Service
• Indexing is performed by
• Overlord
• Middle Managers
• Peons
• Middle Managers spawn peons which runs ingestion
tasks
• Each peon runs 1 task
• Task definition defines which task to run and its
properties
23 © Hortonworks Inc. 2011–2018. All rights reserved
2
Streaming Ingestion : Realtime Index Tasks
• Ability to ingest streams of data
• Stores data in write-optimized structure
• Periodically converts write-optimized structure
to read-optimized segments
• Event query-able as soon as it is ingested
• Both push and pull based ingestion
24 © Hortonworks Inc. 2011–2018. All rights reserved
Streaming Ingestion : Tranquility
• Helper library for coordinating
streaming ingestion
• Simple API to send events to
druid
• Transparently Manages
• Realtime index Task Creation
• Partitioning and Replication
• Schema Evolution
• Can be used with your favourite
ETL framework e.g Flink, Nifi,
Samza, Spark, Storm
• At-least once ingestion
25 © Hortonworks Inc. 2011–2018. All rights reserved
Kafka Indexing Service (experimental)
• Supports Exactly once ingestion
• Messages pulled by Kafka Index Tasks
• Each Kafka Index Task consumes from a set
of partitions with specific start and end
offset
• Each message verified to ensure sequence
• Kafka Offsets and corresponding segments
persisted in same metadata transaction
atomically
• Kafka Supervisor
• embedded inside overlord
• Manages kafka index tasks
• Retry failed tasks
Task 1
Task 2
Task 3
26 © Hortonworks Inc. 2011–2018. All rights reserved
Batch Ingestion
• HadoopIndexTask
• Peon launches Hadoop MR job
• Mappers read data
• Reducers create Druid segment files
• Index Task
• Runs in single JVM i.e peon
• Suitable for data sizes(<1G)
• Integrations with Apache HIVE and Spark for Batch Ingestion
27 © Hortonworks Inc. 2011–2018. All rights reserved
Querying Data
28 © Hortonworks Inc. 2011–2018. All rights reserved
Querying Data from Druid
• Druid supports
• JSON Queries over HTTP
• In built SQL (experimental)
• Querying libraries available for
• Python
• R
• Ruby
• Javascript
• Clojure
• PHP
• Multiple Open source UI tools
29 © Hortonworks Inc. 2011–2018. All rights reserved
2
JSON Over HTTP
• HTTP Rest API
• Queries and results expressed in JSON
• Multiple Query Types
• Time Boundary
• Timeseries
• TopN
• GroupBy
• Select
• Segment Metadata
30 © Hortonworks Inc. 2011–2018. All rights reserved
In built SQL (experimental)
• Apache Calcite based parser and planner
• Ability to connect druid to any BI tool that supports JDBC
• SQL via JSON over HTTP
• Supports Approximate queries
• APPROX_COUNT_DISTINCT(col)
• Ability to do Fast Approx TopN queries
• APPROX_QUANTILE(column, probability)
31 © Hortonworks Inc. 2011–2018. All rights reserved
Integrated with multiple Open Source UI tools
• Superset –
• Developed at AirBnb
• In Apache Incubation since May 2017
• Grafana – Druid plugin
• Metabase
• With in-built SQL, connect with any BI tool supporting JDBC
32 © Hortonworks Inc. 2011–2018. All rights reserved
Druid in Production
33 © Hortonworks Inc. 2011–2018. All rights reserved
Druid in Production
 Is Druid suitable for my Use case ?
 Will Druid meet my performance requirements at scale ?
 How complex is it to Operate and Manage Druid cluster ?
 How to monitor a Druid cluster ?
 High Availability ?
 How to upgrade Druid cluster without downtime ?
 Security ?
34 © Hortonworks Inc. 2011–2018. All rights reserved
Suitable Use Cases
• Powering Interactive user facing applications
• Arbitrary slicing and dicing of large datasets
• User behavior analysis
• measuring distinct counts
• retention analysis
• funnel analysis
• A/B testing
• Exploratory analytics/root cause analysis
• Not interested in dumping entire dataset
35 © Hortonworks Inc. 2011–2018. All rights reserved
Performance and Scalability : Fast Facts
Most Events per Day
300 Billion Events / Day
(Metamarkets)
Most Computed Metrics
1 Billion Metrics / Min
(Jolata)
Largest Cluster
200 Nodes
(Metamarkets)
Largest Hourly Ingestion
2TB per Hour
(Netflix)
36 © Hortonworks Inc. 2011–2018. All rights reserved
3
Performance Numbers
• Query Latency
• average - 500ms
• 90%ile < 1sec
• 95%ile < 5sec
• 99%ile < 10 sec
• Query Volume
• 1000s queries per minute
• Benchmarking code
• https://github.com/druid-
io/druid-benchmark
37 © Hortonworks Inc. 2011–2018. All rights reserved
Simplified Druid Cluster Management with Ambari
 Install, configure and manage Druid and all external dependencies from Ambari
 Easy to enable HA, Security, Monitoring …
38 © Hortonworks Inc. 2011–2018. All rights reserved
Simplified Druid Cluster Management with Ambari
39 © Hortonworks Inc. 2011–2018. All rights reserved
Monitoring a Druid Cluster
• Each Druid Node emits metrics for
• Query performance
• Ingestion Rate
• JVM Health
• Query Cache performance
• System health
• Emitted as JSON objects to a runtime log file or over HTTP to other services
• Emitters available for Ambari Metrics Server, Graphite, StatsD, Kafka
• Easy to implement your own metrics emitter
40 © Hortonworks Inc. 2011–2018. All rights reserved
Monitoring using Ambari Metrics Server
• HDP 2.6.1 contains pre-defined grafana dashboards
• Health of Druid Nodes
• Ingestion
• Query performance
• Easy to create new dashboards and setup alerts
• Auto configured when both Druid and Ambari Metrics Server are installed
41 © Hortonworks Inc. 2011–2018. All rights reserved
Monitoring using Ambari Metrics Server
42 © Hortonworks Inc. 2011–2018. All rights reserved
Monitoring using Ambari Metrics Server
43 © Hortonworks Inc. 2011–2018. All rights reserved
High Availability
• Deploy Coordinator/Overlord on multiple instances
• Leader election in zookeeper
• Broker – install multiple brokers
• Use druid Router/ Any Load balancer to route queries to brokers
• Realtime Index Tasks – create redundant tasks.
• Historical Nodes – create load rule with replication factor >= 2 (default = 2)
44 © Hortonworks Inc. 2011–2018. All rights reserved
4
Rolling Upgrades
 Shared Nothing Architecture
⬢ Maintain backwards compatibility
⬢ Data redundancy
 Upgrade one Druid Component at a time
⬢ No Downtime
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
3
45 © Hortonworks Inc. 2011–2018. All rights reserved
Security
• Supports Authentication via Kerberos/ SPNEGO
• Easy Wizard based kerberos security enablement via Ambari
Druid
KDC server
User
Browser1 kinit user
2 Token
46 © Hortonworks Inc. 2011–2018. All rights reserved
4
Summary
• Easy installation and management via Ambari
• Real-time
• Ingestion latency < seconds.
• Query latency < seconds.
• Arbitrary slice and dice big data like ninja
• No more pre-canned drill downs.
• Query with more fine-grained granularity.
• High availability and Rolling deployment capabilities
• Secure and Production ready
• Vibrant and Active community
• Available as Tech Preview in HDP 2.6.1
47 © Hortonworks Inc. 2011–2018. All rights reserved
4
• Druid website – http://druid.io
• Druid User Group - dev@druid.incubator.apache.org
• Druid Dev Group – users@druid.incubator.apache.org
Useful Resources
48 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you
Twitter - @NishantBangarwa
Email - nbangarwa@hortonworks.com
49 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
50 © Hortonworks Inc. 2011–2018. All rights reserved
Extending Core Druid
• Plugin Based Architecture
• leverage Guice in order to load extensions at runtime
• Possible to add extension to
• Add a new deep storage implementation
• Add a new Firehose for Ingestion
• Add Aggregators
• Add Complex metrics
• Add new Query types
• Add new Jersey resources
• Bundle your extension with all the other Druid extensions
51 © Hortonworks Inc. 2011–2018. All rights reserved
Performance : Approximate Algorithms
• Ability to Store Approximate Data Sketches for high cardinality columns e.g userid
• Reduced storage size
• Use Cases
• Fast approximate distinct counts
• Approximate Top-K queries
• Approximate histograms
• Funnel/retention analysis
• Limitation
• Not possible to do exact counts
• filter on individual row values
52 © Hortonworks Inc. 2011–2018. All rights reserved
Superset
• Python backend
• Flask app builder
• Authentication
• Pandas for rich analytics
• SqlAlchemy for SQL toolkit
• Javascript frontend
• React, NVD3
• Deep integration with Druid
53 © Hortonworks Inc. 2011–2018. All rights reserved
Superset Rich Dashboarding Capabilities: Treemaps
54 © Hortonworks Inc. 2011–2018. All rights reserved
Superset Rich Dashboarding Capabilities: Sunburst
55 © Hortonworks Inc. 2011–2018. All rights reserved
Superset UI Provides Powerful Visualizations
Rich library of dashboard visualizations:
Basic:
• Bar Charts
• Pie Charts
• Line Charts
Advanced:
• Sankey Diagrams
• Treemaps
• Sunburst
• Heatmaps
And More!

An Introduction to Druid

  • 1.
    1 © HortonworksInc. 2011–2018. All rights reserved An Introduction to Druid Nishant Bangarwa Software Developer
  • 2.
    © Hortonworks Inc.2011 – 2016. All Rights Reserved 2 Agenda History and Motivation Introduction Data Storage Format Druid Architecture – Indexing and Querying Data Druid In Production Recent Improvements
  • 3.
    3 © HortonworksInc. 2011–2018. All rights reserved HISTORY • Druid Open sourced in late 2012 • Initial Use case • Power ad-tech analytics product • Requirements • Query any combination of metrics and dimensions • Scalability : trillions of events/day • Real-time : data freshness • Streaming Ingestion • Interactive : low latency queries
  • 4.
    4 © HortonworksInc. 2011–2018. All rights reserved 4 How Big is the initial use case ?
  • 5.
    5 © HortonworksInc. 2011–2018. All rights reserved 5 MOTIVATION • Business Intelligence Queries • Arbitrary slicing and dicing of data • Interactive real time visualizations on Complex data streams • Answer BI questions • How many unique male visitors visited my website last month ? • How many products were sold last quarter broken down by a demographic and product category ? • Not interested in dumping entire dataset
  • 6.
    6 © HortonworksInc. 2011–2018. All rights reserved Introduction
  • 7.
    7 © HortonworksInc. 2011–2018. All rights reserved 7 What is Druid ? • Column-oriented distributed datastore • Sub-Second query times • Realtime streaming ingestion • Arbitrary slicing and dicing of data • Automatic Data Summarization • Approximate algorithms (hyperLogLog, theta) • Scalable to petabytes of data • Highly available
  • 8.
    8 © HortonworksInc. 2011–2018. All rights reserved 8 Companies Using Druid
  • 9.
    9 © HortonworksInc. 2011–2018. All rights reserved Druid Architecture
  • 10.
    10 © HortonworksInc. 2011–2018. All rights reserved 1 Node Types • Realtime Nodes • Historical Nodes • Broker Nodes • Coordinator Nodes
  • 11.
    11 © HortonworksInc. 2011–2018. All rights reserved Realtime Nodes Historical Nodes 1 Druid Architecture Batch Data Event Historical Nodes Broker Nodes Realtime Index Tasks Streaming Data Historical Nodes Handoff
  • 12.
    12 © HortonworksInc. 2011–2018. All rights reserved 1 Druid Architecture Batch Data Queries Metadata Store Coordinator Nodes Zookeepe r Historical Nodes Broker Nodes Realtime Index Tasks Streaming Data Handoff
  • 13.
    13 © HortonworksInc. 2011–2018. All rights reserved Storage Format
  • 14.
    14 © HortonworksInc. 2011–2018. All rights reserved Druid: Segments • Data in Druid is stored in Segment Files. • Partitioned by time • Ideally, segment files are each smaller than 1GB. • If files are large, smaller time partitions are needed. Time Segment 1: Monday Segment 2: Tuesday Segment 3: Wednesday Segment 4: Thursday Segment 5_2: Friday Segment 5_1: Friday
  • 15.
    15 © HortonworksInc. 2011–2018. All rights reserved 1 Example Wikipedia Edit Dataset timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53 Timestamp Dimensions Metrics
  • 16.
    16 © HortonworksInc. 2011–2018. All rights reserved 1 Data Rollup timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53 timestamp page language city country count sum_added sum_deleted min_added max_added …. 2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 32 2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 43 2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 12 Rollup by hour
  • 17.
    17 © HortonworksInc. 2011–2018. All rights reserved 1 Dictionary Encoding • Create and store Ids for each value • e.g. page column ⬢ Values - Justin Bieber, Ke$ha, Selena Gomes ⬢ Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2 ⬢ Column Data - [0 0 0 1 1 2] • city column - [0 0 0 1 1 1] timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes en Calgary CA 12 53
  • 18.
    18 © HortonworksInc. 2011–2018. All rights reserved 1 Bitmap Indices • Store Bitmap Indices for each value ⬢ Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0] ⬢ Ke$ha -> [3, 4] -> [0 0 0 1 1 0] ⬢ Selena Gomes -> [5] -> [0 0 0 0 0 1] • Queries ⬢ Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0] ⬢ language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1] • Indexes compressed with Concise or Roaring encoding timestamp page language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber en SF USA 32 45 2011-01-01T00:01:35Z Ke$ha en Calgary CA 17 87 2011-01-01T00:01:35Z Ke$ha en Calgary CA 43 99 2011-01-01T00:01:35Z Selena Gomes en Calgary CA 12 53
  • 19.
    19 © HortonworksInc. 2011–2018. All rights reserved 1 Approximate Sketch Columns timestamp page userid language city country … added deleted 2011-01-01T00:01:35Z Justin Bieber user1111111 en SF USA 10 65 2011-01-01T00:03:63Z Justin Bieber user1111111 en SF USA 15 62 2011-01-01T00:04:51Z Justin Bieber user2222222 en SF USA 32 45 2011-01-01T00:05:35Z Ke$ha user3333333 en Calgary CA 17 87 2011-01-01T00:06:41Z Ke$ha user4444444 en Calgary CA 43 99 2011-01-02T00:08:35Z Selena Gomes user1111111 en Calgary CA 12 53 timestamp page language city country count sum_added sum_delete d min_added Userid_sket ch …. 2011-01-01T00:00:00Z Justin Bieber en SF USA 3 57 172 10 {sketch} 2011-01-01T00:00:00Z Ke$ha en Calgary CA 2 60 186 17 {sketch} 2011-01-02T00:00:00Z Selena Gomes en Calgary CA 1 12 53 12 {sketch} Rollup by hour
  • 20.
    20 © HortonworksInc. 2011–2018. All rights reserved Approximate Sketch Columns • Better rollup for high cardinality columns e.g userid • Reduced storage size • Use Cases • Fast approximate distinct counts • Approximate histograms • Funnel/retention analysis • Limitation • Not possible to do exact counts • filter on individual row values
  • 21.
    21 © HortonworksInc. 2011–2018. All rights reserved Indexing Data
  • 22.
    22 © HortonworksInc. 2011–2018. All rights reserved Indexing Service • Indexing is performed by • Overlord • Middle Managers • Peons • Middle Managers spawn peons which runs ingestion tasks • Each peon runs 1 task • Task definition defines which task to run and its properties
  • 23.
    23 © HortonworksInc. 2011–2018. All rights reserved 2 Streaming Ingestion : Realtime Index Tasks • Ability to ingest streams of data • Stores data in write-optimized structure • Periodically converts write-optimized structure to read-optimized segments • Event query-able as soon as it is ingested • Both push and pull based ingestion
  • 24.
    24 © HortonworksInc. 2011–2018. All rights reserved Streaming Ingestion : Tranquility • Helper library for coordinating streaming ingestion • Simple API to send events to druid • Transparently Manages • Realtime index Task Creation • Partitioning and Replication • Schema Evolution • Can be used with your favourite ETL framework e.g Flink, Nifi, Samza, Spark, Storm • At-least once ingestion
  • 25.
    25 © HortonworksInc. 2011–2018. All rights reserved Kafka Indexing Service (experimental) • Supports Exactly once ingestion • Messages pulled by Kafka Index Tasks • Each Kafka Index Task consumes from a set of partitions with specific start and end offset • Each message verified to ensure sequence • Kafka Offsets and corresponding segments persisted in same metadata transaction atomically • Kafka Supervisor • embedded inside overlord • Manages kafka index tasks • Retry failed tasks Task 1 Task 2 Task 3
  • 26.
    26 © HortonworksInc. 2011–2018. All rights reserved Batch Ingestion • HadoopIndexTask • Peon launches Hadoop MR job • Mappers read data • Reducers create Druid segment files • Index Task • Runs in single JVM i.e peon • Suitable for data sizes(<1G) • Integrations with Apache HIVE and Spark for Batch Ingestion
  • 27.
    27 © HortonworksInc. 2011–2018. All rights reserved Querying Data
  • 28.
    28 © HortonworksInc. 2011–2018. All rights reserved Querying Data from Druid • Druid supports • JSON Queries over HTTP • In built SQL (experimental) • Querying libraries available for • Python • R • Ruby • Javascript • Clojure • PHP • Multiple Open source UI tools
  • 29.
    29 © HortonworksInc. 2011–2018. All rights reserved 2 JSON Over HTTP • HTTP Rest API • Queries and results expressed in JSON • Multiple Query Types • Time Boundary • Timeseries • TopN • GroupBy • Select • Segment Metadata
  • 30.
    30 © HortonworksInc. 2011–2018. All rights reserved In built SQL (experimental) • Apache Calcite based parser and planner • Ability to connect druid to any BI tool that supports JDBC • SQL via JSON over HTTP • Supports Approximate queries • APPROX_COUNT_DISTINCT(col) • Ability to do Fast Approx TopN queries • APPROX_QUANTILE(column, probability)
  • 31.
    31 © HortonworksInc. 2011–2018. All rights reserved Integrated with multiple Open Source UI tools • Superset – • Developed at AirBnb • In Apache Incubation since May 2017 • Grafana – Druid plugin • Metabase • With in-built SQL, connect with any BI tool supporting JDBC
  • 32.
    32 © HortonworksInc. 2011–2018. All rights reserved Druid in Production
  • 33.
    33 © HortonworksInc. 2011–2018. All rights reserved Druid in Production  Is Druid suitable for my Use case ?  Will Druid meet my performance requirements at scale ?  How complex is it to Operate and Manage Druid cluster ?  How to monitor a Druid cluster ?  High Availability ?  How to upgrade Druid cluster without downtime ?  Security ?
  • 34.
    34 © HortonworksInc. 2011–2018. All rights reserved Suitable Use Cases • Powering Interactive user facing applications • Arbitrary slicing and dicing of large datasets • User behavior analysis • measuring distinct counts • retention analysis • funnel analysis • A/B testing • Exploratory analytics/root cause analysis • Not interested in dumping entire dataset
  • 35.
    35 © HortonworksInc. 2011–2018. All rights reserved Performance and Scalability : Fast Facts Most Events per Day 300 Billion Events / Day (Metamarkets) Most Computed Metrics 1 Billion Metrics / Min (Jolata) Largest Cluster 200 Nodes (Metamarkets) Largest Hourly Ingestion 2TB per Hour (Netflix)
  • 36.
    36 © HortonworksInc. 2011–2018. All rights reserved 3 Performance Numbers • Query Latency • average - 500ms • 90%ile < 1sec • 95%ile < 5sec • 99%ile < 10 sec • Query Volume • 1000s queries per minute • Benchmarking code • https://github.com/druid- io/druid-benchmark
  • 37.
    37 © HortonworksInc. 2011–2018. All rights reserved Simplified Druid Cluster Management with Ambari  Install, configure and manage Druid and all external dependencies from Ambari  Easy to enable HA, Security, Monitoring …
  • 38.
    38 © HortonworksInc. 2011–2018. All rights reserved Simplified Druid Cluster Management with Ambari
  • 39.
    39 © HortonworksInc. 2011–2018. All rights reserved Monitoring a Druid Cluster • Each Druid Node emits metrics for • Query performance • Ingestion Rate • JVM Health • Query Cache performance • System health • Emitted as JSON objects to a runtime log file or over HTTP to other services • Emitters available for Ambari Metrics Server, Graphite, StatsD, Kafka • Easy to implement your own metrics emitter
  • 40.
    40 © HortonworksInc. 2011–2018. All rights reserved Monitoring using Ambari Metrics Server • HDP 2.6.1 contains pre-defined grafana dashboards • Health of Druid Nodes • Ingestion • Query performance • Easy to create new dashboards and setup alerts • Auto configured when both Druid and Ambari Metrics Server are installed
  • 41.
    41 © HortonworksInc. 2011–2018. All rights reserved Monitoring using Ambari Metrics Server
  • 42.
    42 © HortonworksInc. 2011–2018. All rights reserved Monitoring using Ambari Metrics Server
  • 43.
    43 © HortonworksInc. 2011–2018. All rights reserved High Availability • Deploy Coordinator/Overlord on multiple instances • Leader election in zookeeper • Broker – install multiple brokers • Use druid Router/ Any Load balancer to route queries to brokers • Realtime Index Tasks – create redundant tasks. • Historical Nodes – create load rule with replication factor >= 2 (default = 2)
  • 44.
    44 © HortonworksInc. 2011–2018. All rights reserved 4 Rolling Upgrades  Shared Nothing Architecture ⬢ Maintain backwards compatibility ⬢ Data redundancy  Upgrade one Druid Component at a time ⬢ No Downtime 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
  • 45.
    45 © HortonworksInc. 2011–2018. All rights reserved Security • Supports Authentication via Kerberos/ SPNEGO • Easy Wizard based kerberos security enablement via Ambari Druid KDC server User Browser1 kinit user 2 Token
  • 46.
    46 © HortonworksInc. 2011–2018. All rights reserved 4 Summary • Easy installation and management via Ambari • Real-time • Ingestion latency < seconds. • Query latency < seconds. • Arbitrary slice and dice big data like ninja • No more pre-canned drill downs. • Query with more fine-grained granularity. • High availability and Rolling deployment capabilities • Secure and Production ready • Vibrant and Active community • Available as Tech Preview in HDP 2.6.1
  • 47.
    47 © HortonworksInc. 2011–2018. All rights reserved 4 • Druid website – http://druid.io • Druid User Group - dev@druid.incubator.apache.org • Druid Dev Group – users@druid.incubator.apache.org Useful Resources
  • 48.
    48 © HortonworksInc. 2011–2018. All rights reserved Thank you Twitter - @NishantBangarwa Email - nbangarwa@hortonworks.com
  • 49.
    49 © HortonworksInc. 2011–2018. All rights reserved Questions?
  • 50.
    50 © HortonworksInc. 2011–2018. All rights reserved Extending Core Druid • Plugin Based Architecture • leverage Guice in order to load extensions at runtime • Possible to add extension to • Add a new deep storage implementation • Add a new Firehose for Ingestion • Add Aggregators • Add Complex metrics • Add new Query types • Add new Jersey resources • Bundle your extension with all the other Druid extensions
  • 51.
    51 © HortonworksInc. 2011–2018. All rights reserved Performance : Approximate Algorithms • Ability to Store Approximate Data Sketches for high cardinality columns e.g userid • Reduced storage size • Use Cases • Fast approximate distinct counts • Approximate Top-K queries • Approximate histograms • Funnel/retention analysis • Limitation • Not possible to do exact counts • filter on individual row values
  • 52.
    52 © HortonworksInc. 2011–2018. All rights reserved Superset • Python backend • Flask app builder • Authentication • Pandas for rich analytics • SqlAlchemy for SQL toolkit • Javascript frontend • React, NVD3 • Deep integration with Druid
  • 53.
    53 © HortonworksInc. 2011–2018. All rights reserved Superset Rich Dashboarding Capabilities: Treemaps
  • 54.
    54 © HortonworksInc. 2011–2018. All rights reserved Superset Rich Dashboarding Capabilities: Sunburst
  • 55.
    55 © HortonworksInc. 2011–2018. All rights reserved Superset UI Provides Powerful Visualizations Rich library of dashboard visualizations: Basic: • Bar Charts • Pie Charts • Line Charts Advanced: • Sankey Diagrams • Treemaps • Sunburst • Heatmaps And More!

Editor's Notes

  • #3 Motivation Druid introduction and use case Demo Druid Architecture Storage Internals Recent Improvements
  • #4 Initial Use Case Power ad-tech analytics product at metamarkets. Similar to as shown in the picture in the right, A dashboard where you can visualize timeseries data and do arbitrary filtering and grouping on any combinations of dimensions. Requirements - Data store needs to support Arbitrary queries i.e users should be able to filter and group on any combination of dimensions. Scalability : should be able to handle trillions of events/day Interactive : since the data store was going to power and interactive dashboard low latency queries was must Real-time : the time when between an event occurred and it is visible dashboard should be mininal (order of few seconds..) High Availability – no central point of failure Rolling Upgrades – the architecture was required to support Rolling upgrades
  • #6 MOTIVATION Interactive real time visualizations on Complex data streams Answer BI questions How many unique male visitors visited my website last month ? How many products were sold last quarter broken down by a demographic and product category ? Not interested in dumping entire dataset Suppose I am running an ad campaign, and I want to understand what kind of Impressions are there What is my click through rate How many users decided to purchase my services We have User Activity Stream and we may want to know How the users are behaving. We may have a stream of Firewall Events and we want to do detect any anomalies in those streams in realtime. Also, For very large distributed clusters there is a need to answer questions about application performance. How individual node in my cluster behaving ? Are there any Anomalies in query response time ? All the above use cases can have data streams which can be huge in volume depending on the scale of business. How do I analyze this information ? How do I get insights from these Stream of Events in realtime ?
  • #8 What is Druid ? Column-oriented distributed datastore – data is stored in columnar format, in general many datasets have a large number of dimensions e.g 100s or 1000s , but most of the time queries only need 5-10s of columns, the column oriented format helps druid in only scanning the required columns. Sub-Second query times – It utilizes various techniques like bitmap indexes to do fast filtering of data, uses memory mapped files to serve data from memory, data summarization and compression, query caching to do fast filtering of data and have very optimized algorithms for different query types. And is able to achievesub second query times Realtime streaming ingestion from almost any ETL pipeline. Arbitrary slicing and dicing of data – no need to create pre-canned drill downs Automatic Data Summarization – during ingestion it can summarize your data based, e.g If my dashboard only shows events aggregated by HOUR, we can optionally configure druid to do pre-aggregation at ingestion time. Approximate algorithms (hyperLogLog, theta) – for fast approximate answers Scalable to petabytes of data Highly available
  • #9 This shows some of the production users. I can talk about some of the large ones which have common use cases. Alibaba and Ebay use druid for ecommerce and user behavior analytics Cisco has a realtime analytics product for analyzing network flows Yahoo uses druid for user behavior analytics and realtime cluster monitoring Hulu does interactive analysis on user and application behavior Paypal, SK telecom – uses druid for business analytics
  • #11 Realtime Nodes - Handle Real-Time Ingestion, Support both pull & push based ingestion. Store data in Row Oriented write optimized Structure Periodically convert write optimized structure read optimized Structure Ability to serve queries as soon as data is ingested. Historical Nodes - Main workhorses of druid clatter Use Memory Mapped files to load columnar data Respond to User queries Broker Nodes - Keeps track of which node is service which portion of data Ability to scatter query across multiple Historical and Realtime nodes Caching Layer
  • #12 Druid has concept of different nodes, where each node is designed and optimized to perform specific set of tasks. Realtime Index Tasks / Realtime Nodes- Handle Real-Time Ingestion, Support both pull & push based ingestion. Handle Queries - Ability to serve queries as soon as data is ingested. Store data in write optimized data structure on heap, periodically convert it to write optimized time partitioned immutable segments and persist it to deep storage. In case you need to do any ETL like data enrichment or joining multiple streams of data, you can do it in a separate ETL and send your massaged data to druid. Deep storage can be any distributed FS and acts as a permanent backup of data Historical Nodes - Main workhorses of druid cluster Use Memory Mapped files to load immutable segments Respond to User queries Now Lets see the how data can be queried. Broker Nodes - Keeps track of the data chunks being loaded by each node in the cluster Ability to scatter query across multiple Historical and Realtime nodes Caching Layer Now Lets discuss another case, when you are not having streaming data, but want to Ingest Batch data into druid Batch ingestion can be done using either Hadoop MR or spark job, which converts your data into time partitioned segments and persist it to deep storage.
  • #13 With many historical nodes in a cluster there is a need for balance the load across them, this is done by the Coordinator Nodes - Uses Zookeeper for coordination Asks historical Nodes to load or drop data They also move data across historical nodes to balances load in the cluster Manages Data replication External Dependencies – Metadata Storage – for storing metadata about the segments i.e the location of segments, information on how to load the segments etc. Memcache/ Redis cache – you can optionally add a memcache or redis cache which can be used to cache partial query results.
  • #15 Druid: Segments Data in Druid is stored in Segment Files. Partitioned by time Ideally, segment files are each smaller than 1GB. If files are large, smaller time partitions are needed.
  • #16 Example Wikipedia Edit Dataset
  • #17 Data Rollup Rollup by hour
  • #18 Dictionary Encoding Create and store Ids for each value e.g. page column Values - Justin Bieber, Ke$ha, Selena Gomes Encoding - Justin Bieber : 0, Ke$ha: 1, Selena Gomes: 2 Column Data - [0 0 0 1 1 2] city column - [0 0 0 1 1 1]
  • #19 Bitmap Indices Store Bitmap Indices for each value Justin Bieber -> [0, 1, 2] -> [1 1 1 0 0 0] Ke$ha -> [3, 4] -> [0 0 0 1 1 0] Selena Gomes -> [5] -> [0 0 0 0 0 1] Queries Justin Bieber or Ke$ha -> [1 1 1 0 0 0] OR [0 0 0 1 1 0] -> [1 1 1 1 1 0] language = en and country = CA -> [1 1 1 1 1 1] AND [0 0 0 1 1 1] -> [0 0 0 1 1 1] Indexes compressed with Concise or Roaring encoding
  • #20 Data Rollup Rollup by hour
  • #23 Indexing Service is highly-available, distributed service that runs indexing related tasks. The indexing service is composed of three main components: Overlord - responsible for accepting tasks, coordinating task distribution, creating locks around tasks, and returning statuses to callers. Middle Managers - The middle manager node is a worker node that executes submitted tasks. they launch peons that actually runs the tasks. Peons – managed by middlemanagers and runs a single task. It gets a task definition which is a json spec file that describes the task to perform. All the coordination and communication for task assignment, announcing task stustuses is done via zookeeper.
  • #24 Streaming Ingestion Done by Realtime Index Tasks Ability to ingest streams of data Stores data in write-optimized structure – row oriented key-value store Indexed by time and dimension values Periodically based on either a time interval or threshold on number of rows it converts write-optimized structure to read-optimized segments Event query-able as soon as it is ingested Both push and pull based ingestion
  • #25 Tranquility is a helper library in druid which provides easy coordination and task management for performing streaming ingestion into druid It has a very simple API which you can use to send events to druid. On the right hand side you can see a simple example sending an event to druid. So you just create a Tranquilzer with config, The config contains the location of druid overlord, name of your datasource and other ingestion related properties. Just simply call send on the tranquilizer, it will automatically takes care of creating a druid task, managing lifecycle of the task, discovering location of the task and sending data to that task.
  • #26 We have also added an experimental support for ingesting data from Kafka that also supports exactly once consumption of data. How kafka works is as follows – Each message written to Kafka is placed into an ordered and immutable sequence called a partition and is assigned a sequentially incrementing identifier called an offset. Messages are pulled by druid tasks which verify the sequence and offsets to ensure the sequence. Then at time of persisting the data both the segments and information related kafka offsets is persisted in a single transaction. Since we have the offsets in the metadata in case of failures, we can start reading from that offset again.
  • #27 Batch Ingestion – ingested data in batch. HadoopIndexTask Peon launches Hadoop MR job Mappers read data Reducers create Druid segment files Index Task Suitable for data sizes(<1G)
  • #30 Druid broker nodes exposes HTTP endpoints where users can post the queries Queries and results expressed in JSON Multiple Query Types On the right we have an example of a groupBy query in the json you can see In the json query you can specify the datasource, granularity – time by which you want to bucket your data, any filter you may want to use, List of aggregations that you need to perform and any post aggregations like average etc.
  • #31 The second and easier way to query druid is using SQL (suport for inbuilt SQL is experimental at present) We leverage apache calcite for parsing and planning the query. It also uses Avatica which is a framework for building JDBC drivers for databases So using this, you can connect any BI tool that supports JDBC to druid. Druid also defines some new operators for supporting approximate queries,
  • #35 Retention analysis
  • #36 Most Events per Day 300 Billion Events / Day (Metamarkets) Most Computed Metrics 1 Billion Metrics / Min (Jolata) Largest Cluster 200 Nodes (Metamarkets) Largest Hourly Ingestion 2TB per Hour (Netflix)
  • #37 Query Latency average - 500ms 90%ile < 1sec 95%ile < 5sec 99%ile < 10 sec Query Volume 1000s queries per minute
  • #40 Query performance – query time, segment scan time … Ingestion Rate – events ingested, events persisted … JVM Health – JVM Heap usage, GC stats … Cache Related – cache hits, cache misses, cache evictions … System related – cpu, disk, network, swap usage etc..
  • #45 No Downtime Data redundancy Rolling upgrades
  • #46 You can secure Druid nodes using Kerberos, and use SPNEGO mechanism to interact with druid HTTP end points.
  • #47 Summary It is easy to install and manage druid via Ambari Realtime with ingestion and query latency of the order of few secs. Arbitrary slicing and dicing of data
  • #48 Summary It is easy to install and manage druid via Ambari Realtime with ingestion and query latency of the order of few secs. Arbitrary slicing and dicing of data
  • #51 Guice which is a  lightweight dependency injection framework