DATA ANALYTICS WITH DRUID
YOU SUN JEONG
DATA ANALYTICS WITH DRUID
WHO AM I ?
Senior Software Engineer of SK Telecom
Commercial Products
Big Data Discovery Solution (~’16)
Hadoop DW (~’15)
PaaS(CloudFoundry) (~’13)
Iaas (OpenStack) (~’13)
Mail to : jerryjung@apache.org
2
DATA ANALYTICS WITH DRUID
FOOTPRINTS
2014
2015 

- Hadoop DW 

- Realtime NW Analytics
2016 

- Big Data Discovery

- Streaming Processing
3
DATA ANALYTICS WITH DRUID
AGENDA
‣ History
‣ What is Druid?
‣ Druid Architecture
‣ Real-Time Ingestion Demo (15m)
‣ Cohort Analysis (15m)
4
DATA ANALYTICS WITH DRUID
HISTORY
▸ Development started at Meta markets in 2011
▸ Apache V2 in early 2015
▸ 150+ contributors today
▸ https://github.com/druid-io
5
DATA ANALYTICS WITH DRUID
DATA LAKE
6
https://www.linkedin.com/pulse/more-analytics-than-just-fishing-data-lake-john-poppelaars
DATA ANALYTICS WITH DRUID
DW VS DATA LAKE
http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html
7
DATA ANALYTICS WITH DRUID
WHAT IS DRUID
Distributed, 

In-memory Multi-dimensional
OLAP store
8
DATA ANALYTICS WITH DRUID
PROBLEMS
timestamp domain user gender clicked
2011-01-01T00:01:35Z bieber.com 4312345532 Female 1
2011-01-01T00:03:03Z bieber.com 3484920241 Female 0
2011-01-01T00:04:51Z ultra.com 9530174728 Male 1
2011-01-01T00:05:33Z ultra.com 4098310573 Male 1
2011-01-01T00:05:53Z ultra.com 5832057930 Female 0
2011-01-01T00:06:17Z ultra.com 5789283478 Female 1
2011-01-01T00:23:15Z bieber.com 4730093842 Female 0
2011-01-01T00:38:51Z ultra.com 3909846810 Male 1
2011-01-01T00:49:33Z bieber.com 4930097162 Female 1
2011-01-01T00:49:53Z ultra.com 0381837193 Female 0
timestamp impressions clicks
2011-01-01T00:00:00Z 10 6
timestamp domain user gender clicked
2011-01-01T00:01:35Z bieber.com 4312345532 Female 1
2011-01-01T00:03:03Z bieber.com 3484920241 Female 0
2011-01-01T00:04:51Z ultra.com 9530174728 Male 1
2011-01-01T00:05:33Z ultra.com 4098310573 Male 1
2011-01-01T00:05:53Z ultra.com 5832057930 Female 0
2011-01-01T00:06:17Z ultra.com 5789283478 Female 1
2011-01-01T00:23:15Z bieber.com 4730093842 Female 0
2011-01-01T00:38:51Z ultra.com 9530174728 Male 1
2011-01-01T00:49:33Z bieber.com 4930097162 Female 1
2011-01-01T00:49:53Z ultra.com 0381837193 Female 0
timestamp domain gender impressions clicks
2011-01-01T00:00:00Z bieber.com Female 4 2
2011-01-01T00:00:00Z ultra.com Female 3 1
2011-01-01T00:00:00Z ultra.com Male 3 2
9
DATA ANALYTICS WITH DRUID
BIG DATA DISCOVERY
▸ Roll-up
▸ Summarizing over a dimension
▸ Drill-down
▸ Focusing (zooming in)
▸ Slicing and dicing
▸ Reducing dimensions (slice)
▸ Picking values of specific dimensions (dice)
▸ Pivoting
▸ Rotating multi-dimensional cube
10
DATA ANALYTICS WITH DRUID
OLAP CUBE
▸ Slice and Dice
11
DATA ANALYTICS WITH DRUID
IN-MEMORY
12
DATA ANALYTICS WITH DRUID
COLUMNAR STORAGE
13
DATA ANALYTICS WITH DRUID
DRUID TERMS
▸ Data
▸ Timestamp
▸ Dimension
▸ Metric
▸ Datasource
▸ Segment
▸ Granularity
14
DATA ANALYTICS WITH DRUID
DRUID ARCHITECTURE
REALTIME
BROKER HISTORICAL
15
DATA ANALYTICS WITH DRUID
ARCHITECTURE - BATCH INGESTION
HDFS
HISTORICAL
NODE
HISTORICAL
NODE
HISTORICAL
NODE
BROKER
NODE
Segments
Queries
16
DATA ANALYTICS WITH DRUID
ARCHITECTURE - STREAMING INGESTION
REALTIME
NODE
HISTORICAL
NODE
HISTORICAL
NODE
HISTORICAL
NODE
BROKER
NODE
Segments
Queries
Streaming
17
DATA ANALYTICS WITH DRUID
ARCHITECTURE - LAMBDA
REALTIME
NODE
HISTORICAL
NODE
HISTORICAL
NODE
HISTORICAL
NODE
BROKER
NODE
Segments
Queries
Streaming
HDFS
18
DATA ANALYTICS WITH DRUID
GLUE ARCHITECTURE
REAL TIME
TASK
HISTORICAL
NODE
HISTORICAL
NODE
HISTORICAL
NODE
BROKER
NODE
Segments
Queries
Streaming
STREAM
PROCESSOR

(TRANQUILITY)
Kafka Indexing Service
19
DATA ANALYTICS WITH DRUID
REAL WORLD ARCHITECTURE
DATA 

NODE #1
DATA 

NODE #N
OVERLORD
MIDDLE
MANAGE

#1
COORDI

NATOR
MYSQL
HA 

PROXY
MEMCACHED

#2
BROKER
NODE

#1
BROKER
NODE

#1
MEMCACHED

#3
MEMCACHED

#1
HISTORICAL
NODE #1
HISTORICAL
NODE #N
MIDDLE
MANAGE

#N
ZK1
ZK2
ZK3
20
DATA ANALYTICS WITH DRUID
DRUID MONITORING
21
http://www.slideshare.net/CharlesAllen9/programmatic-bidding-data-streams-druid
DATA ANALYTICS WITH DRUID
DRUID DATASOURCE
22
RDRUID
DATA ANALYTICS WITH DRUID
https://github.com/druid-io/RDruid
23
DATA ANALYTICS WITH DRUID
PYDROID
24
https://github.com/druid-io/pydruid
DATA ANALYTICS WITH DRUID
DEMO
▸ Jupyter Notebook(PyDruid)
▸ Mobile App User Events for 1 week 

: 2 billion events
▸ Scenario 

: Unique users

Cohort Analysis
25
DEMO
DATA ANALYTICS WITH DRUID
MAY THE FORCE BE WITH YOU
27
DATA ANALYTICS WITH DRUID
REFERENCES
▸ Druid

: http://www.popit.kr/tag/druid/ 

(https://www.facebook.com/popitkr/)

: http://druid.io/
▸ Cohort Analysis

: http://www.gregreda.com/2015/08/23/cohort-analysis-
with-python/
▸ Druid Meetup@Seoul

: http://www.meetup.com/Druid-Seoul/
28
DATA ANALYTICS WITH DRUID
POPIT
29
https://www.facebook.com/popitkr/
Q&A
THANK YOU
DATA ANALYTICS WITH DRUID 30

Data Analytics with Druid

  • 1.
    DATA ANALYTICS WITHDRUID YOU SUN JEONG
  • 2.
    DATA ANALYTICS WITHDRUID WHO AM I ? Senior Software Engineer of SK Telecom Commercial Products Big Data Discovery Solution (~’16) Hadoop DW (~’15) PaaS(CloudFoundry) (~’13) Iaas (OpenStack) (~’13) Mail to : jerryjung@apache.org 2
  • 3.
    DATA ANALYTICS WITHDRUID FOOTPRINTS 2014 2015 
 - Hadoop DW 
 - Realtime NW Analytics 2016 
 - Big Data Discovery
 - Streaming Processing 3
  • 4.
    DATA ANALYTICS WITHDRUID AGENDA ‣ History ‣ What is Druid? ‣ Druid Architecture ‣ Real-Time Ingestion Demo (15m) ‣ Cohort Analysis (15m) 4
  • 5.
    DATA ANALYTICS WITHDRUID HISTORY ▸ Development started at Meta markets in 2011 ▸ Apache V2 in early 2015 ▸ 150+ contributors today ▸ https://github.com/druid-io 5
  • 6.
    DATA ANALYTICS WITHDRUID DATA LAKE 6 https://www.linkedin.com/pulse/more-analytics-than-just-fishing-data-lake-john-poppelaars
  • 7.
    DATA ANALYTICS WITHDRUID DW VS DATA LAKE http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html 7
  • 8.
    DATA ANALYTICS WITHDRUID WHAT IS DRUID Distributed, 
 In-memory Multi-dimensional OLAP store 8
  • 9.
    DATA ANALYTICS WITHDRUID PROBLEMS timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832057930 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 3909846810 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 0381837193 Female 0 timestamp impressions clicks 2011-01-01T00:00:00Z 10 6 timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832057930 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 9530174728 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 0381837193 Female 0 timestamp domain gender impressions clicks 2011-01-01T00:00:00Z bieber.com Female 4 2 2011-01-01T00:00:00Z ultra.com Female 3 1 2011-01-01T00:00:00Z ultra.com Male 3 2 9
  • 10.
    DATA ANALYTICS WITHDRUID BIG DATA DISCOVERY ▸ Roll-up ▸ Summarizing over a dimension ▸ Drill-down ▸ Focusing (zooming in) ▸ Slicing and dicing ▸ Reducing dimensions (slice) ▸ Picking values of specific dimensions (dice) ▸ Pivoting ▸ Rotating multi-dimensional cube 10
  • 11.
    DATA ANALYTICS WITHDRUID OLAP CUBE ▸ Slice and Dice 11
  • 12.
    DATA ANALYTICS WITHDRUID IN-MEMORY 12
  • 13.
    DATA ANALYTICS WITHDRUID COLUMNAR STORAGE 13
  • 14.
    DATA ANALYTICS WITHDRUID DRUID TERMS ▸ Data ▸ Timestamp ▸ Dimension ▸ Metric ▸ Datasource ▸ Segment ▸ Granularity 14
  • 15.
    DATA ANALYTICS WITHDRUID DRUID ARCHITECTURE REALTIME BROKER HISTORICAL 15
  • 16.
    DATA ANALYTICS WITHDRUID ARCHITECTURE - BATCH INGESTION HDFS HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries 16
  • 17.
    DATA ANALYTICS WITHDRUID ARCHITECTURE - STREAMING INGESTION REALTIME NODE HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries Streaming 17
  • 18.
    DATA ANALYTICS WITHDRUID ARCHITECTURE - LAMBDA REALTIME NODE HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries Streaming HDFS 18
  • 19.
    DATA ANALYTICS WITHDRUID GLUE ARCHITECTURE REAL TIME TASK HISTORICAL NODE HISTORICAL NODE HISTORICAL NODE BROKER NODE Segments Queries Streaming STREAM PROCESSOR
 (TRANQUILITY) Kafka Indexing Service 19
  • 20.
    DATA ANALYTICS WITHDRUID REAL WORLD ARCHITECTURE DATA 
 NODE #1 DATA 
 NODE #N OVERLORD MIDDLE MANAGE
 #1 COORDI
 NATOR MYSQL HA 
 PROXY MEMCACHED
 #2 BROKER NODE
 #1 BROKER NODE
 #1 MEMCACHED
 #3 MEMCACHED
 #1 HISTORICAL NODE #1 HISTORICAL NODE #N MIDDLE MANAGE
 #N ZK1 ZK2 ZK3 20
  • 21.
    DATA ANALYTICS WITHDRUID DRUID MONITORING 21 http://www.slideshare.net/CharlesAllen9/programmatic-bidding-data-streams-druid
  • 22.
    DATA ANALYTICS WITHDRUID DRUID DATASOURCE 22
  • 23.
    RDRUID DATA ANALYTICS WITHDRUID https://github.com/druid-io/RDruid 23
  • 24.
    DATA ANALYTICS WITHDRUID PYDROID 24 https://github.com/druid-io/pydruid
  • 25.
    DATA ANALYTICS WITHDRUID DEMO ▸ Jupyter Notebook(PyDruid) ▸ Mobile App User Events for 1 week 
 : 2 billion events ▸ Scenario 
 : Unique users
 Cohort Analysis 25
  • 26.
  • 27.
    DATA ANALYTICS WITHDRUID MAY THE FORCE BE WITH YOU 27
  • 28.
    DATA ANALYTICS WITHDRUID REFERENCES ▸ Druid
 : http://www.popit.kr/tag/druid/ 
 (https://www.facebook.com/popitkr/)
 : http://druid.io/ ▸ Cohort Analysis
 : http://www.gregreda.com/2015/08/23/cohort-analysis- with-python/ ▸ Druid Meetup@Seoul
 : http://www.meetup.com/Druid-Seoul/ 28
  • 29.
    DATA ANALYTICS WITHDRUID POPIT 29 https://www.facebook.com/popitkr/
  • 30.