SlideShare a Scribd company logo
Web analytics at scale
with Druid at naver.com
Jason Heo (analytic.js.heo@navercorp.com)
Doo Yong Kim (dooyong.kim@navercorp.com)
• Part 1
• About naver.com
• What is & Why Druid
• The Architecture of our service
• Part 2
• Druid Segment File Structure
• Spark Druid Connector
• TopN Query
• Plywood & Split-Apply-Combine
• How to fix TopN’s unstable results
• Appendix
Agenda
About naver.com
https://en.wikipedia.org/wiki/Naver
• naver.com
• The biggest website in South Korea
• The Google of South Korea
• 74.7% of all web searches in South Korea
• Developed Analytics Systems at Naver
• Working with Databases since 2000
• Author of 3 MySQL books
• Currently Elasticsearch, Spark, Kudu,
and Druid
• Working on Spark and Druid-based OLAP
platform
• Implemented search infrastructure at
coupang.com
• Have been interested in MPP and advanced file
formats for big data
Jason Heo Doo Yong Kim
About Speakers
Platforms we've tested so far
Parquet
ORC
Carbon Data
Elasticsearch
ClickHouse Kudu
Druid
SparkSQL
Hive
Impala
Drill
Presto
Kylin
Phoenix
Query
Engine
Storage
Format
• What is Druid?
• Our Requirements
• Why Druid?
• Experimental Results
What is & Why Druid
• Column-oriented distributed datastore
• Real-time streaming ingestion
• Scalable to petabytes of data
• Approximate algorithms (hyperLogLog, theta sketch)
https://www.slideshare.net/HadoopSummit/scalable-
realtime-analytics-using-druid
From HORTONWORKS
What is Druid?
From my point of view
• Druid is a cumbersome version of Elasticsearch (w/o search feature)
• Similar points
• Secondary Index
• DSLs for query
• Flow of Query Processing
• Terms Aggregation ↔	TopN Query, Coordinator ↔	Broker, Data Node ↔	Historical
• Different points
• more complicated to operate
• better with much more data
• better for Ultra High Cardinality
• less GC overhead
• better for Spark Connectivity (for Full Scan)
What is Druid?
Real-time
Node
Historical
BrokerOverlord
Middle
Manager
Coordinator
Kafka
Index Service
Segment management
What is Druid? - Architecture
MySQL
metadata
Zookeeper
cluster mgmt.
Deep Storage
(HDFS, S3)
stores Druid segments
for durability
Query Service
Clients
Druid DSL
Segments
download
Segments for
query
Real-time
Node
Historical
Broker
{
"queryType": "groupBy",
"dataSource": "sample_data",
"dimension": ["country", "device"],
"filter": {},
"aggregation": [...],
"limitSpec": [...]
}
{
"queryType": "topN",
"dataSource": "sample_data",
"dimension": "sample_dim",
"filter": {...}
"aggregation": [...],
"threshold": 5
}
SELECT ... FROM dataSource
What is Druid? - Queries
• SQLs can be converted to Druid DSL
• No JOIN
SELECT COUNT(*)
FROM logs
WHERE url = ?;
1. Random Access
(OLTP)
SELECT url,
COUNT(*)
FROM logs
GROUP BY url
ORDER BY COUNT(*)
DESC
LIMIT 10;
2. Most Viewed
SELECT visitor,
COUNT(*)
FROM logs
GROUP BY visitor;
3. Full Aggregation
SELECT ...
FROM logs INNER
JOIN users
GROUP BY ...
HAVING ...
4. JOIN
Why Druid? - Requirements
• Supports Bitmap Index
• Fast Random Access
Perfect solution for OLTP and OLAP
For OLTP
• Supports TopN Query
• 100x times faster than GroupBy query
• Supports Complex Queries
• JOIN, HAVING, etc
• with our Spark Druid Connector
For OLAP
Why Druid?
★★★★☆1. Random Access
★★★★☆3. Full Aggregation
★★★★★2. Most Viewed
★★★★☆4. JOIN
• Fast Random Access
• Terms Aggregation
• TopN Query
• Easy to manage
Pros
Cons
• Slow full scan with es-hadoop
• Low Performance for multi-field terms aggregation
(esp. High Cardinality)
• GC Overhead
Comparison – ElasticSearch
1. Random Access ★★★★★
3. Full Aggregation ☆☆☆☆☆
2. Most Viewed ★★★☆☆
4. JOIN ☆☆☆☆☆
• Fast Random Access via Primary Key
• Fast OLAP with Impala
Pros
• No Secondary Index
• No TopN Query
Cons
Comparison – Kudu + Impala
★★★★★ (PK)
★☆☆☆☆ (non-PK)
1. Random Access
★★★★★3. Full Aggregation
☆☆☆☆☆2. Most Viewed
★★★★★4. JOIN
Random Access Most Viewed
0.25 0.35 0.08
2.7
2.9
0.78
0
0.5
1
1.5
2
2.5
3
3.5
Elasticesarch Kudu+Impala Druid
1 Field 2 Fields
0.003
0.14
0.03
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Elastisearch Kudu+Impala Druid
Experimental Results – Response Time
sec sec
Experimental Results – Notes
• ES: Lucene Index
• Kudu+Impala: Primary Key
• Druid: Bitmap Index
Random Access
• ES: Terms Aggregation
• Kudu+Implala: Group By
• Druid: TopN
• Split-Apply-Combine for Multi Fields
Most Viewed
• 210 mil. rows
• same parallelism
• same number of shards/partitions/segments
Data Sets
Logs
The Architecture of our service
Zeppelin
Plywood
Druid DSL
Coordinator
Overlord
Middle
Manager
Peon
Spark Thrift
Server
Batch
Ingestion
Parquet
Kafka
Run daily batch job
API Server
Historical
Spark
Executor
Segments File Broker
Druid
SparkSQL
Kafka
Indexing
Service
Kafka
transform logs
Parquet
remove
duplicated logs
Real-time
Ingestion
Switching
Introduction – Who am I?
1. Doo Yong Kim
2. Naver
3. Software engineer
4. Big data
Contents
1. Druid Storage Model
2. Spark Druid Connector Implementation
3. TopN Query
4. Plywood & Split-Combine-Apply
5. Extending Druid Query
Druid Storage Model – 4 characteristics
• Columnar format
• Explicit distinguishes between dimension, metric
• Bitmap index
• Dictionary encoded
Druid Storage Model - background
Druid treats dimension and metric separately.
Dimension Metric
• Bitmap Index
• GroupBy Fields
• Argument of Aggregate Function
{
"dimensionsSpec": {
"dimensions": ["country", "device", ...]
},
...
"metricsSpec": [
{ "type": "count", "name": "count" },
{ "type": "doubleSum", "fieldName": "duration", "name": "duration" }
]
}
Druid Ingestion Spec
Druid Storage Model- Dimension
Country (Dimension)
Korea
UK
Korea
Korea
Korea
UK
Korea ↔ 0
UK ↔ 1
Dictionary for country
UK appears in 2nd, 6th rows
Korea → 101110
UK → 010001
Bitmap for Korea
0
1
0
0
0
1
Dictionary Encoded Values
Druid Storage Model - Metric
13
2
15
29
30
14
Country (Dimension) duration (Metric)
Korea 13
UK 2
Korea 15
Korea 29
Korea 30
UK 14
Row
Filter it manually
device LIKE 'Iphone%'
Druid Storage Model
Bitmapcountry Filtering
Bitmapdevice Filtering
duration Filtering
Filter by bitmap
country = 'Korea'
('Korea', 'Iphone 6s', 13)
SELECT country, device, duration
FROM logs
WHERE country = 'Korea'
AND device LIKE 'Iphone%'
Spark Druid Connector
Spark Druid Connector
1. 3 Ways to implement, Our implementation
2. What is needed to implement
3. Sample Codes, Performance Test
4. How to implement
Spark Druid Connector - 3 Ways to implement
Druid
Broker
Spark
Driver
DSLSQL Druid
Historical
Spark
Driver
SQL Spark
Executor
• Good if SQL is rewritable to DSL
• But DSL does not support all SQL
• Ex: JOIN, sub-query
• Easy to implement
• No need to understand Druid Index Library
• Ser/de operation is expensive
• Parallelism is bounded to no. of Historical
Select DSL
Large JSON
1st way 2nd way
Spark Druid Connector - 3 Ways to implement
Spark
Driver
SQL
• Read Druid segment files directly.
• Similar to the way of reading Parquet
• Difficult to implement
• Need to understand Druid segment library
3rd way
Executor
Segment File
Reads segments using
Druid Library
Allocate Spark executor into Historical Node
We chose this way!
spark.read
.format("com.navercorp.ni.druid.spark.druid")
.option("coordinator", "host1.com:18081")
.option("broker", "host2.com:18082")
.option("datasource", "logs").load()
.createOrReplaceTempView("logs")
Spark Druid Connector – How to use
spark.sql("""
SELECT country, device, duration
FROM logs
WHERE country = 'Korea'
AND device LIKE 'Iphone%'
""").show(false)
Create table Execute Query
Total 4.4B rows
0.21
7.5
0
1
2
3
4
5
6
7
8
Spark Druid Spark Parquet
Random Access
24.1
7.7
0
5
10
15
20
25
30
Spark Druid Spark Parquet
Full Scan & GROUP BY
Spark Druid Connector - Performance
Seconds, lower is better
Spark Druid Connector – How to implement
Spark Druid Connector – How to implement
1. Druid Rest API
2. Druid Segment Library
3. Spark Data Source API
Spark Druid Connector – Get table schema
Spark
Driver
Druid
Broker
{
"queryType": "segmentMetaData",
"dataSource": "logs",
"merge": "true"
}
{
"columns": {
"__time": {...},
"country": {...},
"device": {...},
"duration": {...}
...
}
spark.read
.format("...")
.option("coordinator", "...")
.option("broker", "...")
.option("datasource", "logs")
.load()
Schema
Spark Druid Connector – Partition pruning
WHERE country = 'Korea'
AND_time = CAST('2018-05-23' AS TIMESTAMP)
Segments can be pruned
by interval condition and single dimension
partition
1. Interval condition
serverview returns only matched segments
2. Single dimension partition
compare start and end with given filter
Spark
Driver
Druid
Coordinator
GET /.../logs/intervals/2018-05-23/serverview
[
{
"segment": {
"shardSpec": {
"dimension": "country",
"start": "null", "end":
"b" ...},
"id": "segmentId"
},
"servers": [
{"host": "host1"},
{"host": "host2"}
]
},
{ "segment": ...},
...
}
Spark Druid Connector – Spark filters to Druid filters
WHERE country = 'Korea'
AND city = 'Seoul'
buildScan(requiredColumns: [country, device, duration],
filters: [EqualTo(country, Korea), EqualTo(city, Seoul)])
Spark's filters are converted into Druid's DimFilter
private def toDruidDimFilters(sparkFilter: Filter): DimFilter = {
sparkFilter match {
...
case EqualTo(attribute, value) => {
new SelectorDimFilter(
attribute,
value.toString,
null
)
case GreaterThan(attribute, value) => ...
Spark Druid Connector – Attach locality to RACK_LOCAL
• getPreferredLocations(partition: Partition)
• Returns Hosts having Druid Segments
• Caution: Spark does not always guarantee that executors launch on preferred locations
• Set spark.locality.wait to very large value
Spark Druid Connector - How to implement
Done!
Now Spark executor can read records from Druid segment files.
Segment
File
Spark Druid
Connector
Spark
TopN Query
TopN Query
1. How TopN Query works
2. Performance
3. Limitation
TopN Query flow (N=100)
Broker
Historical
Segment Cache
User
TopN Query – We heavily use TopN query
Historical
Segment Cache
Historical
Segment Cache
Client get merged results from
each historical node.
Broker merge each’s results
and make final records.
Each historical node return
local top 100 results
country SUM(duration)
korea 114
uk 47
us 21
country SUM(duration)
uk 67
korea 24
usa 3
country SUM(duration)
korea 87
uk 57
china 33
country SUM(duration)
korea 225
uk 171
china 33
usa 24
country SUM(duration)
korea 225
uk 171
china 33
TopN Query - Example
Top 3 country ORDER BY SUM(duration)
Broker
Top 3 Result
Top 3 of Historical a
Top 3 of Historical b
Top 3 of Historical c
country SUM(duration)
korea 114
uk 47
usa 21
china 17
country SUM(duration)
uk 67
korea 24
usa 3
china 1
country SUM(duration)
korea 87
uk 57
usa 22
china 33
country SUM(duration)
korea 225
uk 171
china 33
Missing!
TopN – is an approximate approach
GroupBy
(Few minutes)
TopN
(1536 ms)
rank metric rank metric
1 1,948,297 1 1,948,297
2 1,404,167 2 1,404,167
3 1,383,538 3 1,383,538
4 1,141,977 4 1,141,977
5 1,099,028 5 1,090,277
6 1,090,277 6 1,079,242
7 1,051,448 7 1,051,448
8 996,961 8 996,961
9 941,284 9 941,284
10 937,078 10 937,078
100x Faster!
TopN – 100x faster than GroupBy
1. rank changed
rank 5 → rank 6
2. value changed
1,099,028 → 1,079,242
TopN – Limitations
1. TopN only has one dimension.
2. Unstable result when replication factor is larger than 2.
Plywood
1. Plywood
2. Split-Apply-Combine
3. Our Improvement
1. https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf
2. http://plywood.imply.io/index
// Split [ country, city, device ]
ply()
.apply(dataSource, $(dataSource).filter(...)) // Filter1
.apply(dataSource, $(dataSource).filter(...)) // Filter2
.apply(dataSource, $(dataSource).filter(...)) // Filter3
.apply('country', $(dataSource).split(...)
.apply(...) // Filter to Split1 (country)
.apply('city', $(dataSource).split(...)
.apply(...) // Filter to Split2 (city)
.apply(...) // Filter to Split2 (city)
.apply('device', $(dataSource).split(...)
.apply(...) // Filter to Split3 (device)
)
)
)
SELECT country, city, device
FROM $TABLE
WHERE …
GROUP BY country, city, device
≒
Split Apply Combine - SAC
Before After
Plywood tuning
Throughput (qps, higher is better)
Before
Before After
Tuning Results
Challenge
Same query but the results can be different under 2+ replica factor configuration
Stable TopN - Motivation
Seg_1
Seg_2
Historical 1
Seg_1
Seg_2
Historical 2
Broker
Historical 1 Historical 2
Broker
TopN(Seg_1 + Seg_2) TopN(Seg_2 + Seg_3)
First Result Second Result
Results can be different
!=
Seg_3Seg_3
Seg_1
Seg_2
Seg_3
Seg_2
Seg_3
TopN(Seg_3)
Seg_1
TopN(Seg_1)
Bypass Historical side TopN Merge, do Broker side merge TopN results for each segment by it’s ID
order
by_segment patch
Broker Broker
First Result Second Result
Always identical
==
Seg_1
Seg_2
Historical 1
Seg_1
Seg_2
Historical 2 Historical 1 Historical 2
TopN(Seg_1) + TopN(Seg_2) TopN(Seg_2) + TopN(Seg_3)
Seg_3Seg_3
Seg_1
Seg_2
Seg_3
Seg_2
Seg_3
TopN(Seg_3)
Seg_1
TopN(Seg_1)
Navis @ SK TelecomEns @ Naver
Special Thanks
Thank you!
Appendix
• 10 Broker Nodes
• 40 Historical Nodes
• 2 MiddleManager & Overlord Nodes
• 2 Coordinator Nodes
• 10 Yarn & HDFS Nodes for Batch Ingestion
• Spark Standalone Cluster runs on Historical Nodes
• for Locality
Druid Deploy & Configuration (1)
• Druid version : 0.11
• H/W Spec for Broker & Historical
• CPU: 40 cores (w/ hyperthread)
• RAM: 128GB
• HDD: SSD w/ RAID 5
• Memory Configuration
Configuration Value for Broker Value for Historical
-Xmx 20GB 12GB
-XX:MaxDirectMemorySize 30GB 45GB
druid.processing.numMergeBuffers 10 20
druid.processing.numThreads 20 30
druid.processing.buffer.sizeBytes 512MB 800MB
druid.cache.sizeInBytes 0 5GB
druid.server.http.numThreads 40 40
Druid Deploy & Configuration (2)
Use Yarn External Resource for Batch Ingestion
"tuningConfig": {
"type": "hadoop",
"jobProperties": {
"yarn.resourcemanager.hostname" : "host1.com",
"yarn.resourcemanager.address" : "host1.com:8032",
"yarn.resourcemanager.scheduler.address": "host1.com:8030",
"yarn.resourcemanager.webapp.address": "host1.com:8088",
"yarn.resourcemanager.resource-tracker.address": "host1.com:8031",
"yarn.resourcemanager.admin.address": "host1.com:8033"
}
}
Ingest Spec for External Yarn and HDFS
Use External HDFS for intermediate MR output
"tuningConfig": {
"type": "hadoop",
"jobProperties": {
"fs.defaultFS": "hdfs://DEFAULT_FS:8020",
"dfs.namenode.http-address": "NAMENODE:50070",
"dfs.namenode.https-address": "NAMENODE:50470",
"dfs.namenode.servicerpc-address": "NAMENODE:8022"
}
}
Ingest Spec for External Yarn and HDFS
Lambda Architecture with Two Databases
https://en.wikipedia.org/wiki/Lambda_architecture
Lambda Architecture with Druid
https://www.slideshare.net/gianmerlino/druid-at-sf-big-analytics-
2015-1201
Why Druid? – Simple Lambda Architecture
How
Kafka
Indexing
Service
https://github.com/knoguchi/cm-druid
Druid on CDH
Extending Druid Query
1. Accumulated Metric in TopN
2. Stable TopN Result
Row stream
Query
Second Query
Historical
Result
Result
Extending Druid Query
Client
Broker
Historical
Cursor
Aggregation
Row
Row
Row
Row
Row
Extending Druid Query - Motivation
2 queries are needed to make following table
1. Total 3 times TopN query for 3 countries
2. Aggregation query for total duration
Country SUM(duration) Ratio over total duration
korea 225 20%
uk 171 15.2%
usa 33 2.9%
Can we do it at once?
Extending Druid Query - Background
Yes we can!
Just do TopN operation and SUM operation simultaneously!
country SUM(duration)
korea 114
china 17
usa 21
uk 47
country duration
korea 100
korea 14
uk 40
uk 7
usa 21
china 17
Segment Data
Aggregated in map structure
country SUM(duration)
korea 114
uk 47
usa 21
Final records
Total duration equals
sum of all metric values!
{
"queryType": "topN",
...
"metric": "edits",
"accMetrics": ["edits"],
...
}
{
...
"edits": 33,
"__acc_edits": 1234
...
}
User Request
Druid Response
Extending Druid Query in TopN
Broker
Historical
Cursor
TopN
Aggregation
Row TopN Queue
Count Metric
We customized Druid to calculate
total edits and metric at once!
Row
Row
Row
Row
Row
Huge intermediate files with MapReduce
• Druid's default Batch Ingestion use MapReduce
• To ingest 1.4GB Parquet file (Single Dim. Partition)
• Read: 16.6GB
• Write: 20.5GB
• Total: 41.1GB
Druid Spark Batch
We modified Original Druid Spark Batch
• https://github.com/metamx/druid-spark-batch
• Original version of Druid Spark Batch from Metamarket (creator of Druid)
• We added some features
• Parquet input
• Single Dimension Partition
• Query Granularity
• Same Ingest spec with Druid MapReduce Batch
Druid Spark Batch
37.1
7
0
5
10
15
20
25
30
35
40
MapReduce Spark
Disk Read, Write
759
2260
0
500
1000
1500
2000
2500
MapReduce Spark
Ingest time
(Single Dim Partition)
(3 Segments, 430MB each)
333
376
0
50
100
150
200
250
300
350
400
MapReduce Spark
Ingest time
(Single Dim Partition)
(11 Segments, 135MB each)
Druid Spark Batch
GB, lower is better Seconds, lower is better Seconds, lower is better

More Related Content

What's hot

Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
Roopa Tangirala
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
Wes McKinney
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
Jacques Nadeau
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 

What's hot (20)

Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 

Similar to Web analytics at scale with Druid at naver.com

Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
Jungsu Heo
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
tieleman
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .Net
Neo4j
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
Sean Forgatch
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
jaxLondonConference
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
Apache Geode
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
jixuan1989
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
Alexander Tokarev
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
Daniel Coupal
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
Spark etl
Spark etlSpark etl
Spark etl
Imran Rashid
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
SATOSHI TAGOMORI
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druid
Suman Banerjee
 

Similar to Web analytics at scale with Druid at naver.com (20)

Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .Net
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Spark etl
Spark etlSpark etl
Spark etl
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Understanding apache-druid
Understanding apache-druidUnderstanding apache-druid
Understanding apache-druid
 

Recently uploaded

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 

Recently uploaded (20)

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 

Web analytics at scale with Druid at naver.com

  • 1. Web analytics at scale with Druid at naver.com Jason Heo (analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com)
  • 2. • Part 1 • About naver.com • What is & Why Druid • The Architecture of our service • Part 2 • Druid Segment File Structure • Spark Druid Connector • TopN Query • Plywood & Split-Apply-Combine • How to fix TopN’s unstable results • Appendix Agenda
  • 3. About naver.com https://en.wikipedia.org/wiki/Naver • naver.com • The biggest website in South Korea • The Google of South Korea • 74.7% of all web searches in South Korea
  • 4. • Developed Analytics Systems at Naver • Working with Databases since 2000 • Author of 3 MySQL books • Currently Elasticsearch, Spark, Kudu, and Druid • Working on Spark and Druid-based OLAP platform • Implemented search infrastructure at coupang.com • Have been interested in MPP and advanced file formats for big data Jason Heo Doo Yong Kim About Speakers
  • 5. Platforms we've tested so far Parquet ORC Carbon Data Elasticsearch ClickHouse Kudu Druid SparkSQL Hive Impala Drill Presto Kylin Phoenix Query Engine Storage Format
  • 6. • What is Druid? • Our Requirements • Why Druid? • Experimental Results What is & Why Druid
  • 7. • Column-oriented distributed datastore • Real-time streaming ingestion • Scalable to petabytes of data • Approximate algorithms (hyperLogLog, theta sketch) https://www.slideshare.net/HadoopSummit/scalable- realtime-analytics-using-druid From HORTONWORKS What is Druid?
  • 8. From my point of view • Druid is a cumbersome version of Elasticsearch (w/o search feature) • Similar points • Secondary Index • DSLs for query • Flow of Query Processing • Terms Aggregation ↔ TopN Query, Coordinator ↔ Broker, Data Node ↔ Historical • Different points • more complicated to operate • better with much more data • better for Ultra High Cardinality • less GC overhead • better for Spark Connectivity (for Full Scan) What is Druid?
  • 9. Real-time Node Historical BrokerOverlord Middle Manager Coordinator Kafka Index Service Segment management What is Druid? - Architecture MySQL metadata Zookeeper cluster mgmt. Deep Storage (HDFS, S3) stores Druid segments for durability Query Service Clients Druid DSL Segments download Segments for query
  • 10. Real-time Node Historical Broker { "queryType": "groupBy", "dataSource": "sample_data", "dimension": ["country", "device"], "filter": {}, "aggregation": [...], "limitSpec": [...] } { "queryType": "topN", "dataSource": "sample_data", "dimension": "sample_dim", "filter": {...} "aggregation": [...], "threshold": 5 } SELECT ... FROM dataSource What is Druid? - Queries • SQLs can be converted to Druid DSL • No JOIN
  • 11. SELECT COUNT(*) FROM logs WHERE url = ?; 1. Random Access (OLTP) SELECT url, COUNT(*) FROM logs GROUP BY url ORDER BY COUNT(*) DESC LIMIT 10; 2. Most Viewed SELECT visitor, COUNT(*) FROM logs GROUP BY visitor; 3. Full Aggregation SELECT ... FROM logs INNER JOIN users GROUP BY ... HAVING ... 4. JOIN Why Druid? - Requirements
  • 12. • Supports Bitmap Index • Fast Random Access Perfect solution for OLTP and OLAP For OLTP • Supports TopN Query • 100x times faster than GroupBy query • Supports Complex Queries • JOIN, HAVING, etc • with our Spark Druid Connector For OLAP Why Druid? ★★★★☆1. Random Access ★★★★☆3. Full Aggregation ★★★★★2. Most Viewed ★★★★☆4. JOIN
  • 13. • Fast Random Access • Terms Aggregation • TopN Query • Easy to manage Pros Cons • Slow full scan with es-hadoop • Low Performance for multi-field terms aggregation (esp. High Cardinality) • GC Overhead Comparison – ElasticSearch 1. Random Access ★★★★★ 3. Full Aggregation ☆☆☆☆☆ 2. Most Viewed ★★★☆☆ 4. JOIN ☆☆☆☆☆
  • 14. • Fast Random Access via Primary Key • Fast OLAP with Impala Pros • No Secondary Index • No TopN Query Cons Comparison – Kudu + Impala ★★★★★ (PK) ★☆☆☆☆ (non-PK) 1. Random Access ★★★★★3. Full Aggregation ☆☆☆☆☆2. Most Viewed ★★★★★4. JOIN
  • 15. Random Access Most Viewed 0.25 0.35 0.08 2.7 2.9 0.78 0 0.5 1 1.5 2 2.5 3 3.5 Elasticesarch Kudu+Impala Druid 1 Field 2 Fields 0.003 0.14 0.03 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Elastisearch Kudu+Impala Druid Experimental Results – Response Time sec sec
  • 16. Experimental Results – Notes • ES: Lucene Index • Kudu+Impala: Primary Key • Druid: Bitmap Index Random Access • ES: Terms Aggregation • Kudu+Implala: Group By • Druid: TopN • Split-Apply-Combine for Multi Fields Most Viewed • 210 mil. rows • same parallelism • same number of shards/partitions/segments Data Sets
  • 17. Logs The Architecture of our service Zeppelin Plywood Druid DSL Coordinator Overlord Middle Manager Peon Spark Thrift Server Batch Ingestion Parquet Kafka Run daily batch job API Server Historical Spark Executor Segments File Broker Druid SparkSQL Kafka Indexing Service Kafka transform logs Parquet remove duplicated logs Real-time Ingestion
  • 19. Introduction – Who am I? 1. Doo Yong Kim 2. Naver 3. Software engineer 4. Big data
  • 20. Contents 1. Druid Storage Model 2. Spark Druid Connector Implementation 3. TopN Query 4. Plywood & Split-Combine-Apply 5. Extending Druid Query
  • 21. Druid Storage Model – 4 characteristics • Columnar format • Explicit distinguishes between dimension, metric • Bitmap index • Dictionary encoded
  • 22. Druid Storage Model - background Druid treats dimension and metric separately. Dimension Metric • Bitmap Index • GroupBy Fields • Argument of Aggregate Function { "dimensionsSpec": { "dimensions": ["country", "device", ...] }, ... "metricsSpec": [ { "type": "count", "name": "count" }, { "type": "doubleSum", "fieldName": "duration", "name": "duration" } ] } Druid Ingestion Spec
  • 23. Druid Storage Model- Dimension Country (Dimension) Korea UK Korea Korea Korea UK Korea ↔ 0 UK ↔ 1 Dictionary for country UK appears in 2nd, 6th rows Korea → 101110 UK → 010001 Bitmap for Korea 0 1 0 0 0 1 Dictionary Encoded Values
  • 24. Druid Storage Model - Metric 13 2 15 29 30 14 Country (Dimension) duration (Metric) Korea 13 UK 2 Korea 15 Korea 29 Korea 30 UK 14
  • 25. Row Filter it manually device LIKE 'Iphone%' Druid Storage Model Bitmapcountry Filtering Bitmapdevice Filtering duration Filtering Filter by bitmap country = 'Korea' ('Korea', 'Iphone 6s', 13) SELECT country, device, duration FROM logs WHERE country = 'Korea' AND device LIKE 'Iphone%'
  • 27. Spark Druid Connector 1. 3 Ways to implement, Our implementation 2. What is needed to implement 3. Sample Codes, Performance Test 4. How to implement
  • 28. Spark Druid Connector - 3 Ways to implement Druid Broker Spark Driver DSLSQL Druid Historical Spark Driver SQL Spark Executor • Good if SQL is rewritable to DSL • But DSL does not support all SQL • Ex: JOIN, sub-query • Easy to implement • No need to understand Druid Index Library • Ser/de operation is expensive • Parallelism is bounded to no. of Historical Select DSL Large JSON 1st way 2nd way
  • 29. Spark Druid Connector - 3 Ways to implement Spark Driver SQL • Read Druid segment files directly. • Similar to the way of reading Parquet • Difficult to implement • Need to understand Druid segment library 3rd way Executor Segment File Reads segments using Druid Library Allocate Spark executor into Historical Node We chose this way!
  • 30. spark.read .format("com.navercorp.ni.druid.spark.druid") .option("coordinator", "host1.com:18081") .option("broker", "host2.com:18082") .option("datasource", "logs").load() .createOrReplaceTempView("logs") Spark Druid Connector – How to use spark.sql(""" SELECT country, device, duration FROM logs WHERE country = 'Korea' AND device LIKE 'Iphone%' """).show(false) Create table Execute Query
  • 31. Total 4.4B rows 0.21 7.5 0 1 2 3 4 5 6 7 8 Spark Druid Spark Parquet Random Access 24.1 7.7 0 5 10 15 20 25 30 Spark Druid Spark Parquet Full Scan & GROUP BY Spark Druid Connector - Performance Seconds, lower is better
  • 32. Spark Druid Connector – How to implement
  • 33. Spark Druid Connector – How to implement 1. Druid Rest API 2. Druid Segment Library 3. Spark Data Source API
  • 34. Spark Druid Connector – Get table schema Spark Driver Druid Broker { "queryType": "segmentMetaData", "dataSource": "logs", "merge": "true" } { "columns": { "__time": {...}, "country": {...}, "device": {...}, "duration": {...} ... } spark.read .format("...") .option("coordinator", "...") .option("broker", "...") .option("datasource", "logs") .load() Schema
  • 35. Spark Druid Connector – Partition pruning WHERE country = 'Korea' AND_time = CAST('2018-05-23' AS TIMESTAMP) Segments can be pruned by interval condition and single dimension partition 1. Interval condition serverview returns only matched segments 2. Single dimension partition compare start and end with given filter Spark Driver Druid Coordinator GET /.../logs/intervals/2018-05-23/serverview [ { "segment": { "shardSpec": { "dimension": "country", "start": "null", "end": "b" ...}, "id": "segmentId" }, "servers": [ {"host": "host1"}, {"host": "host2"} ] }, { "segment": ...}, ... }
  • 36. Spark Druid Connector – Spark filters to Druid filters WHERE country = 'Korea' AND city = 'Seoul' buildScan(requiredColumns: [country, device, duration], filters: [EqualTo(country, Korea), EqualTo(city, Seoul)]) Spark's filters are converted into Druid's DimFilter private def toDruidDimFilters(sparkFilter: Filter): DimFilter = { sparkFilter match { ... case EqualTo(attribute, value) => { new SelectorDimFilter( attribute, value.toString, null ) case GreaterThan(attribute, value) => ...
  • 37. Spark Druid Connector – Attach locality to RACK_LOCAL • getPreferredLocations(partition: Partition) • Returns Hosts having Druid Segments • Caution: Spark does not always guarantee that executors launch on preferred locations • Set spark.locality.wait to very large value
  • 38. Spark Druid Connector - How to implement Done! Now Spark executor can read records from Druid segment files. Segment File Spark Druid Connector Spark
  • 40. TopN Query 1. How TopN Query works 2. Performance 3. Limitation
  • 41. TopN Query flow (N=100) Broker Historical Segment Cache User TopN Query – We heavily use TopN query Historical Segment Cache Historical Segment Cache Client get merged results from each historical node. Broker merge each’s results and make final records. Each historical node return local top 100 results
  • 42. country SUM(duration) korea 114 uk 47 us 21 country SUM(duration) uk 67 korea 24 usa 3 country SUM(duration) korea 87 uk 57 china 33 country SUM(duration) korea 225 uk 171 china 33 usa 24 country SUM(duration) korea 225 uk 171 china 33 TopN Query - Example Top 3 country ORDER BY SUM(duration) Broker Top 3 Result Top 3 of Historical a Top 3 of Historical b Top 3 of Historical c
  • 43. country SUM(duration) korea 114 uk 47 usa 21 china 17 country SUM(duration) uk 67 korea 24 usa 3 china 1 country SUM(duration) korea 87 uk 57 usa 22 china 33 country SUM(duration) korea 225 uk 171 china 33 Missing! TopN – is an approximate approach
  • 44. GroupBy (Few minutes) TopN (1536 ms) rank metric rank metric 1 1,948,297 1 1,948,297 2 1,404,167 2 1,404,167 3 1,383,538 3 1,383,538 4 1,141,977 4 1,141,977 5 1,099,028 5 1,090,277 6 1,090,277 6 1,079,242 7 1,051,448 7 1,051,448 8 996,961 8 996,961 9 941,284 9 941,284 10 937,078 10 937,078 100x Faster! TopN – 100x faster than GroupBy 1. rank changed rank 5 → rank 6 2. value changed 1,099,028 → 1,079,242
  • 45. TopN – Limitations 1. TopN only has one dimension. 2. Unstable result when replication factor is larger than 2.
  • 47. 1. https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf 2. http://plywood.imply.io/index // Split [ country, city, device ] ply() .apply(dataSource, $(dataSource).filter(...)) // Filter1 .apply(dataSource, $(dataSource).filter(...)) // Filter2 .apply(dataSource, $(dataSource).filter(...)) // Filter3 .apply('country', $(dataSource).split(...) .apply(...) // Filter to Split1 (country) .apply('city', $(dataSource).split(...) .apply(...) // Filter to Split2 (city) .apply(...) // Filter to Split2 (city) .apply('device', $(dataSource).split(...) .apply(...) // Filter to Split3 (device) ) ) ) SELECT country, city, device FROM $TABLE WHERE … GROUP BY country, city, device ≒ Split Apply Combine - SAC
  • 49. Throughput (qps, higher is better) Before Before After Tuning Results
  • 51. Same query but the results can be different under 2+ replica factor configuration Stable TopN - Motivation Seg_1 Seg_2 Historical 1 Seg_1 Seg_2 Historical 2 Broker Historical 1 Historical 2 Broker TopN(Seg_1 + Seg_2) TopN(Seg_2 + Seg_3) First Result Second Result Results can be different != Seg_3Seg_3 Seg_1 Seg_2 Seg_3 Seg_2 Seg_3 TopN(Seg_3) Seg_1 TopN(Seg_1)
  • 52. Bypass Historical side TopN Merge, do Broker side merge TopN results for each segment by it’s ID order by_segment patch Broker Broker First Result Second Result Always identical == Seg_1 Seg_2 Historical 1 Seg_1 Seg_2 Historical 2 Historical 1 Historical 2 TopN(Seg_1) + TopN(Seg_2) TopN(Seg_2) + TopN(Seg_3) Seg_3Seg_3 Seg_1 Seg_2 Seg_3 Seg_2 Seg_3 TopN(Seg_3) Seg_1 TopN(Seg_1)
  • 53. Navis @ SK TelecomEns @ Naver Special Thanks
  • 56. • 10 Broker Nodes • 40 Historical Nodes • 2 MiddleManager & Overlord Nodes • 2 Coordinator Nodes • 10 Yarn & HDFS Nodes for Batch Ingestion • Spark Standalone Cluster runs on Historical Nodes • for Locality Druid Deploy & Configuration (1)
  • 57. • Druid version : 0.11 • H/W Spec for Broker & Historical • CPU: 40 cores (w/ hyperthread) • RAM: 128GB • HDD: SSD w/ RAID 5 • Memory Configuration Configuration Value for Broker Value for Historical -Xmx 20GB 12GB -XX:MaxDirectMemorySize 30GB 45GB druid.processing.numMergeBuffers 10 20 druid.processing.numThreads 20 30 druid.processing.buffer.sizeBytes 512MB 800MB druid.cache.sizeInBytes 0 5GB druid.server.http.numThreads 40 40 Druid Deploy & Configuration (2)
  • 58. Use Yarn External Resource for Batch Ingestion "tuningConfig": { "type": "hadoop", "jobProperties": { "yarn.resourcemanager.hostname" : "host1.com", "yarn.resourcemanager.address" : "host1.com:8032", "yarn.resourcemanager.scheduler.address": "host1.com:8030", "yarn.resourcemanager.webapp.address": "host1.com:8088", "yarn.resourcemanager.resource-tracker.address": "host1.com:8031", "yarn.resourcemanager.admin.address": "host1.com:8033" } } Ingest Spec for External Yarn and HDFS
  • 59. Use External HDFS for intermediate MR output "tuningConfig": { "type": "hadoop", "jobProperties": { "fs.defaultFS": "hdfs://DEFAULT_FS:8020", "dfs.namenode.http-address": "NAMENODE:50070", "dfs.namenode.https-address": "NAMENODE:50470", "dfs.namenode.servicerpc-address": "NAMENODE:8022" } } Ingest Spec for External Yarn and HDFS
  • 60. Lambda Architecture with Two Databases https://en.wikipedia.org/wiki/Lambda_architecture Lambda Architecture with Druid https://www.slideshare.net/gianmerlino/druid-at-sf-big-analytics- 2015-1201 Why Druid? – Simple Lambda Architecture
  • 63. Extending Druid Query 1. Accumulated Metric in TopN 2. Stable TopN Result
  • 64. Row stream Query Second Query Historical Result Result Extending Druid Query Client Broker Historical Cursor Aggregation Row Row Row Row Row
  • 65. Extending Druid Query - Motivation 2 queries are needed to make following table 1. Total 3 times TopN query for 3 countries 2. Aggregation query for total duration Country SUM(duration) Ratio over total duration korea 225 20% uk 171 15.2% usa 33 2.9% Can we do it at once?
  • 66. Extending Druid Query - Background Yes we can! Just do TopN operation and SUM operation simultaneously! country SUM(duration) korea 114 china 17 usa 21 uk 47 country duration korea 100 korea 14 uk 40 uk 7 usa 21 china 17 Segment Data Aggregated in map structure country SUM(duration) korea 114 uk 47 usa 21 Final records Total duration equals sum of all metric values!
  • 67. { "queryType": "topN", ... "metric": "edits", "accMetrics": ["edits"], ... } { ... "edits": 33, "__acc_edits": 1234 ... } User Request Druid Response Extending Druid Query in TopN Broker Historical Cursor TopN Aggregation Row TopN Queue Count Metric We customized Druid to calculate total edits and metric at once! Row Row Row Row Row
  • 68. Huge intermediate files with MapReduce • Druid's default Batch Ingestion use MapReduce • To ingest 1.4GB Parquet file (Single Dim. Partition) • Read: 16.6GB • Write: 20.5GB • Total: 41.1GB Druid Spark Batch
  • 69. We modified Original Druid Spark Batch • https://github.com/metamx/druid-spark-batch • Original version of Druid Spark Batch from Metamarket (creator of Druid) • We added some features • Parquet input • Single Dimension Partition • Query Granularity • Same Ingest spec with Druid MapReduce Batch Druid Spark Batch
  • 70. 37.1 7 0 5 10 15 20 25 30 35 40 MapReduce Spark Disk Read, Write 759 2260 0 500 1000 1500 2000 2500 MapReduce Spark Ingest time (Single Dim Partition) (3 Segments, 430MB each) 333 376 0 50 100 150 200 250 300 350 400 MapReduce Spark Ingest time (Single Dim Partition) (11 Segments, 135MB each) Druid Spark Batch GB, lower is better Seconds, lower is better Seconds, lower is better