Benchmarking Apache Druid
July 16, 2020
1
Matt Sarrel (matt.sarrel@imply.io)
Developer Evangelist
2
Agenda:
1. Intro
2. Why Benchmark?
3. Star Schema Benchmark
4. What We Did
5. DIY Druid Benchmarking
Imply Overview
3
Founded by the creators of Apache Druid
Funded by Tier 1 investors
Trusted by innovative enterprises
Best-in-class revenue growth
41x
ARR
growth in
3 years
Leading contributor to Druid
Open core
Imply’s open engine, Druid, is becoming a standard part of modern data infrastructure.
Druid
● Next generation analytics engine
● Widely adopted
Workflow transformation
● Subsecond speed unlocks new workflows
● Self-service explanations of data patterns
● Make data fun again
4
Core Design
● Real-time ingestion
● Flexible schema
● Full text search
● Batch ingestion
● Efficient storage
● Fast analytic queries
● Optimized storage for
time-based datasets
● Time-based functions
SEARCH PLATFORM TIME SERIES DB OLAP
Key features
● Column oriented
● High concurrency
● Scalable to 1000s of servers, millions of messages/sec
● Continuous, real-time ingest
● Query through SQL
● Target query latency sub-second to a few seconds
6
Druid in Data Pipeline
Data lakes
Message buses
Raw data Staging (and Processing) Analytics Database End User Application
clicks, ad impressions
network telemetry
application events
Druid Architecture
Pick your servers
Data NodesD
● Large-ish
● Scales with size of data and query volume
● Lots of cores, lots of memory, fast NVMe
disk
Query NodesQ
● Medium-ish
● Scales with concurrency and # of Data
nodes
● Typically CPU bound
Master NodesM
● Small-ish Nodes
● Coordinator scales with # of segments
● Overlord scales with # of supervisors and
tasks
Test Configs
Data NodesD
● 3 i3.2xlarge (8CPU / 61GB RAM / 1.9TB
NVMe SSD storage)
Query NodesQ ● 2 m5d.large (2 CPU / 8GB RAM)
Master NodesM ● 1 m5.large (2 CPU / 8GB RAM)
Streaming Ingestion
Method Kafka Kinesis Tranquility
Supervisor
type
kafka kinesis N/A
How it works
Druid reads
directly from
Apache Kafka.
Druid reads directly
from Amazon
Kinesis.
Tranquility, a library that ships separately
from Druid, is used to push data into Druid.
Can ingest
late data?
Yes Yes
No (late data is dropped based on the
windowPeriod config)
Exactly-once
guarantees?
Yes Yes No
Batch Ingestion
Method Native batch (simple) Native batch (parallel) Hadoop-based
Parallel? No. Each task is single-threaded.
Yes, if firehose is splittable and
maxNumConcurrentSubTasks > 1 in
tuningConfig. See firehose
documentation for details.
Yes, always.
Can append or
overwrite?
Yes, both. Yes, both. Overwrite only.
File formats
Text file formats (CSV, TSV,
JSON).
Text file formats (CSV, TSV, JSON).
Any Hadoop
InputFormat.
Rollup modes
Perfect if forceGuaranteedRollup =
true in the tuningConfig.
Perfect if forceGuaranteedRollup =
true in the tuningConfig.
Always perfect.
Partitioning
options
Hash-based partitioning is
supported when
forceGuaranteedRollup = true in
the tuningConfig.
Hash-based partitioning (when
forceGuaranteedRollup = true).
Hash-based or range-
based partitioning via
partitionsSpec.
Is Druid Right For My Project?
● Timestamp dimension
● Streaming
● Denormalized
● Many attributes (30+ dimensions)
● High cardinality
Data Characteristics
● Large dataset
● Fast query response (<1s)
● Low latency data ingestion
● Interactive, ad-hoc queries
● Arbitrary slicing and dicing (OLAP)
● Query real-time & historical data
● Infrequent updates
Use Case Characteristics
Long Term Benchmark Plan
● Loosely follow the enterprise digital transformation journey
● Using widely accepted benchmarks, characterize query
performance on batched data
● Using widely accepted data sets benchmarks, characterize
streaming data ingestion and query performance
● Fully characterize ingestion with respect to timing and storage
● Develop the Streaming OLAP Benchmark the world needs
Druid and Data Warehouses
● Druid is not a DW
● Druid augments DW to provide the following
○ consistent, sub-second SLA
○ pre-aggregation/metrics generation upon ingest
○ simple schema
○ high concurrency reads
● Hot and warm queries in Druid, cold queries in DW
● Druid for internal and external customer powering realtime
visualization
● DW for internal customer
Confidential. Do not redistribute.
Realtime DW Solution Architecture
16
Apps
Storage
Machines
Events Stream > Parse > Search > Detect >
Correlate
Custom Dashboard
Notify
ETL ML
Block Control Permit
Allow Prohibit Custom
Data centers
managed/unmanaged
Confidential. Do not redistribute.
Logical Test Architecture
17
Star Schema Benchmark
● Designed to evaluate database system performance of star
schema data warehouse queries
● Based on TPC-H
● Widely used since 2007
● Combines standard generated test data with 13 SQL queries
● https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
Star Schema Benchmark Data Generation
● DBGEN utility
● Generates
● Fact table – lineorder.tbl
● Dimension tables –
customer.tbl, part.tbl,
supplier.tbl, date.tbl
● Scale Factor (SF=1) to
generate 600 million rows
or roughly 100GB
SSB ETL and Ingestion
● TBL files are tab delimited
● Generate on EBS, store on S3
● Amazon Athena (Apache Hive) used to denormalize 5 files into
one
● Saved in ORC and parquet formats for flexibility (ORC tested in
Druid)
How data is structured
● Druid stores data in immutable segments
● Column-oriented compressed format
● Dictionary-encoded at column level
● Bitmap Index Compression : concise & roaring
○ Roaring -typically recommended, faster for boolean operations such
as filters
● Rollup (partial aggregation)
Optimize segment size
Ideally 300 - 700 mb (~ 5 million rows)
To control segment size
● Alter segment granularity
● Specify partition spec
● Use Automatic Compaction
Controlling Segment Size
● Segment Granularity - Increase if only 1 file per segment and <
200MB
"segmentGranularity": "HOUR"
● Max Rows Per Segment - Increase if a single segment is <
200MB
"maxRowsPerSegment": 5000000
Partitioning beyond time
● Druid always partitions by time
● Decide which dimension to
partition on… next
● Partition by some dimension you
often filter on
● Improves locality, compression,
storage size, query performance
Ingestion (and the 5 million rows)
Run Rules
We ran JMeter against each platform’s HTTP API under the following conditions:
Query cache off
Each SSB query was run 10 times (10 samples per query)
Each query flight consisted of all 13 SSB queries run in succession
For each test, Average Response Time, Lowest Response Time, Highest Response
Time, and Average Response Time Standard Deviation per query were calculated
Each test was repeated five times
The lowest and highest test results were discarded, a standard practice to remove
outliers from performance testing results, leaving results from 3 test runs
The remaining 3 results for each query were averaged to provide results for
Average Response Time, Lowest Response Time, Highest Response Time, and
Average Response Time Standard Deviation per query were calculated
Star Schema Benchmark Queries
● Designed to be around classic DW use cases
● Select from table exactly once
● Restrictions on dimensions
● Druid supports native and SQL queries
13 Queries in Plain English
Query Flight 1 has restrictions on 1 dimension and measures revenue increase from eliminating ranges of discounts in given product order quantity intervals shipped in a given year.
Q1.1 has restrictions d_year = 1993, lo_quantity < 25, and lo_discount between 1 and 3.
Q1.2 changes restrictions of Q1.1 to d_yearmonthnum = 199401, lo_quantity between 26 and 35, lo_discount between 4 and 6.
Q1.3 changes the restrictions to d_weeknuminyear = 6 and d_year= 1994, lo_quantity between 36 and 40, and lo_discount between 5 and 7
Query flight 2 has restictions on 2 dimensions. The query compares revenues for certain product classes and suppliers in a certain region, grouped by more restrictive product classes and all years of orders.
2.1 has restrictions on p_category and s_region.
2.2 changes restrictions of Q2.1 to p_brand1 between 'MFGR#2221' and 'MFGR#2228' and s_regrion to 'ASIA'
2.3 changes restriction to p_brand1='MFGR#2339' and s_region='EUROPE'
Query flight 3, has restrictions on 3 dimensions. The query is intended to retrieve total revenue for lineorder transactions within and given region in a certain time period, grouped by customer nation, supplier
nation and year.
Q3.1 has restriction c_region = 'ASIA', s_region='ASIA', and restricts d_year to a 6-year period, grouped by c_nation, s_nation and d_year
3.2 changes region restrictions to c_nation = ""UNITED STATES' and s_nation = 'UNITED STATES', grouping revenue by customer city, supplier city and year.
3.3 changes restrictions to c_city and s_city to two cities in 'UNITED KINGDOM' and retrieves revenue grouped by c_city, s_city, d_year.
3.4 changes date restriction to a single month. After partitioning the 12 billion row dataset on d_yearmonth, we needed to rewrite the query for d_yearmonthnum
Query flight 4 provides a ""what-if"" sequence of queries that might be generated in an OLAP-style of exploration. Starting with a query with rather weak constraints on three dimensional columns, we retreive
aggregate profit, sum(lo_revenue-lo_supplycost), grouped by d_year and c_nation. Successive queries modify predicate constraints by drilling down to find the source of an anomaly.
Q4.1 restricts c_region and s_region both to 'AMERICA', and p_mfgr to one of two possilities.
Q4.2 utilizes a typical workflow to dig deeper into the results. We pivot away from grouping by s_nation, restrict d_year to 1997 and 1998, and drill down to group by p_category to see where the profit change arises.
Q4.3 digs deeper, restricting s_nation to 'UNITED STATES' and p_category = 'MFGR#14', drilling down to group by s_city (in the USA) and p_brand1 (within p_category 'MFGR#14').
Query Optimization
● Date! Date! Date!
Biggest impacts in
optimization came
from aligning date
as ingested with
anticipated
queries.
● Optimize SQL
expressions
● Vectorize
Query Optimization Stage Query 4.3
SSB (Original) select d_year, s_city, p_brand1, sum(lo_revenue -
lo_supplycost) as profit from denormalized where
s_nation = 'UNITED STATES' and (d_year = 1997
or d_year = 1998) and p_category = 'MFGR#14'
group by d_year, s_city, p_brand1 order by d_year,
s_city, p_brand1
Apache Druid select d_year, s_nation, p_category,
sum(lo_revenue) - sum(lo_supplycost) as profit from
${jmDataSource} where c_region = 'AMERICA' and
s_region = 'AMERICA' and (FLOOR("__time" to
YEAR) = TIME_PARSE('1997-01-
01T00:00:00.000Z') or FLOOR("__time" to YEAR)
= TIME_PARSE('1998-01-01T00:00:00.000Z')) and
(p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group
by d_year, s_nation, p_category order by d_year,
s_nation, p_category
Explain Plan
EXPLAIN PLAN FOR
SELECT d_year, s_city, p_brand1, sum(lo_revenue –
lo_supplycost) as profit
FROM ssb_data
WHERE s_nation = 'UNITED STATES' and (d_year =
1997 or d_year = 1998) and p_category = 'MFGR#14'
GROUP BY d_year, s_city, p_brand1
ORDER BY d_year, s_city, p_brand1
JMETER Config
JMETER Queries
Apache Druid SSB Results
Now Go Do It Yourself!
● Spec out your test project thoroughly
● Representative Data
● Representative Queries
● Install a small cluster (Quickstart)
● Ingest and tune
● Query via console for functional testing
● Install Jmeter (on query server and locally)
● Run queries against the HTTP API (no GUI, query server)
● Change, rerun, measure differences and learn
● The best way to learn is to just do it!
Resources
● Druid.apache.org
● Druid.apache.org/community
● ASF #druid Slack channel
● Jmeter.apache.org
● https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
● https://github.com/lemire/StarSchemaBenchmark
● https://github.com/implydata/benchmark-tools

Benchmarking Apache Druid

  • 1.
    Benchmarking Apache Druid July16, 2020 1 Matt Sarrel (matt.sarrel@imply.io) Developer Evangelist
  • 2.
    2 Agenda: 1. Intro 2. WhyBenchmark? 3. Star Schema Benchmark 4. What We Did 5. DIY Druid Benchmarking
  • 3.
    Imply Overview 3 Founded bythe creators of Apache Druid Funded by Tier 1 investors Trusted by innovative enterprises Best-in-class revenue growth 41x ARR growth in 3 years Leading contributor to Druid
  • 4.
    Open core Imply’s openengine, Druid, is becoming a standard part of modern data infrastructure. Druid ● Next generation analytics engine ● Widely adopted Workflow transformation ● Subsecond speed unlocks new workflows ● Self-service explanations of data patterns ● Make data fun again 4
  • 5.
    Core Design ● Real-timeingestion ● Flexible schema ● Full text search ● Batch ingestion ● Efficient storage ● Fast analytic queries ● Optimized storage for time-based datasets ● Time-based functions SEARCH PLATFORM TIME SERIES DB OLAP
  • 6.
    Key features ● Columnoriented ● High concurrency ● Scalable to 1000s of servers, millions of messages/sec ● Continuous, real-time ingest ● Query through SQL ● Target query latency sub-second to a few seconds 6
  • 7.
    Druid in DataPipeline Data lakes Message buses Raw data Staging (and Processing) Analytics Database End User Application clicks, ad impressions network telemetry application events
  • 8.
  • 9.
    Pick your servers DataNodesD ● Large-ish ● Scales with size of data and query volume ● Lots of cores, lots of memory, fast NVMe disk Query NodesQ ● Medium-ish ● Scales with concurrency and # of Data nodes ● Typically CPU bound Master NodesM ● Small-ish Nodes ● Coordinator scales with # of segments ● Overlord scales with # of supervisors and tasks
  • 10.
    Test Configs Data NodesD ●3 i3.2xlarge (8CPU / 61GB RAM / 1.9TB NVMe SSD storage) Query NodesQ ● 2 m5d.large (2 CPU / 8GB RAM) Master NodesM ● 1 m5.large (2 CPU / 8GB RAM)
  • 11.
    Streaming Ingestion Method KafkaKinesis Tranquility Supervisor type kafka kinesis N/A How it works Druid reads directly from Apache Kafka. Druid reads directly from Amazon Kinesis. Tranquility, a library that ships separately from Druid, is used to push data into Druid. Can ingest late data? Yes Yes No (late data is dropped based on the windowPeriod config) Exactly-once guarantees? Yes Yes No
  • 12.
    Batch Ingestion Method Nativebatch (simple) Native batch (parallel) Hadoop-based Parallel? No. Each task is single-threaded. Yes, if firehose is splittable and maxNumConcurrentSubTasks > 1 in tuningConfig. See firehose documentation for details. Yes, always. Can append or overwrite? Yes, both. Yes, both. Overwrite only. File formats Text file formats (CSV, TSV, JSON). Text file formats (CSV, TSV, JSON). Any Hadoop InputFormat. Rollup modes Perfect if forceGuaranteedRollup = true in the tuningConfig. Perfect if forceGuaranteedRollup = true in the tuningConfig. Always perfect. Partitioning options Hash-based partitioning is supported when forceGuaranteedRollup = true in the tuningConfig. Hash-based partitioning (when forceGuaranteedRollup = true). Hash-based or range- based partitioning via partitionsSpec.
  • 13.
    Is Druid RightFor My Project? ● Timestamp dimension ● Streaming ● Denormalized ● Many attributes (30+ dimensions) ● High cardinality Data Characteristics ● Large dataset ● Fast query response (<1s) ● Low latency data ingestion ● Interactive, ad-hoc queries ● Arbitrary slicing and dicing (OLAP) ● Query real-time & historical data ● Infrequent updates Use Case Characteristics
  • 14.
    Long Term BenchmarkPlan ● Loosely follow the enterprise digital transformation journey ● Using widely accepted benchmarks, characterize query performance on batched data ● Using widely accepted data sets benchmarks, characterize streaming data ingestion and query performance ● Fully characterize ingestion with respect to timing and storage ● Develop the Streaming OLAP Benchmark the world needs
  • 15.
    Druid and DataWarehouses ● Druid is not a DW ● Druid augments DW to provide the following ○ consistent, sub-second SLA ○ pre-aggregation/metrics generation upon ingest ○ simple schema ○ high concurrency reads ● Hot and warm queries in Druid, cold queries in DW ● Druid for internal and external customer powering realtime visualization ● DW for internal customer
  • 16.
    Confidential. Do notredistribute. Realtime DW Solution Architecture 16 Apps Storage Machines Events Stream > Parse > Search > Detect > Correlate Custom Dashboard Notify ETL ML Block Control Permit Allow Prohibit Custom Data centers managed/unmanaged
  • 17.
    Confidential. Do notredistribute. Logical Test Architecture 17
  • 18.
    Star Schema Benchmark ●Designed to evaluate database system performance of star schema data warehouse queries ● Based on TPC-H ● Widely used since 2007 ● Combines standard generated test data with 13 SQL queries ● https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
  • 19.
    Star Schema BenchmarkData Generation ● DBGEN utility ● Generates ● Fact table – lineorder.tbl ● Dimension tables – customer.tbl, part.tbl, supplier.tbl, date.tbl ● Scale Factor (SF=1) to generate 600 million rows or roughly 100GB
  • 20.
    SSB ETL andIngestion ● TBL files are tab delimited ● Generate on EBS, store on S3 ● Amazon Athena (Apache Hive) used to denormalize 5 files into one ● Saved in ORC and parquet formats for flexibility (ORC tested in Druid)
  • 21.
    How data isstructured ● Druid stores data in immutable segments ● Column-oriented compressed format ● Dictionary-encoded at column level ● Bitmap Index Compression : concise & roaring ○ Roaring -typically recommended, faster for boolean operations such as filters ● Rollup (partial aggregation)
  • 22.
    Optimize segment size Ideally300 - 700 mb (~ 5 million rows) To control segment size ● Alter segment granularity ● Specify partition spec ● Use Automatic Compaction
  • 23.
    Controlling Segment Size ●Segment Granularity - Increase if only 1 file per segment and < 200MB "segmentGranularity": "HOUR" ● Max Rows Per Segment - Increase if a single segment is < 200MB "maxRowsPerSegment": 5000000
  • 24.
    Partitioning beyond time ●Druid always partitions by time ● Decide which dimension to partition on… next ● Partition by some dimension you often filter on ● Improves locality, compression, storage size, query performance
  • 25.
    Ingestion (and the5 million rows)
  • 26.
    Run Rules We ranJMeter against each platform’s HTTP API under the following conditions: Query cache off Each SSB query was run 10 times (10 samples per query) Each query flight consisted of all 13 SSB queries run in succession For each test, Average Response Time, Lowest Response Time, Highest Response Time, and Average Response Time Standard Deviation per query were calculated Each test was repeated five times The lowest and highest test results were discarded, a standard practice to remove outliers from performance testing results, leaving results from 3 test runs The remaining 3 results for each query were averaged to provide results for Average Response Time, Lowest Response Time, Highest Response Time, and Average Response Time Standard Deviation per query were calculated
  • 27.
    Star Schema BenchmarkQueries ● Designed to be around classic DW use cases ● Select from table exactly once ● Restrictions on dimensions ● Druid supports native and SQL queries
  • 28.
    13 Queries inPlain English Query Flight 1 has restrictions on 1 dimension and measures revenue increase from eliminating ranges of discounts in given product order quantity intervals shipped in a given year. Q1.1 has restrictions d_year = 1993, lo_quantity < 25, and lo_discount between 1 and 3. Q1.2 changes restrictions of Q1.1 to d_yearmonthnum = 199401, lo_quantity between 26 and 35, lo_discount between 4 and 6. Q1.3 changes the restrictions to d_weeknuminyear = 6 and d_year= 1994, lo_quantity between 36 and 40, and lo_discount between 5 and 7 Query flight 2 has restictions on 2 dimensions. The query compares revenues for certain product classes and suppliers in a certain region, grouped by more restrictive product classes and all years of orders. 2.1 has restrictions on p_category and s_region. 2.2 changes restrictions of Q2.1 to p_brand1 between 'MFGR#2221' and 'MFGR#2228' and s_regrion to 'ASIA' 2.3 changes restriction to p_brand1='MFGR#2339' and s_region='EUROPE' Query flight 3, has restrictions on 3 dimensions. The query is intended to retrieve total revenue for lineorder transactions within and given region in a certain time period, grouped by customer nation, supplier nation and year. Q3.1 has restriction c_region = 'ASIA', s_region='ASIA', and restricts d_year to a 6-year period, grouped by c_nation, s_nation and d_year 3.2 changes region restrictions to c_nation = ""UNITED STATES' and s_nation = 'UNITED STATES', grouping revenue by customer city, supplier city and year. 3.3 changes restrictions to c_city and s_city to two cities in 'UNITED KINGDOM' and retrieves revenue grouped by c_city, s_city, d_year. 3.4 changes date restriction to a single month. After partitioning the 12 billion row dataset on d_yearmonth, we needed to rewrite the query for d_yearmonthnum Query flight 4 provides a ""what-if"" sequence of queries that might be generated in an OLAP-style of exploration. Starting with a query with rather weak constraints on three dimensional columns, we retreive aggregate profit, sum(lo_revenue-lo_supplycost), grouped by d_year and c_nation. Successive queries modify predicate constraints by drilling down to find the source of an anomaly. Q4.1 restricts c_region and s_region both to 'AMERICA', and p_mfgr to one of two possilities. Q4.2 utilizes a typical workflow to dig deeper into the results. We pivot away from grouping by s_nation, restrict d_year to 1997 and 1998, and drill down to group by p_category to see where the profit change arises. Q4.3 digs deeper, restricting s_nation to 'UNITED STATES' and p_category = 'MFGR#14', drilling down to group by s_city (in the USA) and p_brand1 (within p_category 'MFGR#14').
  • 29.
    Query Optimization ● Date!Date! Date! Biggest impacts in optimization came from aligning date as ingested with anticipated queries. ● Optimize SQL expressions ● Vectorize Query Optimization Stage Query 4.3 SSB (Original) select d_year, s_city, p_brand1, sum(lo_revenue - lo_supplycost) as profit from denormalized where s_nation = 'UNITED STATES' and (d_year = 1997 or d_year = 1998) and p_category = 'MFGR#14' group by d_year, s_city, p_brand1 order by d_year, s_city, p_brand1 Apache Druid select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from ${jmDataSource} where c_region = 'AMERICA' and s_region = 'AMERICA' and (FLOOR("__time" to YEAR) = TIME_PARSE('1997-01- 01T00:00:00.000Z') or FLOOR("__time" to YEAR) = TIME_PARSE('1998-01-01T00:00:00.000Z')) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category
  • 30.
    Explain Plan EXPLAIN PLANFOR SELECT d_year, s_city, p_brand1, sum(lo_revenue – lo_supplycost) as profit FROM ssb_data WHERE s_nation = 'UNITED STATES' and (d_year = 1997 or d_year = 1998) and p_category = 'MFGR#14' GROUP BY d_year, s_city, p_brand1 ORDER BY d_year, s_city, p_brand1
  • 31.
  • 32.
  • 33.
  • 34.
    Now Go DoIt Yourself! ● Spec out your test project thoroughly ● Representative Data ● Representative Queries ● Install a small cluster (Quickstart) ● Ingest and tune ● Query via console for functional testing ● Install Jmeter (on query server and locally) ● Run queries against the HTTP API (no GUI, query server) ● Change, rerun, measure differences and learn ● The best way to learn is to just do it!
  • 35.
    Resources ● Druid.apache.org ● Druid.apache.org/community ●ASF #druid Slack channel ● Jmeter.apache.org ● https://www.cs.umb.edu/~poneil/StarSchemaB.PDF ● https://github.com/lemire/StarSchemaBenchmark ● https://github.com/implydata/benchmark-tools