Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Page1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP: Sub-Second Analytical Queries in Hive
Gopal Vijayaraghavan
Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why LLAP?
• People like Hive
• Disk->Mem is getting further away...
Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is LLAP?
• Hybrid model combining daemons and containers fo...
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2.0 (+ LLAP)
• Transparent to Hive users, BI tools, etc.
• ...
Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Industry benchmark – 10Tb scale
0
5
10
15
20
25
30
35
40
45
50
q...
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation from a customer case study
0
20000
1 3 5 7 9 11 13 15...
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation from a customer case study
• Display a large report
E...
Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cut-awaytodemo
(GIF)
Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does LLAP make queries faster?
Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP
Queue
Technical overview – execution
• LLAP daemon has a n...
Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Executor
Technical overview – IO layer
• Optional: when executi...
Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel queries – priorities, preemption
• Lower-priority frag...
Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
I/OExecution
In-memory processing – present and future
Decoder
...
Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
First query erformance
• Cold LLAP is nearly as fast as
shared ...
Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
JIT Performance – heavy use
• Cache disabled!
13.23
7.93 7.7
6....
Page16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel query execution – LLAP vs Hive 1.2
3131
2416
1741
1420...
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel query execution – 10Tb scale
531
291
147
94
75
0
100
2...
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance – cache on HDFS, 1Tb scale
20.2
18.3
17.6
16.2 16.2...
Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP as a “relational” datanode
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example - SparkSQL integration – execution flow
Page 20
HadoopR...
Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring LLAP Queries
Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring
• LLAP exposes a UI for
monitoring
• Also has jmx en...
Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Watching queries – Tez UI integration
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn...
Upcoming SlideShare
Loading in …5
×

LLAP: Sub-Second Analytical Queries in Hive

4,421 views

Published on

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics

Published in: Technology

LLAP: Sub-Second Analytical Queries in Hive

  1. 1. Page1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP: Sub-Second Analytical Queries in Hive Gopal Vijayaraghavan
  2. 2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why LLAP? • People like Hive • Disk->Mem is getting further away – Cloud Storage isn’t co-located – Disks are connected to the CPU via network • Security landscape is changing – Cells & Columns are the new security boundary, not files – Safely masking columns needs a process boundary • Concurrency, Performance & Scale are at conflict – Concurrency at 100k queries/hour – Latencies at 2-5 seconds/query – Petabyte scale warehouses (with terabytes of “hot” data) Node LLAP Process Cache Query Fragment HDFS Query Fragment
  3. 3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is LLAP? • Hybrid model combining daemons and containers for fast, concurrent execution of analytical workloads (e.g. Hive SQL queries) • Concurrent queries without specialized YARN queue setup • Multi-threaded execution of vectorized operator pipelines • Asynchronous IO and efficient in-memory caching • Relational view of the data available thru the API • High performance scans, execution code pushdown • Centralized data security Node LLAP Process Cache Query Fragment HDFS Query Fragment
  4. 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive 2.0 (+ LLAP) • Transparent to Hive users, BI tools, etc. • Hive decides where query fragments run (LLAP, Container, AM) based on configuration, data size, format, etc. • Each Query coordinated independently by a Tez AM • Number of concurrent queries throttled by number of active AMs • Hive Operators used for processing • Tez Runtime used for data transfer HiveServer2 Query/AM Controller Client(s) YARN Cluster AM1 llapd llapd Container AM1 Container AM1 llapd Container AM2 AM2 AM3 llapd
  5. 5. Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Industry benchmark – 10Tb scale 0 5 10 15 20 25 30 35 40 45 50 query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query98 Time(seconds) LLAP vs Hive 1.x 10TB Scale Hive 1.x LLAP90% faster
  6. 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Evaluation from a customer case study 0 20000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 • Aggregate daily statistics for a time interval: SELECT yyyymmdd, sum(total_1), sum(total_2), ... from table where yyyymmdd >= xxx and yyyymmdd < xxx and userid = xxx group by userid, yyyymmdd; Max Max Max Avg Avg Avg 0 1 2 3 4 5 6 7 8 D W M Y D W M Y D W M Y Tez Phoenix LLAP Executiontime,s
  7. 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Evaluation from a customer case study • Display a large report Execution Time in seconds over time range Max Max Avg Avg 0 5 10 15 20 D W M Y D W M Y Tez LLAP Executiontime,s
  8. 8. Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cut-awaytodemo (GIF)
  9. 9. Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does LLAP make queries faster?
  10. 10. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Queue Technical overview – execution • LLAP daemon has a number of executors (think containers) that execute work "fragments" • Fragments are parts of one, or multiple parallel workloads (e.g. Hive SQL queries) • Work queue with pluggable priority • Geared towards low latency queries over long- running queries (by default) • I/O is similar to containers – read/write to HDFS, shuffle, other storages and formats • Streaming output for data API Executor Q1 Map 1 Executor External read Executor Q3 Reducer 3 Q1 Map 1 Q1 Map 1 Q3 Map 19 HDFS Waiting for shuffle inputs HBase Container (shuffle input) Spark executor
  11. 11. Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Executor Technical overview – IO layer • Optional: when executing inside LLAP • All other formats use in-sync mode • Asynchronous IO for Hive • Wraps over InputFormat, reads through cache • Supported with ORC • Transparent, compressed in-memory cache • Format-specific, extensible • NVMe/NVDIMM caches RecordReade r Fragment Cache IO thread Plan & decode Read, decompress Metadata cache Actual data (HDFS, S3, …) What to read Data buffers Splits Vectorized data Indexes
  12. 12. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Parallel queries – priorities, preemption • Lower-priority fragments can be preempted • For example, a fragment can start running before its inputs are ready, for better pipelining; such fragments may be preempted • LLAP work queue examines the DAG parameters to give preference to interactive (BI) queries LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce waiting Time ClusterUtilization Long-Running Query Short-Running Query
  13. 13. Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved I/OExecution In-memory processing – present and future Decoder Distributed FS Fragment Hive operator Hive operator Vectorized processin g col1 col2 Low-level pushdown (e.g. filter)* SSD cache* Off-heap cache Compression codec Native data vectors - work in progress * Compact encoded data Compressed data col1 col2
  14. 14. Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved First query erformance • Cold LLAP is nearly as fast as shared pre-warmed containers (impractical on real clusters) • Realistic (long-running) LLAP ~3x faster than realistic (no prewarm) Tez No prewarm, 20.94 Shared prewarm 11.96 Cold LLAP 13.23 Realistic LLAP 7.49 0 5 10 15 20 25 Firstqueryruntime,s
  15. 15. Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved JIT Performance – heavy use • Cache disabled! 13.23 7.93 7.7 6.29 5.44 5.3 5.01 4.89 4.83 7.49 6.1 4.61 4.59 4.93 4.62 4.63 4.45 4.25 0 2 4 6 8 10 12 14 0 1 2 3 4 5 6 7 8 Queryruntime,s Query run Cold LLAP LLAP after an unrelated workload ~2xtimesaving For cold start, the warmup took 3 runs
  16. 16. Page16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Parallel query execution – LLAP vs Hive 1.2 3131 2416 1741 1420 1508 531 291 147 94 75 0 500 1000 1500 2000 2500 3000 3500 1 2 4 8 16 Totalruntimeforallqueries,s. Number of concurrent users Hive 1.2 total runtime LLAP total runtime Hive 1.2 linear scaling LLAP linear scaling Parallelism eats into latency
  17. 17. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Parallel query execution – 10Tb scale 531 291 147 94 75 0 100 200 300 400 500 600 1 2 4 8 16 Totalruntimeforallqueries,s. Number of concurrent users LLAP total runtime LLAP linear scaling Latency sensitive pre-emption
  18. 18. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Performance – cache on HDFS, 1Tb scale 20.2 18.3 17.6 16.2 16.2 16.8 14.5 11.3 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 5 10 15 20 25 24 26 28 30 32 34 36 38 40 42 Cachehitrate Queryruntime,s Cache size, Gb Query runtime, after unrelated workload Query runtime, after related workload Metadata hit rate Data hit rate
  19. 19. Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP as a “relational” datanode
  20. 20. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example - SparkSQL integration – execution flow Page 20 HadoopRDD.compute() LlapInputFormat.getRecordReader() - Open socket for incoming data - Send package/plan to LLAP Check permissions Generate splits w/LLAP locations Return securely signed splits Ranger+HiveServer2 Spark Executor HadoopRDD.getPartitions() Get Hive splits Cluster nodes Verify splits Run scan + transform Send data back LLAP Partitions/ Splits Request splits : : var llapContext = LlapContext.newInstance( sparkContext, jdbcUrl) var df: DataFrame = llapContext.sql("select * from tpch_text_5.region") DataFrame for Hive/LLAP data
  21. 21. Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitoring LLAP Queries
  22. 22. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Monitoring • LLAP exposes a UI for monitoring • Also has jmx endpoint with much more data, logs and jstack endpoints as usual • Aggregate monitoring UI is work in progress
  23. 23. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Watching queries – Tez UI integration
  24. 24. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more

×