More Related Content Similar to LLAP: Sub-Second Analytical Queries in Hive (20) More from DataWorks Summit/Hadoop Summit (20) LLAP: Sub-Second Analytical Queries in Hive1. Page1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP: Sub-Second Analytical Queries in Hive
Gopal Vijayaraghavan
2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why LLAP?
• People like Hive
• Disk->Mem is getting further away
– Cloud Storage isn’t co-located
– Disks are connected to the CPU via network
• Security landscape is changing
– Cells & Columns are the new security boundary, not files
– Safely masking columns needs a process boundary
• Concurrency, Performance & Scale are at conflict
– Concurrency at 100k queries/hour
– Latencies at 2-5 seconds/query
– Petabyte scale warehouses (with terabytes of “hot” data)
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is LLAP?
• Hybrid model combining daemons and containers for
fast, concurrent execution of analytical workloads
(e.g. Hive SQL queries)
• Concurrent queries without specialized YARN queue setup
• Multi-threaded execution of vectorized operator pipelines
• Asynchronous IO and efficient in-memory caching
• Relational view of the data available thru the API
• High performance scans, execution code pushdown
• Centralized data security
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2.0 (+ LLAP)
• Transparent to Hive users, BI tools, etc.
• Hive decides where query fragments run
(LLAP, Container, AM) based on
configuration, data size, format, etc.
• Each Query coordinated independently by
a Tez AM
• Number of concurrent queries throttled
by number of active AMs
• Hive Operators used for processing
• Tez Runtime used for data transfer
HiveServer2
Query/AM
Controller
Client(s) YARN Cluster
AM1
llapd
llapd
Container AM1
Container AM1
llapd
Container AM2
AM2
AM3
llapd
5. Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Industry benchmark – 10Tb scale
0
5
10
15
20
25
30
35
40
45
50
query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query98
Time(seconds)
LLAP vs Hive 1.x 10TB Scale
Hive 1.x LLAP90% faster
6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation from a customer case study
0
20000
1 3 5 7 9 11 13 15 17 19 21 23 25 27
• Aggregate daily statistics for a time interval:
SELECT yyyymmdd,
sum(total_1),
sum(total_2),
...
from table
where yyyymmdd >= xxx
and yyyymmdd < xxx
and userid = xxx
group by userid, yyyymmdd;
Max
Max
Max
Avg
Avg Avg
0
1
2
3
4
5
6
7
8
D W M Y D W M Y D W M Y
Tez Phoenix LLAP
Executiontime,s
7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation from a customer case study
• Display a large report
Execution Time in seconds over time range
Max
Max
Avg
Avg
0
5
10
15
20
D W M Y D W M Y
Tez LLAP
Executiontime,s
9. Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does LLAP make queries faster?
10. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP
Queue
Technical overview – execution
• LLAP daemon has a number of executors (think
containers) that execute work "fragments"
• Fragments are parts of one, or multiple parallel
workloads (e.g. Hive SQL queries)
• Work queue with pluggable priority
• Geared towards low latency queries over long-
running queries (by default)
• I/O is similar to containers – read/write to HDFS,
shuffle, other storages and formats
• Streaming output for data API
Executor
Q1 Map 1
Executor
External read
Executor
Q3 Reducer 3
Q1 Map 1
Q1 Map 1
Q3 Map 19
HDFS
Waiting for
shuffle inputs
HBase
Container
(shuffle input)
Spark
executor
11. Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Executor
Technical overview – IO layer
• Optional: when executing inside LLAP
• All other formats use in-sync mode
• Asynchronous IO for Hive
• Wraps over InputFormat, reads through cache
• Supported with ORC
• Transparent, compressed in-memory cache
• Format-specific, extensible
• NVMe/NVDIMM caches
RecordReade
r
Fragment
Cache
IO thread
Plan & decode
Read, decompress
Metadata cache
Actual data (HDFS, S3, …)
What to read Data buffers
Splits Vectorized data
Indexes
12. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel queries – priorities, preemption
• Lower-priority fragments can be preempted
• For example, a fragment can start running before its inputs are
ready, for better pipelining; such fragments may be preempted
• LLAP work queue examines the DAG parameters to give
preference to interactive (BI) queries
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce waiting
Time
ClusterUtilization
Long-Running Query
Short-Running Query
13. Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
I/OExecution
In-memory processing – present and future
Decoder
Distributed FS
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
Low-level pushdown
(e.g. filter)*
SSD cache*
Off-heap cache Compression
codec
Native data
vectors
- work in
progress
*
Compact encoded data
Compressed data
col1
col2
14. Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
First query erformance
• Cold LLAP is nearly as fast as
shared pre-warmed containers
(impractical on real clusters)
• Realistic (long-running) LLAP
~3x faster than realistic (no
prewarm) Tez
No
prewarm,
20.94
Shared
prewarm
11.96
Cold LLAP
13.23
Realistic
LLAP
7.49
0
5
10
15
20
25
Firstqueryruntime,s
15. Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
JIT Performance – heavy use
• Cache disabled!
13.23
7.93 7.7
6.29
5.44 5.3 5.01 4.89 4.83
7.49
6.1
4.61 4.59
4.93
4.62 4.63 4.45 4.25
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7 8
Queryruntime,s
Query run
Cold LLAP
LLAP after an
unrelated workload
~2xtimesaving
For cold start, the
warmup took 3 runs
16. Page16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel query execution – LLAP vs Hive 1.2
3131
2416
1741
1420
1508
531
291
147
94 75
0
500
1000
1500
2000
2500
3000
3500
1 2 4 8 16
Totalruntimeforallqueries,s.
Number of concurrent users
Hive 1.2 total runtime
LLAP total runtime
Hive 1.2 linear scaling
LLAP linear scaling
Parallelism eats into latency
17. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Parallel query execution – 10Tb scale
531
291
147
94
75
0
100
200
300
400
500
600
1 2 4 8 16
Totalruntimeforallqueries,s.
Number of concurrent users
LLAP total runtime
LLAP linear scaling
Latency sensitive pre-emption
18. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance – cache on HDFS, 1Tb scale
20.2
18.3
17.6
16.2 16.2
16.8
14.5
11.3
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0
5
10
15
20
25
24 26 28 30 32 34 36 38 40 42
Cachehitrate
Queryruntime,s
Cache size, Gb
Query runtime, after
unrelated workload
Query runtime, after
related workload
Metadata hit rate
Data hit rate
20. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example - SparkSQL integration – execution flow
Page 20
HadoopRDD.compute()
LlapInputFormat.getRecordReader()
- Open socket for incoming data
- Send package/plan to LLAP
Check permissions
Generate splits w/LLAP locations
Return securely signed splits
Ranger+HiveServer2
Spark Executor
HadoopRDD.getPartitions()
Get Hive splits
Cluster nodes
Verify splits
Run scan + transform
Send data back
LLAP
Partitions/
Splits
Request
splits
: :
var llapContext =
LlapContext.newInstance(
sparkContext, jdbcUrl)
var df: DataFrame =
llapContext.sql("select *
from tpch_text_5.region")
DataFrame for Hive/LLAP data
22. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Monitoring
• LLAP exposes a UI for
monitoring
• Also has jmx endpoint with
much more data, logs and
jstack endpoints as usual
• Aggregate monitoring UI is
work in progress
23. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Watching queries – Tez UI integration
24. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn more