SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
LLAP: Sub-Second Analytical Queries in Hive
Gunther Hagleitner
Page2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Perf overview – a demo in a gif
Page3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive 2 with LLAP: 5x Performance Boost at 10 TB Scale
0
2
4
6
8
10
12
14
16
18
0
500
1000
1500
2000
2500
Improvement(Factor)
Querytime,s
Hive 2 with LLAP: 5x Performance Improvement Across All Query Types
(10 TB, 10 Nodes (256GB/32 cores), TPC-DS Queries)
Hive 2 LLAP Improvement v Hive 1 (Factor)
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
0
5
10
15
20
25
30
35
40
45
50
0
50
100
150
200
250
Speedup(xFactor)
QueryTime(s)(LowerisBetter)
Hive 2 with LLAP averages 26x faster than Hive 1
Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s) Speedup (x Factor)
Hive 2 with LLAP: 25+x Performance Boost: Interactive / 1TB Scale
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Hive vs. Apache Impala at 10TB
 10TB scale on 10 identical
nodes.
 Hive and Impala showed
similar times on most
smaller queries.
 Hive scaled better,
completing larger queries
quicker with lower end-to-
end time for all queries
Highlights
1
10
100
1000
10000
Querytime,s
Impala 2.8 Hive 2.1
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Hive vs. Presto on a partitioned 1TB dataset.
 Presto lacks basic
performance optimizations
like dynamic partition
pruning.
 On a real dataset / workload
Presto perform poorly
without full re-writes.
 Example: Query 55 without
re-writes = 185.17s, with re-
writes = 16s. LLAP = 1.37s.
Highlights
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive LLAP: Stable Performance under High Concurrency
4x Queries,
2.8x
Runtime
Difference
5x Queries,
4.6x
Runtime
Difference
Mark
Concurrent
Queries
Average
Runtime
5 7.76s
25 36.24s
100 102.89s
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive with LLAP: Linear Scale
Average and Maximum query times
Mixed SQL workload of simple and complex queries
1TB Dataset Size
0
20
40
60
80
100
120
8 16 32
Time(s)
Concurrent Users
Average Query Time by Concurrency, 1TB
Average Time: 8 Node Average Time: 16 Node
0
50
100
150
200
250
300
8 16 32
Time(s)
Concurrent Users
Maximum Query Time by Concurrency, 1TB
Max Time: 8 Node Max Time: 16 Node
Linear Scale:
Double Nodes,
Consistent Runtime
Page10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Performance – cache on HDFS, 1Tb scale
20.2
18.3
17.6
16.2 16.2
16.8
14.5
11.3
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0
5
10
15
20
25
24 26 28 30 32 34 36 38 40 42
Cachehitrate
Queryruntime,s
Cache size, Gb
Query runtime, after
unrelated workload
Query runtime, after
related workload
Metadata hit rate
Data hit rate
Page11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Performance – cache on cloud storage
• Cloud storage, slower than
on-prem HDFS
• Large scan on 1Tb scale
• Local HD/SDD cache
• Massive improvement on slow
storage with little memory cost
0
50
100
150
200
250
300
350
400
Cold cache Hot cache
Runtime,s
Entire query
Scan portion
Page12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive with LLAP on text data (CSV, logs, etc)
-
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0
200
400
600
800
1000
1200
1400
Text: No Cache
Text: Cached
Improvement (Factor)
Page13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hive with LLAP: Architecture Overview
Deep
Storage
YARN Cluster
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries
In-Memory Cache
(Shared Across All Users)
HDFS and
Compatible
S3 WASB Isilon
Page14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
I/OExecution
LLAP: In-memory processing
Decoder
Distributed FS
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
Low-level pushdown
(e.g. filter)*
SSD/HD cache
Off-heap cache Compression
codec
Native data
vectors
Compact encoded data
Compressed data
col1
col2
Page15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Parallel queries – priorities, preemption
• Lower-priority fragments can be preempted
• For example, a fragment can start running before its inputs are
ready, for better pipelining; such fragments may be preempted
• LLAP work queue examines the DAG parameters to give
preference to interactive (BI) queries
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce waiting
Time
ClusterUtilization
Long-Running Query
Short-Running Query
Page16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Sidebar: Relational data node
• LLAP can provide a "relational datanode" view of the data
• Standard interface to read tables directly from LLAP nodes
• Supports column-level security
• LLAP handles ACID, schema-evolution
• Optimized for fast access/scans
• Read horizontal table partitions in parallel on compute nodes
• Push projections, filters, aggregates into LLAP
• Utilizes cache and LLAP IO subsystem
Page17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Example - SparkSQL integration
• Allowing direct access to Hive
table data from Spark breaks
security model for warehouses
secured using Ranger or SQL
Standard Authorization
• Other features may not work
correctly unless support added in
Spark
• Row/Cell level security (when
implemented), ACID, Schema
Evolution, …
• SparkSQL has interfaces for external
sources; they can be implemented to
access Hive data via HS2/LLAP
• SparkSQL Catalyst optimizer hooks
can be used to push processing into
LLAP (e.g. filters on the tables)
• Some optimizations, like dynamic partition
pruning, can be used by Hive even if SparkSQL
doesn't support them; if execution pushdown
is advanced enough
Page18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Example - SparkSQL integration – execution flow
Page 18
HadoopRDD.compute()
LlapInputFormat.getRecordReader()
- Open socket for incoming data
- Send package/plan to LLAP
Check permissions
Generate splits w/LLAP locations
Return securely signed splits
HS2/Metastore
Spark Executor
HadoopRDD.getPartitions()
Get Hive splits
Cluster nodes
Verify splits
Run query plan
Send data back
LLAP
Partitions/
Splits
Request
splits
: :
var llapContext =
LlapContext.newInstance(
sparkContext, jdbcUrl)
var df: DataFrame =
llapContext.sql("select *
from tpch_text_5.region")
DataFrame for Hive/LLAP data
Page19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
LLAP Availability
• First version shipped in Apache Hive 2.0
• Hive 2.0.1 contains important bug fixes to make it production ready
• Hive 2.1 has new features and further perf improvements
• HDP 2.5 includes beta version of LLAP
• HDP 2.6 includes GA version of Apache Hive 2.1 w/ LLAP
Page20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Setup: Ambari
• Simple setup
• Integrated into Hive configuration page
• Launches HS2 instance and LLAP cluster
• Handles configuration, slider, security,
alerts and monitoring, …
• Requires Ambari 2.5.x
• Requires HDP 2.6.x stack (includes Apache
Hive 2.1 w/ LLAP)
Page21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Setup: Cloud
• Spin up LLAP clusters in minutes
• Exposes HS2 endpoint
• JDBC/ODBC
• Zeppelin/Hive view also options
• RDS + S3 friendly
• Use HD cache for fast S3 access
• RDS for shared metastore instance
• HDP 2.6 available now!
Page22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitoring
• Simple UI for monitoring
• JMX endpoint with more data
• Logs and jstack endpoints
Page23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitoring
• HDP ships with number of default
Grafana dashboards for LLAP
• Show wealth of metrics for cache,
compute and scheduling
• Helps to identify bottlenecks and issues
• Helps with capacity planning
• Easy to extend or integrate with other
systems
Page24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Watching queries – Tez UI integration
Page25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Future work
• Fine-grained workload management
• Reservations & priorities
• Configurable guardrails
• Maximum permissible scan range, runtime, cost
• Integrated with CBO’s cost model
• Admin tools
• Identify hotspots & data layout issues
• Easy reporting on health, performance and utilization
Page26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Summary
• LLAP is a new execution substrate for fast, concurrent
analytical workloads, harnessing Hive vectorized SQL
engine and efficient in-memory caching layer
• Provides secure relational view of the data through a
simple API
• Available in Hive 2, integrated with ecosystem (Ambari,
Cloud, Grafana, jmx, Hive & Tez views)
Page27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn more

More Related Content

What's hot

Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
DataWorks Summit
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 

What's hot (20)

Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Llap: Locality is Dead
Llap: Locality is DeadLlap: Locality is Dead
Llap: Locality is Dead
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 

Similar to LLAP: Sub-Second Analytical Queries in Hive

LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
Hortonworks
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
HortonworksJapan
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
Abdelkrim Hadjidj
 

Similar to LLAP: Sub-Second Analytical Queries in Hive (20)

LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

LLAP: Sub-Second Analytical Queries in Hive

  • 1. Page1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved LLAP: Sub-Second Analytical Queries in Hive Gunther Hagleitner
  • 2. Page2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Perf overview – a demo in a gif
  • 3. Page3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hive 2 with LLAP: 5x Performance Boost at 10 TB Scale 0 2 4 6 8 10 12 14 16 18 0 500 1000 1500 2000 2500 Improvement(Factor) Querytime,s Hive 2 with LLAP: 5x Performance Improvement Across All Query Types (10 TB, 10 Nodes (256GB/32 cores), TPC-DS Queries) Hive 2 LLAP Improvement v Hive 1 (Factor)
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved 0 5 10 15 20 25 30 35 40 45 50 0 50 100 150 200 250 Speedup(xFactor) QueryTime(s)(LowerisBetter) Hive 2 with LLAP averages 26x faster than Hive 1 Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s) Speedup (x Factor) Hive 2 with LLAP: 25+x Performance Boost: Interactive / 1TB Scale
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Hive vs. Apache Impala at 10TB  10TB scale on 10 identical nodes.  Hive and Impala showed similar times on most smaller queries.  Hive scaled better, completing larger queries quicker with lower end-to- end time for all queries Highlights 1 10 100 1000 10000 Querytime,s Impala 2.8 Hive 2.1
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Hive vs. Presto on a partitioned 1TB dataset.  Presto lacks basic performance optimizations like dynamic partition pruning.  On a real dataset / workload Presto perform poorly without full re-writes.  Example: Query 55 without re-writes = 185.17s, with re- writes = 16s. LLAP = 1.37s. Highlights
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hive LLAP: Stable Performance under High Concurrency 4x Queries, 2.8x Runtime Difference 5x Queries, 4.6x Runtime Difference Mark Concurrent Queries Average Runtime 5 7.76s 25 36.24s 100 102.89s
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hive with LLAP: Linear Scale Average and Maximum query times Mixed SQL workload of simple and complex queries 1TB Dataset Size 0 20 40 60 80 100 120 8 16 32 Time(s) Concurrent Users Average Query Time by Concurrency, 1TB Average Time: 8 Node Average Time: 16 Node 0 50 100 150 200 250 300 8 16 32 Time(s) Concurrent Users Maximum Query Time by Concurrency, 1TB Max Time: 8 Node Max Time: 16 Node Linear Scale: Double Nodes, Consistent Runtime
  • 10. Page10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Performance – cache on HDFS, 1Tb scale 20.2 18.3 17.6 16.2 16.2 16.8 14.5 11.3 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 5 10 15 20 25 24 26 28 30 32 34 36 38 40 42 Cachehitrate Queryruntime,s Cache size, Gb Query runtime, after unrelated workload Query runtime, after related workload Metadata hit rate Data hit rate
  • 11. Page11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Performance – cache on cloud storage • Cloud storage, slower than on-prem HDFS • Large scan on 1Tb scale • Local HD/SDD cache • Massive improvement on slow storage with little memory cost 0 50 100 150 200 250 300 350 400 Cold cache Hot cache Runtime,s Entire query Scan portion
  • 12. Page12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hive with LLAP on text data (CSV, logs, etc) - 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0 200 400 600 800 1000 1200 1400 Text: No Cache Text: Cached Improvement (Factor)
  • 13. Page13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hive with LLAP: Architecture Overview Deep Storage YARN Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users) HDFS and Compatible S3 WASB Isilon
  • 14. Page14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved I/OExecution LLAP: In-memory processing Decoder Distributed FS Fragment Hive operator Hive operator Vectorized processin g col1 col2 Low-level pushdown (e.g. filter)* SSD/HD cache Off-heap cache Compression codec Native data vectors Compact encoded data Compressed data col1 col2
  • 15. Page15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Parallel queries – priorities, preemption • Lower-priority fragments can be preempted • For example, a fragment can start running before its inputs are ready, for better pipelining; such fragments may be preempted • LLAP work queue examines the DAG parameters to give preference to interactive (BI) queries LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce waiting Time ClusterUtilization Long-Running Query Short-Running Query
  • 16. Page16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Sidebar: Relational data node • LLAP can provide a "relational datanode" view of the data • Standard interface to read tables directly from LLAP nodes • Supports column-level security • LLAP handles ACID, schema-evolution • Optimized for fast access/scans • Read horizontal table partitions in parallel on compute nodes • Push projections, filters, aggregates into LLAP • Utilizes cache and LLAP IO subsystem
  • 17. Page17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Example - SparkSQL integration • Allowing direct access to Hive table data from Spark breaks security model for warehouses secured using Ranger or SQL Standard Authorization • Other features may not work correctly unless support added in Spark • Row/Cell level security (when implemented), ACID, Schema Evolution, … • SparkSQL has interfaces for external sources; they can be implemented to access Hive data via HS2/LLAP • SparkSQL Catalyst optimizer hooks can be used to push processing into LLAP (e.g. filters on the tables) • Some optimizations, like dynamic partition pruning, can be used by Hive even if SparkSQL doesn't support them; if execution pushdown is advanced enough
  • 18. Page18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Example - SparkSQL integration – execution flow Page 18 HadoopRDD.compute() LlapInputFormat.getRecordReader() - Open socket for incoming data - Send package/plan to LLAP Check permissions Generate splits w/LLAP locations Return securely signed splits HS2/Metastore Spark Executor HadoopRDD.getPartitions() Get Hive splits Cluster nodes Verify splits Run query plan Send data back LLAP Partitions/ Splits Request splits : : var llapContext = LlapContext.newInstance( sparkContext, jdbcUrl) var df: DataFrame = llapContext.sql("select * from tpch_text_5.region") DataFrame for Hive/LLAP data
  • 19. Page19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved LLAP Availability • First version shipped in Apache Hive 2.0 • Hive 2.0.1 contains important bug fixes to make it production ready • Hive 2.1 has new features and further perf improvements • HDP 2.5 includes beta version of LLAP • HDP 2.6 includes GA version of Apache Hive 2.1 w/ LLAP
  • 20. Page20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Setup: Ambari • Simple setup • Integrated into Hive configuration page • Launches HS2 instance and LLAP cluster • Handles configuration, slider, security, alerts and monitoring, … • Requires Ambari 2.5.x • Requires HDP 2.6.x stack (includes Apache Hive 2.1 w/ LLAP)
  • 21. Page21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Setup: Cloud • Spin up LLAP clusters in minutes • Exposes HS2 endpoint • JDBC/ODBC • Zeppelin/Hive view also options • RDS + S3 friendly • Use HD cache for fast S3 access • RDS for shared metastore instance • HDP 2.6 available now!
  • 22. Page22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monitoring • Simple UI for monitoring • JMX endpoint with more data • Logs and jstack endpoints
  • 23. Page23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monitoring • HDP ships with number of default Grafana dashboards for LLAP • Show wealth of metrics for cache, compute and scheduling • Helps to identify bottlenecks and issues • Helps with capacity planning • Easy to extend or integrate with other systems
  • 24. Page24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Watching queries – Tez UI integration
  • 25. Page25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Future work • Fine-grained workload management • Reservations & priorities • Configurable guardrails • Maximum permissible scan range, runtime, cost • Integrated with CBO’s cost model • Admin tools • Identify hotspots & data layout issues • Easy reporting on health, performance and utilization
  • 26. Page26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Summary • LLAP is a new execution substrate for fast, concurrent analytical workloads, harnessing Hive vectorized SQL engine and efficient in-memory caching layer • Provides secure relational view of the data through a simple API • Available in Hive 2, integrated with ecosystem (Ambari, Cloud, Grafana, jmx, Hive & Tez views)
  • 27. Page27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more

Editor's Notes

  1. Introduce myself Set stage for demo
  2. Llap off -> 10s Llap on -> < 1s
  3. Observations: -> same hive, same interface (only ‘mode’ difference) -> huge speed up, esp significant when working online (tableau, ad hoc) -> always on (+ cache, memory) v on demand -> why containers? Throughput, shared cluster Rest of presentation: More details about performance and behavior, then technical details
  4. TPCDS (decision support: ad-hoc, reporting, iterative OLAP, data mining) TPCDS, 10TB, Hive 2 (llap) v Hive 1 (containers) (availability: hive 2.0+ has llap) Benchmark on 10 nodes (There are new techniques to bring down the long running queries, not included (optimizer v runtime)) Things to note: -> red line is factor improvement (< 15x, average 5x), tremendous improvement across the board -> analytical queries (ad hoc, drill down, etc) very fast (< 5s, 10s) -> heavy lifting still takes time (other ways to handle) Average is 5x
  5. This time: Choose interactive (adhoc, iterative) queries, smaller data set. Avg is 25x, Blue hive 2 times hard to see on this graph. All hive 2 queries are low seconds again. -> Good for working w/ data set.
  6. HDP 2.5 ORC/ZLib, CDH 5.8 Parquet/Snappy
  7. Hive version = Hive 2.1 Presto version = Presto 0.163
  8. Additional details: Queries are 20 interactive queries taken from TPC-DS using 1 TB of data from TPC-DS schema. Queries run at random using a jmeter test harness Average moves linearly System tries to handle short queries first, w/o starving longer ones
  9. -> cache only speed up on hdfs: llap benefits always there, cache size is variable -> HDFS - this is speed up over: locality + buffer cache -> unrelated workload: query stays the same -> Metadata cache is immediately useful, even if cache is small -> Cache size doesn’t have to be as big as whole data set (pushdown proj + filter, hot data set) -> 2x speed up when all is cached
  10. Story is different when there is no locality
  11. - Text is a horrible format for analytical engines: verbose, low information density, row major, inefficient serialization of data - Typically: ETL step to transform text data to columnar format - LLAP: use it directly, first access will transform and cache the data Cache can be HD or SD (large cache on slower medium still preferable) Steady state: Similar performance to ORC
  12. Start left: No change to clients, HS2 still enpoint for ODBC, JDBC Number of coordinators, each one handles one query at a time (limits concurrency), run in yarn These coordinators break the query into “fragments (part of the plan + part of the data)” and schedule the work out to the daemons Coos let HS2 know the status of the query execution LLAP daemons consist of 2 parts IO subsystem: Cache + elevator, cache is shared Executors: Multi threaded query execution Storage can be either local (hdfs) or remote (s3, wasb, etc) -> Perf: Daemons + coos run permanently, no spin up cost Daemons + coos run fixed set of code: JIT gains Executors are multi threaded, more use of cluster Cache avoids reload of data Multi-threaded IO
  13. Left side: Fragment (part of plan + part of data) Operators need to work on data: -> setup will initialize operators + IO -> IO scan (table, range, filters, projections) -> pool of column vectors (consumer/producer) Pushdown: I/O has metadata cache (column index, min/max, bloom) Will jump over unnecessary data Cache: Read through + coherent Trade-off: Encoding (decoder + cpu v space), dictionary + runlength Off heap (GC), ssd/hd Eviction: lrfu (don’t swap entire table for 1 scan) Cache locality (multiple v wait) Text conversion! Performance: Threading: No more ping/pong, independent IO/proc Multiple fragments in same process JIT Push down Metadata cache: Don’t need to read headers/footers/index Primary data cache: Don’t touch disk
  14. Scheduling: Each llap daemon has queue of fragments Coordinators submit these fragments These fragments can be pre-empted even after they start running Might be running but waiting for input Short running query comes in to interrupt long running (to keep system responsive)
  15. - Think distributed result set or relational datanode - Granular security
  16. Quick way to setup on premise Pre-emption is required for guaranteed startup Pick queue, #nodes, #concurrent queries Done! Under the cover computes: Memory sizes, cache size, #executors, jvm parameters, etc etc All of that can be fine tuned (later or on advanced page) Will use slider to spin up llap, check RM UI and monitoring pages for more info
  17. Hortonworks data cloud Get it on the amazon marketplace, check hwx.com for details HDC allows for transient + permanent clusters, multiple use cases, single view on data EDW-Analytics based on LLAP