Submit Search
Upload
LLAP: long-lived execution in Hive
•
62 likes
•
17,242 views
DataWorks Summit
Follow
Hadoop Summit 2015
Read less
Read more
Technology
Report
Share
Report
Share
1 of 43
Recommended
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Recommended
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
HBase Low Latency
HBase Low Latency
DataWorks Summit
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
Ververica
Spark shuffle introduction
Spark shuffle introduction
colorant
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
Apache Airflow
Apache Airflow
Sumit Maheshwari
Introduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
Delta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
Introduction to Apache Spark
Introduction to Apache Spark
Anastasios Skarlatidis
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
More Related Content
What's hot
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
HBase Low Latency
HBase Low Latency
DataWorks Summit
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
Ververica
Spark shuffle introduction
Spark shuffle introduction
colorant
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
Apache Airflow
Apache Airflow
Sumit Maheshwari
Introduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
Delta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
Introduction to Apache Spark
Introduction to Apache Spark
Anastasios Skarlatidis
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
What's hot
(20)
Hive 3 - a new horizon
Hive 3 - a new horizon
Using Apache Hive with High Performance
Using Apache Hive with High Performance
HBase Low Latency
HBase Low Latency
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
Spark shuffle introduction
Spark shuffle introduction
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Apache Spark Architecture
Apache Spark Architecture
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
Apache Airflow
Apache Airflow
Introduction to Apache Kudu
Introduction to Apache Kudu
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
Delta lake and the delta architecture
Delta lake and the delta architecture
Introduction to Apache Spark
Introduction to Apache Spark
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
The Impala Cookbook
The Impala Cookbook
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Similar to LLAP: long-lived execution in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
High throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Apache Tez – Present and Future
Apache Tez – Present and Future
Jianfeng Zhang
Apache Tez – Present and Future
Apache Tez – Present and Future
Rajesh Balamohan
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
What's new in apache hive
What's new in apache hive
DataWorks Summit
Similar to LLAP: long-lived execution in Hive
(20)
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
High throughput data replication over RAFT
High throughput data replication over RAFT
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
Apache Tez – Present and Future
Apache Tez – Present and Future
Apache Tez – Present and Future
Apache Tez – Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
What's new in apache hive
What's new in apache hive
More from DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
More from DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Recently uploaded
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Recently uploaded
(20)
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
LLAP: long-lived execution in Hive
1.
Page1 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive Sergey Shelukhin
2.
Page2 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive Stinger recap and even faster queries+ + LLAP: overview+ + Query fragment execution+ + IO elevator and caching+ + Performance+ + Current status and future directions+ + Query fragment API+
3.
Page3 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Hive performance recap • Stinger: An Open Roadmap to improve Apache Hive’s performance 100x • Delivered in 100% Apache Open Source • Stinger.Next: Enterprise SQL at Hadoop Scale • Launched in September 2014, phase 1 delivered in 2015 Vectorized SQL Engine, Tez Execution Engine, ORC Columnar format Cost Based Optimizer Hive 0.10 Batch Processing 100-150x Query Speedup Hive 0.14 Human Interactive (5 seconds)
4.
Page4 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved The road ahead to sub-second queries • Startup costs are now a key bottleneck • Example: JVM takes 100s of ms to start up • Vectorized code can benefit from JIT optimization • JIT optimizer needs (run)time to do its work • Improved operator performance shifts focus on IO • Reading data is serialized with data processing • Reading from HDFS is relatively expensive • Large machines provide opportunities for data sharing • Both between parallel computation (sharing) and serial (caching)
5.
Page 5 ©
Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: overview
6.
Page6 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What is LLAP? • Hybrid execution with daemons in Hive • Eliminates startup costs for tasks • Allows the JIT optimizer to have time to optimize • Multi-threaded execution of vectorized operator pipelines • Also allows sharing of metadata, map join tables, etc. • Asynchronous IO elevator and caching • Reduces IO cost and parallelizes IO and processing • Can be spindle-aware; other IO optimizations • Query fragment API Node LLAP Process Cache Query Fragment HDFS Query Fragment
7.
Page7 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What LLAP isn't • Not a Hive execution engine (like Tez, MR, Spark…) • Execution engines provide coordination and scheduling • Some work (e.g. large shuffles) can still be scheduled in containers • Not a storage layer • Daemons are stateless and read (and cache) data from HDFS • Does not supersede existing Hive • Container-based execution still fully supported
8.
Page8 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example execution: MR vs Tez vs Tez+LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T R R R T T T R M M M R R R M M R R HDFS In-Memory columnar cache Map – Reduce Intermediate results in HDFS Tez Optimized Pipeline Tez with LLAP Resident process on Nodes Map tasks read HDFS
9.
Page9 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP in your cluster • LLAP daemons run on existing YARN • Apache Slider is used for provisioning and recovery • Easy to bring up, tear down, and share clusters • Resource management via YARN delegation model (WIP) • LLAP and containers dynamically balance resource usage (WIP)
10.
Page10 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Benefits unrelated to performance (WIP) • Concurrent query execution and priority enforcement • Access control, including column-level security • ACID improvements • Can be used externally via the API • Will be usable e.g. by Spark, Pig, Cascading, …
11.
Page11 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Query fragment API
12.
Page12 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Query Fragment API - overview • Hadoop RPC, protobuf are used to send fragments • Fragments are "physical algebra": operators, metadata, input sources and output channels • Results are returned asynchronously via output channels • Hive will produce fragments for LLAP as part of physical optimization • Other applications can compile their own physical algebra
13.
Page13 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Query Fragment API – algebra • Operators: Scan, Filter, Group By, Hash/Merge join, etc. • Operators may include statistics for local optimization • Expressions: comparison, arithmetic, Hive built-in functions • All Hive datatypes • Complex types like map/list/etc. – WIP
14.
Page14 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Query Fragment API – client API • Encapsulates creation, submission of query fragments • Also helps with IO from LLAP • Getting vectorized record readers, batches, etc. • Working with output channels (cancellation, availability of records, failure)
15.
Page15 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Query execution
16.
Page16 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: Query Execution Overview of Query Execution+ + Scheduling+ ++ + Coordination via Tez+ What Fragments run in LLAP vs Containers+ Future work+
17.
Page17 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Tez + LLAP – overview • Hive on Tez already proven to perform well • Tez being enhanced to allow it to coordinate work to external systems (TEZ-2003) • Pluggable Scheduling • Pluggable communication – custom execution specifications, protocols • DAG coordination remains unchanged • Hive Operators / Tez Runtime components used for Processing and data transfer
18.
Page18 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Deciding on where query components run • Fragments can run in LLAP, regular containers, AM (as threads) • Decision made by the Hive Client • Configurable – all in LLAP, none in LLAP, intelligent mix • Criteria for running in LLAP (in auto mode) • No user code (or only blessed user code) • Data source – HDFS • ORC and vectorized execution (for now) • Others can still run in LLAP in "all" mode, w/o IO elevator and cache • Data size limitations (avoid heavy / long running processing within LLAP)
19.
Page19 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved So… M M M R R R M M R R Tez
20.
Page20 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved AM So… T T T R R R T T T R M M M R R R M M R R Tez Tez with LLAP (auto) auto
21.
Page21 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved AM AM So… T T T R R R T T T R M M M R R R M M R R Tez Tez with LLAP (auto) T T T R R R T T T R Tez with LLAP (all) allauto
22.
Page22 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Scheduling for LLAP in Tez AM • Greedy scheduling per query – assumes entire cluster available • Schedule work to preferred location (HDFS locality) • Multiple independent queries set the same preferred location if accessing the same data (improves cache locality) • LLAP Daemons schedule fragments independently – across multiple queries
23.
Page23 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP Queue Queuing fragments • LLAP daemon has a number of executors (think containers) • Wait queue with pluggable priority • Geared towards low latency queries (default) • Models estimated work left in query • Sequencing within a query handled via topological order • Fragment start time factors into scheduling decision Executor Q1 Reducer 2 Executor Q1 Map 1 Executor Q1 Map 1 Executor Q3 Map 19 Q1 Reducer 2 Q1 Map 1 Q3 Map 19 Q1 Reducer 2
24.
Page24 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce Well, 10 mapper out of 100 are done!
25.
Page25 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready • If the data is not ready, may never free the executor • Non-finishable fragments can be preempted • Improves throughput, prevents deadlocks LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce
26.
Page26 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready • If the data is not ready, may never free the executor • Non-finishable fragments can be preempted • Improves throughput, prevents deadlocks LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3
27.
Page27 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved IO elevator and other internals
28.
Page28 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: IO elevator and other internals Asynchronous IO and decompression+ + Off-heap data caching+ ++ + File metadata caching+ Map join table sharing+ Better JIT usage thanks to persistent daemon+
29.
Page29 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Asynchronous IO • Currently, Hive IO and input decoding is interleaved with processing • Remote HDFS reads are expensive • Even local disk might be • Data decompression and decoding is expensive
30.
Page30 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Asynchronous IO • With IO elevator, reading, decoding and processing are parallel • IO threads can be spindle aware (WIP) • Depending on workload, IO and processing threads can balance resource usage (throttle IO, etc.) (WIP)
31.
Page31 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Caching and off-heap data • Decompressed data is cached off-heap • Simplifies memory management, mitigates some GC problems • Saves HDFS and decompression costs, esp. on dimension tables • In future, processing cache data directly possible to avoid copies • Replacement policy is pluggable • Currently, simple local policies are used e.g. FIFO, LRFU • Other policies possible (e.g. workflow-adaptable, or lazily coordinated for better cache affinity)
32.
Page32 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Cache size vs operator memory requirement • Cache space takes away from operator space • Sort buffers, hash join tables, GBY buffers take space • Tradeoff between HDFS reads and operator speed • Depends on workflow, dataset size, etc. • New vectorization changes in Hive will speed up operators and allow for larger cache
33.
Page33 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Other benefits • File metadata and indexes are cached • Much faster PPD application for selective queries – no HDFS reads • Same replacement as data cache (but higher priority) • Map join hash tables, fragment plans are shared • Multiple tasks do not all generate the table or deserialize the plans • Better use of JIT optimizer • Because the daemons are persistent, JIT has more time to kick in • Especially good with vectorization!
34.
Page34 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance
35.
Page35 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Setup • 13 physical machines (12 cores, 40Gb RAM each) • Note – smaller cluster than previous Tez perf runs • TPCDS 200, interactive queries • Both – ORC, vectorized, Hadoop 2.8, queries via HS2 w/JMeter • TEZ: Hive 1.2 + Tez 0.8 (snapshot) • Pre-warm and container reuse enabled • LLAP: Branch in pre-alpha stage + Tez 0.8 (snapshot) • Bias towards executors – small cache • Otherwise no tuning
36.
Page36 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Summary • NOTE - in early stage – pre-alpha-release perf results • Still, interactive queries are already 1.5-4 times faster • First query result after launching CLI significantly improved • In real life, LLAP daemons would also already be warm • Parallel queries are already better • Lots of work still ahead – epic locks in Kryo, Log4j, HDFS, HiveServer2; better object sharing, better priority enforcement • Should be much faster in short order
37.
Page37 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Query execution time 0 5 10 15 20 25 30 35 query55 query42 query52 query3 query12 query27 query26 query7 query19 query96 query43 query15 query82 query13 Execuonme,sec Hive (1.2.0) Hive (LLAP)
38.
Page38 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel query execution • 8 users, 4 parallel executors on HS • Tez: 50% of serial time; LLAP alpha: 41% of serial time 0 50 100 150 200 250 300 Serial Parallel Execuonme,sec Total execu on me (13 queries) Hive (1.2.0) Hive (LLAP)
39.
Page39 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Current status and future directions
40.
Page40 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Current status • Putting the finishing touches on the CTP (alpha release) • Watch Hortonworks blog, and Apache Hive mailing lists, for details! • The basic features are functional • Currently only on Tez; IO only on vectorized and ORC • AKA the fastest Hive setup possible • Lots of performance improvement not yet realized • Lots of advanced features are WIP or planned
41.
Page41 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Work in progress • Further performance improvement • Concurrent query execution improvements • Better vectorized operators (join, group by, …) • Defining the API
42.
Page42 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Future work • Security, including column level security • Tighter integration with YARN, e.g. resource delegation • Guaranteed Capacities for better SLA guarantee, maybe with central scheduler • Dynamic daemon sizing with off-heap storage • ACID support • Better (maybe centrally coordinated) locality and caching • Temp tables, intermediate query results in LLAP • Interleaving of Fragment Execution • Past processing is not lost (as against preemption) • A rogue / badly scheduled query will not hog the system
43.
Page43 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more