Submit Search
Upload
Hive Now Sparks
•
4 likes
•
2,541 views
DataWorks Summit
Follow
Hive Now Sparks Rui Li Intel Hive Committer Xuefu Zhang Cloudera Hive PMC
Read less
Read more
Technology
Report
Share
Report
Share
1 of 38
Recommended
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
trihug
Empower Hive with Spark
Empower Hive with Spark
DataWorks Summit
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Yarns About Yarn
Yarns About Yarn
Cloudera, Inc.
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
Quick Introduction to Apache Tez
Quick Introduction to Apache Tez
GetInData
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
Recommended
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
trihug
Empower Hive with Spark
Empower Hive with Spark
DataWorks Summit
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Yarns About Yarn
Yarns About Yarn
Cloudera, Inc.
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
Quick Introduction to Apache Tez
Quick Introduction to Apache Tez
GetInData
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
DataWorks Summit/Hadoop Summit
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
Tez Data Processing over Yarn
Tez Data Processing over Yarn
InMobi Technology
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Intro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Uwe Printz
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
Yahoo Developer Network
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
Yahoo Developer Network
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望
Jazz Yao-Tsung Wang
More Related Content
What's hot
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Adam Muise
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
DataWorks Summit/Hadoop Summit
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
Tez Data Processing over Yarn
Tez Data Processing over Yarn
InMobi Technology
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Intro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Uwe Printz
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
Yahoo Developer Network
What's hot
(20)
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Tez Data Processing over Yarn
Tez Data Processing over Yarn
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Intro to Apache Spark
Intro to Apache Spark
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
Viewers also liked
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
Yahoo Developer Network
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望
Jazz Yao-Tsung Wang
reference letter christof
reference letter christof
Christof Kurniawan
Apache Kite
Apache Kite
Alwin James
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
Modern Data Stack France
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
whoschek
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Spain
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
spark-project
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
Chao Zhu
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
Viewers also liked
(20)
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望
reference letter christof
reference letter christof
Apache Kite
Apache Kite
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
Similar to Hive Now Sparks
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Hive on spark berlin buzzwords
Hive on spark berlin buzzwords
Szehon Ho
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Part 2: A Visual Dive into Machine Learning and Deep Learning
Part 2: A Visual Dive into Machine Learning and Deep Learning
Cloudera, Inc.
BigData Clusters Redefined
BigData Clusters Redefined
DataWorks Summit
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
Cloudera, Inc.
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Wangda Tan
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
EMC
Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)
sdeeg
Spark 101
Spark 101
Shahaf Azriely {TopLinked} ☁
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
One Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
Cloudera, Inc.
Fraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
DataWorks Summit
Similar to Hive Now Sparks
(20)
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Spark One Platform Webinar
Spark One Platform Webinar
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Hive on spark berlin buzzwords
Hive on spark berlin buzzwords
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Part 2: A Visual Dive into Machine Learning and Deep Learning
Part 2: A Visual Dive into Machine Learning and Deep Learning
BigData Clusters Redefined
BigData Clusters Redefined
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Spark For Plain Old Java Geeks (June2014 Meetup)
Spark For Plain Old Java Geeks (June2014 Meetup)
Spark 101
Spark 101
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
Fraud Detection using Hadoop
Fraud Detection using Hadoop
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
More from DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
More from DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Recently uploaded
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Recently uploaded
(20)
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Hive Now Sparks
1.
1© Cloudera, Inc.
All rights reserved. Hive Now Sparks Rui Li Intel Hive Committer Xuefu Zhang Cloudera Hive PMC
2.
2© Cloudera, Inc.
All rights reserved. Agenda Background Architecture and Design Challenges Current Status Benchmarks Q&A
3.
3© Cloudera, Inc.
All rights reserved. Background Apache Hive: popular SQL-based data processing tool in Hadoop with large user base Apache Spark: generalized distributed processing framework to succeed MapReduce Marrying the two benefits users in both communities HIVE-7292, the most watched (160+)
4.
4© Cloudera, Inc.
All rights reserved. Motivation Succeed MapReduce as the generalized batch processing engine Introduce faster batch processing in Hive Greater adoption for both projects (Apache Hive and Apache Spark)
5.
5© Cloudera, Inc.
All rights reserved. Community Involvement Effort from both communities (Hive and Spark) Contributions from many organizations Free lancer contributors, early adopters
6.
6© Cloudera, Inc.
All rights reserved. Design Principles No or limited impact on Hive's existing code path Maximal code reuse Minimum feature customization Low future maintenance cost
7.
7© Cloudera, Inc.
All rights reserved. Hive Internal Hive Client (Beeline, JDBC) Parser Semantic Analyzer Task CompilerHadoop Client Hadoop HQL AST Operator Tree TasksResult HiveServer2 MetaStore
8.
8© Cloudera, Inc.
All rights reserved. Class Hierarchy MapRedCompiler TaskCompiler TezCompiler MapRedTask Task TezTask MapRedWork Work TezWork generates described by
9.
9© Cloudera, Inc.
All rights reserved. Class Hierarchy MapRedCompiler TaskCompiler TezCompiler SparkCompiler MapRedTask Task TezTask SparkTask MapRedWork Work TezWork SparkWork generates described by
10.
10© Cloudera, Inc.
All rights reserved. Work – Metadata for Task MapRedWork contains one MapWork and a possible ReduceWork SparkWork contains a graph of MapWorks and ReduceWorks MapWork1 ReduceWork1 MapWork2 ReduceWork2 MapWork1 ReduceWork1 ReduceWork2 Ex. Query: select name, sum(value) as v from dec group by name order by v; Spark Job MR job 1 MR job 2
11.
11© Cloudera, Inc.
All rights reserved. Spark Client Abreast with MR client, Tez client Talking to Spark cluster Support local, local-cluster, standalone, yarn-cluster, yarn-client Job submission, monitoring, error reporting, statistics, metrics, counters
12.
12© Cloudera, Inc.
All rights reserved. Spark Context Core of Spark client Heavy-weighted, thread-unsafe Designed for a single-user application Doesn't work in multi-session environment Doesn't scale with user sessions
13.
13© Cloudera, Inc.
All rights reserved. Remote Spark Context (RSC) Being created and living outside HiveServer2 In yarn-cluster mode, Spark context lives in application master (AM) Otherwise, Spark context lives in a separate process (other than HS2) User 1 User 2 Session 1 Session2 HiveServer 2 AM (RSC) Node 2 Node 3 Yarn cluster Node 1 AM (RSC)
14.
14© Cloudera, Inc.
All rights reserved. Data Processing via MapReduce Table as HDFS files is read by MR framework Map-side processing Map output is shuffled by MR framework Reduce-side processing Reduce output is written to disk as part of reduce-side processing Output may be further processed by next MR job or returned to client
15.
15© Cloudera, Inc.
All rights reserved. Data Processing via Spark Treat Table as HadoopRDD (input RDD) Apply the function that wraps MR's map-side processing Shuffle map output using Spark's transformations (groupByKey, sortByKey, etc) Apply the function that wraps MR's reduce-side processing Output is either written to file or being shuffled again
16.
16© Cloudera, Inc.
All rights reserved. Spark Plan MapInput – encapsulate a table MapTran – map-side processing ShuffleTran – shuffling ReduceTran – reduce-side processing MapInput MapTran ShuffleTran ShuffleTran ReduceTranReduceTran Query: select name, sum(value) as v from dec group by name order by v;
17.
17© Cloudera, Inc.
All rights reserved. Advantages Reuse existing map-side and reduce-side processing Agnostic to Spark's special transformations or actions No need to reinvent wheels Adopt existing features: authorization, window functions, UDFs, etc Open to future features
18.
18© Cloudera, Inc.
All rights reserved. Challenges Missing features or functional gaps in Spark Concurrent data pipelines Spark Context issues Scala vs Java API Scheduling issues Large code base in Hive, many contributors working in different areas Library dependency conflicts among projects
19.
19© Cloudera, Inc.
All rights reserved. Dynamic Executor Scaling Spark cluster per user session Heavy user vs light user Big query vs small query Solution: executors up and down based on load
20.
20© Cloudera, Inc.
All rights reserved. Current Status All functionality is implemented First round of optimization is completed More optimization and benchmarking Beta release in CDH in Feb. 2015 Released in Apache Hive 1.1 Included in CDH5.4 (coming soon) with support for selected customers Follow HIVE-7292 for current and future work
21.
21© Cloudera, Inc.
All rights reserved. Optimizations Map join (bucket map join, SMB, skew join (static and dynamic)) Split generation and grouping CBO, vectorization More to come, including table caching, dynamic partition pruning
22.
22© Cloudera, Inc.
All rights reserved. Summary Community driven project Multi-Organization support Combining merits from multiple projects Benefiting a large user base Bearing solid foundations A fast-moving, evolving project
23.
23© Cloudera, Inc.
All rights reserved. BenchMarks – Cluster Setup 8 physical nodes Each has 32 logical cores and 64GB memory 10Gb/s network between the nodes Component Version Hive Spark-branch Spark 1.3.0 Hadoop 2.6.0 Tez 0.5.3
24.
24© Cloudera, Inc.
All rights reserved. BenchMarks – Test Configurations 320GB and 4TB TPC-DS datasets Three engines share the most configurations – Each node is allocated 32 cores and 48GB memory – Vectorization enabled – CBO enabled – hive.auto.convert.join.noconditionaltask.size = 600MB
25.
25© Cloudera, Inc.
All rights reserved. BenchMarks – Test Configurations Hive on Spark • spark.master = yarn-client • spark.executor.memory = 5120m • spark.yarn.executor.memoryOverhead = 1024 • spark.executor.cores = 4 • spark.kryo.referenceTracking = false • spark.io.compression.codec = lzf Hive on Tez • hive.prewarm.numcontainers = 250 • hive.tez.auto.reducer.parallelism = true • hive.tez.dynamic.partition.pruning = true
26.
26© Cloudera, Inc.
All rights reserved. BenchMarks – Data Collecting We run each query 2 times and measure the 2nd run Spark on yarn waits for a number of executors to register before scheduling tasks, thus with a bigger start-up overhead We also measure a few queries for Tez with dynamic partition pruning disabled for fair comparison, as this optimization hasn't been implemented in Hive on Spark yet
27.
27© Cloudera, Inc.
All rights reserved. MR vs Spark vs Tez, 320GB
28.
28© Cloudera, Inc.
All rights reserved. MR vs Spark vs Tez, 320GB DPP helps Tez
29.
29© Cloudera, Inc.
All rights reserved. Prune partitions at runtime Can dramatically improve performance when tables are joined on partitioned columns To be implemented for Hive on Spark Benchmarks - Dynamic Partition Pruning
30.
30© Cloudera, Inc.
All rights reserved. Spark vs Tez vs Tez w/o DPP, 320GB
31.
31© Cloudera, Inc.
All rights reserved. MR vs Spark vs Tez, 4TB
32.
32© Cloudera, Inc.
All rights reserved. Spark vs Tez, 4TB
33.
33© Cloudera, Inc.
All rights reserved. Spark vs Tez, 4TB DPP helps Tez
34.
34© Cloudera, Inc.
All rights reserved. Spark vs Tez vs Tez w/o DPP, 4TB
35.
35© Cloudera, Inc.
All rights reserved. Spark vs Tez, 4TB Spark is faster
36.
36© Cloudera, Inc.
All rights reserved. Spark vs Tez, 4TB Tez is faster
37.
37© Cloudera, Inc.
All rights reserved. BenchMark - Summary Spark is as fast as or faster than Tez on many queries (Q28, Q82, Q88, etc.) Dynamic partition pruning helps Tez a lot on certain queries (Q3, Q15, Q19). Without DPP for Tez, Spark is close or even faster. Spark is slower on certain queries (common join, Q84) than Tez. Investigation and improvement is on the way. Bigger dataset seems helping Spark more. Spark will likely be faster after DPP is implemented.
38.
38© Cloudera, Inc.
All rights reserved. Q&A