SlideShare a Scribd company logo
1 of 20
Dag Delete:
Improving Reserved Space
Utilization
2
CONCEPTS AND COMPONENTS
● Apache Tez Session Mode
• Runs multiple queries as part of a
single session
• One Application Master, multiple
DAGs
• Apache Hive and Hue primarily
use session mode
• Many transactions = many
DAGs
Fig. Multiple DAGs can form a session under the same AppMaster
DAG 1
Tez AM
DAG 2 DAG 3 DAG 4
3
CONCEPTS AND COMPONENTS (Contd.)
● What is Container Reuse?
• MapReduce has single use
containers
• Apache Tez avoids initialization
cost by reusing containers for
tasks across DAGs and vertices
Task
Attempt1
YARN Container
Fig. DAGs and tasks can share the the same container
Task
Attempt2
4
PROBLEM - THE LIFE OF SESSION INTERMEDIATE DATA
● DAG shuffle data lives beyond its usefulness.
● Bigger the shuffle, larger the disk footprint.
DAG 1 DAG 2 DAG 3 DAG 4
Fig. Multiple DAGs can form a session under the same AppMaster
Tez AM
Shuffle
Data 1
Shuffle
Data 2
Unused for rest of the session
5
PROBLEM - THE LIFE OF SESSION INTERMEDIATE DATA (Contd.)
● Session Bloating
• Shuffle data occupies significant chunk of the
disks
• May elbow out other apps on the same node
Fig. Session Bloating
DAG 3
Tez AM
Shuffle
Data
Shuffle
Data
DAG 4
Shuffle
Data
Shuffle
DataShuffle
Data
Shuffle
DataShuffle
Data
Shuffle
Data
Shuffle
Data
Shuffle
Data
A Solution:
DAG Delete
7
GETTING TO DAG DELETE
● Associate intermediate shuffle data to its DAG.
8
A SOLUTION : DAG DELETE
● Shuffle Handler that understands DAG deletion requests.
• Container Launchers are clients
• Tez Shuffle Handler is the server
• DAG deletion queries asynchronous HTTP requests sent
over multiple threads.
Tez AM
Container Launcher
Manager
Container
Launcher1
Container
Launcher2
Deletion
Tracker1
Deletion
Tracker2
Tez Shuffle Handler on NM
Tez Shuffle Handler on NM
Tez Shuffle Handler on NM
Tez Shuffle Handler on NM
DagComplete
LAUNCH LAUNCH
<NodeId, ShufflePort>
<NodeId, ShufflePort>
<NodeId, ShufflePort> <NodeId, ShufflePort>
Fig. Dag Deletion Architecture and Control Flow
DAG
I just finished!!
9
● Deletion Policy is pluggable
• Write your own
• Container Launchers may or may not want one
• Allow different “types” of shuffle data deletion services
• Not every type of container may know its shuffle port
• Implement use case specific optimization
A SOLUTION : DAG DELETE
Results
11
● Defining Waste
• More SHUFFLE_BYTES unused.
• More idle time for the same data.
● Defining the metric
DAG DELETE EVALUATION
12
• Percentage Reserved Space-Time
Savings : 54.4%
• Multiple DAGs with comparable sizes and
runtimes = more savings
THE BEST OF TIMES
0
1
2
3
4
5
6
7
8
0 500 1000 1500 2000
SHUFFLE_BYTES(GBs) Time (sec)
DAG Shuffle Bytes over Time
After Before
13
• Percentage Reserved Space-Time
Savings : 33.4%
• Multiple DAGs with nearly equal sizes
and runtimes = An entire DAG worth of
saving each time.
0
1
2
3
4
0 200 400 600 800
SHUFFLE_BYTES(GBs) Time (sec)
DAG Shuffle Bytes over Time
After Before
THE UNEXPECTED TIMES
14
• Percentage Reserved Space-Time
Savings : 26.05%
• 9 DAGs, most of them with no shuffle
data, interspersed by large shuffle DAGs
= Reasonable savings!
0
1
2
0 500 1000 1500
SHUFFLE_BYTES(GBs) Time (sec)
DAG Shuffle Bytes over Time
After Before
THE UNEXPECTED TIMES
15
• Percentage Reserved Space-Time Savings
: 0.00004%
• Multiple DAGs with almost no shuffle data
culminating with a shuffle intensive DAG.
0
1
2
3
0 50 100 150 200
SHUFFLE_BYTES(GBs) Time (sec)
DAG Shuffle Bytes over Time
After Before
THE WORST TIMES
16
• The Tez Shuffle Handler (TEZ-3334) shows phenomenal performance gain through composite fetch.
• The shuffle times drop by orders of magnitude.
• Performance improvements with actual job times dropping by several minutes
• More jobs per unit of time, more throughput
• The utilization of disk space is markedly better through Dag Delete.
• Shuffle heavy long running sessions show space-time savings of ~50%
• Makes Tez Shuffle Handler a better partner in a multi-tenant setting
CONCLUSION
17
● Vertex Deletion (TEZ-3363)
• Can we predict when vertex intermediate data is stale enough to be deleted?
• How will it impact the worst case we saw during DAG delete?
• Think delete data at a certain depth of a vertex’s completion
● Multi file (Unordered) (TEZ-3367)
• The Unordered case does not need multiple spill files
• How to limit multiple output files from taking up all the INodes?
• Removing buffers for key value reads
• Address skew scenario
FUTURE WORK
18
● Slow Start with MRInput (TEZ-3274)
• Vertices with shuffle input and MRInput do not respect slow start
● Empty partition improvements (TEZ-3605)
• Ordered case writes empty partitions to the IFiles
• In a heavily auto-reduced scenario, such partitions are fetched and then thrown away
FUTURE WORK
19
● The Apache Tez Community
● Rohini Palaniswamy, PMC/Committer, Apache Pig, Apache TEZ, Apache Oozie
● Jason Lowe , PMC/Committer, Apache Hadoop, Apache Tez
● Siddharth Seth, PMC/Committer, Apache Hadoop, Apache Tez, Apache Hive
● Hitesh Shah, PMC/Committer, Apache Hadoop, Apache Tez, Apache Ambari
ACKNOWLEDGEMENTS
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop

More Related Content

What's hot

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationEyad Garelnabi
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
Oracle Table Partitioning - Introduction
Oracle Table Partitioning  - IntroductionOracle Table Partitioning  - Introduction
Oracle Table Partitioning - IntroductionMyOnlineITCourses
 
Arquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQLArquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQLRaul Oliveira
 

What's hot (20)

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
Oracle Table Partitioning - Introduction
Oracle Table Partitioning  - IntroductionOracle Table Partitioning  - Introduction
Oracle Table Partitioning - Introduction
 
Arquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQLArquitetura de Memoria do PostgreSQL
Arquitetura de Memoria do PostgreSQL
 

Similar to Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop

MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAiougVizagChapter
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...Nexus FrontierTech
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2ScyllaDB
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
GLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New FeaturesGLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New FeaturesBiju Thomas
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 

Similar to Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop (20)

MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_features
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
Tech Talk #4 : Multi - threading and GCD ( grand central dispatch ) in iOS - ...
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
Storage Management
Storage ManagementStorage Management
Storage Management
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
GLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New FeaturesGLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New Features
 
Spark architechure.pptx
Spark architechure.pptxSpark architechure.pptx
Spark architechure.pptx
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Map db
Map dbMap db
Map db
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop

  • 1. Dag Delete: Improving Reserved Space Utilization
  • 2. 2 CONCEPTS AND COMPONENTS ● Apache Tez Session Mode • Runs multiple queries as part of a single session • One Application Master, multiple DAGs • Apache Hive and Hue primarily use session mode • Many transactions = many DAGs Fig. Multiple DAGs can form a session under the same AppMaster DAG 1 Tez AM DAG 2 DAG 3 DAG 4
  • 3. 3 CONCEPTS AND COMPONENTS (Contd.) ● What is Container Reuse? • MapReduce has single use containers • Apache Tez avoids initialization cost by reusing containers for tasks across DAGs and vertices Task Attempt1 YARN Container Fig. DAGs and tasks can share the the same container Task Attempt2
  • 4. 4 PROBLEM - THE LIFE OF SESSION INTERMEDIATE DATA ● DAG shuffle data lives beyond its usefulness. ● Bigger the shuffle, larger the disk footprint. DAG 1 DAG 2 DAG 3 DAG 4 Fig. Multiple DAGs can form a session under the same AppMaster Tez AM Shuffle Data 1 Shuffle Data 2 Unused for rest of the session
  • 5. 5 PROBLEM - THE LIFE OF SESSION INTERMEDIATE DATA (Contd.) ● Session Bloating • Shuffle data occupies significant chunk of the disks • May elbow out other apps on the same node Fig. Session Bloating DAG 3 Tez AM Shuffle Data Shuffle Data DAG 4 Shuffle Data Shuffle DataShuffle Data Shuffle DataShuffle Data Shuffle Data Shuffle Data Shuffle Data
  • 7. 7 GETTING TO DAG DELETE ● Associate intermediate shuffle data to its DAG.
  • 8. 8 A SOLUTION : DAG DELETE ● Shuffle Handler that understands DAG deletion requests. • Container Launchers are clients • Tez Shuffle Handler is the server • DAG deletion queries asynchronous HTTP requests sent over multiple threads. Tez AM Container Launcher Manager Container Launcher1 Container Launcher2 Deletion Tracker1 Deletion Tracker2 Tez Shuffle Handler on NM Tez Shuffle Handler on NM Tez Shuffle Handler on NM Tez Shuffle Handler on NM DagComplete LAUNCH LAUNCH <NodeId, ShufflePort> <NodeId, ShufflePort> <NodeId, ShufflePort> <NodeId, ShufflePort> Fig. Dag Deletion Architecture and Control Flow DAG I just finished!!
  • 9. 9 ● Deletion Policy is pluggable • Write your own • Container Launchers may or may not want one • Allow different “types” of shuffle data deletion services • Not every type of container may know its shuffle port • Implement use case specific optimization A SOLUTION : DAG DELETE
  • 11. 11 ● Defining Waste • More SHUFFLE_BYTES unused. • More idle time for the same data. ● Defining the metric DAG DELETE EVALUATION
  • 12. 12 • Percentage Reserved Space-Time Savings : 54.4% • Multiple DAGs with comparable sizes and runtimes = more savings THE BEST OF TIMES 0 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 SHUFFLE_BYTES(GBs) Time (sec) DAG Shuffle Bytes over Time After Before
  • 13. 13 • Percentage Reserved Space-Time Savings : 33.4% • Multiple DAGs with nearly equal sizes and runtimes = An entire DAG worth of saving each time. 0 1 2 3 4 0 200 400 600 800 SHUFFLE_BYTES(GBs) Time (sec) DAG Shuffle Bytes over Time After Before THE UNEXPECTED TIMES
  • 14. 14 • Percentage Reserved Space-Time Savings : 26.05% • 9 DAGs, most of them with no shuffle data, interspersed by large shuffle DAGs = Reasonable savings! 0 1 2 0 500 1000 1500 SHUFFLE_BYTES(GBs) Time (sec) DAG Shuffle Bytes over Time After Before THE UNEXPECTED TIMES
  • 15. 15 • Percentage Reserved Space-Time Savings : 0.00004% • Multiple DAGs with almost no shuffle data culminating with a shuffle intensive DAG. 0 1 2 3 0 50 100 150 200 SHUFFLE_BYTES(GBs) Time (sec) DAG Shuffle Bytes over Time After Before THE WORST TIMES
  • 16. 16 • The Tez Shuffle Handler (TEZ-3334) shows phenomenal performance gain through composite fetch. • The shuffle times drop by orders of magnitude. • Performance improvements with actual job times dropping by several minutes • More jobs per unit of time, more throughput • The utilization of disk space is markedly better through Dag Delete. • Shuffle heavy long running sessions show space-time savings of ~50% • Makes Tez Shuffle Handler a better partner in a multi-tenant setting CONCLUSION
  • 17. 17 ● Vertex Deletion (TEZ-3363) • Can we predict when vertex intermediate data is stale enough to be deleted? • How will it impact the worst case we saw during DAG delete? • Think delete data at a certain depth of a vertex’s completion ● Multi file (Unordered) (TEZ-3367) • The Unordered case does not need multiple spill files • How to limit multiple output files from taking up all the INodes? • Removing buffers for key value reads • Address skew scenario FUTURE WORK
  • 18. 18 ● Slow Start with MRInput (TEZ-3274) • Vertices with shuffle input and MRInput do not respect slow start ● Empty partition improvements (TEZ-3605) • Ordered case writes empty partitions to the IFiles • In a heavily auto-reduced scenario, such partitions are fetched and then thrown away FUTURE WORK
  • 19. 19 ● The Apache Tez Community ● Rohini Palaniswamy, PMC/Committer, Apache Pig, Apache TEZ, Apache Oozie ● Jason Lowe , PMC/Committer, Apache Hadoop, Apache Tez ● Siddharth Seth, PMC/Committer, Apache Hadoop, Apache Tez, Apache Hive ● Hitesh Shah, PMC/Committer, Apache Hadoop, Apache Tez, Apache Ambari ACKNOWLEDGEMENTS