SlideShare a Scribd company logo
1 of 44
Apache Tez : Accelerating
Hadoop Query Processing
Page 1
Arun C. Murthy Bikas Saha
Founder & Architect Hortonworks
@acmurthy @bikassaha
(@hortonworks)
© Hortonworks Inc. 2013
Hello!
• Founder/Architect at Hortonworks Inc.
–Lead - Map-Reduce/YARN/Tez
–Formerly, Architect Hadoop MapReduce, Yahoo
–Responsible for running Hadoop MapReduce as a service for all
of Yahoo (~50k nodes footprint)
• Apache Hadoop, ASF
–Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
–Long-term Committer/PMC member (full time for 7 years)
–Release Manager for hadoop-2.x
Page 2
© Hortonworks Inc. 2013
Once upon a time …
Page 3
… long, long ago, there was a kingdom we shall call
Apache Hadoop
http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo
© Hortonworks Inc. 2013
Hadoop begat …
Page 4
… a two-headed monster on every node in the kingdom;
each belonged to a different clan and answered to a
different master
http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg
© Hortonworks Inc. 2013
Knights of Bytes - HDFS
Page 5
… stored data uncompromisingly in directories/files, nary a
care about contents
http://whoiscraigmoser.com/Images/identity/knight.png
© Hortonworks Inc. 2013
Prince of Processing - MapReduce
Page 6
He ruled with an iron fist by mapping,
and then by mercilessly reducing datahttp://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg
© Hortonworks Inc. 2013
Peace Reigned
Page 7
… for a while with the odd change in the direction of the wind
http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg
© Hortonworks Inc. 2013
Slowly, but surely …
Page 8
Human beings define reality through misery and suffering.
- Agent Smith
http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp
© Hortonworks Inc. 2013
Slowly, but surely …
Page 9
Human beings define reality through misery and suffering.
- Agent Smith
http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp
© Hortonworks Inc. 2013
Slowly, but surely …
Page 10
… people of the kingdom clamored for more.
A palpable sense of greed & expectation.
http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg
© Hortonworks Inc. 2013
Signs of Distress
Page 11
SQL said some, others said Machine Learning,
still others said Real-Time Event Processing
http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg
© Hortonworks Inc. 2013
A Meeting at the Summit
Page 12
MapReduce is dead!
Err… not quite.
We need more options! We need more!
True…
http://4.bp.blogspot.com/-
oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp
© Hortonworks Inc. 2013
A Meeting at the Summit
Page 13
A common thread YARN running through all applications…
Long live the King!
http://whipup.net/wp-content/images/2008/08/yarn.gif
© Hortonworks Inc. 2013
The Edict
Page 14
Henceforth, in the Kingdom of King YARN…
MapReduce has been relegated to the status
of, merely, one of the applications!
http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg
© Hortonworks Inc. 2013
Reign of King YARN
Page 15
King YARN came to throne
with promises to return power
to all applications
equally, lower performance
taxes and resource
management…
http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg
© Hortonworks Inc. 2013
Oh the Shame!
Page 16
Well, at least, Prince
MapReduce still had
powerful allies like
Highness
Hive, Powerful
Pig, Cheery
Cascading…
http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg
© Hortonworks Inc. 2013
Things get worse before better
Page 17
Unfortunately, things got a lot worse for the Prince MapReduce…
http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg
© Hortonworks Inc. 2013
Knight Tez
Page 18
He did MapReduce, and so much more…
Smartly aligned himself to Kingdom YARN.
http://twomorrows.com/alterego/media/08shiningknight.gif
© Hortonworks Inc. 2013
Knight Tez
Page 19
… they decided to throw their
lot with Knight Tez!
http://informatica.upg-ploiesti.ro/62689/img/partners.jpg
Long term alliances of MapReduce with
Hive, Pig, Cascading etc. broke up…
http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png
© Hortonworks Inc. 2013
Happily ever after…
Page 20
(nothing cute to say)
© Hortonworks Inc. 2013
On a more serious note…
Page 21
© Hortonworks Inc. 2013
Every season has a flavor…
Page 22
SQL-on-Hadoop is the new black!
SQL-on-Hadoop will be solved within
the existing ecosystem
© Hortonworks Inc. 2013
Looking ahead
Page 23
What will it be next year?
Real-time event processing?
Machine Learning?
© Hortonworks Inc. 2013
Play to our strengths
Page 24
Invest in the Apache Hadoop platform
and the ecosystem (Hive et al).
© Hortonworks Inc. 2013
Seriously…
Technical Details
Page 25
© Hortonworks Inc. 2013
Tez – Introduction
Page 26
• Distributed execution
framework targeted towards
data-processing applications.
• Based on expressing a
computation as a dataflow
graph.
• Built on top of YARN – the
resource management
framework for Hadoop.
• Open source Apache incubator
project and Apache licensed.
© Hortonworks Inc. 2013
Tez – Design Themes
Page 27
• Empowering End Users
• Execution Performance
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying deployment
Page 28
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Expressive dataflow definition API’s
–Enable definition of complex data flow pipelines using simple
graph connection API’s. Tez expands the logical plan at runtime.
–Targeted towards data processing applications like Hive/Pig but
not limited to it. Hive/Pig query plans naturally map to Tez dataflow
graphs with no translation impedance.
Page 29
TaskA-1 TaskA-2 TaskB-1 TaskB-2 TaskC-1 TaskC-2
TaskD-1 TaskD-2 TaskE-1 TaskE-2
© Hortonworks Inc. 2013
Aggregate Stage
Partition Stage
Preprocessor Stage
Tez – Empowering End Users
• Expressive dataflow definition API’s
Page 30
Sampler
Task-1 Task-2
Task-1 Task-2
Task-1 Task-2
Samples
Ranges
Distributed Sort
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Flexible Input-Processor-Output runtime model
–Construct physical runtime executors dynamically by connecting
different inputs, processors and outputs.
–End goal is to have a library of inputs, outputs and processors that
can be programmatically composed to generate useful operators.
Page 31
IntermediateReduce
ShuffleInput
ReduceProcessor
FileSortedOutput
FinalReduce
ShuffleInput
ReduceProcessor
HDFSOutput
PairwiseJoin
Input1
JoinProcessor
FileSortedOutput
Input2
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Data type agnostic
–Tez is only concerned with the movement of data. Files and
streams of bytes.
–Does not impose any data format on the user application. MR
application can use Key-Value pairs on top of Tez. Hive and Pig
can use tuple oriented formats that are natural and native to them.
Page 32
File
Stream
Key Value
Tez Task
Tuples
User Code
Bytes Bytes
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Simplifying deployment
–Tez is a completely client side application.
–No deployments to do. Simply upload to any accessible
FileSystem and change local Tez configuration to point to that.
–Enables running different versions concurrently. Easy to test new
functionality while keeping stable versions for production.
–Leverages YARN local resources and distributed cache.
Page 33
Client
Machine
Node
Manager
TezTask
Node
Manager
TezTaskTezClient
HDFS
Tez Lib 1 Tez Lib 2
Client
Machine
TezClient
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Expressive dataflow definition API’s
• Flexible Input-Processor-Output runtime model
• Data type agnostic
• Simplifying usage
With great power API’s come great responsibilities 
Page 34
© Hortonworks Inc. 2013
Tez – Execution Performance
• Performance gains over Map Reduce
• Plan reconfiguration at runtime
• Optimal resource management
• Dynamic physical data flow decisions
Page 35
© Hortonworks Inc. 2013
Tez – Execution Performance
• Performance gains over Map Reduce
–Eliminate replicated write barrier between successive
computations.
–Eliminate job launch overhead of workflow jobs.
–Eliminate extra stage of map reads in every workflow job.
–Eliminate queue and resource contention suffered by workflow
jobs that are started after a predecessor job completes.
Page 36
Pig/Hive - MR
Pig/Hive - Tez
© Hortonworks Inc. 2013
Tez – Execution Performance
• Plan reconfiguration at runtime
–Dynamic runtime concurrent control based on data size, user
operator resources, available cluster resources and locality.
–Advanced changes in dataflow graph structure.
–Progressive graph construction in concert with user optimizer.
Page 37
HDFS
Blocks
YARN
Resources
Stage 1
50 maps
100
partitions
Stage 2
100
reducers
Stage 1
50 maps
100
partitions
Stage 2
100 10
reducers
Only 10GB’s
of data
© Hortonworks Inc. 2013
Tez – Execution Performance
• Optimal resource management
–Reuse YARN containers to launch new tasks.
–Reuse YARN containers to enable shared objects across tasks.
Page 38
YARN Container
TezTask Host
TezTask1
TezTask2
SharedObjects
YARN Container
Tez
Application Master
Start Task
Task Done
Start Task
© Hortonworks Inc. 2013
Tez – Execution Performance
• Dynamic physical data flow decisions
–Decide the type of physical byte movement and storage on the fly.
–Store intermediate data on distributed store, local store or in-
memory.
–Transfer bytes via blocking files or streaming and the spectrum in
between.
Page 39
Producer
(small size)
In-Memory
Consumer
Producer
Local File
Consumer
At Runtime
© Hortonworks Inc. 2013
Tez – Current status
• Apache Incubator Project
–Rapid development. Over 270 jiras opened. Over 170 resolved.
–Growing community.
• Focus on stability
–Testing and quality are highest priority.
–Code ready and deployed on multi-node clusters.
• DAG of MR processing is working
– Already functionally equivalent to Map Reduce. Existing Map
Reduce jobs can be executed on Tez with few or no changes.
– Working Hive prototype that can target Tez for execution of
queries.
–Work started on prototype of Pig that can target Tez.
Page 40
© Hortonworks Inc. 2013
Tez – Current status
Page 41
Fact Table
Dimension
Table 1
Result
Table 1
Dimension
Table 2
Result
Table 2
Dimension
Table 3
Result
Table 3
Join
Join
Join
Typical pattern in a
TPC-DS query
Fact Table
Dimension
Table 1
Dimension
Table 1
Dimension
Table 1
Optimization for
small data sets
Both can now run
as a single Tez job
© Hortonworks Inc. 2013
Tez – Looking ahead
• Early adopters and contributors welcome
–Adopters to drive more scenarios. Contributors to make them
happen.
• Stay tuned for Tez meetups with deep dives on Tez
architecture and using Tez
• Useful links
–Work tracking: https://issues.apache.org/jira/browse/TEZ
–Code: https://github.com/apache/incubator-tez
–High level design document and API specification:
https://issues.apache.org/jira/browse/TEZ-65
– Developer list: dev@tez.incubator.apache.org
User list: user@tez.incubator.apache.org
Issues list: issues@tez.incubator.apache.org
Page 42
© Hortonworks Inc. 2013
Tez – Takeaways
• Distributed execution framework that works on
computations represented as dataflow graphs
• Naturally maps to execution plans produced by query
optimizers
• Execution architecture designed to enable dynamic
performance optimizations at runtime
• Open source Apache project – your use-cases and
code are welcome
• It works and is already being used by Hive
Page 43
© Hortonworks Inc. 2013
Tez
Thanks for your time and attention!
Questions?
Page 44

More Related Content

What's hot

Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 

What's hot (20)

Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 

Similar to Apache Tez: Accelerating Hadoop Query Processing

February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaData Con LA
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing enginebigdatagurus_meetup
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Modern Data Stack France
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Caserta
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 

Similar to Apache Tez: Accelerating Hadoop Query Processing (20)

February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Apache Tez: Accelerating Hadoop Query Processing

  • 1. Apache Tez : Accelerating Hadoop Query Processing Page 1 Arun C. Murthy Bikas Saha Founder & Architect Hortonworks @acmurthy @bikassaha (@hortonworks)
  • 2. © Hortonworks Inc. 2013 Hello! • Founder/Architect at Hortonworks Inc. –Lead - Map-Reduce/YARN/Tez –Formerly, Architect Hadoop MapReduce, Yahoo –Responsible for running Hadoop MapReduce as a service for all of Yahoo (~50k nodes footprint) • Apache Hadoop, ASF –Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) –Long-term Committer/PMC member (full time for 7 years) –Release Manager for hadoop-2.x Page 2
  • 3. © Hortonworks Inc. 2013 Once upon a time … Page 3 … long, long ago, there was a kingdom we shall call Apache Hadoop http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo
  • 4. © Hortonworks Inc. 2013 Hadoop begat … Page 4 … a two-headed monster on every node in the kingdom; each belonged to a different clan and answered to a different master http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg
  • 5. © Hortonworks Inc. 2013 Knights of Bytes - HDFS Page 5 … stored data uncompromisingly in directories/files, nary a care about contents http://whoiscraigmoser.com/Images/identity/knight.png
  • 6. © Hortonworks Inc. 2013 Prince of Processing - MapReduce Page 6 He ruled with an iron fist by mapping, and then by mercilessly reducing datahttp://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg
  • 7. © Hortonworks Inc. 2013 Peace Reigned Page 7 … for a while with the odd change in the direction of the wind http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg
  • 8. © Hortonworks Inc. 2013 Slowly, but surely … Page 8 Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp
  • 9. © Hortonworks Inc. 2013 Slowly, but surely … Page 9 Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp
  • 10. © Hortonworks Inc. 2013 Slowly, but surely … Page 10 … people of the kingdom clamored for more. A palpable sense of greed & expectation. http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg
  • 11. © Hortonworks Inc. 2013 Signs of Distress Page 11 SQL said some, others said Machine Learning, still others said Real-Time Event Processing http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg
  • 12. © Hortonworks Inc. 2013 A Meeting at the Summit Page 12 MapReduce is dead! Err… not quite. We need more options! We need more! True… http://4.bp.blogspot.com/- oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp
  • 13. © Hortonworks Inc. 2013 A Meeting at the Summit Page 13 A common thread YARN running through all applications… Long live the King! http://whipup.net/wp-content/images/2008/08/yarn.gif
  • 14. © Hortonworks Inc. 2013 The Edict Page 14 Henceforth, in the Kingdom of King YARN… MapReduce has been relegated to the status of, merely, one of the applications! http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg
  • 15. © Hortonworks Inc. 2013 Reign of King YARN Page 15 King YARN came to throne with promises to return power to all applications equally, lower performance taxes and resource management… http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg
  • 16. © Hortonworks Inc. 2013 Oh the Shame! Page 16 Well, at least, Prince MapReduce still had powerful allies like Highness Hive, Powerful Pig, Cheery Cascading… http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg
  • 17. © Hortonworks Inc. 2013 Things get worse before better Page 17 Unfortunately, things got a lot worse for the Prince MapReduce… http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg
  • 18. © Hortonworks Inc. 2013 Knight Tez Page 18 He did MapReduce, and so much more… Smartly aligned himself to Kingdom YARN. http://twomorrows.com/alterego/media/08shiningknight.gif
  • 19. © Hortonworks Inc. 2013 Knight Tez Page 19 … they decided to throw their lot with Knight Tez! http://informatica.upg-ploiesti.ro/62689/img/partners.jpg Long term alliances of MapReduce with Hive, Pig, Cascading etc. broke up… http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png
  • 20. © Hortonworks Inc. 2013 Happily ever after… Page 20 (nothing cute to say)
  • 21. © Hortonworks Inc. 2013 On a more serious note… Page 21
  • 22. © Hortonworks Inc. 2013 Every season has a flavor… Page 22 SQL-on-Hadoop is the new black! SQL-on-Hadoop will be solved within the existing ecosystem
  • 23. © Hortonworks Inc. 2013 Looking ahead Page 23 What will it be next year? Real-time event processing? Machine Learning?
  • 24. © Hortonworks Inc. 2013 Play to our strengths Page 24 Invest in the Apache Hadoop platform and the ecosystem (Hive et al).
  • 25. © Hortonworks Inc. 2013 Seriously… Technical Details Page 25
  • 26. © Hortonworks Inc. 2013 Tez – Introduction Page 26 • Distributed execution framework targeted towards data-processing applications. • Based on expressing a computation as a dataflow graph. • Built on top of YARN – the resource management framework for Hadoop. • Open source Apache incubator project and Apache licensed.
  • 27. © Hortonworks Inc. 2013 Tez – Design Themes Page 27 • Empowering End Users • Execution Performance
  • 28. © Hortonworks Inc. 2013 Tez – Empowering End Users • Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying deployment Page 28
  • 29. © Hortonworks Inc. 2013 Tez – Empowering End Users • Expressive dataflow definition API’s –Enable definition of complex data flow pipelines using simple graph connection API’s. Tez expands the logical plan at runtime. –Targeted towards data processing applications like Hive/Pig but not limited to it. Hive/Pig query plans naturally map to Tez dataflow graphs with no translation impedance. Page 29 TaskA-1 TaskA-2 TaskB-1 TaskB-2 TaskC-1 TaskC-2 TaskD-1 TaskD-2 TaskE-1 TaskE-2
  • 30. © Hortonworks Inc. 2013 Aggregate Stage Partition Stage Preprocessor Stage Tez – Empowering End Users • Expressive dataflow definition API’s Page 30 Sampler Task-1 Task-2 Task-1 Task-2 Task-1 Task-2 Samples Ranges Distributed Sort
  • 31. © Hortonworks Inc. 2013 Tez – Empowering End Users • Flexible Input-Processor-Output runtime model –Construct physical runtime executors dynamically by connecting different inputs, processors and outputs. –End goal is to have a library of inputs, outputs and processors that can be programmatically composed to generate useful operators. Page 31 IntermediateReduce ShuffleInput ReduceProcessor FileSortedOutput FinalReduce ShuffleInput ReduceProcessor HDFSOutput PairwiseJoin Input1 JoinProcessor FileSortedOutput Input2
  • 32. © Hortonworks Inc. 2013 Tez – Empowering End Users • Data type agnostic –Tez is only concerned with the movement of data. Files and streams of bytes. –Does not impose any data format on the user application. MR application can use Key-Value pairs on top of Tez. Hive and Pig can use tuple oriented formats that are natural and native to them. Page 32 File Stream Key Value Tez Task Tuples User Code Bytes Bytes
  • 33. © Hortonworks Inc. 2013 Tez – Empowering End Users • Simplifying deployment –Tez is a completely client side application. –No deployments to do. Simply upload to any accessible FileSystem and change local Tez configuration to point to that. –Enables running different versions concurrently. Easy to test new functionality while keeping stable versions for production. –Leverages YARN local resources and distributed cache. Page 33 Client Machine Node Manager TezTask Node Manager TezTaskTezClient HDFS Tez Lib 1 Tez Lib 2 Client Machine TezClient
  • 34. © Hortonworks Inc. 2013 Tez – Empowering End Users • Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying usage With great power API’s come great responsibilities  Page 34
  • 35. © Hortonworks Inc. 2013 Tez – Execution Performance • Performance gains over Map Reduce • Plan reconfiguration at runtime • Optimal resource management • Dynamic physical data flow decisions Page 35
  • 36. © Hortonworks Inc. 2013 Tez – Execution Performance • Performance gains over Map Reduce –Eliminate replicated write barrier between successive computations. –Eliminate job launch overhead of workflow jobs. –Eliminate extra stage of map reads in every workflow job. –Eliminate queue and resource contention suffered by workflow jobs that are started after a predecessor job completes. Page 36 Pig/Hive - MR Pig/Hive - Tez
  • 37. © Hortonworks Inc. 2013 Tez – Execution Performance • Plan reconfiguration at runtime –Dynamic runtime concurrent control based on data size, user operator resources, available cluster resources and locality. –Advanced changes in dataflow graph structure. –Progressive graph construction in concert with user optimizer. Page 37 HDFS Blocks YARN Resources Stage 1 50 maps 100 partitions Stage 2 100 reducers Stage 1 50 maps 100 partitions Stage 2 100 10 reducers Only 10GB’s of data
  • 38. © Hortonworks Inc. 2013 Tez – Execution Performance • Optimal resource management –Reuse YARN containers to launch new tasks. –Reuse YARN containers to enable shared objects across tasks. Page 38 YARN Container TezTask Host TezTask1 TezTask2 SharedObjects YARN Container Tez Application Master Start Task Task Done Start Task
  • 39. © Hortonworks Inc. 2013 Tez – Execution Performance • Dynamic physical data flow decisions –Decide the type of physical byte movement and storage on the fly. –Store intermediate data on distributed store, local store or in- memory. –Transfer bytes via blocking files or streaming and the spectrum in between. Page 39 Producer (small size) In-Memory Consumer Producer Local File Consumer At Runtime
  • 40. © Hortonworks Inc. 2013 Tez – Current status • Apache Incubator Project –Rapid development. Over 270 jiras opened. Over 170 resolved. –Growing community. • Focus on stability –Testing and quality are highest priority. –Code ready and deployed on multi-node clusters. • DAG of MR processing is working – Already functionally equivalent to Map Reduce. Existing Map Reduce jobs can be executed on Tez with few or no changes. – Working Hive prototype that can target Tez for execution of queries. –Work started on prototype of Pig that can target Tez. Page 40
  • 41. © Hortonworks Inc. 2013 Tez – Current status Page 41 Fact Table Dimension Table 1 Result Table 1 Dimension Table 2 Result Table 2 Dimension Table 3 Result Table 3 Join Join Join Typical pattern in a TPC-DS query Fact Table Dimension Table 1 Dimension Table 1 Dimension Table 1 Optimization for small data sets Both can now run as a single Tez job
  • 42. © Hortonworks Inc. 2013 Tez – Looking ahead • Early adopters and contributors welcome –Adopters to drive more scenarios. Contributors to make them happen. • Stay tuned for Tez meetups with deep dives on Tez architecture and using Tez • Useful links –Work tracking: https://issues.apache.org/jira/browse/TEZ –Code: https://github.com/apache/incubator-tez –High level design document and API specification: https://issues.apache.org/jira/browse/TEZ-65 – Developer list: dev@tez.incubator.apache.org User list: user@tez.incubator.apache.org Issues list: issues@tez.incubator.apache.org Page 42
  • 43. © Hortonworks Inc. 2013 Tez – Takeaways • Distributed execution framework that works on computations represented as dataflow graphs • Naturally maps to execution plans produced by query optimizers • Execution architecture designed to enable dynamic performance optimizations at runtime • Open source Apache project – your use-cases and code are welcome • It works and is already being used by Hive Page 43
  • 44. © Hortonworks Inc. 2013 Tez Thanks for your time and attention! Questions? Page 44