Tez is a data processing framework that allows dataflow jobs to be expressed as directed acyclic graphs (DAGs). It is built on top of YARN for resource management and aims to provide better performance than MapReduce by enabling container reuse, late binding of tasks, and simplifying operations. Tez defines APIs for developers to express DAGs and processing logic to customize jobs.
Container Reuse
Fault Tolerance
Recovery
Routing Data Efficiently
Elasticity
Hard to expect the f/w to do the last bit of optimizations. Sometimes, user would like to instruct the framework on what needs to be done at runtime. Tez allows such customizations.
It is easy to operate, experiment and upgrade Tez deployment. This is hugely important, since we had to get a downtime for the entire cluster with previous MR deployment.