Tez: UI & Debugging 
Fall 2014 
Version 1.0 
Page 1 © Hortonworks Inc. 2014 
gopalv@apache.org
TEZ (nomenclature) 
• DAG 
• Vertex 
• Task 
• Attempt 
• Container 
• Edge 
Page 2 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Directed Acyclic Graphs 
Page 3 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
How to view raw DAGs from logs 
• Tez Application logs contain .dot files in Graphviz format 
• To generate images: dot –Tpng –o dag.png dag.dot 
• OR javascript version: http://people.apache.org/~gopalv/dagviz/ 
Page 4 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TEZ-8 JIRA & branch 
• TEZ UI for progress tracking and history 
• https://issues.apache.org/jira/browse/TEZ-8 
• https://github.com/apache/tez/tree/TEZ-8 
• UI-centric branch 
Page 5 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez-UI: Landing page 
Page 6 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: DAG view 
Page 7 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Vertex view 
Page 8 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Vertex -> Tasks view 
Page 9 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Task logs 
Page 10 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 
Task logs
Tez UI: Task counters 
Page 11 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 
Task counters
Tez UI: Task counters 
Page 12 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 
Search for 
counters
Tez UI: Per-edge shuffle counters 
Page 13 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 
Map 3 to Map 1 only
Tez UI: Payload view 
Page 14 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Failed DAGs (diagnostic) 
Page 15 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Failed tasks indication 
Page 16 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 
Failed tasks
Tez UI: Failed tasks 
Page 17 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Failed attempts 
Page 18 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Post-hoc/Ad-hoc analysis helpers 
• tez/tez-tools ships with two helper tools 
• swimlanes 
• tez-tfile-parser 
Page 20 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Swimlanes 
• ./yarn-swimlanes.sh application_1415860665053_0098 
Page 21 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TFile parser 
• Tez logs can be parsed via PIG 
• Allows us to treat our logs exactly like we treat our big-data 
• Processing using “pig –x tez” + UDFs [1] 
rawLogs = load ‘/app-logs/root/logs/application_1409012059361_0539/*' using 
org.apache.tez.tools.TFileLoader() as (machine:chararray, key:chararray, line:chararray); 
[1] - https://github.com/rajeshbalamohan/tez_log_parser/blob/master/src/main/resources/pig/udf.groovy 
Page 22 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TFile parser (contd) 
• Parsing INFO logs for shuffle for instance (for time taken + machine) 
Problematic machine 
Page 23 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TFile parser (node/rack traffic at 350 nodes) 
Problematic machine 
Fetcher in node-100 is always slow 
(irrespective of where its pulling data from) 
Other faulty nodes 
Page 24 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 
Mapout served from node-100 to node-120 
To any node is always slow
Questions? 
• Thanks all tez contributors for their efforts! 
• FYI, Hadoop Summit 2015 (Europe) Call for papers is out 
Page 25 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49

November 2014 HUG: Apache Tez - A Performance View into Large Scale Data-processing, Shuffle Throughput, Reducer parallelism and Reducer Skew

  • 1.
    Tez: UI &Debugging Fall 2014 Version 1.0 Page 1 © Hortonworks Inc. 2014 gopalv@apache.org
  • 2.
    TEZ (nomenclature) •DAG • Vertex • Task • Attempt • Container • Edge Page 2 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 3.
    Directed Acyclic Graphs Page 3 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 4.
    How to viewraw DAGs from logs • Tez Application logs contain .dot files in Graphviz format • To generate images: dot –Tpng –o dag.png dag.dot • OR javascript version: http://people.apache.org/~gopalv/dagviz/ Page 4 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 5.
    TEZ-8 JIRA &branch • TEZ UI for progress tracking and history • https://issues.apache.org/jira/browse/TEZ-8 • https://github.com/apache/tez/tree/TEZ-8 • UI-centric branch Page 5 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 6.
    Tez-UI: Landing page Page 6 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 7.
    Tez UI: DAGview Page 7 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 8.
    Tez UI: Vertexview Page 8 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 9.
    Tez UI: Vertex-> Tasks view Page 9 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 10.
    Tez UI: Tasklogs Page 10 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 Task logs
  • 11.
    Tez UI: Taskcounters Page 11 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 Task counters
  • 12.
    Tez UI: Taskcounters Page 12 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 Search for counters
  • 13.
    Tez UI: Per-edgeshuffle counters Page 13 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 Map 3 to Map 1 only
  • 14.
    Tez UI: Payloadview Page 14 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 15.
    Tez UI: FailedDAGs (diagnostic) Page 15 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 16.
    Tez UI: Failedtasks indication Page 16 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 Failed tasks
  • 17.
    Tez UI: Failedtasks Page 17 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 18.
    Tez UI: Failedattempts Page 18 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 19.
    Post-hoc/Ad-hoc analysis helpers • tez/tez-tools ships with two helper tools • swimlanes • tez-tfile-parser Page 20 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 20.
    Swimlanes • ./yarn-swimlanes.shapplication_1415860665053_0098 Page 21 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 21.
    TFile parser •Tez logs can be parsed via PIG • Allows us to treat our logs exactly like we treat our big-data • Processing using “pig –x tez” + UDFs [1] rawLogs = load ‘/app-logs/root/logs/application_1409012059361_0539/*' using org.apache.tez.tools.TFileLoader() as (machine:chararray, key:chararray, line:chararray); [1] - https://github.com/rajeshbalamohan/tez_log_parser/blob/master/src/main/resources/pig/udf.groovy Page 22 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 22.
    TFile parser (contd) • Parsing INFO logs for shuffle for instance (for time taken + machine) Problematic machine Page 23 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
  • 23.
    TFile parser (node/racktraffic at 350 nodes) Problematic machine Fetcher in node-100 is always slow (irrespective of where its pulling data from) Other faulty nodes Page 24 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49 Mapout served from node-100 to node-120 To any node is always slow
  • 24.
    Questions? • Thanksall tez contributors for their efforts! • FYI, Hadoop Summit 2015 (Europe) Call for papers is out Page 25 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49