Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
©	
  Hortonworks	
  Inc.	
  2015 Page	
  1
Apache	
  Tez
-­‐ Next	
  Generation	
  of	
  execution	
  engine	
  upon	
  ha...
©	
  Hortonworks	
  Inc.	
  2015
Who’s	
  this	
  guy
• Start	
  use	
  pig	
  from	
  2009.	
  Become	
  Pig	
  committer...
©	
  Hortonworks	
  Inc.	
  2015
Agenda
•Tez Introduction
•Tez Feature	
  Deep	
  Dive
•Tez Status	
  &	
  Roadmap
©	
  Hortonworks	
  Inc.	
  2015
I/O	
  Synchronization	
  
Barrier
I/O	
  Synchronization	
  
Barrier
Job	
  1	
  (	
  Jo...
©	
  Hortonworks	
  Inc.	
  2015
Tez	
  – Introduction
Page	
  5
• Distributed	
  execution	
  framework	
  
targeted	
  t...
©	
  Hortonworks	
  Inc.	
  2015
What	
  is	
  DAG	
  &	
  Why	
  	
  DAG
Projection
Filter
GroupBy
…
Join
Union
Intersect...
©	
  Hortonworks	
  Inc.	
  2015
Expressing	
  DAG	
  in	
  Tez API
• DAG	
  API	
  (Logic	
  View)
– Allowuser to	
  buil...
©	
  Hortonworks	
  Inc.	
  2015
DAG	
  API	
  (Logic	
  View)
Page	
  8
• Vertex	
  (Processor,	
  Parallelism,	
  Resour...
©	
  Hortonworks	
  Inc.	
  2015
Runtime	
  API	
  (Runtime	
  View)
Page	
  9
ProcessorInput Output
• Input
– Through	
  ...
©	
  Hortonworks	
  Inc.	
  2015
Benefit	
  of	
  DAG
• Easier	
  to	
  express	
  computation	
  in	
  DAG
• No	
  interm...
©	
  Hortonworks	
  Inc.	
  2015
Agenda
•Tez Introduction
•Tez Feature	
  Deep	
  Dive
•Tez Improvement	
  &	
  Debuggabil...
©	
  Hortonworks	
  Inc.	
  2015
Container-­‐Reuse
• Reuse	
  the	
  same	
  container	
  across	
  DAG/Vertices/Tasks
• B...
©	
  Hortonworks	
  Inc.	
  2015
Tez Session
• Multiple	
  Jobs/DAGs	
  in	
  one	
  AM
• Container-­‐reuse	
  across	
  J...
©	
  Hortonworks	
  Inc.	
  2015
Dynamic	
  Parallelism	
  Estimation	
  
• VertexManager
– Listen	
  to	
  the	
  other	
...
©	
  Hortonworks	
  Inc.	
  2015
ATS	
  Integration
• Tez is	
  fully	
  integrated	
  with	
  YARN	
  ATS	
  (Application...
©	
  Hortonworks	
  Inc.	
  2015
Recovery
• AM	
  can	
  crash	
  in	
  corner	
  cases
– OOM
– Node	
  failure
– …
• Cont...
©	
  Hortonworks	
  Inc.	
  2015
Order	
  By	
  of	
  Pig
f =	
  Load	
  ‘foo’	
  as	
  (x,	
  y);
o =	
  Order	
  f	
  by...
©	
  Hortonworks	
  Inc.	
  2015
Tez UI
©	
  Hortonworks	
  Inc.	
  2015
Tez UI
Tez UI
20
Download	
  data from	
  ATS
©	
  Hortonworks	
  Inc.	
  2015
RoadMap
• Shared	
  output	
  edges
– Same	
  output	
  to	
  multiple	
  vertices
• Loca...
©	
  Hortonworks	
  Inc.	
  2015
Tez	
  – Adoption	
  
• Apache	
  Hive
• Start	
  from	
  Hive	
  0.13
• set	
  hive.exec...
©	
  Hortonworks	
  Inc.	
  2015
Tez Community
• Useful	
  Links
– http://tez.apache.org/
– JIRA	
  :	
  https://issues.ap...
©	
  Hortonworks	
  Inc.	
  2015
Thank  You!
Questions  &  Answers
Page	
  24
Upcoming SlideShare
Loading in …5
×

3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai

629 views

Published on

Apache Tez Introducation - Apache Kylin Meetup @Shanghai

Published in: Software
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website! http://bit.ly/resumpro
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai

  1. 1. ©  Hortonworks  Inc.  2015 Page  1 Apache  Tez -­‐ Next  Generation  of  execution  engine  upon  hadoop Jeff  Zhang  (@zjffdu)
  2. 2. ©  Hortonworks  Inc.  2015 Who’s  this  guy • Start  use  pig  from  2009.  Become  Pig  committer  from  Nov   2009 • Join  Hortonworks  in  2014.   • Tez Committer  from  Oct  2014
  3. 3. ©  Hortonworks  Inc.  2015 Agenda •Tez Introduction •Tez Feature  Deep  Dive •Tez Status  &  Roadmap
  4. 4. ©  Hortonworks  Inc.  2015 I/O  Synchronization   Barrier I/O  Synchronization   Barrier Job  1  (  Join a  &  b  ) Job  3 (  Group by  of  c  ) Job  2    (Group  by  of   a  Join b) Job  4  (Join  of  S  & R  ) Hive  -­‐ MR Example  of  MR  versus  Tez Page  4 Single  Job Hive  -­‐ Tez Join a  &  b Group  by  of  a  Join b Group by  of  c Job  4  (Join  of  S  & R  )
  5. 5. ©  Hortonworks  Inc.  2015 Tez  – Introduction Page  5 • Distributed  execution  framework   targeted  towards  data-­‐processing   applications. • Based  on  expressing  a  computation   as  a  dataflow  graph  (DAG). • Highly  customizable  to  meet  a  broad   spectrum  of  use  cases. • Built  on  top  of  YARN  – the  resource   management  framework  for   Hadoop. • Open  source  Apache  project  and   Apache  licensed.
  6. 6. ©  Hortonworks  Inc.  2015 What  is  DAG  &  Why    DAG Projection Filter GroupBy … Join Union Intersect … Split … • Directed  Acyclic  Graph • Any  complicated  DAG  can  been  composed  of  the  following  3  basic   paradigm – Sequential – Merge – Divide
  7. 7. ©  Hortonworks  Inc.  2015 Expressing  DAG  in  Tez API • DAG  API  (Logic  View) – Allowuser to  build  DAG – Topological  structure  of  the  data  computation  flow • Runtime  API  (Runtime  View) – Application  logic  of  each  computation  unit  (vertex) – How to move/read/write  data between vertices
  8. 8. ©  Hortonworks  Inc.  2015 DAG  API  (Logic  View) Page  8 • Vertex  (Processor,  Parallelism,  Resource,  etc…) • Edge (EdgeProperty) – DataMovement – Scatter  Gather  (Join,  GroupBy …  ) – Broadcast      (  Pig  Replicated  Join  /  Hive  Broadcast  Join  ) – One-­‐to-­‐One    (  Pig  Order  by  ) – Custom
  9. 9. ©  Hortonworks  Inc.  2015 Runtime  API  (Runtime  View) Page  9 ProcessorInput Output • Input – Through  which  processor  receives  data  on  an  edge – Vertex  can  have  multiple  inputs • Processor – Application  Logic  (One  vertex  one  processor) – Consume  the  inputs  and  produce  the  outputs • Output – Through  which  processor  writes  data  to  an  edge – One  vertex  can  have  multiple  outputs   • Example  of  Input/Output/Processor – MRInput &  MROutput (InputFormat/OutputFormat) – OrderedGroupedKVInput &  OrderedPartitionedKVOutput (Scatter  Gather) – UnorderedKVInput &  UnorderedKVOutput (Broadcast  &  One-­‐to-­‐One) – PigProcessor/HiveProcessor
  10. 10. ©  Hortonworks  Inc.  2015 Benefit  of  DAG • Easier  to  express  computation  in  DAG • No  intermediate  data  written  to  HDFS • Less  pressure  on  NameNode • No  resource  queuing  effort  &  less  resource  contention • More  optimization  opportunity  with  more  global  context
  11. 11. ©  Hortonworks  Inc.  2015 Agenda •Tez Introduction •Tez Feature  Deep  Dive •Tez Improvement  &  Debuggability •Tez Status  &  Roadmap
  12. 12. ©  Hortonworks  Inc.  2015 Container-­‐Reuse • Reuse  the  same  container  across  DAG/Vertices/Tasks • Benefit  of  Container-­‐Reuse – Less  resources  consumed – Reduce  overhead  of  launching  JVM – Reduce  overhead  of  negotiate with Resource  Manager – Reduce  overhead  of  resource  localization – Reduce  network  IO – Object  Caching  (Object  Sharing)
  13. 13. ©  Hortonworks  Inc.  2015 Tez Session • Multiple  Jobs/DAGs  in  one  AM • Container-­‐reuse  across  Jobs/DAGs • Data  sharing  between  Jobs/DAGs
  14. 14. ©  Hortonworks  Inc.  2015 Dynamic  Parallelism  Estimation   • VertexManager – Listen  to  the  other  vertices   status – Coordinate  and  schedule  its   tasks – Communication  between   vertices
  15. 15. ©  Hortonworks  Inc.  2015 ATS  Integration • Tez is  fully  integrated  with  YARN  ATS  (Application  Timeline   Service) – DAG  Status,  DAG  Metrics,  Task  Status,  Task  Metrics  are  captured • Diagnostics  &  Performance  analysis – Data  Source  for  monitoring  &  diagnostics   – Data  Source  for  performance  analysis  
  16. 16. ©  Hortonworks  Inc.  2015 Recovery • AM  can  crash  in  corner  cases – OOM – Node  failure – … • Continue  from  the  last  checkpoint • Transparent  to  end  users AM  Crash
  17. 17. ©  Hortonworks  Inc.  2015 Order  By  of  Pig f =  Load  ‘foo’  as  (x,  y); o =  Order  f  by  x;Load Sample (Calculate  Histogram) HDFS Partition Sort Broadcast Load Sample (Calculate  Histogram) Partition Sort One-­‐to-­‐One Scatter  Gather Scatter  Gather
  18. 18. ©  Hortonworks  Inc.  2015 Tez UI
  19. 19. ©  Hortonworks  Inc.  2015 Tez UI
  20. 20. Tez UI 20 Download  data from  ATS
  21. 21. ©  Hortonworks  Inc.  2015 RoadMap • Shared  output  edges – Same  output  to  multiple  vertices • Local  mode  stabilization • Optimizing  (include/exclude)  vertex  at  runtime • Partial  completion  VertexManager • Co-­‐Scheduling • Framework  stats  for  better  runtime  decisions
  22. 22. ©  Hortonworks  Inc.  2015 Tez  – Adoption   • Apache  Hive • Start  from  Hive  0.13 • set  hive.exec.engine =  tez • Apache  Pig • Start  from  Pig  0.14 • pig  -­‐x  tez • Cascading • Flink Page  22
  23. 23. ©  Hortonworks  Inc.  2015 Tez Community • Useful  Links – http://tez.apache.org/ – JIRA  :  https://issues.apache.org/jira/browse/TEZ – Code  Repository:  https://git-­‐wip-­‐us.apache.org/repos/asf/tez.git – Mailing  Lists – Dev List:  dev@tez.apache.org – User  List:  user@tez.apache.org – Issues  List:  issues@tez.apache.org • Tez Meetup – http://www.meetup.com/Apache-­‐Tez-­‐User-­‐Group
  24. 24. ©  Hortonworks  Inc.  2015 Thank  You! Questions  &  Answers Page  24

×