Successfully reported this slideshow.

Tez: Accelerating Data Pipelines - fifthel

1,147 views

Published on

Apache Tez at FifithElephant.in

Published in: Technology
  • Be the first to comment

Tez: Accelerating Data Pipelines - fifthel

  1. 1. ©  Hortonworks  Inc.  2014   Page  1   Accelera8ng   Hadoop  Data   Pipelines       Fi>hElephant.in  2014     gopalv  @  apache.org  
  2. 2. ©  Hortonworks  Inc.  2014   Tez  –  Introduc8on   Page  2   • Distributed  execu-on  framework   targeted  towards  data-­‐processing   applica-ons.   • Based  on  expressing  a  computa-on   as  a  dataflow  graph.   • Highly  customizable  to  meet  a   broad  spectrum  of  use  cases.   • Built  on  top  of  YARN  –  the  resource   management  framework  for   Hadoop.   • Open  source  Apache  project  and   Apache  licensed.  
  3. 3. © Hortonworks Inc. 2014© Hortonworks Inc. 2014. Confidential and Proprietary. Hadoop  1  -­‐>  Hadoop  2   HADOOP 1.0 HDFS   (redundant,  reliable  storage)   MapReduce   (cluster  resource  management    &  data  processing)   Pig   (data  flow)   Hive   (sql)     Others   (cascading)     HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   Tez   (execu8on  engine)   HADOOP 2.0 Data  Flow   Pig   SQL   Hive     Others   (Cascading)     Batch   MapReduce   Real  Time     Stream     Processing   Storm   Online     Data     Processing   HBase,   Accumulo     Monolithic   •  Resource  Management   •  Execu-on  Engine   •  User  API   Layered   •  Resource  Management  –  YARN   •  Execu-on  Engine  –  Tez   •  User  API  –  Hive,  Pig,  Cascading,  Your  App!    
  4. 4. © Hortonworks Inc. 2014 Tez  –  Design  considera8ons   Don’t  solve  problems  that  have  already  been  solved.  Or  you   will  have  to  solve  them  again!   •  Leverage  discrete  task  based  compute  model  for  elas8city,  scalability   and  fault  tolerance   •  Leverage  several  man  years  of  work  in  Hadoop  Map-­‐Reduce  data   shuffling  opera8ons   •  Leverage  proven  resource  sharing  and  mul8-­‐tenancy  model  for  Hadoop   and  YARN   •  Leverage  built-­‐in  security  mechanisms  in  Hadoop  for  privacy  and   isola8on   Page 4 Look  to  the  Future  with  an  eye  on  the  Past  
  5. 5. © Hortonworks Inc. 2014 Tez  –  Problems  that  it  addresses   • Expressing  the  computa-on   •  Direct  and  elegant  representa8on  of  the  data  processing  flow   •  Interfacing  with  applica8on  code  and  new  technologies   • Performance   •  Late  Binding  :  Make  decisions  as  late  as  possible  using  real  data  from  at   run8me   •  Leverage  the  resources  of  the  cluster  efficiently   •  Just  work  out  of  the  box!   •  Customizable  engine  to  let  applica8ons  tailor  the  job  to  meet  their   specific  requirements   • Opera-on  simplicity   •  Painless  to  operate,  experiment  and  upgrade   Page 5
  6. 6. © Hortonworks Inc. 2014 Tez  –  Simplifying  Opera8ons   •  Tez  is  a  pure  YARN  applica8on.  Easy  and  safe  to  try  it  out!   •  No  deployments  to  do,  no  servers  to  run   •  Enables  running  different  versions  concurrently.  Easy  to  test  new   func8onality  while  keeping  stable  versions  for  produc8on.   •  Leverages  YARN  local  resources.     Page 6 Client Machine Node Manager TezTask Node Manager TezTaskTezClient HDFS Tez Lib 1 Tez Lib 2 Client Machine TezClient
  7. 7. © Hortonworks Inc. 2014 Tez  –  Expressing  the  computa8on   Page 7 Aggregate Stage Partition Stage Preprocessor Stage Sampler Task-1 Task-2 Task-1 Task-2 Task-1 Task-2 Samples Ranges Distributed Sort Distributed  data  processing  jobs  typically  look  like  DAGs  (Directed  Acyclic   Graph).     •  Ver-ces  in  the  graph  represent  data  transforma-ons     •  Edges  represent  data  movement  from  producers  to  consumers  
  8. 8. © Hortonworks Inc. 2014 MR  is  a  2-­‐vertex  sub-­‐set  of  Tez   Page 8
  9. 9. © Hortonworks Inc. 2014 But  Tez  is  so  much  more   Page 9
  10. 10. © Hortonworks Inc. 2014 Tez  –  Expressing  the  computa8on   Page 10 Tez  defines  the  following  APIs  to  define  the  work   • DAG  API   •  Defines   the   structure   of   the   data   processing   and   the   rela8onship   between  producers  and  consumers   •  Enable   defini8on   of   complex   data   flow   pipelines   using   simple   graph   connec8on  API’s.  Tez  expands  the  logical  DAG  at  run8me   •  This  is  how  all  the  tasks  in  the  job  get  specified   • Run-me  API   •  Defines  the  interface  using  which  the  framework  and  app  code  interact   with  each  other   •  App  code  transforms  data  and  moves  it  between  tasks   •  This  is  how  we  specify  what  actually  executes  in  each  task  on  the  cluster   nodes  
  11. 11. © Hortonworks Inc. 2014 Tez  –  DAG  API     //  Define  DAG   DAG  dag  =  new  DAG();     //  Define  Vertex   Vertex  source  =  new  Vertex(Processor.class);     //  Define  Edge   Edge  edge  =  Edge(source,  des8na8on,   SCATTER_GATHER,  PERSISTED,  SEQUENTIAL,   Output.class,  Input.class);     //  Connect  them   dag.addVertex(source).addEdge(edge)…   Page 11 reduce1 map2 reduce2 join1 map1 Scatter_Gather Bipartite Sequential Scatter_Gather Bipartite Sequential Defines the global processing flow
  12. 12. © Hortonworks Inc. 2014 Tez  –  Logical  DAG  expansion  at  Run8me   Page 12 reduce1 map2 reduce2 join1 map1
  13. 13. © Hortonworks Inc. 2014 Tez  –  Library  of  Inputs  and  Outputs   Page 13 Classical  ‘Map’   Classical  ‘Reduce’   Intermediate  ‘Reduce’  for     Map-­‐Reduce-­‐Reduce   Map   Processor   HDFS   Input   Sorted   Output   Reduce   Processor   Shuffle   Input   HDFS   Output   Reduce   Processor   Shuffle   Input   Sorted   Output   • What  is  built  in?   –   Hadoop  InputFormat/OutputFormat   –   SortedGroupedPar88oned  Key-­‐Value   Input/Output   –   UnsortedGroupedPar88oned  Key-­‐ Value  Input/Output   –   Key-­‐Value  Input/Output    
  14. 14. © Hortonworks Inc. 2014 Tez  –  Broadcast  Edge   SELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantity_on_hand FROM (select avg(ss_sold_price) as avg_price, ss_item_sk, ss_quantity_sk from store_sales group by ss_item_sk) ss JOIN inventory inv ON (inv.inv_item_sk = ss.ss_item_sk); Hive – MR Hive – Tez M M M M M HDFS Store Sales scan. Group by and aggregation reduce size of this input. Inventory scan and Join Broadcast edge M M M HDFS Store Sales scan. Group by and aggregation. Inventory and Store Sales (aggr.) output scan and shuffle join. R R R R RR M MMM HDFS Hive  :   Broadcast  Join
  15. 15. © Hortonworks Inc. 2014 Tez  –  Custom  Edge   SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand FROM store_sales ss JOIN inventory inv ON (inv.inv_item_sk = ss.ss_item_sk); Hive – MR Hive – Tez M MM M M HDFS Inventory scan (Runs on cluster potentially more than 1 mapper) Store Sales scan and Join (Custom vertex reads both inputs – no side file reads) Custom edge (routes outputs of previous stage to the correct Mappers of the next stage) M MM M HDFS Inventory scan (Runs as single local map task) Store Sales scan and Join (Inventory hash table read as side file) HDFS Hive  :  Dynamically   Par88oned  Hash  Join  
  16. 16. © Hortonworks Inc. 2014 Tez  –  Mul8ple  Outputs   FROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk = d_date_sk and d_year = 2000) INSERT INTO TABLE t1 SELECT distinct ss_item_sk INSERT INTO TABLE t2 SELECT distinct ss_customer_sk; Hive – MR Hive – Tez M MM M HDFS Map join date_dim/ store sales Two MR jobs to do the distinct M MM M M HDFS RR HDFS M M M R M M M R HDFS Broadcast Join (scan date_dim, join store sales) Distinct for customer + items Materialize join on HDFS Hive  :  Mul8-­‐insert   queries  
  17. 17. © Hortonworks Inc. 2014 Tez  –  One  to  One  Edge   Page 17 Aggregate Sample L Join Stage sample map on distributed cache l = LOAD ‘left’ AS (x, y); r = LOAD ‘right’ AS (x, z); j = JOIN l BY x, r BY x USING ‘skewed’; Load & Sample Aggregate Partition L Join Pass through input via 1-1 edge Partition R HDFS Broadcast sample map Partition L and Partition R Pig – MR Pig – Tez Pig  :  Skewed  Join  
  18. 18. © Hortonworks Inc. 2014 Tez  –  Bringing  it  all  together   Page 18 Architecting the Future of Big Data Tez Session populates container pool Dimension table calculation and HDFS split generation in parallel Dimension tables broadcasted to Hive MapJoin tasks Final Reducer pre- launched and fetches completed inputs TPCDS – Query-27 with Hive on Tez
  19. 19. © Hortonworks Inc. 2014 Tez  –  Performance   • Benefits  of  expressing  the  data  processing  as  a  DAG   •  Reducing  overheads  and  queuing  effects   •  Gives  system  the  global  picture  for  beper  planning   • Efficient  use  of  resources   •  Re-­‐use  resources  to  maximize  u8liza8on   •  Pre-­‐launch,  pre-­‐warm  and  cache   •  Locality  &  resource  aware  scheduling   • Support  for  applica-on  defined  DAG  modifica-ons  at  run-me   for  op-mized  execu-on   •  Change  task  concurrency     •  Change  task  scheduling   •  Change  DAG  edges   •  Change  DAG  ver8ces   Page 19
  20. 20. © Hortonworks Inc. 2014 Tez  –  Benefits  of  DAG  execu8on   • Faster  Execu-on  and  Higher  Predictability   – Eliminate  replicated  write  barrier  between  successive  computa8ons.   – Eliminate  job  launch  overhead  of  workflow  jobs.   – Eliminate  extra  stage  of  map  reads  in  every  workflow  job.   – Eliminate  queue  and  resource  conten8on  suffered  by  workflow  jobs   that  are  started  a>er  a  predecessor  job  completes.   – Beper  locality  because  the  engine  has  the  global  picture   Page 20 Pig/Hive - MR Pig/Hive - Tez
  21. 21. © Hortonworks Inc. 2014 Tez  –  Container  Re-­‐Use   • Reuse  YARN  containers/JVMs  to  launch  new  tasks   • Reduce  scheduling  and  launching  delays   • Shared  in-­‐memory  data  across  tasks   • JVM  JIT  friendly  execu8on   Page 21 YARN Container / JVM TezTask Host TezTask1 TezTask2 SharedObjects YARN Container Tez Application Master Start Task Task Done Start Task
  22. 22. © Hortonworks Inc. 2014 Tez  –  Sessions   Page 22 Application Master Client Start Session Submit DAG Task Scheduler ContainerPool Shared Object Registry Pre Warmed JVM Sessions   •  Standard  concepts  of  pre-­‐launch   and  pre-­‐warm  applied   •  Key  for  Interac8ve  queries   •  Represents  a  connec8on  between   the  user  and  the  cluster   •  Mul8ple  DAGs/Queries  executed  in   the  same  AM   •  Containers  re-­‐used  across  queries   •  Takes  care  of  data  locality  and   releasing  resources  when  idle  
  23. 23. © Hortonworks Inc. 2014 Tez  –  Re-­‐Use  in  Ac8on   Task  Execu8on   Timeline  
  24. 24. © Hortonworks Inc. 2014 Tez  –  Customizable  Core  Engine   Page 24 Vertex-2 Vertex-1 Start vertex Vertex Manager Start tasks DAG Scheduler Get Priority Get Priority Start vertex Task Scheduler Get container Get container •  Vertex Manager •  Determines task parallelism •  Determines when tasks in a vertex can start. •  DAG Scheduler Determines priority of task •  Task Scheduler Allocates containers from YARN and assigns them to tasks
  25. 25. © Hortonworks Inc. 2014 Tez  –  Theory  to  Prac8ce   • In theory, there is no difference between theory and practice. • But, in practice, there is. Page 25
  26. 26. © Hortonworks Inc. 2014 Tez  –  Data  at  scale   Page 26 Hive  TPC-­‐DS   Scale  10TB
  27. 27. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Replicated Join (2.8x) Join + Groupby (1.5x) Join + Groupby + Orderby (1.5x) 3 way Split + Join + Groupby + Orderby (2.6x) Timeinsecs MR Tez Tez  –  Pig  performance  gains   •  Demonstrate  performance  gains  from  a  basic  transla8on  to  a   Tez  DAG   •  Deeper  integra8on  in  the  works  for  further  boost  
  28. 28. Tez  –  itera8ve  algorithms   •  Pig  can  do  itera8ve  algorithms  on  top  of  Tez   •  This  uses  heavy-­‐weight  itera8on  (for-­‐loop  +  map)   •  Future  work  for  faster  loop-­‐unrolled  out-­‐of-­‐order  itera8on   •  1-­‐1  edges  between  loops    allows  building  morsel  style   parallelism   0 1000 2000 3000 10 50 100 Timeinsecs Iteration k-means MR Tez 14.84X 13.12X 5.37X * Source code at http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding
  29. 29. © Hortonworks Inc. 2014 Tez  –  Designed  for  big,  busy  clusters   •  Number of stages in the DAG •  Higher the number of stages in the DAG, performance of Tez (over MR) will be better. •  Cluster/queue capacity •  More congested a queue is, the performance of Tez (over MR) will be better due to container reuse. •  Size of intermediate output •  More the size of intermediate output, the performance of Tez (over MR) will be better due to reduced HDFS usage (cross-rack traffic) •  Size of data in the job •  For smaller data and more stages, the performance of Tez (over MR) will be better as percentage of launch overhead in the total time is high for smaller jobs. •  Move workloads from gateway boxes to the cluster •  Move as much work as possible to the cluster by modelling it via the job DAG. Exploit the parallelism and resources of the cluster. Page 29
  30. 30. © Hortonworks Inc. 2014 Tez  –  what  if  you  can’t  get  enough  containers?   • 78 vertex + 8374 tasks on 50 YARN containers Page 30
  31. 31. © Hortonworks Inc. 2014 Tez  –  Adop8on     • Hive   •  Hadoop  standard  for  declara8ve  access  via  SQL-­‐like  interface   • Pig   •  Hadoop  standard  for  procedural  scrip8ng  and  pipeline  processing   • Cascading   •  Developer  friendly  Java  API  and  SDK   •  Scalding  (Scala  API  on  Cascading)   • Commercial  Vendors   •  ETL  :  Use  Tez  instead  of  MR  or  custom  pipelines   •  Analy8cs  Vendors  :  Use  Tez  as  a  target  plasorm  for  scaling  parallel   analy8cal  tools  to  large  data-­‐sets   Page 31
  32. 32. © Hortonworks Inc. 2014 Tez  –  Roadmap   • Richer  DAG  support   –   Addi8on  of  ver8ces  at  run8me   –   Shared  edges  for  shared  outputs   –   Enhance  Input/Output  collec8ons   • Performance  op-miza-ons   –   Improve  throughput  at  high  concurrency     –   Improve  locality  aware  scheduling  (co-­‐scheduling)   –   Add  framework  level  data  sta8s8cs     –   HDFS  memory  storage  integra8on   • Usability   –   Stability  and  testability   –   API  ease  of  use   –   Tools  for  performance  analysis  and  debugging   Page 32
  33. 33. © Hortonworks Inc. 2014 Tez  –  Community   • Early  adopters  and  code  contributors  welcome   – Adopters  to  drive  more  scenarios.  Contributors  to  make  them  happen.   • Technical  blog  series   – hpp://hortonworks.com/blog/apache-­‐tez-­‐a-­‐new-­‐chapter-­‐in-­‐hadoop-­‐data-­‐ processing     • Useful  links   – Work  tracking:  hpps://issues.apache.org/jira/browse/TEZ   – Code:  hpps://github.com/apache/tez   –   Developer  list:  dev@tez.apache.org    User  list:  user@tez.apache.org    Issues  list:  issues@tez.apache.org   Page 33

×