Your SlideShare is downloading. ×
©	
  Hortonworks	
  Inc.	
  2014	
   Page	
  1	
  
Accelera8ng	
  
Hadoop	
  Data	
  
Pipelines	
  
	
  
	
  
Fi>hElephant...
©	
  Hortonworks	
  Inc.	
  2014	
  
Tez	
  –	
  Introduc8on	
  
Page	
  2	
  
• Distributed	
  execu-on	
  framework	
  
...
© Hortonworks Inc. 2014© Hortonworks Inc. 2014. Confidential and Proprietary.
Hadoop	
  1	
  -­‐>	
  Hadoop	
  2	
  
HADOO...
© Hortonworks Inc. 2014
Tez	
  –	
  Design	
  considera8ons	
  
Don’t	
  solve	
  problems	
  that	
  have	
  already	
  b...
© Hortonworks Inc. 2014
Tez	
  –	
  Problems	
  that	
  it	
  addresses	
  
• Expressing	
  the	
  computa-on	
  
•  Direc...
© Hortonworks Inc. 2014
Tez	
  –	
  Simplifying	
  Opera8ons	
  
•  Tez	
  is	
  a	
  pure	
  YARN	
  applica8on.	
  Easy	...
© Hortonworks Inc. 2014
Tez	
  –	
  Expressing	
  the	
  computa8on	
  
Page 7
Aggregate Stage
Partition Stage
Preprocesso...
© Hortonworks Inc. 2014
MR	
  is	
  a	
  2-­‐vertex	
  sub-­‐set	
  of	
  Tez	
  
Page 8
© Hortonworks Inc. 2014
But	
  Tez	
  is	
  so	
  much	
  more	
  
Page 9
© Hortonworks Inc. 2014
Tez	
  –	
  Expressing	
  the	
  computa8on	
  
Page 10
Tez	
  defines	
  the	
  following	
  APIs	...
© Hortonworks Inc. 2014
Tez	
  –	
  DAG	
  API	
  	
  
//	
  Define	
  DAG	
  
DAG	
  dag	
  =	
  new	
  DAG();	
  
	
  
//...
© Hortonworks Inc. 2014
Tez	
  –	
  Logical	
  DAG	
  expansion	
  at	
  Run8me	
  
Page 12
reduce1
map2
reduce2
join1
map1
© Hortonworks Inc. 2014
Tez	
  –	
  Library	
  of	
  Inputs	
  and	
  Outputs	
  
Page 13
Classical	
  ‘Map’	
   Classical...
© Hortonworks Inc. 2014
Tez	
  –	
  Broadcast	
  Edge	
  
SELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantit...
© Hortonworks Inc. 2014
Tez	
  –	
  Custom	
  Edge	
  
SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand
FROM...
© Hortonworks Inc. 2014
Tez	
  –	
  Mul8ple	
  Outputs	
  
FROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk...
© Hortonworks Inc. 2014
Tez	
  –	
  One	
  to	
  One	
  Edge	
  
Page 17
Aggregate
Sample L
Join
Stage sample map
on distr...
© Hortonworks Inc. 2014
Tez	
  –	
  Bringing	
  it	
  all	
  together	
  
Page 18
Architecting the Future of Big Data
Tez ...
© Hortonworks Inc. 2014
Tez	
  –	
  Performance	
  
• Benefits	
  of	
  expressing	
  the	
  data	
  processing	
  as	
  a	...
© Hortonworks Inc. 2014
Tez	
  –	
  Benefits	
  of	
  DAG	
  execu8on	
  
• Faster	
  Execu-on	
  and	
  Higher	
  Predicta...
© Hortonworks Inc. 2014
Tez	
  –	
  Container	
  Re-­‐Use	
  
• Reuse	
  YARN	
  containers/JVMs	
  to	
  launch	
  new	
 ...
© Hortonworks Inc. 2014
Tez	
  –	
  Sessions	
  
Page 22
Application Master
Client
Start
Session
Submit
DAG
Task Scheduler...
© Hortonworks Inc. 2014
Tez	
  –	
  Re-­‐Use	
  in	
  Ac8on	
  
Task	
  Execu8on	
  
Timeline	
  
© Hortonworks Inc. 2014
Tez	
  –	
  Customizable	
  Core	
  Engine	
  
Page 24
Vertex-2
Vertex-1
Start
vertex
Vertex Manag...
© Hortonworks Inc. 2014
Tez	
  –	
  Theory	
  to	
  Prac8ce	
  
• In theory, there is no difference
between theory and pra...
© Hortonworks Inc. 2014
Tez	
  –	
  Data	
  at	
  scale	
  
Page 26
Hive	
  TPC-­‐DS	
  
Scale	
  10TB
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Replicated
Join (2.8x)
Join +
Groupby
(1.5x)
Join +
Groupby +
Orderby
(...
Tez	
  –	
  itera8ve	
  algorithms	
  
•  Pig	
  can	
  do	
  itera8ve	
  algorithms	
  on	
  top	
  of	
  Tez	
  
•  This...
© Hortonworks Inc. 2014
Tez	
  –	
  Designed	
  for	
  big,	
  busy	
  clusters	
  
•  Number of stages in the DAG
•  High...
© Hortonworks Inc. 2014
Tez	
  –	
  what	
  if	
  you	
  can’t	
  get	
  enough	
  containers?	
  
• 78 vertex + 8374 task...
© Hortonworks Inc. 2014
Tez	
  –	
  Adop8on	
  	
  
• Hive	
  
•  Hadoop	
  standard	
  for	
  declara8ve	
  access	
  via...
© Hortonworks Inc. 2014
Tez	
  –	
  Roadmap	
  
• Richer	
  DAG	
  support	
  
– 	
  Addi8on	
  of	
  ver8ces	
  at	
  run...
© Hortonworks Inc. 2014
Tez	
  –	
  Community	
  
• Early	
  adopters	
  and	
  code	
  contributors	
  welcome	
  
– Adop...
Upcoming SlideShare
Loading in...5
×

Tez: Accelerating Data Pipelines - fifthel

281

Published on

Apache Tez at FifithElephant.in

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
281
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Tez: Accelerating Data Pipelines - fifthel"

  1. 1. ©  Hortonworks  Inc.  2014   Page  1   Accelera8ng   Hadoop  Data   Pipelines       Fi>hElephant.in  2014     gopalv  @  apache.org  
  2. 2. ©  Hortonworks  Inc.  2014   Tez  –  Introduc8on   Page  2   • Distributed  execu-on  framework   targeted  towards  data-­‐processing   applica-ons.   • Based  on  expressing  a  computa-on   as  a  dataflow  graph.   • Highly  customizable  to  meet  a   broad  spectrum  of  use  cases.   • Built  on  top  of  YARN  –  the  resource   management  framework  for   Hadoop.   • Open  source  Apache  project  and   Apache  licensed.  
  3. 3. © Hortonworks Inc. 2014© Hortonworks Inc. 2014. Confidential and Proprietary. Hadoop  1  -­‐>  Hadoop  2   HADOOP 1.0 HDFS   (redundant,  reliable  storage)   MapReduce   (cluster  resource  management    &  data  processing)   Pig   (data  flow)   Hive   (sql)     Others   (cascading)     HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   Tez   (execu8on  engine)   HADOOP 2.0 Data  Flow   Pig   SQL   Hive     Others   (Cascading)     Batch   MapReduce   Real  Time     Stream     Processing   Storm   Online     Data     Processing   HBase,   Accumulo     Monolithic   •  Resource  Management   •  Execu-on  Engine   •  User  API   Layered   •  Resource  Management  –  YARN   •  Execu-on  Engine  –  Tez   •  User  API  –  Hive,  Pig,  Cascading,  Your  App!    
  4. 4. © Hortonworks Inc. 2014 Tez  –  Design  considera8ons   Don’t  solve  problems  that  have  already  been  solved.  Or  you   will  have  to  solve  them  again!   •  Leverage  discrete  task  based  compute  model  for  elas8city,  scalability   and  fault  tolerance   •  Leverage  several  man  years  of  work  in  Hadoop  Map-­‐Reduce  data   shuffling  opera8ons   •  Leverage  proven  resource  sharing  and  mul8-­‐tenancy  model  for  Hadoop   and  YARN   •  Leverage  built-­‐in  security  mechanisms  in  Hadoop  for  privacy  and   isola8on   Page 4 Look  to  the  Future  with  an  eye  on  the  Past  
  5. 5. © Hortonworks Inc. 2014 Tez  –  Problems  that  it  addresses   • Expressing  the  computa-on   •  Direct  and  elegant  representa8on  of  the  data  processing  flow   •  Interfacing  with  applica8on  code  and  new  technologies   • Performance   •  Late  Binding  :  Make  decisions  as  late  as  possible  using  real  data  from  at   run8me   •  Leverage  the  resources  of  the  cluster  efficiently   •  Just  work  out  of  the  box!   •  Customizable  engine  to  let  applica8ons  tailor  the  job  to  meet  their   specific  requirements   • Opera-on  simplicity   •  Painless  to  operate,  experiment  and  upgrade   Page 5
  6. 6. © Hortonworks Inc. 2014 Tez  –  Simplifying  Opera8ons   •  Tez  is  a  pure  YARN  applica8on.  Easy  and  safe  to  try  it  out!   •  No  deployments  to  do,  no  servers  to  run   •  Enables  running  different  versions  concurrently.  Easy  to  test  new   func8onality  while  keeping  stable  versions  for  produc8on.   •  Leverages  YARN  local  resources.     Page 6 Client Machine Node Manager TezTask Node Manager TezTaskTezClient HDFS Tez Lib 1 Tez Lib 2 Client Machine TezClient
  7. 7. © Hortonworks Inc. 2014 Tez  –  Expressing  the  computa8on   Page 7 Aggregate Stage Partition Stage Preprocessor Stage Sampler Task-1 Task-2 Task-1 Task-2 Task-1 Task-2 Samples Ranges Distributed Sort Distributed  data  processing  jobs  typically  look  like  DAGs  (Directed  Acyclic   Graph).     •  Ver-ces  in  the  graph  represent  data  transforma-ons     •  Edges  represent  data  movement  from  producers  to  consumers  
  8. 8. © Hortonworks Inc. 2014 MR  is  a  2-­‐vertex  sub-­‐set  of  Tez   Page 8
  9. 9. © Hortonworks Inc. 2014 But  Tez  is  so  much  more   Page 9
  10. 10. © Hortonworks Inc. 2014 Tez  –  Expressing  the  computa8on   Page 10 Tez  defines  the  following  APIs  to  define  the  work   • DAG  API   •  Defines   the   structure   of   the   data   processing   and   the   rela8onship   between  producers  and  consumers   •  Enable   defini8on   of   complex   data   flow   pipelines   using   simple   graph   connec8on  API’s.  Tez  expands  the  logical  DAG  at  run8me   •  This  is  how  all  the  tasks  in  the  job  get  specified   • Run-me  API   •  Defines  the  interface  using  which  the  framework  and  app  code  interact   with  each  other   •  App  code  transforms  data  and  moves  it  between  tasks   •  This  is  how  we  specify  what  actually  executes  in  each  task  on  the  cluster   nodes  
  11. 11. © Hortonworks Inc. 2014 Tez  –  DAG  API     //  Define  DAG   DAG  dag  =  new  DAG();     //  Define  Vertex   Vertex  source  =  new  Vertex(Processor.class);     //  Define  Edge   Edge  edge  =  Edge(source,  des8na8on,   SCATTER_GATHER,  PERSISTED,  SEQUENTIAL,   Output.class,  Input.class);     //  Connect  them   dag.addVertex(source).addEdge(edge)…   Page 11 reduce1 map2 reduce2 join1 map1 Scatter_Gather Bipartite Sequential Scatter_Gather Bipartite Sequential Defines the global processing flow
  12. 12. © Hortonworks Inc. 2014 Tez  –  Logical  DAG  expansion  at  Run8me   Page 12 reduce1 map2 reduce2 join1 map1
  13. 13. © Hortonworks Inc. 2014 Tez  –  Library  of  Inputs  and  Outputs   Page 13 Classical  ‘Map’   Classical  ‘Reduce’   Intermediate  ‘Reduce’  for     Map-­‐Reduce-­‐Reduce   Map   Processor   HDFS   Input   Sorted   Output   Reduce   Processor   Shuffle   Input   HDFS   Output   Reduce   Processor   Shuffle   Input   Sorted   Output   • What  is  built  in?   –   Hadoop  InputFormat/OutputFormat   –   SortedGroupedPar88oned  Key-­‐Value   Input/Output   –   UnsortedGroupedPar88oned  Key-­‐ Value  Input/Output   –   Key-­‐Value  Input/Output    
  14. 14. © Hortonworks Inc. 2014 Tez  –  Broadcast  Edge   SELECT ss.ss_item_sk, ss.ss_quantity, avg_price, inv.inv_quantity_on_hand FROM (select avg(ss_sold_price) as avg_price, ss_item_sk, ss_quantity_sk from store_sales group by ss_item_sk) ss JOIN inventory inv ON (inv.inv_item_sk = ss.ss_item_sk); Hive – MR Hive – Tez M M M M M HDFS Store Sales scan. Group by and aggregation reduce size of this input. Inventory scan and Join Broadcast edge M M M HDFS Store Sales scan. Group by and aggregation. Inventory and Store Sales (aggr.) output scan and shuffle join. R R R R RR M MMM HDFS Hive  :   Broadcast  Join
  15. 15. © Hortonworks Inc. 2014 Tez  –  Custom  Edge   SELECT ss.ss_item_sk, ss.ss_quantity, inv.inv_quantity_on_hand FROM store_sales ss JOIN inventory inv ON (inv.inv_item_sk = ss.ss_item_sk); Hive – MR Hive – Tez M MM M M HDFS Inventory scan (Runs on cluster potentially more than 1 mapper) Store Sales scan and Join (Custom vertex reads both inputs – no side file reads) Custom edge (routes outputs of previous stage to the correct Mappers of the next stage) M MM M HDFS Inventory scan (Runs as single local map task) Store Sales scan and Join (Inventory hash table read as side file) HDFS Hive  :  Dynamically   Par88oned  Hash  Join  
  16. 16. © Hortonworks Inc. 2014 Tez  –  Mul8ple  Outputs   FROM (SELECT * FROM store_sales, date_dim WHERE ss_sold_date_sk = d_date_sk and d_year = 2000) INSERT INTO TABLE t1 SELECT distinct ss_item_sk INSERT INTO TABLE t2 SELECT distinct ss_customer_sk; Hive – MR Hive – Tez M MM M HDFS Map join date_dim/ store sales Two MR jobs to do the distinct M MM M M HDFS RR HDFS M M M R M M M R HDFS Broadcast Join (scan date_dim, join store sales) Distinct for customer + items Materialize join on HDFS Hive  :  Mul8-­‐insert   queries  
  17. 17. © Hortonworks Inc. 2014 Tez  –  One  to  One  Edge   Page 17 Aggregate Sample L Join Stage sample map on distributed cache l = LOAD ‘left’ AS (x, y); r = LOAD ‘right’ AS (x, z); j = JOIN l BY x, r BY x USING ‘skewed’; Load & Sample Aggregate Partition L Join Pass through input via 1-1 edge Partition R HDFS Broadcast sample map Partition L and Partition R Pig – MR Pig – Tez Pig  :  Skewed  Join  
  18. 18. © Hortonworks Inc. 2014 Tez  –  Bringing  it  all  together   Page 18 Architecting the Future of Big Data Tez Session populates container pool Dimension table calculation and HDFS split generation in parallel Dimension tables broadcasted to Hive MapJoin tasks Final Reducer pre- launched and fetches completed inputs TPCDS – Query-27 with Hive on Tez
  19. 19. © Hortonworks Inc. 2014 Tez  –  Performance   • Benefits  of  expressing  the  data  processing  as  a  DAG   •  Reducing  overheads  and  queuing  effects   •  Gives  system  the  global  picture  for  beper  planning   • Efficient  use  of  resources   •  Re-­‐use  resources  to  maximize  u8liza8on   •  Pre-­‐launch,  pre-­‐warm  and  cache   •  Locality  &  resource  aware  scheduling   • Support  for  applica-on  defined  DAG  modifica-ons  at  run-me   for  op-mized  execu-on   •  Change  task  concurrency     •  Change  task  scheduling   •  Change  DAG  edges   •  Change  DAG  ver8ces   Page 19
  20. 20. © Hortonworks Inc. 2014 Tez  –  Benefits  of  DAG  execu8on   • Faster  Execu-on  and  Higher  Predictability   – Eliminate  replicated  write  barrier  between  successive  computa8ons.   – Eliminate  job  launch  overhead  of  workflow  jobs.   – Eliminate  extra  stage  of  map  reads  in  every  workflow  job.   – Eliminate  queue  and  resource  conten8on  suffered  by  workflow  jobs   that  are  started  a>er  a  predecessor  job  completes.   – Beper  locality  because  the  engine  has  the  global  picture   Page 20 Pig/Hive - MR Pig/Hive - Tez
  21. 21. © Hortonworks Inc. 2014 Tez  –  Container  Re-­‐Use   • Reuse  YARN  containers/JVMs  to  launch  new  tasks   • Reduce  scheduling  and  launching  delays   • Shared  in-­‐memory  data  across  tasks   • JVM  JIT  friendly  execu8on   Page 21 YARN Container / JVM TezTask Host TezTask1 TezTask2 SharedObjects YARN Container Tez Application Master Start Task Task Done Start Task
  22. 22. © Hortonworks Inc. 2014 Tez  –  Sessions   Page 22 Application Master Client Start Session Submit DAG Task Scheduler ContainerPool Shared Object Registry Pre Warmed JVM Sessions   •  Standard  concepts  of  pre-­‐launch   and  pre-­‐warm  applied   •  Key  for  Interac8ve  queries   •  Represents  a  connec8on  between   the  user  and  the  cluster   •  Mul8ple  DAGs/Queries  executed  in   the  same  AM   •  Containers  re-­‐used  across  queries   •  Takes  care  of  data  locality  and   releasing  resources  when  idle  
  23. 23. © Hortonworks Inc. 2014 Tez  –  Re-­‐Use  in  Ac8on   Task  Execu8on   Timeline  
  24. 24. © Hortonworks Inc. 2014 Tez  –  Customizable  Core  Engine   Page 24 Vertex-2 Vertex-1 Start vertex Vertex Manager Start tasks DAG Scheduler Get Priority Get Priority Start vertex Task Scheduler Get container Get container •  Vertex Manager •  Determines task parallelism •  Determines when tasks in a vertex can start. •  DAG Scheduler Determines priority of task •  Task Scheduler Allocates containers from YARN and assigns them to tasks
  25. 25. © Hortonworks Inc. 2014 Tez  –  Theory  to  Prac8ce   • In theory, there is no difference between theory and practice. • But, in practice, there is. Page 25
  26. 26. © Hortonworks Inc. 2014 Tez  –  Data  at  scale   Page 26 Hive  TPC-­‐DS   Scale  10TB
  27. 27. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Replicated Join (2.8x) Join + Groupby (1.5x) Join + Groupby + Orderby (1.5x) 3 way Split + Join + Groupby + Orderby (2.6x) Timeinsecs MR Tez Tez  –  Pig  performance  gains   •  Demonstrate  performance  gains  from  a  basic  transla8on  to  a   Tez  DAG   •  Deeper  integra8on  in  the  works  for  further  boost  
  28. 28. Tez  –  itera8ve  algorithms   •  Pig  can  do  itera8ve  algorithms  on  top  of  Tez   •  This  uses  heavy-­‐weight  itera8on  (for-­‐loop  +  map)   •  Future  work  for  faster  loop-­‐unrolled  out-­‐of-­‐order  itera8on   •  1-­‐1  edges  between  loops    allows  building  morsel  style   parallelism   0 1000 2000 3000 10 50 100 Timeinsecs Iteration k-means MR Tez 14.84X 13.12X 5.37X * Source code at http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding
  29. 29. © Hortonworks Inc. 2014 Tez  –  Designed  for  big,  busy  clusters   •  Number of stages in the DAG •  Higher the number of stages in the DAG, performance of Tez (over MR) will be better. •  Cluster/queue capacity •  More congested a queue is, the performance of Tez (over MR) will be better due to container reuse. •  Size of intermediate output •  More the size of intermediate output, the performance of Tez (over MR) will be better due to reduced HDFS usage (cross-rack traffic) •  Size of data in the job •  For smaller data and more stages, the performance of Tez (over MR) will be better as percentage of launch overhead in the total time is high for smaller jobs. •  Move workloads from gateway boxes to the cluster •  Move as much work as possible to the cluster by modelling it via the job DAG. Exploit the parallelism and resources of the cluster. Page 29
  30. 30. © Hortonworks Inc. 2014 Tez  –  what  if  you  can’t  get  enough  containers?   • 78 vertex + 8374 tasks on 50 YARN containers Page 30
  31. 31. © Hortonworks Inc. 2014 Tez  –  Adop8on     • Hive   •  Hadoop  standard  for  declara8ve  access  via  SQL-­‐like  interface   • Pig   •  Hadoop  standard  for  procedural  scrip8ng  and  pipeline  processing   • Cascading   •  Developer  friendly  Java  API  and  SDK   •  Scalding  (Scala  API  on  Cascading)   • Commercial  Vendors   •  ETL  :  Use  Tez  instead  of  MR  or  custom  pipelines   •  Analy8cs  Vendors  :  Use  Tez  as  a  target  plasorm  for  scaling  parallel   analy8cal  tools  to  large  data-­‐sets   Page 31
  32. 32. © Hortonworks Inc. 2014 Tez  –  Roadmap   • Richer  DAG  support   –   Addi8on  of  ver8ces  at  run8me   –   Shared  edges  for  shared  outputs   –   Enhance  Input/Output  collec8ons   • Performance  op-miza-ons   –   Improve  throughput  at  high  concurrency     –   Improve  locality  aware  scheduling  (co-­‐scheduling)   –   Add  framework  level  data  sta8s8cs     –   HDFS  memory  storage  integra8on   • Usability   –   Stability  and  testability   –   API  ease  of  use   –   Tools  for  performance  analysis  and  debugging   Page 32
  33. 33. © Hortonworks Inc. 2014 Tez  –  Community   • Early  adopters  and  code  contributors  welcome   – Adopters  to  drive  more  scenarios.  Contributors  to  make  them  happen.   • Technical  blog  series   – hpp://hortonworks.com/blog/apache-­‐tez-­‐a-­‐new-­‐chapter-­‐in-­‐hadoop-­‐data-­‐ processing     • Useful  links   – Work  tracking:  hpps://issues.apache.org/jira/browse/TEZ   – Code:  hpps://github.com/apache/tez   –   Developer  list:  dev@tez.apache.org    User  list:  user@tez.apache.org    Issues  list:  issues@tez.apache.org   Page 33

×