Hadoop - Past, Present and Future - v2.0

650 views

Published on

A session focused on ramping you up on what Hadoop is, how its works and what it's capable of. We will also look at what Hadoop 2.x and YARN brings to the table and some future projects in the Hadoop space to keep an eye on.

Published in: Data & Analytics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
650
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hadoop - Past, Present and Future - v2.0

  1. 1. ©  2014  Trace3,  All  rights  reserved.   BIG  DATA  INTELLIGENCE  PRACTICE   HADOOP:   PAST,  PRESENT  AND  FUTURE  
  2. 2. ©  2014  Trace3,  All  rights  reserved.   Roadmap   1   ~1  hour   1-­‐  What  Makes  Up  Hadoop  1.x?   2-­‐  What’s  New  In  Hadoop  2.x?   3-­‐  The  Future  Of  Hadoop  …  
  3. 3. ©  2014  Trace3,  All  rights  reserved.   WHAT  MAKES  UP   HADOOP  1.0?  
  4. 4. ©  2014  Trace3,  All  rights  reserved.   What’s  a  “Node”?   Node  aka  Server   Compute   Storage   OperaVng  System   Memory  
  5. 5. ©  2014  Trace3,  All  rights  reserved.   Hadoop  1.0:  HDFS  +  MapReduce   4   NameNode   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   JobTracker   Client   1-­‐1   1-­‐2  1-­‐3  
  6. 6. ©  2014  Trace3,  All  rights  reserved.   Hadoop  1.0:  HDFS  +  MapReduce   5   NameNode   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   DataNode  /  TaskTracker   JobTracker   Client   1-­‐1   1-­‐2   1-­‐3   Reduce  Map   2-­‐1   3-­‐2   3-­‐3   4-­‐1   2-­‐3   4-­‐2   2-­‐2   3-­‐1   4-­‐3   Reduce  Map  
  7. 7. ©  2014  Trace3,  All  rights  reserved.   MapReduce  v1  LimitaVons   6   Scalability   Maximum  cluster  size  is  4,000  nodes  and  maximum  concurrent  tasks  is  40,000   Availability   JobTracker  failure  kills  all  queued  and  running  jobs   Resources  ParVVoned  into  Map  and  Reduce   Hard  parGGoning  of  Map  and  Reduce  slots  led  to  low  resource  uVlizaVon   No  Support  for  Alternate  Paradigms  /  Services   Only  MapReduce  batch  jobs,  nothing  else  
  8. 8. ©  2014  Trace3,  All  rights  reserved.   Hadoop  1.0:  Single  Use  System   7   HADOOP  1.0   Single  Use  System   Batch  Apps   HDFS   (redundant,  reliable  storage)   MapReduce   (cluster  resource  management  and  data   processing)   Pig   Hive  
  9. 9. ©  2014  Trace3,  All  rights  reserved.   WHAT’S  NEW  IN   HADOOP  2.0?  
  10. 10. ©  2014  Trace3,  All  rights  reserved.   YARN   9   YARN  Replaces   MapReduce   Yet  Another  Resource  NegoVator   YARN  will  be  the  de-­‐facto  distributed   operaVng  system  for  Big  Data  
  11. 11. ©  2014  Trace3,  All  rights  reserved.  10   Store  DATA  in  one  place   Interact  with  that  data  in  MULTIPLE  WAYS   with  Predictable  Performance  and  Quality  of  Service              ApplicaGons  Run  NaGvely  IN  Hadoop   HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   BATCH   (MapReduce)   INTERACTIVE   (Tez)   ONLINE   (HBase)   STREAMING   (DataTorrent)   GRAPH   (Giraph)   YARN:  No  Longer  Just  Batch  Apps  
  12. 12. ©  2014  Trace3,  All  rights  reserved.  11   YARN:  ApplicaVons   Running  all  on  the  same  Hadoop  cluster  to  give   applicaVons  access  to  all  the  same  source  data!   MapReduce  v2   Stream  Processing   Master-­‐Worker  Online   In-­‐Memory   Apache  Storm  
  13. 13. ©  2014  Trace3,  All  rights  reserved.  12   YARN:  Quickly  Maturing   2010     2011     2012     2013     2014     Today   Conceived  at  Yahoo!   Alpha  Releases  –  2.0   Beta  Releases  –  2.1   GA  Released  –  2.2   100,000+  nodes,  400,000+  jobs  daily   10  million+  hours  of  compute  daily   Version  2.3   Version  2.4  
  14. 14. ©  2014  Trace3,  All  rights  reserved.  13   YARN:  Dr.  Evil  Approved  
  15. 15. ©  2014  Trace3,  All  rights  reserved.  14   YARN:  What  Has  Changed?   YARN   MRv1   RM   ResourceManager   AM  ApplicaVonMaster   JT   JobTracker   Scheduler   Scheduler   NM  NodeManager   TT  TaskTracker   Container   Map  &   Reduce   Slot   ResourceManager   Scheduler   JobTracker   Scheduler   NodeManager   ApplicaVonMaster   TaskTracker   Map   Reduce   NodeManager   Container   Container   TaskTracker   Map   Reduce  
  16. 16. ©  2014  Trace3,  All  rights  reserved.   The  6  Benefits  Of  YARN   15   • Scale   • New  programming  models   and  services   • Improved  cluster  uVlizaVon   • Agility   • Backwards  compaVble  with   MapReduce  v1   • Mixed  workloads  on  the   same  source  of  data  
  17. 17. ©  2014  Trace3,  All  rights  reserved.   THE  FUTURE   OF  HADOOP  
  18. 18. ©  2014  Trace3,  All  rights  reserved.   SQL  on  Hadoop   Speed   Deliver  interacGve  query  performance.   SQL   Support  array  of  SQL  semanGcs  for  analyGc   applicaGons  running  against  Hadoop.   Scale   SQL  interface  to  Hadoop  designed  for  queries   that  scale  from  Terabytes  to  Petabytes    
  19. 19. ©  2014  Trace3,  All  rights  reserved.   SQL  on  Hadoop   Hive  on  Apache  Tez   Hortonworks  HDP2   Hive  on  Apache  Spark   Cloudera  CDH5   Apache  Drill   MapR  M7   Cloudera  Impala   Cloudera  CDH5   Pivotal  HAWQ   Pivotal  Big  Data  Suite  
  20. 20. ©  2014  Trace3,  All  rights  reserved.   HOYA:  HBase  (NoSQL)  on  YARN   Dynamic  Scaling   On-­‐demand  cluster  size.  Increase  and  decrease   the  size  with  load.   Easier  Deployment   APIs  to  create,  start,  stop  and  delete  HBase   clusters.   Availability   Recover  from  Region  Server  loss  with  a  new   container.  
  21. 21. ©  2014  Trace3,  All  rights  reserved.   Microsoo  REEF   Machine  Learning   Framework  well  suited  for  building  machine   learning  jobs.   Scalable  /  Fault  Tolerant   Makes  it  easy  to  implement  scalable,  fault-­‐ tolerant  runGme  environments  for  a  range  of   computaGonal  models.   Maintain  State   Users  can  build  jobs  that  uGlize  data  from   where  it’s  needed  and  also  maintain  state  a_er   jobs  are  done.   Retainable   Evaluator   ExecuGon   Framework  
  22. 22. ©  2014  Trace3,  All  rights  reserved.   Heterogeneous  Storage   NameNode   Storage   NameNode   SATA   SSD   Fusion   IO   THEN   NOW  
  23. 23. ©  2014  Trace3,  All  rights  reserved.   Hadoop  Roadmap     • Apache  Hadoop  2.5   –  NodeManager  Restart  w/o  disrupGon   –  Dynamic  Resource  ConfiguraGon     • Apache  Hadoop  2.6   –  Memory  As  Storage  Tier   –  Support  For  Docker  Containers   Q3  2014   Q4  2014  
  24. 24. ©  2014  Trace3,  All  rights  reserved.   HADOOP:  PAST,  PRESENT  &  FUTURE   23   I  KNOW  YOU  HAVE   QUESTONS   NO  SUCH  THING  AS  A  STUPID  QUESTION.  
  25. 25. ©  2014  Trace3,  All  rights  reserved.   ONE  LAST  THING  …   24   SD  Big  Data  Meetup     meetup.com/sdbigdata   2nd  Wednesday  Of  The  Month   Next:  August  13th  @  5:45P  

×