Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Riding the Elephant - Hadoop 2.0

582 views

Published on

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Riding the Elephant - Hadoop 2.0

  1. 1. Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb Riding the Elephant: Hadoop 2.0 http://bit.ly/RidingElephants
  2. 2. Append only distributed file-system In the beginning… Map Reduce Java.
  3. 3. JVM Based (scala, groovy, jython, clojure) More languages Streaming (python, whatever) HDP for Windows and .NET SDK
  4. 4. Abstraction Photo: https://www.flickr.com/photos/puroticorico/ Hive, Pig Cascading Scalding
  5. 5. SQL on Hadoop Learning to share the toys HBase Solr on Hadoop Sharing HDFS…
  6. 6. Map Reduce v1 JobTracker Job Head Node TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n Task r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n
  7. 7. Map Reduce v1 JobTracker Job Head Node TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n MR Status r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n
  8. 8. Typical Hadoop 1.x setup HBaseProductionAdhoc
  9. 9. Typical Hadoop 1.x setup HBaseProductionAdhoc
  10. 10. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  11. 11. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  12. 12. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  13. 13. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  14. 14. Removing the choke point Advantages 60%-150% better usage Long running applications
  15. 15. Not quite… Operating system for Big Data? Security …but a framework for Big Data Apps Data Access abstraction
  16. 16. Storm on YARN A whole batch of new applications HOYA Tez (Stinger)MapReduce 2 Giraph <Insert your application here>
  17. 17. Batch applications Spinning YARNs with Spring Services Direct to YARN APIs Spring Data Hadoop abstraction
  18. 18. Streaming Why? Machine Learning Graphs Services Distributed Shell - Anything.
  19. 19. Spark A higher abstraction Hadoop based? … but can run on YARN In Memory Distributed Fault tolerant Real-time ✓ ✓ ✓ ✓ ❌ RRDs ✓
  20. 20. Mesos Wider sharing HadoopSparkAurora Mesos Framework Hardware YARN MapReduce HBase etc HDFS
  21. 21. Hadoop is more than MapReduce The new world YARN opens up new paradigms Infrastructure maturing: better sharing
  22. 22. Hadoop and beyond!
  23. 23. Thank you
  24. 24. Questions? Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb simon@simonellistonball.com http://bit.ly/RidingElephants

×