Your SlideShare is downloading. ×
Riding the Elephant - Hadoop 2.0
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Riding the Elephant - Hadoop 2.0

194
views

Published on

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't …

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
194
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb Riding the Elephant: Hadoop 2.0 http://bit.ly/RidingElephants
  • 2. Append only distributed file-system In the beginning… Map Reduce Java.
  • 3. JVM Based (scala, groovy, jython, clojure) More languages Streaming (python, whatever) HDP for Windows and .NET SDK
  • 4. Abstraction Photo: https://www.flickr.com/photos/puroticorico/ Hive, Pig Cascading Scalding
  • 5. SQL on Hadoop Learning to share the toys HBase Solr on Hadoop Sharing HDFS…
  • 6. Map Reduce v1 JobTracker Job Head Node TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n Task r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n
  • 7. Map Reduce v1 JobTracker Job Head Node TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n MR Status r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n
  • 8. Typical Hadoop 1.x setup HBaseProductionAdhoc
  • 9. Typical Hadoop 1.x setup HBaseProductionAdhoc
  • 10. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  • 11. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  • 12. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  • 13. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  • 14. Removing the choke point Advantages 60%-150% better usage Long running applications
  • 15. Not quite… Operating system for Big Data? Security …but a framework for Big Data Apps Data Access abstraction
  • 16. Storm on YARN A whole batch of new applications HOYA Tez (Stinger)MapReduce 2 Giraph <Insert your application here>
  • 17. Batch applications Spinning YARNs with Spring Services Direct to YARN APIs Spring Data Hadoop abstraction
  • 18. Streaming Why? Machine Learning Graphs Services Distributed Shell - Anything.
  • 19. Spark A higher abstraction Hadoop based? … but can run on YARN In Memory Distributed Fault tolerant Real-time ✓ ✓ ✓ ✓ ❌ RRDs ✓
  • 20. Mesos Wider sharing HadoopSparkAurora Mesos Framework Hardware YARN MapReduce HBase etc HDFS
  • 21. Hadoop is more than MapReduce The new world YARN opens up new paradigms Infrastructure maturing: better sharing
  • 22. Hadoop and beyond!
  • 23. Thank you
  • 24. Questions? Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb simon@simonellistonball.com http://bit.ly/RidingElephants