Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant:
Hadoop 2.0
http://bit.ly/RidingElepha...
Append only distributed file-system
In the beginning…
Map Reduce
Java.
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)
HDP for Windows and .NET SDK
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, Pig
Cascading
Scalding
SQL on Hadoop
Learning to share the toys
HBase
Solr on Hadoop
Sharing HDFS…
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
Task
r slot ...
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
MR Status
r ...
Typical Hadoop 1.x setup
HBaseProductionAdhoc
Typical Hadoop 1.x setup
HBaseProductionAdhoc
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Nod...
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Nod...
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Nod...
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Nod...
Removing the choke point
Advantages
60%-150% better usage
Long running applications
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)MapReduce 2
Giraph
<Insert your application here>
Batch applications
Spinning YARNs with Spring
Services
Direct to YARN APIs
Spring Data Hadoop abstraction
Streaming
Why?
Machine Learning
Graphs
Services
Distributed Shell - Anything.
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In Memory
Distributed
Fault tolerant
Real-time
✓
✓
✓
✓
❌
RR...
Mesos
Wider sharing
HadoopSparkAurora
Mesos Framework
Hardware
YARN
MapReduce HBase etc
HDFS
Hadoop is more than MapReduce
The new world
YARN opens up new paradigms
Infrastructure maturing: better sharing
Hadoop and beyond!
Thank you
Questions?
Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
simon@simonellistonball.com
http://bit.ly/Ridin...
Upcoming SlideShare
Loading in …5
×

Riding the Elephant - Hadoop 2.0

532 views

Published on

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
532
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Riding the Elephant - Hadoop 2.0

  1. 1. Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb Riding the Elephant: Hadoop 2.0 http://bit.ly/RidingElephants
  2. 2. Append only distributed file-system In the beginning… Map Reduce Java.
  3. 3. JVM Based (scala, groovy, jython, clojure) More languages Streaming (python, whatever) HDP for Windows and .NET SDK
  4. 4. Abstraction Photo: https://www.flickr.com/photos/puroticorico/ Hive, Pig Cascading Scalding
  5. 5. SQL on Hadoop Learning to share the toys HBase Solr on Hadoop Sharing HDFS…
  6. 6. Map Reduce v1 JobTracker Job Head Node TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n Task r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n
  7. 7. Map Reduce v1 JobTracker Job Head Node TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n MR Status r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n TaskTracker Task (Map / Reduce) Data Node m slot 1 m slot 2 … m slot n r slot 1 r slot 2 … r slot n
  8. 8. Typical Hadoop 1.x setup HBaseProductionAdhoc
  9. 9. Typical Hadoop 1.x setup HBaseProductionAdhoc
  10. 10. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  11. 11. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  12. 12. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  13. 13. YARN architecture Container Application Master Container Data Node Node Manager Container ContainerContainer Data Node Node Manager Application Master Container Free Slot Data Node Node Manager Resource Manager YARN Client
  14. 14. Removing the choke point Advantages 60%-150% better usage Long running applications
  15. 15. Not quite… Operating system for Big Data? Security …but a framework for Big Data Apps Data Access abstraction
  16. 16. Storm on YARN A whole batch of new applications HOYA Tez (Stinger)MapReduce 2 Giraph <Insert your application here>
  17. 17. Batch applications Spinning YARNs with Spring Services Direct to YARN APIs Spring Data Hadoop abstraction
  18. 18. Streaming Why? Machine Learning Graphs Services Distributed Shell - Anything.
  19. 19. Spark A higher abstraction Hadoop based? … but can run on YARN In Memory Distributed Fault tolerant Real-time ✓ ✓ ✓ ✓ ❌ RRDs ✓
  20. 20. Mesos Wider sharing HadoopSparkAurora Mesos Framework Hardware YARN MapReduce HBase etc HDFS
  21. 21. Hadoop is more than MapReduce The new world YARN opens up new paradigms Infrastructure maturing: better sharing
  22. 22. Hadoop and beyond!
  23. 23. Thank you
  24. 24. Questions? Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb simon@simonellistonball.com http://bit.ly/RidingElephants

×