Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant:
Hadoop 2.0
http://bit.ly/RidingElephants
Append only distributed file-system
In the beginning…
Map Reduce
Java.
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)
HDP for Windows and .NET SDK
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, Pig
Cascading
Scalding
SQL on Hadoop
Learning to share the toys
HBase
Solr on Hadoop
Sharing HDFS…
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
Task
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
Map Reduce v1
JobTracker
Job
Head Node
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
MR Status
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
TaskTracker
Task (Map / Reduce)
Data Node
m slot 1
m slot 2
…
m slot n
r slot 1
r slot 2
…
r slot n
Typical Hadoop 1.x setup
HBaseProductionAdhoc
Typical Hadoop 1.x setup
HBaseProductionAdhoc
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container ContainerContainer
Data Node
Node Manager
Application
Master
Container Free Slot
Data Node
Node Manager
Resource
Manager
YARN Client
Removing the choke point
Advantages
60%-150% better usage
Long running applications
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)MapReduce 2
Giraph
<Insert your application here>
Batch applications
Spinning YARNs with Spring
Services
Direct to YARN APIs
Spring Data Hadoop abstraction
Streaming
Why?
Machine Learning
Graphs
Services
Distributed Shell - Anything.
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In Memory
Distributed
Fault tolerant
Real-time
✓
✓
✓
✓
❌
RRDs
✓
Mesos
Wider sharing
HadoopSparkAurora
Mesos Framework
Hardware
YARN
MapReduce HBase etc
HDFS
Hadoop is more than MapReduce
The new world
YARN opens up new paradigms
Infrastructure maturing: better sharing
Hadoop and beyond!
Thank you
Questions?
Simon Elliston Ball
Head of Big Data - Red Gate Ventures
@sireb
simon@simonellistonball.com
http://bit.ly/RidingElephants

Riding the Elephant - Hadoop 2.0