Hadoop 1.x vs Hadoop 2
Rommel Garcia
Solutions Engineer - Big Data

Hortonworks
Transition To Big Data

Relational

Dimensional
(EDW)

Big Data
Data Explosion
3 Design Dimensions
Key Hadoop Data Types
Sentiment
Clickstream
Sensor/Machine
Geographic
Server Logs
Text
Hadoop is NOT
ESB
NoSQL
HPC
Relational
Real-time
The “Jack of all Trades”
Hadoop 1
Limited up to 4,000 nodes per cluster
O(# of tasks in a cluster)
JobTracker bottleneck - resource
management, job...
Hadoop 1 - Basics
MapReduce (Computation Framework)

A
A

B
B

C
C

C
C

B
B

B
B

C
C

A
A

A
A

A
A

HDFS (Storage Frame...
Hadoop 1 - Reading
Files
NameNode

read file

Hadoop Client

SNameNode

(fsimage/edit)

return DNs,
block ids, etc.

check...
Hadoop 1 - Writing Files
NameNode
request write

Hadoop Client

SNameNode
(fsimage/edit)
checkpoint

return DNs, etc.

blo...
Hadoop 1 - Running
Jobs
Hadoop Client

submit job

JobTracker

map
deploy job

shuffle

part 0
part 0

DN | TT

DN | TT

D...
Hadoop 1 - Security
authN/authZ

LDAP/AD

Users
Users

F
I
R
E
W
A
L
L

KDC

service request

Hadoop Cluster

block token
...
Hadoop 1 - APIs
org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Red...
Hadoop 2
Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing
HDFS
Efficien...
Hadoop 2 - Basics
Hadoop 2 - Reading Files
(w/ NN Federation)
(w/ NN Federation)

Hadoop Client

NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4

SNameNode
...
Hadoop 2 - Writing Files

SNameNode
per NN

Hadoop Client

NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4

request write

fsimage/edit co...
Hadoop 2 - Running Jobs
create app1

Hadoop Client 1

submit app1

ASM
NM

ResourceManager

.......negotiates....... Conta...
Hadoop 2 - Security
DMZ
KDC
LDAP/AD
Knox Gateway Cluster

Enterprise/
Cloud SSO
Provider
JDBC Client

F
I
R
E
W
A
L
L

F
I...
Hadoop 2 - APIs
org.apache.hadoop.yarn.api.ApplicationClientProtocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocol
...
Resources
http://hortonworks.com/products/hortonworks-sandbox/
http://hortonworks.com/products/hdp-2/
http://hortonworks.c...
Hadoop Summit 2014
Thank you!
www.linkedin.com/in/rommelgarcia
twitter.com/rommelgarcia
rgarcia@hortonworks.com

Hortonworks
Upcoming SlideShare
Loading in...5
×

Hadoop1 131004105935-phpapp02

884

Published on

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
884
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop1 131004105935-phpapp02

  1. 1. Hadoop 1.x vs Hadoop 2 Rommel Garcia Solutions Engineer - Big Data Hortonworks
  2. 2. Transition To Big Data Relational Dimensional (EDW) Big Data
  3. 3. Data Explosion
  4. 4. 3 Design Dimensions
  5. 5. Key Hadoop Data Types Sentiment Clickstream Sensor/Machine Geographic Server Logs Text
  6. 6. Hadoop is NOT ESB NoSQL HPC Relational Real-time The “Jack of all Trades”
  7. 7. Hadoop 1 Limited up to 4,000 nodes per cluster O(# of tasks in a cluster) JobTracker bottleneck - resource management, job scheduling and monitoring Only has one namespace for managing HDFS Map and Reduce slots are static Only job to run is MapReduce
  8. 8. Hadoop 1 - Basics MapReduce (Computation Framework) A A B B C C C C B B B B C C A A A A A A HDFS (Storage Framework)
  9. 9. Hadoop 1 - Reading Files NameNode read file Hadoop Client SNameNode (fsimage/edit) return DNs, block ids, etc. checkpoint heartbeat/ block report read blocks DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT Rack1 Rack2 Rack3 RackN
  10. 10. Hadoop 1 - Writing Files NameNode request write Hadoop Client SNameNode (fsimage/edit) checkpoint return DNs, etc. block report write blocks DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT Rack1 Rack2 Rack3 RackN replication pipelining
  11. 11. Hadoop 1 - Running Jobs Hadoop Client submit job JobTracker map deploy job shuffle part 0 part 0 DN | TT DN | TT DN | TT DN | TT reduce DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT Rack1 Rack2 Rack3 RackN
  12. 12. Hadoop 1 - Security authN/authZ LDAP/AD Users Users F I R E W A L L KDC service request Hadoop Cluster block token delegate token Client Node/ Spoke Server Encryption Plugin * block token is for accessing data * delegate token is for running jobs
  13. 13. Hadoop 1 - APIs org.apache.hadoop.mapreduce.Partitioner org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer org.apache.hadoop.mapreduce.Job
  14. 14. Hadoop 2 Potentially up to 10,000 nodes per cluster O(cluster size) Supports multiple namespace for managing HDFS Efficient cluster utilization (YARN) MRv1 backward and forward compatible Any apps can integrate with Hadoop Beyond Java
  15. 15. Hadoop 2 - Basics
  16. 16. Hadoop 2 - Reading Files (w/ NN Federation) (w/ NN Federation) Hadoop Client NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4 SNameNode per NN fsimage/edit copy read file checkpoint return DNs, block ids, etc. read blocks fs sync or Backup NN per NN checkpoint register/ heartbeat/ block report Block Pools DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM Rack1 Rack2 Rack3 RackN ns1 ns2 ns3 ns4 dn1, dn2 dn1, dn3 dn4, dn5 dn4, dn5
  17. 17. Hadoop 2 - Writing Files SNameNode per NN Hadoop Client NN1/ns1 NN2/ns2 NN3/ns3 NN4/ns4 request write fsimage/edit copy checkpoint or return DNs, etc. fs sync write blocks block report DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM checkpoint DN | NM DN | NM Backup NN per NN Rack1 Rack2 Rack3 RackN replication pipelining
  18. 18. Hadoop 2 - Running Jobs create app1 Hadoop Client 1 submit app1 ASM NM ResourceManager .......negotiates....... Containers .......reports to....... ASM Scheduler .......partitions.......Resources create app2 Hadoop Client 2 submit app2 Scheduler ASM queues status report NodeManager C2.1 NodeManager C2.2 NodeManager AM2 Rack1 NodeManager NodeManager C1.3 NodeManager C2.3 C1.2 NodeManager AM1 Rack2 NodeManager C1.4 NodeManager C1.1 RackN
  19. 19. Hadoop 2 - Security DMZ KDC LDAP/AD Knox Gateway Cluster Enterprise/ Cloud SSO Provider JDBC Client F I R E W A L L F I R E W A L L Hadoop Cluster REST Client Browser(HUE) Native Hive/HBase Encryption
  20. 20. Hadoop 2 - APIs org.apache.hadoop.yarn.api.ApplicationClientProtocol org.apache.hadoop.yarn.api.ApplicationMasterProtocol org.apache.hadoop.yarn.api.ContainerManagementProtoc ol
  21. 21. Resources http://hortonworks.com/products/hortonworks-sandbox/ http://hortonworks.com/products/hdp-2/ http://hortonworks.com/resources/ http://hadoopsummit.org/san-jose/
  22. 22. Hadoop Summit 2014
  23. 23. Thank you! www.linkedin.com/in/rommelgarcia twitter.com/rommelgarcia rgarcia@hortonworks.com Hortonworks
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×