Hadoop 1.x vs Hadoop 2
Rommel Garcia
Solutions Engineer - Big Data
Hortonworks
Transition To Big Data
Relational Dimensional
(EDW)
Big Data
Data Explosion
3 Design Dimensions
Key Hadoop Data Types
Sentiment
Clickstream
Sensor/Machine
Geographic
Server Logs
Text
Hadoop is NOT
ESB
NoSQL
HPC
Relational
Real-time
The “Jack of all Trades”
Hadoop 1
Limited up to 4,000 nodes per cluster
O(# of tasks in a cluster)
JobTracker bottleneck - resource
management, job...
Hadoop 1 - Basics
BBBB CCCC AAAA AAAA AAAA
AAAA BBBB CCCC CCCC BBBB
MapReduce (Computation Framework)
HDFS (Storage Framew...
Hadoop 1 - Reading
Files
Rack1 Rack2 Rack3 RackN
read file (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs,
blo...
Hadoop 1 - Writing Files
Rack1 Rack2 Rack3 RackN
request write (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs,...
Hadoop 1 - Running
Jobs
Rack1 Rack2 Rack3 RackN
Hadoop Client
JobTracker
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
D...
Hadoop 1 - Security
UsersUsersUsersUsers
FF
II
RR
EE
WW
AA
LL
LL
LDAP/AD
Client Node/
Spoke Server
KDC
Hadoop Cluster
auth...
Hadoop 1 - APIs
org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Red...
Hadoop 2
Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing
HDFS
Efficien...
Hadoop 2 - Basics
Hadoop 2 - Reading Files
(w/ NN Federation)
(w/ NN Federation)
Rack1 Rack2 Rack3 RackN
read file
fsimage/edit copy
Hadoop ...
Hadoop 2 - Writing Files
Rack1 Rack2 Rack3 RackN
request write
Hadoop Client
return DNs, etc.
DN | NM
DN | NM
DN | NM
DN |...
Hadoop 2 - Running Jobs
RackN
NodeManager
NodeManager
NodeManager
Rack2
NodeManager
NodeManager
NodeManager
Rack1
NodeMana...
Hadoop 2 - Security
FF
II
RR
EE
WW
AA
LL
LL
LDAP/AD
Knox Gateway Cluster
KDC
Hadoop Cluster
Enterprise/
Cloud SSO
Provider...
Hadoop 2 - APIs
org.apache.hadoop.yarn.api.ApplicationClientProtocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocol
...
Resources
http://hortonworks.com/products/hortonworks-sandbox/
http://hortonworks.com/products/hdp-2/
http://hortonworks.c...
Hadoop Summit 2014
Thank you!
www.linkedin.com/in/rommelgarcia
twitter.com/rommelgarcia
rgarcia@hortonworks.com
Hortonworks
Upcoming SlideShare
Loading in...5
×

Hadoop 1.x vs 2

11,082

Published on

There's a big shift in both at the architecture and api level from Hadoop 1 vs Hadoop 2, particularly YARN and we had our first meetup to talk about this (http://www.meetup.com/Atlanta-YARN-User-Group/) on 10/13/2013.

Published in: Technology, Sports
1 Comment
21 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
11,082
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
595
Comments
1
Likes
21
Embeds 0
No embeds

No notes for slide

Hadoop 1.x vs 2

  1. 1. Hadoop 1.x vs Hadoop 2 Rommel Garcia Solutions Engineer - Big Data Hortonworks
  2. 2. Transition To Big Data Relational Dimensional (EDW) Big Data
  3. 3. Data Explosion
  4. 4. 3 Design Dimensions
  5. 5. Key Hadoop Data Types Sentiment Clickstream Sensor/Machine Geographic Server Logs Text
  6. 6. Hadoop is NOT ESB NoSQL HPC Relational Real-time The “Jack of all Trades”
  7. 7. Hadoop 1 Limited up to 4,000 nodes per cluster O(# of tasks in a cluster) JobTracker bottleneck - resource management, job scheduling and monitoring Only has one namespace for managing HDFS Map and Reduce slots are static Only job to run is MapReduce
  8. 8. Hadoop 1 - Basics BBBB CCCC AAAA AAAA AAAA AAAA BBBB CCCC CCCC BBBB MapReduce (Computation Framework) HDFS (Storage Framework)
  9. 9. Hadoop 1 - Reading Files Rack1 Rack2 Rack3 RackN read file (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, block ids, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint heartbeat/ block reportread blocks
  10. 10. Hadoop 1 - Writing Files Rack1 Rack2 Rack3 RackN request write (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint block report write blocks replication pipelining
  11. 11. Hadoop 1 - Running Jobs Rack1 Rack2 Rack3 RackN Hadoop Client JobTracker DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT submit job deploy job part 0part 0part 0part 0 map reduce shuffle
  12. 12. Hadoop 1 - Security UsersUsersUsersUsers FF II RR EE WW AA LL LL LDAP/AD Client Node/ Spoke Server KDC Hadoop Cluster authN/authZ service request block token delegate token * block token is for accessing data * delegate token is for running jobs Encryption PluginEncryption Plugin
  13. 13. Hadoop 1 - APIs org.apache.hadoop.mapreduce.Partitioner org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer org.apache.hadoop.mapreduce.Job
  14. 14. Hadoop 2 Potentially up to 10,000 nodes per cluster O(cluster size) Supports multiple namespace for managing HDFS Efficient cluster utilization (YARN) MRv1 backward and forward compatible Any apps can integrate with Hadoop Beyond Java
  15. 15. Hadoop 2 - Basics
  16. 16. Hadoop 2 - Reading Files (w/ NN Federation) (w/ NN Federation) Rack1 Rack2 Rack3 RackN read file fsimage/edit copy Hadoop Client NN1/ns1 SNameNode per NN return DNs, block ids, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM checkpoint register/ heartbeat/ block report read blocks fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or ns1 ns2 ns3 ns4 dn1, dn2 dn1, dn3 dn4, dn5 dn4, dn5 Block Pools
  17. 17. Hadoop 2 - Writing Files Rack1 Rack2 Rack3 RackN request write Hadoop Client return DNs, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM write blocks replication pipelining fsimage/edit copy NN1/ns1 SNameNode per NN checkpoint block report fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or
  18. 18. Hadoop 2 - Running Jobs RackN NodeManager NodeManager NodeManager Rack2 NodeManager NodeManager NodeManager Rack1 NodeManager NodeManager NodeManager C2.1 C1.4 AM2 C2.2 C2.3 AM1 C1.3 C1.2 C1.1 Hadoop Client 1 Hadoop Client 2 create app2 submit app1 submit app2 create app1 ASM Scheduler queues ASM Containers NM ASM Scheduler Resources .......negotiates....... .......reports to....... .......partitions....... ResourceManager status report
  19. 19. Hadoop 2 - Security FF II RR EE WW AA LL LL LDAP/AD Knox Gateway Cluster KDC Hadoop Cluster Enterprise/ Cloud SSO Provider JDBC ClientJDBC Client REST ClientREST Client FF II RR EE WW AA LL LL DMZ Browser(HUE)Browser(HUE) Native Hive/HBase EncryptionNative Hive/HBase Encryption
  20. 20. Hadoop 2 - APIs org.apache.hadoop.yarn.api.ApplicationClientProtocol org.apache.hadoop.yarn.api.ApplicationMasterProtocol org.apache.hadoop.yarn.api.ContainerManagementProtoc ol
  21. 21. Resources http://hortonworks.com/products/hortonworks-sandbox/ http://hortonworks.com/products/hdp-2/ http://hortonworks.com/resources/ http://hadoopsummit.org/san-jose/
  22. 22. Hadoop Summit 2014
  23. 23. Thank you! www.linkedin.com/in/rommelgarcia twitter.com/rommelgarcia rgarcia@hortonworks.com Hortonworks
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×