Basic Hadoop Architecture V1 vs V2

Introduction
Open source framework from Apache Software
Foundation –java programming language
Google published – GFS in Oct 2003 and
MapReduce Algorithm in Dec 2004.
Google’s proprietary distributed Filesystem to
store and manage data efficiently and reliably
on commodity software.
MapReduce is a parallel and distributed
programming model used for processing and
generating large datasets.
10 December 2011

Advantage
• Open source – free license
• Highly Availability – replication technique
• Highly Scalability – store and distribute huge
data
• Better performance – distributes to different
nodes and perform task parallel it can process
PB(Peta Bytes)
• It handles huge and varied types of data-
parallel computing technique
• Very flexible –we can integrate new data source
• It solves complex problems

Application
• Recommendation systems
• Processing very big data
• Processing Diversity of data
• Best to process the data when it is at
rest
• Log processing

Limitation
• Not suitable for small data sets
• Not suitable for executing comples
queries
• Bit tough to process the data when it is at
motion

Components
HADOOP 1 HADOOP 2
HDFS HDFS
MapReduce/MRv1 YARN/MRv2

DAEMONS
HADOOP 1 HADOOP 2
NameNode NameNode
DataNode DataNode
Secondary Namenode Secondary Namenode
Job Tracker Resourse Manager
Task Tracker Node Manager

HADOOP V1
Fsimage - file stored on os
- contains complete
directory structure of HDFS
Logfile - file that records either
events in an operating system or
other software(software type).

limitations
• Batch processing of huge amount of data
• Not suitable for Real-time Data
processing
• Not suitable for Data streaming
• It supports upto 4000 nodes per cluster

HADOOP V2
• HDFS
• YARN
• MapReduce

Difference between 1.x and 2.x
Hadoop 1.x Hadoop 2.x
It manages only one name space It manages multiple name space
It supports one and only programming model
(ie) MapReduce
It supports multiple programming models with YARN
component like MapReduce, Streaming, Graph etc.
It has lot of limitations in Scalability
It has overcome the limitations with new
architecture
It does not have multi-tenancy support It has multi-tenancy support
It uses fixed size Slots mechanism for storage
purpose
It uses variable-sized Containers
It supports maximum 4K nodes per cluster It supports more then 10K nodes per cluster

Basic Hadoop Architecture V1 vs V2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Basic Hadoop Architecture V1 vs V2

Similar to Basic Hadoop Architecture V1 vs V2 (20)

Recently uploaded

Recently uploaded (20)

Basic Hadoop Architecture V1 vs V2