Hadoop

Contents
 What is Hadoop?
 How big it is??
 Why need for Large Computing?
 How it Works?
 Advantages
 Disadvantages
 Who uses hadoop?
 Conclusion

What is Hadoop?
 It is a open source software written in java
 Hadoop software library is a framework that
allows for the distributed processing of large
data sets across clusters of computers using
simple programming models.
 HDFC: Self healing high bandwidth clustered
storage.
 Map reduce : fault-tolerant distributed
processing.
 Operates on unstructured and structured
data.

How BIG it is????
• We have ~20,000
machines running Hadoop.
• Our largest clusters
are currently 2000 nodes
Several petabytes of user
data (compressed,
unreplicated).
• We run hundreds of
thousands of jobs every
month.

WHY NEED FOR LARGE
COMPUTING????
 The New York Stock Exchange generates
about one terabyte of new trade data per
day.
 Facebook hosts approximately 10 billion
photos, taking up one petabyte of storage.
 The Internet Archive stores around 2
petabytes of data, and is growing at a rate
of 20 terabytes per month.

How it Works?
Map-Reduce=Computation
HDFS=Storage

How Does HDFS Works?
Hadoop Distributed File System

Every Chunk Will Store 64mb in single Chunk
 ;

MAPREDUCE
EX : WORD COUNT OVER A GIVEN SET OF
STRINGS
We love India
We 1
love
1
India 1
We 1
Play
1
tennis 1
Love 1
India 1
We 2
tennis 1
play
1
We play tennis
Map Reduce
 The Hadoop MapReduce framework harnesses a cluster of
machines and executes user defined MapReduce jobs across
the nodes in the cluster.

Advantages
 A Reliable shared storage.
 Simple analysis system.
 Distributed File System.
 Tasks are independent.
 Easy to handle partial failures - entire nodes
can fail and restart.

Disadvantages
 Lack of central data can be frustrating.
 Still single master, which requires care and
may limit scaling.
 Managing job flow isn’t trivial when
intermediate data should be kept.

Conclusion
 Hadoop is a data grid operating system
which provides an economically scalable
solution for storing and processing large
amounts of unstructured or structured data
over long periods of time.

Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Hadoop

Similar to Hadoop (20)

Recently uploaded

Recently uploaded (20)

Hadoop