Hadoop

Structured, Unstructured and Complex Data
Management

Amit Chaudhary 11MCA03
Karthik Iyer 11MCA05

Hadoop
 What is this?
 Structure of this
 Is this unknown thing right for me?
 Where is this used?

 Any idea? (Idea SIM card)

What is ?
 It is an open source project by the
Apache Foundation to handle large
data processing
 It was inspired by Google’s MapReduce
and Google File System (GFS) papers
 It was originally conceived by Doug
Cutting
 It is named after his son’s pet elephant
incidentally

Large Data Means?
 1000 kilobytes = 1 Megabyte
 1000 Megabytes = 1 Gigabyte
 1000 Gigabytes = 1 Terabyte
 1000 Terabytes = 1 Petabyte
 1000 Petabytes = 1 Exabyte
 1000 Exabytes = 1 Zettabyte
 1000 Zettabytes = 1 Yottabyte
 1000 Yottabytes = 1 Bronobyte
 1000 Bronobytes = 1 Geopbyte

So what’s the big deal?
 Scalable: New nodes can be added as
needed, without changing the formats
 Flexible: It is schema-less, and can
absorb any type of data, structured or
not, from any number of sources
 Fault tolerant: System redirects work to
another location if a node fails

Hadoop = HDFS + MapReduce
 HDFS: For storing massive datasets
using low-cost storage
 MapReduce: The algorithm on which
Google built its empire

HDFS
 It is a fault-tolerant storage system
 Able to store huge amounts of
information
 It creates clusters of machines and
coordinates work among them
 If one fails, it continues to operate the
cluster without losing data or interrupting
work, by shifting work to the remaining
machines in the cluster

HDFS
 It manages storage on the cluster by
breaking incoming files into
pieces, called blocks
 Stores each of the blocks redundantly
across the pool of servers
 It stores three complete copies of each
file by copying each piece to three
different servers

Which companies are
using?
 LinkedIn
 Walt Disney
 Wal-mart
 General Electric
 Nokia
 Bank of America
 Foursquare

at Foursquare
 Foursquare: Mobile + Location + Social
Networking

Is this unknown thing right for me?

Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Hadoop

Similar to Hadoop (20)

More from Amit Chaudhary

More from Amit Chaudhary (6)

Recently uploaded

Recently uploaded (20)

Hadoop

Editor's Notes