HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE

HADOOP DISTRIBUTED FILE
SYSTEM AND MAPREDUCE
BY
Eadara Harsha Siva Sai
DEPARTMENT OF COMPUTER SCENCE AND ENGINEERING

1. INTRDUCTION TO HADOOP
1. HADOOP ARCHITECTURE
2. HADOOP DISTRIBUTED FILE
SYSTEM(HDFS)
3. MAPREDUCE
CONTENTS

INTRDUCTION TO HADOOP
What is hadoop..?
• Hadoop is a frame work, To store and to
process big data sets. It is open source
software used for distributed computing.
• Dough cutting introduced hadoop in cloud
era .
• Designed to answer the question: “How to
process big data with reasonable cost and
time?”

• In a traditional non distributed architecture,
you’ll have data stored in one server and any
client program will access this central data server
to access the data.
• The non distributed model has few issues. In this
model, you’ll mostly scale vertically by adding
more CPU to adding more storage, etc.
• This architecture is also not reliable, as if the
main server fails, you have to go back to the
backup to restore the data and it is slow to
access the huge data.

In a hadoop distributed architecture
• Each and every server offers local computation and
storage. i.e. When you run a query against a large data set,
every server in this distributed architecture will be executing
the query on its local machine against the local data set.
Finally, the result set from all this local servers are
consolidated.
• You don’t need a powerful server. Just use several less
expensive commodity servers as hadoop individual nodes.
If any of the nodes fails in the hadoop environment, it will
still return the dataset properly, as hadoop takes care of
replicating and distributing the data efficiently across the
multiple nodes.
• Hadoop is written in Java. So, it can run on any platform.

HADOOP ARCHITECTURE
In Hadoop architecture
1.Name node
2. Secondary Node
3. Job Tracker
4. Data node
5. Task Tracker

HADOOP DISTRIBUTED FILE SYSTEM(HDFS)
• The distribution of a data between the data nodes by using
hadoop is called HDFS .A typical HDFS block size is 64MB.
• There will be one Name Node that manages the file system
metadata. It will divide the data into 64MB size.
• The name will decide to which data node the data to send
and it also says the data node to store it replications to
another two nodes.
• After storing the data the data node will send how much of
space is available.
• Each and every 3 seconds data node passes a heart beat to
name node. If data node failed to send heart beat then
name node wait for 30 seconds . If not send it declare the
data node is dead.

HADOOP DISTRIBUTED FILE SYSTEM(HDFS)

MAPREDUCE
• The process of obtaining the output or getting back your
data is called map reduce . The following is map reduce
using hadoop frame work.
• When the client want the output of the stored data then the
client writes a program and sends the program to job tracer.
• Job tracer asks the name node wether the metadata is
created for this data or not . If created then send the meta
data.
• Then the job tracer will order the data nodes to process the
data with them.
• After processing the data the out put will send to jobtraker.
Again job tracer will send the outputs obtained to another
data node for final output.

MAPREDUCE
• After receiving the the final output then the jobtraker will
send it to the client.
• If a data node failed while processing the data then the job
tracer will order another data node to process the data that
consist of the replication of file.
• After receiving the out puts from the data nodes then the
jobtrker will see which data node have the less work at the
moment and send to data node to get the final output.
• The following diagram will explains about mapreduce.

ADVANTAGES AND DISADVANTAGES
ADVANTAGES:
1. Cost effective
2. Flexible
3. Fast
4. Resilient to failure
DISADVANTAGES:
1. Security concerns
2. Not fit for small data
3. Potential stability issues

CONCLUSION
• Facebook , Google ,Amazon , flipchart
etc.. are using HADOOP
• Hadoop solves so many problems in
storing of data on cloud .hence hadoop is
a open source it is free and it can work on
any Operating System .

HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE

Similar to HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE (20)

Recently uploaded

Recently uploaded (20)

HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE