How Hadoop Exploits Data Locality

HOW HADOOP EXPLOITS
DATA-LOCALITY

AGENDA
• How Hadoop stores files in HDFS ?
• Brief Mapreduce flow.
• What is Data-locality and how it exploits in hadoop?

HOW HADOOP STORES FILES IN HDFS ?

WHAT IS DATA-LOCALITY AND HOW IT EXPLOITS IN HADOOP?
• Hadoop believes in “Moving computation is cheaper than moving
data”
• Data-locality means data aware to process/compute.
• In Hadoop, when a slave node sends a heartbeat message and says it
has available map slots, the master node first tries to find a map task
whose input data are stored on that slave node. If such a task can be
found, it is scheduled to the node and node-level data locality is
gained. Otherwise, Hadoop tries to find a task that can achieve rack-
level data locality.
• Hadoop puts Mapreduce job's jar to the HDFS. The task trackers
which needed it will take it from there. So it distributed to some
nodes and then loaded on-demand by nodes which actually needs
them. Usually this needs mean, that node is going to process local
data.
• Hadoop cluster is "stateless" in relation to the jobs. Each time job is
viewed as something new and "side effects" of the previous job are
not used.

TRADITIONAL HADOOP NETWORK TOPOLOGY

DATA-LOCALITY WHILE JOB INITIALIZATION

How Hadoop Exploits Data Locality

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to How Hadoop Exploits Data Locality

Similar to How Hadoop Exploits Data Locality (20)

More from Uday Vakalapudi

More from Uday Vakalapudi (11)

Recently uploaded

Recently uploaded (20)

How Hadoop Exploits Data Locality