Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Published on

Data Locality in HDFS Data Locality – The ability to process data where it is locally stored. Observations §Notice this initial spike in RX Traffic is before the Reducers kick in. § It represents data each map task needsNote: that is not local.During the Map Phase, the JobTracker § Looking at the spikeattempts to use data locality to schedule it is mainly data from only a few tasks where the data is locallystored. This is not perfect and isdependent on a data nodes where the Reducers Start Jobdata is located. This is a consideration Maps Start Maps Finish Completewhen choosing the replication factor. Map  Tasks:  IniEal  spike  for  non-­‐local  data.  SomeEmes  a  task   may  be  scheduled  on  a  node  that  does  not  have  the  data  More replicas tend to create higher available  locally.    probability for data locality. 22

Published in: Technology, Education
  • Be the first to comment