Loop-aware Scheduling place on the same physical machines those map and reduce tasks that occur in different iterations but access the same data.
Scheduling Algorithm the number of reduce tasks should be invariant across iterations, so that the hash function assigning mapper outputs to reducer nodes remains unchanged. the master node maintains a mapping from each slave node to the data partitions that this node processed in the previous iteration.
Caches Reducer Input Cache Same key hashed to same reducer. f must be deterministic, same across iterations, take tuple t as only the input. Number of reducers remains unchanged. Reducer Output Cache That is, if two Reduce function calls produce the same output key from two different reducer input keys, both reducer input keys must be in the same partition so that they are sent to the same reduce task. Mapper Input Cache