View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Kmeans.py• use “in-mapper combining” technology, for implementing combiner functionality within every map task. Notice, not combiner phase.• It makes a discrete Combine step between Map and Reduce unnecessary. Typically, it is not guaranteed that a combiner function will be called on every mapper or that ,if called , it will only be called once.• In-mapper combiner design patten, we will guarantee that combiner-like key aggregation occurs in every mapper, instead of optionally in some mappers.2012-12-20 5
Kmeans.py• The aggregation is done entirely in the memory, without touching disk and it happens before any emission code has been called• But it can not assure “Memory Leak” issue. We should use python to control this condition.• Results (3.6G Test Dataset) • Old: 30+ min • Current: 9+ min， in reduce phase we only use 1~2 second. Saving significant time.2012-12-20 6