Invent Episode 3: Tech Talk on Parallel Future

Invent Show Tech Talk Series Parallel Future

Parallelism is Here……. 12-Aug-10 Invent Show 2 In the words of Sun Microsystems researcher “Guy Steele”: “ The bag of programming tricks that has served us so well for the last 50 years is the wrong way to think going forward and must be thrown out.” In the words of famous Berkeley Professor “Dave Patterson”: “ We desperately need new approach to hardware and software based on parallelism since industry has bet its future that parallelism works”

The Paradigm Shift – What Caused It? 12-Aug-10 Invent Show 3 Moore’s Law: “The density of transistors on a chip doubles every 18 months, for the same cost.” Now failed We have reached a limit in reducing the transistor size – Power Wall Memory bandwidth is now an issue – Memory Wall Set of problems we can solve with a single computer is not going to get any larger – ILP Wall Solution: Parallel computing – multicores Distributed computing – data centers (Google, Facebook, Yahoo)

So What is the Difference? Good sequential code Good Parallel Code 12-Aug-10 Invent Show 4 Minimizes total number of operations. Minimizes space usage. Stresses linear problem decomposition. Performs redundant operations. Requires extra space. Requires multiway problem decomposition.

Basics 12-Aug-10 Invent Show 5 Not all code can be parallelized Fibonacci function: Fk+2= Fk+ Fk+1 But most of the computations can be parallelized Large amount of consistent data to be processed with no dependencies

Basic Model – Master/Worker Model (1/2) 12-Aug-10 Invent Show 6 Consider a huge array that can be broken into sub-arrays

Basic Model – Master/Worker Model (2/2) 12-Aug-10 Invent Show 7 MASTER Initializes the array and splits it up according to the number of WORKERS Sends each WORKER its subarray Receives the results from each WORKER WORKER Receives the subarray from the MASTER Performs processing on the subarray Returns results to MASTER

MapReduce 12-Aug-10 Invent Show 8 Simple data-parallel programming model designed for scalability and fault-tolerance Pioneered by Google Processes 20 petabytes of data per day Popularized by open-source Hadoop project Used at Yahoo!, Facebook, Amazon, …

What is MapReduce used for? (1/2) 12-Aug-10 Invent Show 9 At Google: Index construction for Google Search Article clustering for Google News Statistical machine translation At Yahoo!: “Web map” powering Yahoo! Search Spam detection for Yahoo! Mail At Facebook: Data mining Ad optimization Spam detection

What is MapReduce used for? (2/2) 12-Aug-10 Invent Show 10 In research: Astronomical image analysis (Washington) Bioinformatics (Maryland) Analyzing Wikipedia conflicts (PARC) Natural language processing (CMU) Particle physics (Nebraska) Ocean climate simulation (Washington) VisionerBOT – our custom Web crawler

MapReduce Programming Model 12-Aug-10 Invent Show 11 Data type: key-value records Map function: (Kin, Vin)  list(Kinter, Vinter) Reduce function: (Kinter, list(Vinter))  list(Kout, Vout)

Example: Word Count 12-Aug-10 Invent Show 12 def mapper(line): foreach word in line.split(): output(word, 1) def reducer(key, values): output(key, sum(values))

Word Count Execution 12-Aug-10 Invent Show 13 Reduce Output Input Map Shuffle & Sort the, 1 brown, 1 fox, 1 the quick brown fox brown, 2 fox, 2 how, 1 now, 1 the, 3 Map Reduce the, 1 fox, 1 the, 1 the fox ate the mouse Map quick, 1 how, 1 now, 1 brown, 1 ate, 1 cow, 1 mouse, 1 quick, 1 ate, 1 mouse, 1 Reduce how now brown cow Map cow, 1

Example: VisionerBot Web Crawler 12-Aug-10 Database and Multimedia Lab. 14

MapReduce Execution Details 12-Aug-10 Invent Show 15 Single master controls job execution on multiple slaves There could be hierarchy of masters under the control of absolute master Mappers are preferably placed near to each other in order to minimize network delay There should be checkpoints to make sure recovery process if some operation gets crashed

12-Aug-10 Invent Show 16 QUESTIONS AND FEEDBACK

Invent Episode 3: Tech Talk on Parallel Future

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Viewers also liked

Viewers also liked (9)

Similar to Invent Episode 3: Tech Talk on Parallel Future

Similar to Invent Episode 3: Tech Talk on Parallel Future (20)

Recently uploaded

Recently uploaded (20)

Invent Episode 3: Tech Talk on Parallel Future

Editor's Notes