1. Massively Parallel Computing CS 264 / CSCI E-292Lecture #1: Introduction | January 25th, 2011 Nicolas Pinto (MIT, Harvard) email@example.com
3. Distant Students
4. Take a picture with...
5. a friend I like
6. his d ogI like
7. cool hard ware
8. your m om
9. Send it to:firstname.lastname@example.org
13. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
14. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
16. Modeling & Simulation• Physics, astronomy, molecular dynamics, ﬁnance, etc.• Data and processing intensive• Requires high-performance computing (HPC)• Driving HPC architecture development
17. (20 09)CS 264 Top Dog (2008)• Roadrunner, LANL • #1 on top500.org in 2008 (now #7) • 1.105 petaﬂop/s • 3000 nodes with dual-core AMD Opteron processors • Each node connected via PCIe to two IBM Cell processors • Nodes are connected via Inﬁniband 4x DDR
19. Tianhe-1A at NSC Tianjin 2.507 Petaflop 7168 Tesla M2050 GPUs1 Petaﬂop/s = ~1M high-end laptops = ~world populationwith hand calculators 24/7/365 for ~16 years Slide courtesy of Bill Dally (NVIDIA)
37. How much Data?• Google processes 24 PB / day, 8 EB / year (’10)• Wayback Machine has 3 PB,100 TB/month (’09)• Facebook user data: 2.5 PB, 15 TB/day (’09)• Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)• eBay user data: 6.5 PB, 50 TB/day (’09)• “all words ever spoken by human beings”~ 42 ZB Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
38. “640k ought to be enough for anybody.” - Bill Gates just a rumor (1981)
39. Disk Throughput• Average Google job size: 180 GB• 1 SATA HDD = 75 MB / sec• Time to read 180 GB off disk: 45 mins• Solution: parallel reads• 1000 HDDs = 75 GB / sec• Google’s solutions: BigTable, MapReduce, etc.
40. Cloud Computing• Clear trend: centralization of computing resources in large data centers• Q: What do Oregon, Iceland, and abandoned mines have in common?• A: Fiber, juice, and space• Utility computing!
41. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
42. Instrument Data ExplosionSloan Digital Sky Survey ATLUM / Connectome Project
53. Slide courtesy of Hanspeter Pﬁster Diesel Powered HPC Life Support…Murchison Wideﬁeld Array
54. How much Data?• NOAA has ~1 PB climate data (‘07)• MWA radio telescope: 8 GB/sec of data• Connectome: 1 PB / mm3 of brain tissue (1 EB for 1 cm3)• CERN’s LHC will generate 15 PB a year (‘08)
55. High Flops / Watt
56. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
57. Computer Games• PC gaming business: • $15B / year market (2010) • $22B / year in 2015 ? • WOW: $1B / year• NVIDIA Shipped 1B GPUs since 1993: • 10 years to ship 200M GPUs (1993-2003)• 1/3 of all PCs have more than one GPU• High-end GPUs sell for around $300• Now used for science application
67. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
68. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
69. Massively Parallel Human Computing ???• “Crowdsourcing”• Amazon Mechanical Turk (artiﬁcial artiﬁcial intelligence)• Wikipedia• Stackoverﬂow• etc.
70. What is this course about?
71. What is this course about? Massively parallel processors • GPU computing with CUDA Cloud computing • Amazon’s EC2 as an example of utility computing • MapReduce, the “back-end” of cloud computing
72. Less like Rodin...
73. More like Bob...
76. Anant Agarwal, MIT
77. Power Cost• Power ∝ Voltage2 x Frequency• Frequency ∝ Voltage• Power ∝ Frequency3 Jack Dongarra
78. Power Cost Cores Freq Perf Power P/W CPU 1 1 1 1 1“New” CPU 1 1.5 1.5 3.3 0.45xMulticore 2 0.75 1.5 0.8 1.88x Jack Dongarra
79. Problem with Buses Anant Agarwal, MIT
80. Problem with Memory http://www.OpenSparc.net/
81. Problem with Disks 64 MB / sec Tom’s Hardware
82. Good News• Moore’s Law marches on• Chip real-estate is essentially free• Many-core architectures are commodities• Space for new innovations
83. Bad News• Power limits improvements in clock speed• Parallelism is the only route to improve performance• Computation / communication ratio will get worse• More frequent hardware failures?
85. A “Simple” Matter of Software• We have to use all the cores efﬁciently• Careful data and memory management• Must rethink software design• Must rethink algorithms• Must learn new skills!• Must learn new strategies!• Must learn new tools...
86. Our mantra: always use the right tool !
88. Instructor: Nicolas Pinto The Rowland Institute at Harvard HARVARD UNIVERSITY
89. ~50% of is for vision!
90. Everyone knows that...
91. The ApproachReverse and Forward Engineering the Brain
92. The ApproachReverse and Forward Engineering the Brain REVERSE FORWARD Study Build Natural System Artiﬁcial System
93. t aﬂo ps ! in =2 0 pebra
95. “ If you want to have good ideas you must have many ideas. ” “ Most of them will be wrong, and what you have to learn is which ones to throw away. ” Linus Pauling (double Nobel Prize Winner)
96. High-throughput Screening
97. The curse of speed...and the blessing of massively parallel computing thousands of big models large amounts of unsupervised learning experience
98. The curse of speed...and the blessing of massively parallel computing No off-the-shelf solution? DIY! Engineering (Hardware/SysAdmin/Software) Science Leverage non-scientiﬁc high-tech markets and their $billions of R&D... Gaming: Graphics Cards (GPUs), PlayStation 3 Web 2.0: Cloud Computing (Amazon, Google)
99. r ow n! u ild youB
100. The blessing of GPUs DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)
101. speed (in billion ﬂoating point operations per second) Q9450 (Matlab/C)  0.3 Q9450 (C/SSE)  9.07900GTX (OpenGL/Cg)  68.2 PS3/Cell (C/ASM)  111.4 8800GTX (CUDA1.x)  192.7 GTX280 (CUDA2.x)  339.3 cha n ging... e GTX480 (CUDA3.x)  pe edu p is g a m 974.3 (Fermi) >1 000X s Pinto, Doukhan, DiCarlo, Cox PLoS 2009 Pinto, Cox GPU Comp. Gems 2011
102. Tired Of Waiting For YourComputations? n your deskto p: go Supercomputin n of c h e a p a n d eneratio Prog ramm ing the next g sing CUDA all el hardware u massively par extensive g ive students designed to disruptive This IA P has been ne w potentially e in using a ses having ha nds- on experienc ab les the mas echnology en techno logy. This t apabilities. rcomputing c access to supe rog ramming the CUDA p e students to orp. which, has been an We will introduc NVIDIA C developed by u n if y in g t h e lan guage p li fy in g a n d t o w a rd s s im s e n t ia l s t e p el chips. es of m assively parall prog ramming ions from nero us contribut orted by ge te at Harvard , and MIT This IAP is supp stitu e Rowland In s given by NVID IA Corp. , Th e featuring talk ) and will b (OEIT , BCS, EECS . various ﬁelds experts from (IAP 09) 6. 963
103. Co-Instructor:Hanspeter Pﬁster
104. Visual Computing• Large image & video collections• Physically-based modeling• Face modeling and recognition• Visualization
105. VolumePro 500 Released 1999
108. NSF CDI Grant ’08-’11
109. NVIDIA CUDA Center of Excellence
110. TFs• Claudio Andreoni (MIT Course 18)• Dwight Bell (Harvard DCE)• Krunal Patel (Accelereyes)• Jud Porter (Harvard SEAS)• Justin Riley (MIT OEIT)• Mike Roberts (Harvard SEAS)
111. Claudio Andreoni(MIT Course 18)
112. Dwight Bell(Harvard DCE)
113. Krunal Patel(Accelereyes)
114. Jud Porter(Harvard SEAS)
115. Justin Riley(MIT OEIT)
116. Mike Roberts(Harvard SEAS)
117. About You
118. About you...• Undergraduate ? Graduate ?• Programming ? >5 years ? <2 years ?• CUDA ? MPI ? MapReduce ?• CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ?• Humanities ? Social Sc ? Economy ?
120. CS 264 Goals• Have fun!• Learn basic principles of parallel computing• Learn programming with CUDA• Learn to program a cluster of GPUs (e.g. MPI)• Learn basics of EC2 and MapReduce• Learn new learning strategies, tools, etc.• Implement a ﬁnal project
121. Experimental Learning t, re pe epea Strategy peat,r ReMemory “recall”
123. Lectures “Format”• 2x ~ 45min regular “lectures”• ~ 15min “Clinic” • we’ll be here to ﬁx your problems• ~ 5 min: Life and Code “Hacking”: • GTD Zen • Presentation Zen • Ninja Programming Tricks & Tools, etc. • Interested? email email@example.com
124. Act I: GPU Computing• Introduction to GPU Computing• CUDA Basics• CUDA Advanced• CUDA Ninja Tricks !
125. l u t i on n k Convo Fi lterbaPerformance / Effort 3D Performance (g ops) Development Time (hours) 0.3Matlab 0.5 9.0 C/SSE 10.0 111.4 PS3 30.0 339.3GT200 10.0
126. Empirical results... Performance (g ops) Q9450 (Matlab/C)  0.3 Q9450 (C/SSE)  9.0 7900GTX (Cg)  68.2 PS3/Cell (C/ASM)  111.48800GTX (CUDA1.x)  192.7 GTX280 (CUDA2.x)  339.3 . GTX480 (CUDA3.x)  e cha nging.. 974.3 g am e edup is >1 0 00X sp
127. Act II: Cloud Computing• Introduction to utility computing• EC2 & starcluster (Justin Riley, MIT OEIT)• Hadoop (Zak Stone, SEAS)• MapReduce with GPU Jobs on EC2
128. Amazon’s Web Services• Elastic Compute Cloud (EC2) • Rent computing resources by the hour • Basic unit of accounting = instance-hour • Additional costs for bandwidth• You’ll be getting free AWS credits for course assignments
129. MapReduce• Functional programming meets distributed processing• Processing of lists with <key, value> pairs• Batch data processing infrastructure• Move the computation where the data is
130. Act III: Guest Lectures• Andreas Knockler (NYU): OpenCL & PyOpenCL• John Owens (UC Davis): fundamental algorithms/ data structures and irregular parallelism• Nathan Bell (NVIDIA): Thrust• Duane Merrill* (Virginia Tech): Ninja Tricks• Mike Bauer* (Stanford): Sequoia• Greg Diamos (Georgia Tech): Ocelot• Other lecturers* from Google,Yahoo, Sun, Intel, NCSA, AMD, Cloudera, etc.
131. Labs• Lead by TF(s)• Work on an interesting small problem• From skeleton code to solution• Hands-on
132. 53 Church St.
133. 53 Church St.
134. 53 Church St.
135. 53 Church St., Room 10453 Church St., Rm 104 Thu, Fri 7.35-9.35 pm
139. What do you need to know?• Programming (ideally in C / C++) • See HW 0• Basics of computer systems • CS 61 or similar
140. Homeworks• Programming assignments• “Issue Spotter” (code debug & review, Q&A)• Contribution to the community (OSS, Wikipedia, Stackoverﬂow, etc.)• Due: Fridays at 11 pm EST • Hard deadline - 2 “bonus” days
141. Ofﬁce Hours• Lead by a TF• 104 @ 53 Church St (check website and news feed)
142. Participation• HW0 (this week)• Mandatory attendance for guest lectures• forum.cs264.org • Answer questions, help others • Post relevant links and discussions (!)
143. Final Project• Implement a substantial project• Pick from a list of suggested projects or design your own• Milestones along the way (idea, proposal, etc.)• In-class ﬁnal presentations• $500+ price for the best project
144. Grading• On a 0-100 scale • Participation: 10% • Homework: 50% • Final project: 40%
145. www.cs264.org• Detailed schedule (soon)• News blog w/ RSS feed• Video feeds• Forum (forum.cs264.org)• Academic honesty policy• HW0 (due Fri 2/4)
146. Thank you!
147. one more thing from WikiLeaks?
148. Is this course for me ???
149. This course is not for you... • If you’re not genuinely interested in the topic • If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software • If you’re not ready to do a lot of programming • If you’re not open to thinking about computing in new ways • If you can’t put in the time Slide after Jimmy Lin, iSchool, Maryland
150. Otherwise...It will be a richly rewarding experience!
152. Be Patient Be FlexibleBe Constructive http://davidzinger.wordpress.com/2007/05/page/2/
153. It would be a win-win-win situation!(The Ofﬁce Season 2, Episode 27: Conﬂict Resolution)
154. Hypergrowth ?
155. Acknowledgements• Hanspeter Pﬁster & Henry Leitner, DCE• TFs• Rob Parrott & IT Team, SEAS• Gabe Russell & Video Team, DCE• NVIDIA, esp. David Luebke• Amazon
156. CO ME
157. Next?• Fill out the survey: http://bit.ly/enrb1r• Get ready for HW0 (Lab 1 & 2)• Subscribe to http://forum.cs264.org• Subscribe to RSS feed: http://bit.ly/eFIsqR