[Harvard CS264] 01 - Introduction
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

[Harvard CS264] 01 - Introduction

on

  • 3,391 views

http://cs264.org

http://cs264.org

Statistics

Views

Total Views
3,391
Views on SlideShare
3,386
Embed Views
5

Actions

Likes
5
Downloads
79
Comments
0

1 Embed 5

http://www.docshut.com 5

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

[Harvard CS264] 01 - Introduction Presentation Transcript

  • 1. Massively Parallel Computing CS 264 / CSCI E-292Lecture #1: Introduction | January 25th, 2011 Nicolas Pinto (MIT, Harvard) pinto@mit.edu
  • 2. ...
  • 3. Distant Students
  • 4. Take a picture with...
  • 5. a friend I like
  • 6. his d ogI like
  • 7. cool hard ware
  • 8. your m om
  • 9. Send it to:pinto@mit.edu
  • 10. Today
  • 11. Outline
  • 12. Outline
  • 13. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 14. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 15. http://www.youtube.com/watch?v=jj0WsQYtT7M
  • 16. Modeling & Simulation• Physics, astronomy, molecular dynamics, finance, etc.• Data and processing intensive• Requires high-performance computing (HPC)• Driving HPC architecture development
  • 17. (20 09)CS 264 Top Dog (2008)• Roadrunner, LANL • #1 on top500.org in 2008 (now #7) • 1.105 petaflop/s • 3000 nodes with dual-core AMD Opteron processors • Each node connected via PCIe to two IBM Cell processors • Nodes are connected via Infiniband 4x DDR
  • 18. http://www.top500.org/lists/2010/11
  • 19. Tianhe-1A at NSC Tianjin 2.507 Petaflop 7168 Tesla M2050 GPUs1 Petaflop/s = ~1M high-end laptops = ~world populationwith hand calculators 24/7/365 for ~16 years Slide courtesy of Bill Dally (NVIDIA)
  • 20. http://news.cnet.com/8301-13924_3-20021122-64.html
  • 21. What $100+ million can buy you...Roadrunner (#7) Jaguar (#2)
  • 22. Road runn e r (#7 ) http://www.lanl.gov/roadrunner/
  • 23. (# 2) gu arJa
  • 24. C? HP ses houW
  • 25. Wh o us es H PC?
  • 26. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 27. Cloud Computing?
  • 28. Buzzword ?
  • 29. Careless Computing?
  • 30. Response from the legend:...
  • 31. http://techcrunch.com/2010/12/14/stallman-cloud-computing-careless-computing/
  • 32. Cloud Utility Computing? for CS264
  • 33. http://code.google.com/appengine/
  • 34. http://aws.amazon.com/ec2/
  • 35. http://www.nilkanth.com/my-uploads/2008/04/comparingpaas.png
  • 36. Web Data Explosion
  • 37. How much Data?• Google processes 24 PB / day, 8 EB / year (’10)• Wayback Machine has 3 PB,100 TB/month (’09)• Facebook user data: 2.5 PB, 15 TB/day (’09)• Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)• eBay user data: 6.5 PB, 50 TB/day (’09)• “all words ever spoken by human beings”~ 42 ZB Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
  • 38. “640k ought to be enough for anybody.” - Bill Gates just a rumor (1981)
  • 39. Disk Throughput• Average Google job size: 180 GB• 1 SATA HDD = 75 MB / sec• Time to read 180 GB off disk: 45 mins• Solution: parallel reads• 1000 HDDs = 75 GB / sec• Google’s solutions: BigTable, MapReduce, etc.
  • 40. Cloud Computing• Clear trend: centralization of computing resources in large data centers• Q: What do Oregon, Iceland, and abandoned mines have in common?• A: Fiber, juice, and space• Utility computing!
  • 41. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 42. Instrument Data ExplosionSloan Digital Sky Survey ATLUM / Connectome Project
  • 43. Another example? hint: Switzerland
  • 44. CERN in 2005....
  • 45. ool 2005 me r Sch ERN SumC
  • 46. ool 2005 me r Sch ERN SumC bad taste party...
  • 47. ool 2005 me r Sch ERN SumC pitchers...
  • 48. LHC
  • 49. LHC Maximilien Brice, © CERN
  • 50. LHC Maximilien Brice, © CERN
  • 51. N’s Cl usterCER ~5000 nodes (‘05)
  • 52. ool 2005 me r Sch ERN SumCpresentations...
  • 53. Slide courtesy of Hanspeter Pfister Diesel Powered HPC Life Support…Murchison Widefield Array
  • 54. How much Data?• NOAA has ~1 PB climate data (‘07)• MWA radio telescope: 8 GB/sec of data• Connectome: 1 PB / mm3 of brain tissue (1 EB for 1 cm3)• CERN’s LHC will generate 15 PB a year (‘08)
  • 55. High Flops / Watt
  • 56. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 57. Computer Games• PC gaming business: • $15B / year market (2010) • $22B / year in 2015 ? • WOW: $1B / year• NVIDIA Shipped 1B GPUs since 1993: • 10 years to ship 200M GPUs (1993-2003)• 1/3 of all PCs have more than one GPU• High-end GPUs sell for around $300• Now used for science application
  • 58. CryEngine 2, CRYTEK
  • 59. Many-Core ProcessorsIntel Core i7-980X Extreme NVIDIA GTX 580 SC 6 cores 512 cores 1.17B transistors 3B transistors http://en.wikipedia.org/wiki/Transistor_count
  • 60. Data Throughput Massive Data GPUParallelismInstruction Level CPUParallelism Data Fits in Cache Huge Data David Kirk, NVIDIA
  • 61. 3 of Top5 Supercomputers $"!! $!!! #"!! !"#$%&() #!!! "!! ! %&()*+#, -./0 1*2/3* %4/25* 6788*09:: Bill Dally, NVIDIA
  • 62. Personal Supercomputers~4 Teraflops@ 1500 Watts
  • 63. Disruptive Technologies• Utility computing • Commodity off-the-shelf (COTS) hardware • Compute servers with 100s-1000s of processors• High-throughput computing • Mass-market hardware • Many-core processors with 100s-1000s of cores • High compute density / high flops/W
  • 64. Green HPCNVIDIA/NCSA Green 500 Entry
  • 65. Green HPCNVIDIA/NCSA Green 500 Entry 128 nodes, each with: 1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak) 1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak) 4x QDR Infiniband 4 GB DRAM Theoretical Peak Perf: 68.95 TF Footprint: ~20 ft^2 => 3.45 TF/ft^2 Cost: $500K (street price) => 137.9 MF/$ Linpack: 33.62 TF, 36.0 kW => 934 MF/W
  • 66. One more thing...
  • 67. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 68. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  • 69. Massively Parallel Human Computing ???• “Crowdsourcing”• Amazon Mechanical Turk (artificial artificial intelligence)• Wikipedia• Stackoverflow• etc.
  • 70. What is this course about?
  • 71. What is this course about? Massively parallel processors • GPU computing with CUDA Cloud computing • Amazon’s EC2 as an example of utility computing • MapReduce, the “back-end” of cloud computing
  • 72. Less like Rodin...
  • 73. More like Bob...
  • 74. Outline
  • 75. wikipedia.org
  • 76. Anant Agarwal, MIT
  • 77. Power Cost• Power ∝ Voltage2 x Frequency• Frequency ∝ Voltage• Power ∝ Frequency3 Jack Dongarra
  • 78. Power Cost Cores Freq Perf Power P/W CPU 1 1 1 1 1“New” CPU 1 1.5 1.5 3.3 0.45xMulticore 2 0.75 1.5 0.8 1.88x Jack Dongarra
  • 79. Problem with Buses Anant Agarwal, MIT
  • 80. Problem with Memory http://www.OpenSparc.net/
  • 81. Problem with Disks 64 MB / sec Tom’s Hardware
  • 82. Good News• Moore’s Law marches on• Chip real-estate is essentially free• Many-core architectures are commodities• Space for new innovations
  • 83. Bad News• Power limits improvements in clock speed• Parallelism is the only route to improve performance• Computation / communication ratio will get worse• More frequent hardware failures?
  • 84. BadNews
  • 85. A “Simple” Matter of Software• We have to use all the cores efficiently• Careful data and memory management• Must rethink software design• Must rethink algorithms• Must learn new skills!• Must learn new strategies!• Must learn new tools...
  • 86. Our mantra: always use the right tool !
  • 87. Outline
  • 88. Instructor: Nicolas Pinto The Rowland Institute at Harvard HARVARD UNIVERSITY
  • 89. ~50% of is for vision!
  • 90. Everyone knows that...
  • 91. The ApproachReverse and Forward Engineering the Brain
  • 92. The ApproachReverse and Forward Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  • 93. t aflo ps ! in =2 0 pebra
  • 94. http://vimeo.com/7945275
  • 95. “ If you want to have good ideas you must have many ideas. ” “ Most of them will be wrong, and what you have to learn is which ones to throw away. ” Linus Pauling (double Nobel Prize Winner)
  • 96. High-throughput Screening
  • 97. The curse of speed...and the blessing of massively parallel computing thousands of big models large amounts of unsupervised learning experience
  • 98. The curse of speed...and the blessing of massively parallel computing No off-the-shelf solution? DIY! Engineering (Hardware/SysAdmin/Software) Science Leverage non-scientific high-tech markets and their $billions of R&D... Gaming: Graphics Cards (GPUs), PlayStation 3 Web 2.0: Cloud Computing (Amazon, Google)
  • 99. r ow n! u ild youB
  • 100. The blessing of GPUs DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)
  • 101. speed (in billion floating point operations per second) Q9450 (Matlab/C) [2008] 0.3 Q9450 (C/SSE) [2008] 9.07900GTX (OpenGL/Cg) [2006] 68.2 PS3/Cell (C/ASM) [2007] 111.4 8800GTX (CUDA1.x) [2007] 192.7 GTX280 (CUDA2.x) [2008] 339.3 cha n ging... e GTX480 (CUDA3.x) [2010] pe edu p is g a m 974.3 (Fermi) >1 000X s Pinto, Doukhan, DiCarlo, Cox PLoS 2009 Pinto, Cox GPU Comp. Gems 2011
  • 102. Tired Of Waiting For YourComputations? n your deskto p: go Supercomputin n of c h e a p a n d eneratio Prog ramm ing the next g sing CUDA all el hardware u massively par extensive g ive students designed to disruptive This IA P has been ne w potentially e in using a ses having ha nds- on experienc ab les the mas echnology en techno logy. This t apabilities. rcomputing c access to supe rog ramming the CUDA p e students to orp. which, has been an We will introduc NVIDIA C developed by u n if y in g t h e lan guage p li fy in g a n d t o w a rd s s im s e n t ia l s t e p el chips. es of m assively parall prog ramming ions from nero us contribut orted by ge te at Harvard , and MIT This IAP is supp stitu e Rowland In s given by NVID IA Corp. , Th e featuring talk ) and will b (OEIT , BCS, EECS . various fields experts from (IAP 09) 6. 963
  • 103. Co-Instructor:Hanspeter Pfister
  • 104. Visual Computing• Large image & video collections• Physically-based modeling• Face modeling and recognition• Visualization
  • 105. VolumePro 500 Released 1999
  • 106. GPGPU
  • 107. Connectome
  • 108. NSF CDI Grant ’08-’11
  • 109. NVIDIA CUDA Center of Excellence
  • 110. TFs• Claudio Andreoni (MIT Course 18)• Dwight Bell (Harvard DCE)• Krunal Patel (Accelereyes)• Jud Porter (Harvard SEAS)• Justin Riley (MIT OEIT)• Mike Roberts (Harvard SEAS)
  • 111. Claudio Andreoni(MIT Course 18)
  • 112. Dwight Bell(Harvard DCE)
  • 113. Krunal Patel(Accelereyes)
  • 114. Jud Porter(Harvard SEAS)
  • 115. Justin Riley(MIT OEIT)
  • 116. Mike Roberts(Harvard SEAS)
  • 117. About You
  • 118. About you...• Undergraduate ? Graduate ?• Programming ? >5 years ? <2 years ?• CUDA ? MPI ? MapReduce ?• CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ?• Humanities ? Social Sc ? Economy ?
  • 119. Outline
  • 120. CS 264 Goals• Have fun!• Learn basic principles of parallel computing• Learn programming with CUDA• Learn to program a cluster of GPUs (e.g. MPI)• Learn basics of EC2 and MapReduce• Learn new learning strategies, tools, etc.• Implement a final project
  • 121. Experimental Learning t, re pe epea Strategy peat,r ReMemory “recall”
  • 122. Lectures•Theory, Architecture, Patterns ?• Act 1: GPU Computing• Act II: Cloud Computing• Act III: Guest Lectures
  • 123. Lectures “Format”• 2x ~ 45min regular “lectures”• ~ 15min “Clinic” • we’ll be here to fix your problems• ~ 5 min: Life and Code “Hacking”: • GTD Zen • Presentation Zen • Ninja Programming Tricks & Tools, etc. • Interested? email staff+spotlight@cs264.org
  • 124. Act I: GPU Computing• Introduction to GPU Computing• CUDA Basics• CUDA Advanced• CUDA Ninja Tricks !
  • 125. l u t i on n k Convo Fi lterbaPerformance / Effort 3D Performance (g ops) Development Time (hours) 0.3Matlab 0.5 9.0 C/SSE 10.0 111.4 PS3 30.0 339.3GT200 10.0
  • 126. Empirical results... Performance (g ops) Q9450 (Matlab/C) [2008] 0.3 Q9450 (C/SSE) [2008] 9.0 7900GTX (Cg) [2006] 68.2 PS3/Cell (C/ASM) [2007] 111.48800GTX (CUDA1.x) [2007] 192.7 GTX280 (CUDA2.x) [2008] 339.3 . GTX480 (CUDA3.x) [2010] e cha nging.. 974.3 g am e edup is >1 0 00X sp
  • 127. Act II: Cloud Computing• Introduction to utility computing• EC2 & starcluster (Justin Riley, MIT OEIT)• Hadoop (Zak Stone, SEAS)• MapReduce with GPU Jobs on EC2
  • 128. Amazon’s Web Services• Elastic Compute Cloud (EC2) • Rent computing resources by the hour • Basic unit of accounting = instance-hour • Additional costs for bandwidth• You’ll be getting free AWS credits for course assignments
  • 129. MapReduce• Functional programming meets distributed processing• Processing of lists with <key, value> pairs• Batch data processing infrastructure• Move the computation where the data is
  • 130. Act III: Guest Lectures• Andreas Knockler (NYU): OpenCL & PyOpenCL• John Owens (UC Davis): fundamental algorithms/ data structures and irregular parallelism• Nathan Bell (NVIDIA): Thrust• Duane Merrill* (Virginia Tech): Ninja Tricks• Mike Bauer* (Stanford): Sequoia• Greg Diamos (Georgia Tech): Ocelot• Other lecturers* from Google,Yahoo, Sun, Intel, NCSA, AMD, Cloudera, etc.
  • 131. Labs• Lead by TF(s)• Work on an interesting small problem• From skeleton code to solution• Hands-on
  • 132. 53 Church St.
  • 133. 53 Church St.
  • 134. 53 Church St.
  • 135. 53 Church St., Room 10453 Church St., Rm 104 Thu, Fri 7.35-9.35 pm
  • 136. 53 Church St., Room 10553 Church St., Rm 105
  • 137. NVIDIA Fx4800 Quadro • MacPro • NVIDIA Fx4800 Quadro, 1.5 GB
  • 138. Resonance @ SEAS • Quad-core Intel Xeon host, 3 GHz, 8 GB • 8 Tesla S1070s (32 GPUs, 4 GB each) • 16 quad-core Intel Xeons, 2 GHz, 16 GB • http:// community.crimsongri d.harvard.edu/getting- started/resources/ resonance-cuda-host
  • 139. What do you need to know?• Programming (ideally in C / C++) • See HW 0• Basics of computer systems • CS 61 or similar
  • 140. Homeworks• Programming assignments• “Issue Spotter” (code debug & review, Q&A)• Contribution to the community (OSS, Wikipedia, Stackoverflow, etc.)• Due: Fridays at 11 pm EST • Hard deadline - 2 “bonus” days
  • 141. Office Hours• Lead by a TF• 104 @ 53 Church St (check website and news feed)
  • 142. Participation• HW0 (this week)• Mandatory attendance for guest lectures• forum.cs264.org • Answer questions, help others • Post relevant links and discussions (!)
  • 143. Final Project• Implement a substantial project• Pick from a list of suggested projects or design your own• Milestones along the way (idea, proposal, etc.)• In-class final presentations• $500+ price for the best project
  • 144. Grading• On a 0-100 scale • Participation: 10% • Homework: 50% • Final project: 40%
  • 145. www.cs264.org• Detailed schedule (soon)• News blog w/ RSS feed• Video feeds• Forum (forum.cs264.org)• Academic honesty policy• HW0 (due Fri 2/4)
  • 146. Thank you!
  • 147. one more thing from WikiLeaks?
  • 148. Is this course for me ???
  • 149. This course is not for you... • If you’re not genuinely interested in the topic • If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software • If you’re not ready to do a lot of programming • If you’re not open to thinking about computing in new ways • If you can’t put in the time Slide after Jimmy Lin, iSchool, Maryland
  • 150. Otherwise...It will be a richly rewarding experience!
  • 151. Guaranteed?!
  • 152. Be Patient Be FlexibleBe Constructive http://davidzinger.wordpress.com/2007/05/page/2/
  • 153. It would be a win-win-win situation!(The Office Season 2, Episode 27: Conflict Resolution)
  • 154. Hypergrowth ?
  • 155. Acknowledgements• Hanspeter Pfister & Henry Leitner, DCE• TFs• Rob Parrott & IT Team, SEAS• Gabe Russell & Video Team, DCE• NVIDIA, esp. David Luebke• Amazon
  • 156. CO ME
  • 157. Next?• Fill out the survey: http://bit.ly/enrb1r• Get ready for HW0 (Lab 1 & 2)• Subscribe to http://forum.cs264.org• Subscribe to RSS feed: http://bit.ly/eFIsqR