[Harvard CS264] 01 - Introduction

3,863 views
3,685 views

Published on

http://cs264.org

Published in: Education, Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,863
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
124
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

[Harvard CS264] 01 - Introduction

  1. 1. Massively Parallel Computing CS 264 / CSCI E-292Lecture #1: Introduction | January 25th, 2011 Nicolas Pinto (MIT, Harvard) pinto@mit.edu
  2. 2. ...
  3. 3. Distant Students
  4. 4. Take a picture with...
  5. 5. a friend I like
  6. 6. his d ogI like
  7. 7. cool hard ware
  8. 8. your m om
  9. 9. Send it to:pinto@mit.edu
  10. 10. Today
  11. 11. Outline
  12. 12. Outline
  13. 13. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  14. 14. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  15. 15. http://www.youtube.com/watch?v=jj0WsQYtT7M
  16. 16. Modeling & Simulation• Physics, astronomy, molecular dynamics, finance, etc.• Data and processing intensive• Requires high-performance computing (HPC)• Driving HPC architecture development
  17. 17. (20 09)CS 264 Top Dog (2008)• Roadrunner, LANL • #1 on top500.org in 2008 (now #7) • 1.105 petaflop/s • 3000 nodes with dual-core AMD Opteron processors • Each node connected via PCIe to two IBM Cell processors • Nodes are connected via Infiniband 4x DDR
  18. 18. http://www.top500.org/lists/2010/11
  19. 19. Tianhe-1A at NSC Tianjin 2.507 Petaflop 7168 Tesla M2050 GPUs1 Petaflop/s = ~1M high-end laptops = ~world populationwith hand calculators 24/7/365 for ~16 years Slide courtesy of Bill Dally (NVIDIA)
  20. 20. http://news.cnet.com/8301-13924_3-20021122-64.html
  21. 21. What $100+ million can buy you...Roadrunner (#7) Jaguar (#2)
  22. 22. Road runn e r (#7 ) http://www.lanl.gov/roadrunner/
  23. 23. (# 2) gu arJa
  24. 24. C? HP ses houW
  25. 25. Wh o us es H PC?
  26. 26. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  27. 27. Cloud Computing?
  28. 28. Buzzword ?
  29. 29. Careless Computing?
  30. 30. Response from the legend:...
  31. 31. http://techcrunch.com/2010/12/14/stallman-cloud-computing-careless-computing/
  32. 32. Cloud Utility Computing? for CS264
  33. 33. http://code.google.com/appengine/
  34. 34. http://aws.amazon.com/ec2/
  35. 35. http://www.nilkanth.com/my-uploads/2008/04/comparingpaas.png
  36. 36. Web Data Explosion
  37. 37. How much Data?• Google processes 24 PB / day, 8 EB / year (’10)• Wayback Machine has 3 PB,100 TB/month (’09)• Facebook user data: 2.5 PB, 15 TB/day (’09)• Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)• eBay user data: 6.5 PB, 50 TB/day (’09)• “all words ever spoken by human beings”~ 42 ZB Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
  38. 38. “640k ought to be enough for anybody.” - Bill Gates just a rumor (1981)
  39. 39. Disk Throughput• Average Google job size: 180 GB• 1 SATA HDD = 75 MB / sec• Time to read 180 GB off disk: 45 mins• Solution: parallel reads• 1000 HDDs = 75 GB / sec• Google’s solutions: BigTable, MapReduce, etc.
  40. 40. Cloud Computing• Clear trend: centralization of computing resources in large data centers• Q: What do Oregon, Iceland, and abandoned mines have in common?• A: Fiber, juice, and space• Utility computing!
  41. 41. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  42. 42. Instrument Data ExplosionSloan Digital Sky Survey ATLUM / Connectome Project
  43. 43. Another example? hint: Switzerland
  44. 44. CERN in 2005....
  45. 45. ool 2005 me r Sch ERN SumC
  46. 46. ool 2005 me r Sch ERN SumC bad taste party...
  47. 47. ool 2005 me r Sch ERN SumC pitchers...
  48. 48. LHC
  49. 49. LHC Maximilien Brice, © CERN
  50. 50. LHC Maximilien Brice, © CERN
  51. 51. N’s Cl usterCER ~5000 nodes (‘05)
  52. 52. ool 2005 me r Sch ERN SumCpresentations...
  53. 53. Slide courtesy of Hanspeter Pfister Diesel Powered HPC Life Support…Murchison Widefield Array
  54. 54. How much Data?• NOAA has ~1 PB climate data (‘07)• MWA radio telescope: 8 GB/sec of data• Connectome: 1 PB / mm3 of brain tissue (1 EB for 1 cm3)• CERN’s LHC will generate 15 PB a year (‘08)
  55. 55. High Flops / Watt
  56. 56. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  57. 57. Computer Games• PC gaming business: • $15B / year market (2010) • $22B / year in 2015 ? • WOW: $1B / year• NVIDIA Shipped 1B GPUs since 1993: • 10 years to ship 200M GPUs (1993-2003)• 1/3 of all PCs have more than one GPU• High-end GPUs sell for around $300• Now used for science application
  58. 58. CryEngine 2, CRYTEK
  59. 59. Many-Core ProcessorsIntel Core i7-980X Extreme NVIDIA GTX 580 SC 6 cores 512 cores 1.17B transistors 3B transistors http://en.wikipedia.org/wiki/Transistor_count
  60. 60. Data Throughput Massive Data GPUParallelismInstruction Level CPUParallelism Data Fits in Cache Huge Data David Kirk, NVIDIA
  61. 61. 3 of Top5 Supercomputers $"!! $!!! #"!! !"#$%&() #!!! "!! ! %&()*+#, -./0 1*2/3* %4/25* 6788*09:: Bill Dally, NVIDIA
  62. 62. Personal Supercomputers~4 Teraflops@ 1500 Watts
  63. 63. Disruptive Technologies• Utility computing • Commodity off-the-shelf (COTS) hardware • Compute servers with 100s-1000s of processors• High-throughput computing • Mass-market hardware • Many-core processors with 100s-1000s of cores • High compute density / high flops/W
  64. 64. Green HPCNVIDIA/NCSA Green 500 Entry
  65. 65. Green HPCNVIDIA/NCSA Green 500 Entry 128 nodes, each with: 1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak) 1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak) 4x QDR Infiniband 4 GB DRAM Theoretical Peak Perf: 68.95 TF Footprint: ~20 ft^2 => 3.45 TF/ft^2 Cost: $500K (street price) => 137.9 MF/$ Linpack: 33.62 TF, 36.0 kW => 934 MF/W
  66. 66. One more thing...
  67. 67. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  68. 68. Massively Parallel Computing pu ting om Supe r eC rcom putin any -co g M MPC H igh-T uting hrou ghpu p t Co om Hu mput dC Clou ma ing n? “C om pu tin g”
  69. 69. Massively Parallel Human Computing ???• “Crowdsourcing”• Amazon Mechanical Turk (artificial artificial intelligence)• Wikipedia• Stackoverflow• etc.
  70. 70. What is this course about?
  71. 71. What is this course about? Massively parallel processors • GPU computing with CUDA Cloud computing • Amazon’s EC2 as an example of utility computing • MapReduce, the “back-end” of cloud computing
  72. 72. Less like Rodin...
  73. 73. More like Bob...
  74. 74. Outline
  75. 75. wikipedia.org
  76. 76. Anant Agarwal, MIT
  77. 77. Power Cost• Power ∝ Voltage2 x Frequency• Frequency ∝ Voltage• Power ∝ Frequency3 Jack Dongarra
  78. 78. Power Cost Cores Freq Perf Power P/W CPU 1 1 1 1 1“New” CPU 1 1.5 1.5 3.3 0.45xMulticore 2 0.75 1.5 0.8 1.88x Jack Dongarra
  79. 79. Problem with Buses Anant Agarwal, MIT
  80. 80. Problem with Memory http://www.OpenSparc.net/
  81. 81. Problem with Disks 64 MB / sec Tom’s Hardware
  82. 82. Good News• Moore’s Law marches on• Chip real-estate is essentially free• Many-core architectures are commodities• Space for new innovations
  83. 83. Bad News• Power limits improvements in clock speed• Parallelism is the only route to improve performance• Computation / communication ratio will get worse• More frequent hardware failures?
  84. 84. BadNews
  85. 85. A “Simple” Matter of Software• We have to use all the cores efficiently• Careful data and memory management• Must rethink software design• Must rethink algorithms• Must learn new skills!• Must learn new strategies!• Must learn new tools...
  86. 86. Our mantra: always use the right tool !
  87. 87. Outline
  88. 88. Instructor: Nicolas Pinto The Rowland Institute at Harvard HARVARD UNIVERSITY
  89. 89. ~50% of is for vision!
  90. 90. Everyone knows that...
  91. 91. The ApproachReverse and Forward Engineering the Brain
  92. 92. The ApproachReverse and Forward Engineering the Brain REVERSE FORWARD Study Build Natural System Artificial System
  93. 93. t aflo ps ! in =2 0 pebra
  94. 94. http://vimeo.com/7945275
  95. 95. “ If you want to have good ideas you must have many ideas. ” “ Most of them will be wrong, and what you have to learn is which ones to throw away. ” Linus Pauling (double Nobel Prize Winner)
  96. 96. High-throughput Screening
  97. 97. The curse of speed...and the blessing of massively parallel computing thousands of big models large amounts of unsupervised learning experience
  98. 98. The curse of speed...and the blessing of massively parallel computing No off-the-shelf solution? DIY! Engineering (Hardware/SysAdmin/Software) Science Leverage non-scientific high-tech markets and their $billions of R&D... Gaming: Graphics Cards (GPUs), PlayStation 3 Web 2.0: Cloud Computing (Amazon, Google)
  99. 99. r ow n! u ild youB
  100. 100. The blessing of GPUs DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)
  101. 101. speed (in billion floating point operations per second) Q9450 (Matlab/C) [2008] 0.3 Q9450 (C/SSE) [2008] 9.07900GTX (OpenGL/Cg) [2006] 68.2 PS3/Cell (C/ASM) [2007] 111.4 8800GTX (CUDA1.x) [2007] 192.7 GTX280 (CUDA2.x) [2008] 339.3 cha n ging... e GTX480 (CUDA3.x) [2010] pe edu p is g a m 974.3 (Fermi) >1 000X s Pinto, Doukhan, DiCarlo, Cox PLoS 2009 Pinto, Cox GPU Comp. Gems 2011
  102. 102. Tired Of Waiting For YourComputations? n your deskto p: go Supercomputin n of c h e a p a n d eneratio Prog ramm ing the next g sing CUDA all el hardware u massively par extensive g ive students designed to disruptive This IA P has been ne w potentially e in using a ses having ha nds- on experienc ab les the mas echnology en techno logy. This t apabilities. rcomputing c access to supe rog ramming the CUDA p e students to orp. which, has been an We will introduc NVIDIA C developed by u n if y in g t h e lan guage p li fy in g a n d t o w a rd s s im s e n t ia l s t e p el chips. es of m assively parall prog ramming ions from nero us contribut orted by ge te at Harvard , and MIT This IAP is supp stitu e Rowland In s given by NVID IA Corp. , Th e featuring talk ) and will b (OEIT , BCS, EECS . various fields experts from (IAP 09) 6. 963
  103. 103. Co-Instructor:Hanspeter Pfister
  104. 104. Visual Computing• Large image & video collections• Physically-based modeling• Face modeling and recognition• Visualization
  105. 105. VolumePro 500 Released 1999
  106. 106. GPGPU
  107. 107. Connectome
  108. 108. NSF CDI Grant ’08-’11
  109. 109. NVIDIA CUDA Center of Excellence
  110. 110. TFs• Claudio Andreoni (MIT Course 18)• Dwight Bell (Harvard DCE)• Krunal Patel (Accelereyes)• Jud Porter (Harvard SEAS)• Justin Riley (MIT OEIT)• Mike Roberts (Harvard SEAS)
  111. 111. Claudio Andreoni(MIT Course 18)
  112. 112. Dwight Bell(Harvard DCE)
  113. 113. Krunal Patel(Accelereyes)
  114. 114. Jud Porter(Harvard SEAS)
  115. 115. Justin Riley(MIT OEIT)
  116. 116. Mike Roberts(Harvard SEAS)
  117. 117. About You
  118. 118. About you...• Undergraduate ? Graduate ?• Programming ? >5 years ? <2 years ?• CUDA ? MPI ? MapReduce ?• CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ?• Humanities ? Social Sc ? Economy ?
  119. 119. Outline
  120. 120. CS 264 Goals• Have fun!• Learn basic principles of parallel computing• Learn programming with CUDA• Learn to program a cluster of GPUs (e.g. MPI)• Learn basics of EC2 and MapReduce• Learn new learning strategies, tools, etc.• Implement a final project
  121. 121. Experimental Learning t, re pe epea Strategy peat,r ReMemory “recall”
  122. 122. Lectures•Theory, Architecture, Patterns ?• Act 1: GPU Computing• Act II: Cloud Computing• Act III: Guest Lectures
  123. 123. Lectures “Format”• 2x ~ 45min regular “lectures”• ~ 15min “Clinic” • we’ll be here to fix your problems• ~ 5 min: Life and Code “Hacking”: • GTD Zen • Presentation Zen • Ninja Programming Tricks & Tools, etc. • Interested? email staff+spotlight@cs264.org
  124. 124. Act I: GPU Computing• Introduction to GPU Computing• CUDA Basics• CUDA Advanced• CUDA Ninja Tricks !
  125. 125. l u t i on n k Convo Fi lterbaPerformance / Effort 3D Performance (g ops) Development Time (hours) 0.3Matlab 0.5 9.0 C/SSE 10.0 111.4 PS3 30.0 339.3GT200 10.0
  126. 126. Empirical results... Performance (g ops) Q9450 (Matlab/C) [2008] 0.3 Q9450 (C/SSE) [2008] 9.0 7900GTX (Cg) [2006] 68.2 PS3/Cell (C/ASM) [2007] 111.48800GTX (CUDA1.x) [2007] 192.7 GTX280 (CUDA2.x) [2008] 339.3 . GTX480 (CUDA3.x) [2010] e cha nging.. 974.3 g am e edup is >1 0 00X sp
  127. 127. Act II: Cloud Computing• Introduction to utility computing• EC2 & starcluster (Justin Riley, MIT OEIT)• Hadoop (Zak Stone, SEAS)• MapReduce with GPU Jobs on EC2
  128. 128. Amazon’s Web Services• Elastic Compute Cloud (EC2) • Rent computing resources by the hour • Basic unit of accounting = instance-hour • Additional costs for bandwidth• You’ll be getting free AWS credits for course assignments
  129. 129. MapReduce• Functional programming meets distributed processing• Processing of lists with <key, value> pairs• Batch data processing infrastructure• Move the computation where the data is
  130. 130. Act III: Guest Lectures• Andreas Knockler (NYU): OpenCL & PyOpenCL• John Owens (UC Davis): fundamental algorithms/ data structures and irregular parallelism• Nathan Bell (NVIDIA): Thrust• Duane Merrill* (Virginia Tech): Ninja Tricks• Mike Bauer* (Stanford): Sequoia• Greg Diamos (Georgia Tech): Ocelot• Other lecturers* from Google,Yahoo, Sun, Intel, NCSA, AMD, Cloudera, etc.
  131. 131. Labs• Lead by TF(s)• Work on an interesting small problem• From skeleton code to solution• Hands-on
  132. 132. 53 Church St.
  133. 133. 53 Church St.
  134. 134. 53 Church St.
  135. 135. 53 Church St., Room 10453 Church St., Rm 104 Thu, Fri 7.35-9.35 pm
  136. 136. 53 Church St., Room 10553 Church St., Rm 105
  137. 137. NVIDIA Fx4800 Quadro • MacPro • NVIDIA Fx4800 Quadro, 1.5 GB
  138. 138. Resonance @ SEAS • Quad-core Intel Xeon host, 3 GHz, 8 GB • 8 Tesla S1070s (32 GPUs, 4 GB each) • 16 quad-core Intel Xeons, 2 GHz, 16 GB • http:// community.crimsongri d.harvard.edu/getting- started/resources/ resonance-cuda-host
  139. 139. What do you need to know?• Programming (ideally in C / C++) • See HW 0• Basics of computer systems • CS 61 or similar
  140. 140. Homeworks• Programming assignments• “Issue Spotter” (code debug & review, Q&A)• Contribution to the community (OSS, Wikipedia, Stackoverflow, etc.)• Due: Fridays at 11 pm EST • Hard deadline - 2 “bonus” days
  141. 141. Office Hours• Lead by a TF• 104 @ 53 Church St (check website and news feed)
  142. 142. Participation• HW0 (this week)• Mandatory attendance for guest lectures• forum.cs264.org • Answer questions, help others • Post relevant links and discussions (!)
  143. 143. Final Project• Implement a substantial project• Pick from a list of suggested projects or design your own• Milestones along the way (idea, proposal, etc.)• In-class final presentations• $500+ price for the best project
  144. 144. Grading• On a 0-100 scale • Participation: 10% • Homework: 50% • Final project: 40%
  145. 145. www.cs264.org• Detailed schedule (soon)• News blog w/ RSS feed• Video feeds• Forum (forum.cs264.org)• Academic honesty policy• HW0 (due Fri 2/4)
  146. 146. Thank you!
  147. 147. one more thing from WikiLeaks?
  148. 148. Is this course for me ???
  149. 149. This course is not for you... • If you’re not genuinely interested in the topic • If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software • If you’re not ready to do a lot of programming • If you’re not open to thinking about computing in new ways • If you can’t put in the time Slide after Jimmy Lin, iSchool, Maryland
  150. 150. Otherwise...It will be a richly rewarding experience!
  151. 151. Guaranteed?!
  152. 152. Be Patient Be FlexibleBe Constructive http://davidzinger.wordpress.com/2007/05/page/2/
  153. 153. It would be a win-win-win situation!(The Office Season 2, Episode 27: Conflict Resolution)
  154. 154. Hypergrowth ?
  155. 155. Acknowledgements• Hanspeter Pfister & Henry Leitner, DCE• TFs• Rob Parrott & IT Team, SEAS• Gabe Russell & Video Team, DCE• NVIDIA, esp. David Luebke• Amazon
  156. 156. CO ME
  157. 157. Next?• Fill out the survey: http://bit.ly/enrb1r• Get ready for HW0 (Lab 1 & 2)• Subscribe to http://forum.cs264.org• Subscribe to RSS feed: http://bit.ly/eFIsqR

×