On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research)Presentation Transcript
David RichApril 2011 The Onset of Parallelism Changes in computer architecture and Microsoft’s role in the transition
Your introduction – somequestions…! What kind of software do you see yourself working on in the future? Scientific? Web? Games? Business?! Have you worked on a distributed app? MPI?! Have you used Visual Studio?! Which will limit performance in the future: Power consumption? Latency? Lack of parallelism? Bugs?
! Made in 1922 by Robert Flaherty! Considered to be the first full length documentary -though some scenes were staged! http://en.wikipedia.org/ wiki/ Nanook_of_the_north
Job SpecializationBricklayer / Masonry Industrial PipefitterCarpenter (construction)Caulker / Pointer / Cleaners Industrial WelderCement Mason (construction)Construction Lineman Ironworker, StructuralDrywall Finisher/Taper Laborer Marble Setter, MasonryElectrician, Elevator Mechanic Millwright ConstructionElectrician, HVAC--Environmental Control System Machinery ErectorServicer & Installer Operating EngineerElectrician, General Painter--Decorator / TrafficJourneyman (Inside) Control PainterElectrician, Limited Energy Pile DriverTechnician A Pipefitter • What about?Electrician, Limited Energy PlastererTechnician B Plumber – ArchitectElectrician, Limited Renewable Renewable Energy TechnicianEnergy Technician – Surveyor RooferElectrician, Limited Residential Scaffold Erector – InspectorElectrician, Sign Maker- Sheet Metal WorkerErector / Sign Hanger / Sign Solar Heating/Cooling • Or people that work inAssembler-FabricatorExterior/Interior Specialist Systems Installer the companies that Sprinkler Fitter(metal framing & drywall) Steamfitter produce pre-fabFinisher, MasonryFloorcoverer Technical Engineer components? Terrazzo Worker, MasonryGlazier (construction) – Pipes, wires, windows,Heat / Frost Insulator Tilesetter, Masonry Tree Trimmer, Power Line fixtures, etc.Heavy Duty Repairer Truck Driver (Heavy)
Preparing for the Future – What Will Your Machine Look Like in 5 to 10 Years?! Look at the Top500, predict and divide: 1. At any point in time, most organizations can afford a machine which is 1/1000th the size of the #1 machine on the Top500 2. Exaflop comes from 2x efficiency, 2x frequency and 100x the cores Today’s #1 Test: Is this within Exaflop Your Future Tianhe-1A your budget? Platform (1/1000th) Perf: 2.5 PFs 250 TFs 1000PFs 1PF Nodes 7,168 7 500,000? 500? Cores X86: 86,016 X86: 86 -- ~14 Xeons 130 Million 130 GPU: GPU: 3,211 -- ~7 Tesla Thousand 3,211,164 Cores…
Core Counts On the Rise3,500,000 Number of Cores in Top500 #1 Over Time Tianhe-1A GPUs Get to #1...3,000,000 250,000 Jaguar2,500,000200,0002,000,000 150,000 Blue Cores Gene1,500,000 RoadRunner 100,0001,000,000 50,000 500,000 ASCI Earth ASCI Red White Simulator Fujitsu - - Jun 93 Nov 93 Jun 94 Nov 94 Jun 95 Nov 95 Jun 96 Nov 96 Jun 97 Nov 97 Jun 98 Nov 98 Jun 99 Nov 99 Jun 00 Nov 00 Jun 01 Nov 01 Jun 02 Nov 02 Jun 03 Nov 03 Jun 05 Jun 05 Nov 05 Jun 06 Nov 06 Jun 07 Nov 07 Jun 08 Nov 08 June 09 Nov 09 Jun 10 Nov 10 14
Good News: Everybody gets a Petaflop!Bad News: You have to find 200,000 way parallelism
Caveat: No biology since high school…
Niche vs. Commodity Computing in HPC “Perfect Predator”Homogeneity Performance growth with decreasing cost and no code Commodity changes. Clusters ? Horizontal Industry 64bit x86 + Linux Cluster of SMP IBM, Dell–many HP, Commodity Clusters RISC + *nix + others Plus: MPI GPU, Multicore, Cloud, IBM, Digital, FPGA, “big data” & Vertically Integrated SGI… Windows! Single Machines IBM, Digital, Cray, HP (Apollo, Data General, ? Prime, Masscomp, Gould…) 80’s 90’s 00’s 10’s 20’s
2 years 6 years 12 MM users 2 Bil emails/day 7 years 5 Bil conf mins/yr. 11 years Update 12 Bil queries/mo. 12 years 40 Petabytes/ mo. 13 years 500 Million active Windows Live IDs 550 MM users/ 9.9 Billion messages / day via WL Messenger mo. Over 1 Million BPOS Users in 36 Countries15 years450 MM users
Microsoft’s Datacenter EvolutionDatacenter Co- Quincy and San Chicago and Dublin Modular Datacenter Location Antonio Generation 3 Generation 4 Generation 1 Generation 2 Facility PAC Server Capacity Time to Market Lower TCO
Generation 3 - Chicago Data Center $500M+ investment 1.5 million person hours-of-labor 3000 construction related jobs 3400 tons of steel 707,000 sq ft 190 miles of conduit 2400 tons of copper7.5 miles of chilled water piping 26,000 cubic yards of concrete
Visual Studio! Visual Studio is used by over half of the professional programmers in the world! VS2010 – released a year ago – has been downloaded over 7 million times (more than 4 million extension downloads)! Main point: when we release a new capability into Visual Studio it automatically gets large adoption! (story about the ISC developers)
Microsoft and GPUs The volume business….
GPU Hardware EvolutionYear Version Defining Feature1996 DirectX3 Hardware rasterization1997 DirectX5 2 Shading options to select1998 DirectX6 Multi-texture operations1999 DirectX7 Vertex Processing in hardware2000 DirectX8 Programmable Shaders: Vertex and Pixel2002 DirectX9 High Level Shading Language, 32 instr2003 DirectX9c 1000s of instructions per shader2006 DirectX10 Unified Shaders: consistent shader models2009 DirectX11 Compute Shader: explicit SIMD, random I/O
The GPGPU Software Stack High level tools and! Windows has broad libraries support at all levels: PGI “x86 CUDA”, CAPS, Culatools, Volara, • Supports all HW Acceleware • Each of CUDA, OpenCL and Low Level Programming DirectCompute CUDA, OpenCL, DirectCompute • Almost all high level tools and libraries Hardware GPU: AMD & NVIDIA Mullticore x86: AMD & Intel
DirectCompute ! What is DirectCompute? • Microso3’s GPGPU Programming Solu<on • API of the DirectX Family • Component of the Direct3D API ! Why Use DirectCompute Over Other APIs? • Interoperability with rest of 2D, 3D, Video rendering APIs (display computed results) • Cross-‐hardware compa<bility • Feature compa<bility guarantees • Access to ﬁxed-‐func<on hardware ! Used extensively by the gaming community http://msdn.microsoft.com/directx
GPGPU Development on Windows! Choice: CUDA, OpenCL or DirectCompute! Tools and libraries; Nsight and Visual Studio, PGI, CAPS, MATLAB, Jacket, PyCUDA, Quantifi, CUDA.NET, Culatools, NAG, Scicomp… many others! NVIDIA reports that over 80% of CUDA SDK downloads are for Windows
Microsoft and NVIDIANVIDIA’s Parallel Nsight is integratedwith Microsoft’s Visual Studio
MATLAB Computer Cluster Desktop Computer MATLAB Distributed Computing ServerParallel Computing Toolbox Windows HPC Server Workers
Cluster HPC ISV / OSS Excel MPI SOA Applications Applications HPC Middleware Pack SOA HPC Edition Operating Systems On Premise Cluster Computing *Note that in SP1 support for MPI applications on Azure does not exist.
Performance Parity BetweenLinux and Windows 1 Million active Cells, 1000 wells, Blackoil 5500 5000Elapsed Time [secs] 4500 4000 3500 3000 2500 2000 1500 1000 500Cores 1 2 4 6 8 16 24 32 48RedHat 5 U3 5200.43 3385.17 3095.72 2281.25 1790.59 1014.42 776.71 638.43 621.42Win HPC R2 SP1 5404.38 3298.55 3175.9 2171.37 1736.11 992.82 745.43 610.88 549.74 Make your choice based on features and TCO…
! Connects to the cluster as a SOA client Excel SOA Client ! VSTO code in workbook calls out to SOA Service ! Input and output managed by Excel developer ! Run multiple instances of Excel 2010 on an HPC Cluster Excel Workbook on ! Each instance runs an iteration of the same workbook the Cluster ! Can be launched from Excel 2010 or a Windows programNEW ! Excel Dialog Suppression ! Run User Defined Functions in parallel on a cluster ! Excel 2010 includes a new API and options for HPC Excel UDF on the cluster Cluster ! Support for .XLL files developed through Excel SDKNEW ! Easy to develop on a desktop and then deploy to a cluster
! Use Azure servers to run HPC compute Jobs ! Can be used to “burst-out” to the cloud to handle peak demand ! Can create clusters that include dedicated on-premise servers, non-dedicated workstations and shared Azure servers ! Jobs can run unchanged across all 3 types of compute nodes (no support for MPI in SP1) ! Azure nodes are added to cluster using the Administration console (just like Workstation nodes)HPC Clients Azure Head & Broker Nodes Jobs Requests Azure Gateway
Compute Nodes On-Premise and in Azure Simultaneously HPC Head Node Desktops • “Burst” into cloud on- demand while keeping control over data and corporate policies Broker Node On-premise Compute Nodes • Pay only for what you use • A stepping stone to hybrid Azure and public clouds. • Dynamically adjust how Azure much runs on-premise and in the cloud Compute Proxies Compute Instances
Parallel Development “Combined with Intel Parallel Studio, I think it is reasonable to say that Windows has the richest and most complete set of tools for multicore programming”. -- James Reinders, Intel, 12-April-2010
Solution Begins with DEVELOPERS Make it easier to express and manage the correctness, efficiency and maintainability of parallelism on Microsoft platforms for developers of all skill levelsEnable developers to Simplify the express parallelism process of easily and focus on designing and Improve the testing parallel the problem to be efficiency and solved applications scalability of parallel applications
Visual Studio 2010Tools, Programming Models, RuntimesTools Programming Models Parallel LINQ Parallel Parallel AgentsDebugger Task Parallel Pattern Tool Library Library Library Data Structures Data StructuresWindows Visual Studio .NET Framework 4 Visual C++ 10 Concurrency Runtime IDE Profiler ThreadPoolConcurrenc Task Scheduler y Task Scheduler Analysis Resource Manager Resource ManagerOperating UMSSystem Windows Threads Threads Managed Native Tooling
World’s Fastest House Construction Three and a Half Hours
http://www.microsoft.com/hpc David Rich darich at microsoft.com