• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research)

[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research)








Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    [Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research) [Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research) Presentation Transcript

    • David RichApril 2011 The Onset of Parallelism Changes in computer architecture and Microsoft’s role in the transition
    • Your introduction – somequestions…!  What kind of software do you see yourself working on in the future? Scientific? Web? Games? Business?!  Have you worked on a distributed app? MPI?!  Have you used Visual Studio?!  Which will limit performance in the future: Power consumption? Latency? Lack of parallelism? Bugs?
    • !   Made in 1922 by Robert Flaherty!   Considered to be the first full length documentary -though some scenes were staged!   http://en.wikipedia.org/ wiki/ Nanook_of_the_north
    • Job SpecializationBricklayer / Masonry Industrial PipefitterCarpenter (construction)Caulker / Pointer / Cleaners Industrial WelderCement Mason (construction)Construction Lineman Ironworker, StructuralDrywall Finisher/Taper Laborer Marble Setter, MasonryElectrician, Elevator Mechanic Millwright ConstructionElectrician, HVAC--Environmental Control System Machinery ErectorServicer & Installer Operating EngineerElectrician, General Painter--Decorator / TrafficJourneyman (Inside) Control PainterElectrician, Limited Energy Pile DriverTechnician A Pipefitter •  What about?Electrician, Limited Energy PlastererTechnician B Plumber –  ArchitectElectrician, Limited Renewable Renewable Energy TechnicianEnergy Technician –  Surveyor RooferElectrician, Limited Residential Scaffold Erector –  InspectorElectrician, Sign Maker- Sheet Metal WorkerErector / Sign Hanger / Sign Solar Heating/Cooling •  Or people that work inAssembler-FabricatorExterior/Interior Specialist Systems Installer the companies that Sprinkler Fitter(metal framing & drywall) Steamfitter produce pre-fabFinisher, MasonryFloorcoverer Technical Engineer components? Terrazzo Worker, MasonryGlazier (construction) –  Pipes, wires, windows,Heat / Frost Insulator Tilesetter, Masonry Tree Trimmer, Power Line fixtures, etc.Heavy Duty Repairer Truck Driver (Heavy)
    •  GuggenheimMuseum inBilbaoAcorn pre-fabhouse 
    • Preparing for the Future – What Will Your Machine Look Like in 5 to 10 Years?!   Look at the Top500, predict and divide: 1. At any point in time, most organizations can afford a machine which is 1/1000th the size of the #1 machine on the Top500 2. Exaflop comes from 2x efficiency, 2x frequency and 100x the cores Today’s #1 Test: Is this within Exaflop Your Future Tianhe-1A your budget? Platform (1/1000th) Perf: 2.5 PFs 250 TFs 1000PFs 1PF Nodes 7,168 7 500,000? 500? Cores X86: 86,016 X86: 86 -- ~14 Xeons 130 Million 130 GPU: GPU: 3,211 -- ~7 Tesla Thousand 3,211,164 Cores…
    • Core Counts On the Rise3,500,000 Number of Cores in Top500 #1 Over Time Tianhe-1A GPUs Get to #1...3,000,000 250,000 Jaguar2,500,000200,0002,000,000 150,000 Blue Cores Gene1,500,000 RoadRunner 100,0001,000,000 50,000 500,000 ASCI Earth ASCI Red White Simulator Fujitsu - - Jun 93 Nov 93 Jun 94 Nov 94 Jun 95 Nov 95 Jun 96 Nov 96 Jun 97 Nov 97 Jun 98 Nov 98 Jun 99 Nov 99 Jun 00 Nov 00 Jun 01 Nov 01 Jun 02 Nov 02 Jun 03 Nov 03 Jun 05 Jun 05 Nov 05 Jun 06 Nov 06 Jun 07 Nov 07 Jun 08 Nov 08 June 09 Nov 09 Jun 10 Nov 10 14
    • Good News: Everybody gets a Petaflop!Bad News: You have to find 200,000 way parallelism
    • Caveat: No biology since high school…
    • Niche vs. Commodity Computing in HPC “Perfect Predator”Homogeneity Performance growth with decreasing cost and no code Commodity changes. Clusters ? Horizontal Industry 64bit x86 + Linux Cluster of SMP IBM, Dell–many HP, Commodity Clusters RISC + *nix + others Plus: MPI GPU, Multicore, Cloud, IBM, Digital, FPGA, “big data” & Vertically Integrated SGI… Windows! Single Machines IBM, Digital, Cray, HP (Apollo, Data General, ? Prime, Masscomp, Gould…) 80’s 90’s 00’s 10’s 20’s
    • www.calxeda.com
    • 2 years 6 years 12 MM users 2 Bil emails/day 7 years 5 Bil conf mins/yr. 11 years Update 12 Bil queries/mo. 12 years 40 Petabytes/ mo. 13 years 500 Million active Windows Live IDs 550 MM users/ 9.9 Billion messages / day via WL Messenger mo. Over 1 Million BPOS Users in 36 Countries15 years450 MM users
    • Microsoft’s Datacenter EvolutionDatacenter Co- Quincy and San Chicago and Dublin Modular Datacenter Location Antonio Generation 3 Generation 4 Generation 1 Generation 2 Facility PAC Server Capacity Time to Market Lower TCO
    • Generation 3 - Chicago Data Center $500M+ investment 1.5 million person hours-of-labor 3000 construction related jobs 3400 tons of steel 707,000 sq ft 190 miles of conduit 2400 tons of copper7.5 miles of chilled water piping 26,000 cubic yards of concrete
    • Visual Studio!   Visual Studio is used by over half of the professional programmers in the world!   VS2010 – released a year ago – has been downloaded over 7 million times (more than 4 million extension downloads)!   Main point: when we release a new capability into Visual Studio it automatically gets large adoption!   (story about the ISC developers)
    • Microsoft and GPUs The volume business….
    • GPU Hardware EvolutionYear Version Defining Feature1996 DirectX3 Hardware rasterization1997 DirectX5 2 Shading options to select1998 DirectX6 Multi-texture operations1999 DirectX7 Vertex Processing in hardware2000 DirectX8 Programmable Shaders: Vertex and Pixel2002 DirectX9 High Level Shading Language, 32 instr2003 DirectX9c 1000s of instructions per shader2006 DirectX10 Unified Shaders: consistent shader models2009 DirectX11 Compute Shader: explicit SIMD, random I/O
    • The GPGPU Software Stack High level tools and!  Windows has broad libraries support at all levels: PGI “x86 CUDA”, CAPS, Culatools, Volara, • Supports all HW Acceleware • Each of CUDA, OpenCL and Low Level Programming DirectCompute CUDA, OpenCL, DirectCompute • Almost all high level tools and libraries Hardware GPU: AMD & NVIDIA Mullticore x86: AMD & Intel
    • DirectCompute  !  What  is  DirectCompute?   • Microso3’s  GPGPU  Programming  Solu<on   • API  of  the  DirectX  Family   • Component  of  the  Direct3D  API  !  Why  Use  DirectCompute  Over  Other  APIs?   • Interoperability  with  rest  of  2D,  3D,  Video  rendering  APIs   (display  computed  results)   • Cross-­‐hardware  compa<bility   • Feature  compa<bility  guarantees   • Access  to  fixed-­‐func<on  hardware  !  Used  extensively  by  the  gaming  community   http://msdn.microsoft.com/directx
    • GPGPU Development on Windows!  Choice: CUDA, OpenCL or DirectCompute!  Tools and libraries; Nsight and Visual Studio, PGI, CAPS, MATLAB, Jacket, PyCUDA, Quantifi, CUDA.NET, Culatools, NAG, Scicomp… many others!  NVIDIA reports that over 80% of CUDA SDK downloads are for Windows
    • Microsoft and NVIDIANVIDIA’s Parallel Nsight is integratedwith Microsoft’s Visual Studio
    • MATLAB Computer Cluster Desktop Computer MATLAB Distributed Computing ServerParallel Computing Toolbox Windows HPC Server Workers
    • Cluster HPC ISV / OSS Excel MPI SOA Applications Applications HPC Middleware Pack SOA HPC Edition Operating Systems On Premise Cluster Computing  *Note that in SP1 support for MPI applications on Azure does not exist.
    • Performance Parity BetweenLinux and Windows 1 Million active Cells, 1000 wells, Blackoil 5500 5000Elapsed Time [secs] 4500 4000 3500 3000 2500 2000 1500 1000 500Cores 1 2 4 6 8 16 24 32 48RedHat 5 U3 5200.43 3385.17 3095.72 2281.25 1790.59 1014.42 776.71 638.43 621.42Win HPC R2 SP1 5404.38 3298.55 3175.9 2171.37 1736.11 992.82 745.43 610.88 549.74 Make your choice based on features and TCO…
    • NEW
    • !   Connects to the cluster as a SOA client Excel SOA Client !   VSTO code in workbook calls out to SOA Service !   Input and output managed by Excel developer !   Run multiple instances of Excel 2010 on an HPC Cluster Excel Workbook on !   Each instance runs an iteration of the same workbook the Cluster !   Can be launched from Excel 2010 or a Windows programNEW !   Excel Dialog Suppression !   Run User Defined Functions in parallel on a cluster !   Excel 2010 includes a new API and options for HPC Excel UDF on the cluster Cluster !   Support for .XLL files developed through Excel SDKNEW !   Easy to develop on a desktop and then deploy to a cluster
    • !   Use Azure servers to run HPC compute Jobs !   Can be used to “burst-out” to the cloud to handle peak demand !   Can create clusters that include dedicated on-premise servers, non-dedicated workstations and shared Azure servers !  Jobs can run unchanged across all 3 types of compute nodes (no support for MPI in SP1) !  Azure nodes are added to cluster using the Administration console (just like Workstation nodes)HPC Clients Azure Head & Broker Nodes Jobs Requests Azure Gateway
    • Compute Nodes On-Premise and in Azure Simultaneously HPC Head Node Desktops • “Burst” into cloud on- demand while keeping control over data and corporate policies Broker Node On-premise Compute Nodes • Pay only for what you use • A stepping stone to hybrid Azure and public clouds. • Dynamically adjust how Azure much runs on-premise and in the cloud Compute Proxies Compute Instances
    • Parallel Development “Combined with Intel Parallel Studio, I think it is reasonable to say that Windows has the richest and most complete set of tools for multicore programming”. -- James Reinders, Intel, 12-April-2010
    • Solution Begins with DEVELOPERS Make it easier to express and manage the correctness, efficiency and maintainability of parallelism on Microsoft platforms for developers of all skill levelsEnable developers to Simplify the express parallelism process of easily and focus on designing and Improve the testing parallel the problem to be efficiency and solved applications scalability of parallel applications
    • Visual Studio 2010Tools, Programming Models, RuntimesTools Programming Models Parallel LINQ Parallel Parallel AgentsDebugger Task Parallel Pattern Tool Library Library Library Data Structures Data StructuresWindows Visual Studio .NET Framework 4 Visual C++ 10 Concurrency Runtime IDE Profiler ThreadPoolConcurrenc Task Scheduler y Task Scheduler Analysis Resource Manager Resource ManagerOperating UMSSystem Windows Threads Threads Managed Native Tooling
    • World’s Fastest House Construction Three and a Half Hours
    • http://www.microsoft.com/hpc David Rich darich at microsoft.com
    • © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing marketconditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.