Future of Scientific Computing Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft...
Supercomputing Goes Personal Windows Server 2003 SP1 Solaris 2.5.1 UNICOS OS 4 x 2.2GHz x64 4GB, GigE 24 x 333MHz Ultra-SP...
Molecular Biologist’s Workstation <ul><li>High-end workstation with internal cluster nodes </li></ul><ul><ul><li>8 Opteron...
The Future: Supercomputing on a Chip <ul><li>IBM Cell processor </li></ul><ul><ul><li>256 Gflops today </li></ul></ul><ul>...
The Continuing Trend Towards Decentralized, Dedicated Resources Grids of personal &  departmental clusters Personal workst...
The Evolving Nature of HPC <ul><li>HPC Application Integration </li></ul><ul><li>Future scenario </li></ul><ul><li>Multipl...
Exploding Data Sizes <ul><li>Experimental data: TBs    PBs </li></ul><ul><li>Modeling data: </li></ul><ul><ul><li>Today: ...
How Do You Move A Terabyte? * *Material courtesy of Jim Gray 14 minutes 617 200 1,920,000 9600 OC 192 2.2 hours 1000 Gbps ...
Anticipated HPC Grid Topology <ul><li>Islands of high connectivity </li></ul><ul><li>Simulations done on personal & workgr...
Data Analysis and Mining <ul><li>Traditional approach: </li></ul><ul><ul><li>Keep data in flat files </li></ul></ul><ul><u...
Is That the End of the Story? Relational Data warehouse Workgroup cluster Personal cluster
Too Much Complexity Relational Data warehouse Workgroup cluster Personal cluster <ul><li>Distributed systems issues: </li>...
Separating the Domain Scientist from the Computer Scientist Computer scientist Computational scientist Domain  scientist P...
Scientific Information Worker: Past and Future <ul><li>Past </li></ul><ul><li>Buy lab equipment </li></ul><ul><li>Keep lab...
Upcoming SlideShare
Loading in …5
×

Future of Scientific Computing Marvin Theimer

895 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
895
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • © 2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 06/21/10 07:33
  • Future of Scientific Computing Marvin Theimer

    1. 2. Future of Scientific Computing Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft Corporation
    2. 3. Supercomputing Goes Personal Windows Server 2003 SP1 Solaris 2.5.1 UNICOS OS 4 x 2.2GHz x64 4GB, GigE 24 x 333MHz Ultra-SPARCII, 24GB, SBus 16 x Vector 4GB, Bus Architecture < $4,000 (250x drop) $1,000,000 (40x drop) $40,000,000 Price Bioinformatics, Materials Sciences, Digital Media Manufacturing, Energy, Finance, Telecom Classified, Climate, Physics Research Applications Every Engineer & Scientist Large Enterprises Government Labs Customers N/A 500 1 Top500 # ~10 ~10 ~10 GFlops Shuttle @ NewEgg.com Sun HPC10000 Cray Y-MP C916 System 2005 1998 1991
    3. 4. Molecular Biologist’s Workstation <ul><li>High-end workstation with internal cluster nodes </li></ul><ul><ul><li>8 Opteron, 20 Gflops workstation/cluster for O($10,000) </li></ul></ul><ul><li>Turn-key system purchased from a standard OEM </li></ul><ul><li>Pre-installed set of bioinformatics applications </li></ul><ul><li>Run interactive workstation applications that offload computationally intensive tasks to attached cluster nodes </li></ul><ul><li>Run workflows consisting of visualization and analysis programs that process the outputs of simulations running on attached cluster nodes </li></ul>
    4. 5. The Future: Supercomputing on a Chip <ul><li>IBM Cell processor </li></ul><ul><ul><li>256 Gflops today </li></ul></ul><ul><ul><li>4 node personal cluster => 1 Tflops </li></ul></ul><ul><ul><li>32 node personal cluster => Top100 </li></ul></ul><ul><li>Intel many-core chips </li></ul><ul><ul><li>“ 100’s of cores on a chip in 2015” (Justin Rattner, Intel) </li></ul></ul><ul><ul><li>“ 4 cores”/Tflop => 25 Tflops/chip </li></ul></ul>
    5. 6. The Continuing Trend Towards Decentralized, Dedicated Resources Grids of personal & departmental clusters Personal workstations & departmental servers Minicomputers Mainframes
    6. 7. The Evolving Nature of HPC <ul><li>HPC Application Integration </li></ul><ul><li>Future scenario </li></ul><ul><li>Multiple simulations and data sources integrated into a seamless application workflow </li></ul><ul><li>Network topology and latency awareness for optimal distribution of computation </li></ul><ul><li>Structured data storage with rich meta-data </li></ul><ul><li>Applications and data potentially span organizational boundaries </li></ul><ul><li>Personal/Workgroup Cluster </li></ul><ul><li>Emerging scenario </li></ul><ul><li>Clusters are pre-packaged OEM appliances, purchased and managed by end-users </li></ul><ul><li>Desktop HPC applications transparently and interactively make use of cluster resources </li></ul><ul><li>Desktop development tools integration </li></ul><ul><li>Departmental Cluster </li></ul><ul><li>Conventional scenario </li></ul><ul><li>IT owns large clusters due to complexity and allocates resources on per job basis </li></ul><ul><li>Users submit batch jobs via scripts </li></ul><ul><li>In-house and ISV apps, many based on MPI </li></ul>Scenario <ul><li>Scheduling multiple users’ applications onto scarce compute cycles </li></ul><ul><li>Cluster systems administration </li></ul><ul><li>Interactive applications </li></ul><ul><li>Compute grids: distributed systems management </li></ul><ul><li>Data-centric, “whole-system” workflows </li></ul><ul><li>Data grids: distributed data management </li></ul>Focus Interactive Computation and Visualization Manual, batch execution IT Mgr SQL
    7. 8. Exploding Data Sizes <ul><li>Experimental data: TBs  PBs </li></ul><ul><li>Modeling data: </li></ul><ul><ul><li>Today: </li></ul></ul><ul><ul><ul><li>10’s to 100’s of GB per simulation is the common case </li></ul></ul></ul><ul><ul><ul><li>Applications mostly run in isolation </li></ul></ul></ul><ul><ul><li>Tomorrow: </li></ul></ul><ul><ul><ul><li>10’s to 100’s of TBs, all of it to be archived </li></ul></ul></ul><ul><ul><ul><li>Whole-system modeling and multi-application workflows </li></ul></ul></ul>
    8. 9. How Do You Move A Terabyte? * *Material courtesy of Jim Gray 14 minutes 617 200 1,920,000 9600 OC 192 2.2 hours 1000 Gbps 1 day 100 100 Mpbs 14 hours 976 316 49,000 155 OC3 2 days 2,010 651 28,000 43 T3 2 months 2,469 800 1,200 1.5 T1 5 months 360 117 70 0.6 Home DSL 6 years 3,086 1,000 40 0.04 Home phone Time/TB $/TB Sent $/Mbps Rent $/month Speed Mbps Context 24 hours 50 100 FedEx LAN Setting 13 minutes 10000 10 Gpbs
    9. 10. Anticipated HPC Grid Topology <ul><li>Islands of high connectivity </li></ul><ul><li>Simulations done on personal & workgroup clusters </li></ul><ul><li>Data stored in data warehouses </li></ul><ul><li>Data analysis best done inside the data warehouse </li></ul><ul><li>Wide-area data sharing/replication via FedEx? </li></ul>Data warehouse Workgroup cluster Personal cluster
    10. 11. Data Analysis and Mining <ul><li>Traditional approach: </li></ul><ul><ul><li>Keep data in flat files </li></ul></ul><ul><ul><li>Write C or Perl programs to compute specific analysis queries </li></ul></ul><ul><ul><li>Problems with this approach: </li></ul></ul><ul><ul><ul><li>Imposes significant development times </li></ul></ul></ul><ul><ul><ul><li>Scientists must reinvent DB indexing and query technologies </li></ul></ul></ul><ul><ul><ul><li>Have to copy the data from the file system to the compute cluster for every query </li></ul></ul></ul><ul><li>Results from the astronomy community: </li></ul><ul><ul><li>Relational databases can yield speed-ups of one to two orders of magnitude </li></ul></ul><ul><ul><li>SQL + application/domain-specific stored procedures greatly simplify creation of analysis queries </li></ul></ul>
    11. 12. Is That the End of the Story? Relational Data warehouse Workgroup cluster Personal cluster
    12. 13. Too Much Complexity Relational Data warehouse Workgroup cluster Personal cluster <ul><li>Distributed systems issues: </li></ul><ul><li>Security </li></ul><ul><li>System management </li></ul><ul><li>Directory services </li></ul><ul><li>Storage management </li></ul><ul><li>Digital experimentation: </li></ul><ul><li>Experiment management </li></ul><ul><li>Provenance (data & workflows) </li></ul><ul><li>Version management (data & workflows) </li></ul><ul><li>Parallel application development: </li></ul><ul><li>Chip-level, node-level, cluster-level, LAN grid-level, WAN grid-level parallelism </li></ul><ul><li>OpenMP, MPI, HPF, Global Arrays, … </li></ul><ul><li>Component architectures </li></ul><ul><li>Performance configuration & tuning </li></ul><ul><li>Debugging/profiling/tracing/analysis </li></ul>Domain science 2004 NAS supercomputing report: O(35) new computational scientists graduated per year
    13. 14. Separating the Domain Scientist from the Computer Scientist Computer scientist Computational scientist Domain scientist Parallel domain application development Parallel/distributed file systems, relational data warehouses, dynamic systems management, Web Services & HPC grids (Interactive) scientific workflow, integrated with collaboration-enhanced office automation tools Concrete concurrency Abstract concurrency Concrete workflow Abstract workflow Write scientific paper (Word) Record experiment data (Excel) Individual experiment run (Workflow orchestrator) Analyze data (SQL-Server) Share paper with co-authors (Sharepoint) Collaborate with co-authors (NetMeeting) Example:
    14. 15. Scientific Information Worker: Past and Future <ul><li>Past </li></ul><ul><li>Buy lab equipment </li></ul><ul><li>Keep lab notebook </li></ul><ul><li>Run experiments by hand </li></ul><ul><li>Assemble & analyze data (using stat pkg) </li></ul><ul><li>Collaborate by phone/email; write up results with Latex </li></ul><ul><li>Metaphor: </li></ul><ul><li>Physical experimentation </li></ul><ul><li>“ Do it yourself” </li></ul><ul><li>Lots of disparate systems/pieces </li></ul><ul><li>Future </li></ul><ul><li>Buy hardware & software </li></ul><ul><li>Automatic provenance </li></ul><ul><li>Workflow with 3 rd party domain packages </li></ul><ul><li>Excel & Access/Sql-Server </li></ul><ul><li>Office tool suite with collaboration support </li></ul><ul><li>Metaphor: </li></ul><ul><li>Digital experimentation </li></ul><ul><li>Turn-key desktop supercomputer </li></ul><ul><li>Single integrated system </li></ul>

    ×