Grid07 7 Gagliardi

350 views
301 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
350
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Grid07 7 Gagliardi

  1. 1. 6/28/2007 11:46 PM Outline • Introductory remarks • Reviewing emergence of e_Science Opportunities and Challenges in • the intensive computing side e_Science • the massive data side • The opportunity of e_Science • The challenges of e_Science • A Microsoft contribution Fabrizio Gagliardi & Christophe Van • Conclusions Mollekot Microsoft Corporation 2 Introductory remarks Introductory remarks 2 • Who am I? • A computer scientist who has spent 30 years at CERN • Joined Microsoft on 1/November/2005 (and in other scientific laboratories) developing HPC • My mission: Promoting Microsoft Computing into systems for physics and other sciences Science and Science into Microsoft Computing • Started in real-time, data acquisition and networking • by exploring and building important collaborations with • Pioneered ES, AI, MPP systems, cluster computing and science in Europe, Middle East, Africa and Latin in the last 7 years, Grid computing America • Initiator of EU-DataGrid, EGEE and more than 10 other • Director in the Technical Computing team led by Tony HPC and Grid projects (mostly within the EU IST Hey (Corporate VP) programmes) • Co-founder of the Global Grid Forum (started in Amsterdam in 2001 together with EU-DataGrid) • See my last article on IEEE Spectrum Magazine (July 2006) 3 4 A New Science Paradigm Life Thousand years ago: Sciences Social Experimental Science Earth Sciences - description of natural phenomena Sciences Last few hundred years: 2 ⎛ . ⎞ Theoretical Science ⎜a⎟ 4π G ρ c2 ⎜ ⎟ = 3 − Κ a2 ⎜a⎟ - Newton’s Laws, Maxwell’s Equations … Newton’ Maxwell’ ⎝ ⎠ Last few decades: decades: Computational Science Accelerating - simulation of complex phenomena Discovery Today: e-Science or Data-centric Science Data- New Materials, - unify theory, experiment, and simulation Multidisciplinary Technologies - using massive computing and large data Research & Processes exploration and mining: • Data captured by instruments • Data generated by simulations Computer & • Data generated by sensor networks Information Math and Scientists mostly work on computers Sciences Physical Science (With thanks to Jim Gray) © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  2. 2. 6/28/2007 11:46 PM CERN LHC Technology evolution has helped… 40 million particle collisions every 1991 1998 2005 second reduced by online computers to a System Cray Y-MP C916 Sun HPC10000 Small Form Factor PCs few hundred “good” events per sec. Architecture 16 x Vector 24 x 333MHz Ultra- 4 x 2.2GHz Athlon64 4GB, Bus SPARCII, 24GB, SBus 4GB, GigE OS UNICOS Solaris 2.5.1 Windows Server 2003 SP1 Which are recorded on disk and magnetic tape GFlops ~10 ~10 ~10 at 100-1,000 MegaBytes/sec ~15 PetaBytes per year Top500 # 1 500 N/A for all four experiments $40,000,000 $1,000,000 (40x drop) < $4,000 (250x drop) Price Customers Government Labs Large Enterprises Every Engineer & Scientist Applications Classified, Climate, Manufacturing, Energy, Bioinformatics, Materials Physics Research Finance, Telecom Sciences, Digital Media 7 8 High Energy Physics (LCG) Top 500 Architectures / Systems Enabling Grids for E-sciencE LCG depends on two major science Grid 500 infrastructures (plus regional Grids) SIMD EGEE - Enabling Grids for E-Science 400 OSG - US Open Science Grid Single Proc. 300 SMP 200 Const. Scale (June 2006): ~ 200 sites in 40 countries 100 Cluster ~ 25 000 CPUs 0 MPP > 10 PB storage 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 > 35 000 jobs per day > 100 Virtual Organizations INFSO-RI-508833 10 Grids in Biomedical Sciences Future ITER Fusion reactor Enabling Grids for E-sciencE • A multiplication of projects around the world – Example: the National Bioinformatics Initiative in Holland • The example of EGEE – More than 20 applications in medical imaging, bioinformatics and drug discovery – Large scale deployment of in silico drug discovery initiatives •T01 (E119A) •T01 energy statistics In Silico Docking On Malaria on 5 •90000 binding energy Impact of mutations grid infrastructures is breaking the the world record for in silico on drug efficiency •80000 docking energy docking throughput against H5N1 •70000 Applications with distributed calculations: Monte Carlo, •compound numbers •60000 •55% •1f8c •1f8b, 1f8c Separate estimates, … •number •50000 •Do •Bi Multiple Ray Tracing: e. g. TRUBA •40000 •11.58% •30000 •2qwe •binding energy Stellarator Optimization: VMEC •20000 •docking energy Transport and Kinetic Theory: Monte Carlo Codes •10000 •0 •-23•-22•-21•-20•-19•-18•-17•-16•-15•-14•-13•-12•-11•-10 •-9 •-8 •-7 •-6 •-5 •-4 •-3 •-2 •-1 •0 12 EGEE-II INFSO-RI-031688 •kcal/mol •Kcal/mol 11 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  3. 3. 6/28/2007 11:46 PM The data deluge Data, Data, Data • e_Science is now dominated by huge amounts of data • Many discoveries are hidden in those data, but… • How to organize, mine and understand the data? • How to address the above issues in a scientist friendly environment, this is where commodity computing tools developed by Microsoft for business and industry could help… 14 © 13 Courtesy of Carole Goble The opportunity in e_Science Courtesy of Carole Goble • Replacing experimental activity (or part of it) with computing simulation and modelling based on large distributed computing infrastructures is what is now called e_Science • Allowing sharing of resources, not only computing, but also data and people’s knowledge is what motivated the emergency of grid computing and the establishment of international virtual organisations which replace local resident scientists • This is major paradigm shift which requires scientists to become expert in complex computing methods 15 © 16 The challenges (still) in e_Science The Problem for the e-Scientist Experiments & fa c The applied scientist is obliged to become Instruments ts also a computer scientist Other Archives facts questions Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their Literature Simulations facts fac ts ? answers primary mission This has shifted the conventional scientific Data ingest Data Query and Visualization tools Managing Petabytes computing paradigm and could limit Common schemas Support/training scientific discovery in the future and How to organize it? Performance produce major set backs How to reorganize it? Execute queries in a minute Batch (big) query scheduling How to coexist & cooperate with others? 17 18 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  4. 4. 6/28/2007 11:46 PM Can “Here and Now” technologies accelerate discovery? Can “Business” Tools and techniques for dealing with Computational Real-world Modeling Data Persistent Distributed Data Workflow, be used in scientific research to allow Data Mining & Algorithms researchers to be scientists and not computer scientists… Interpretation & Insight 19 20 Conclusion We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Computational Real-world Modeling Data Only in this way e_Science will be successful in accelerating discovery and Persistent Distributed producing new breakthroughs Data Microsoft is investigating solutions in Workflow, Data Mining collaborations with leading scientists & Algorithms around the world with its Technical Computing Initiative Interpretation & Insight 21 22 Four ‘Pillars’ of Technical Computing Pillars’ @ Microsoft Technical Computing @ Microsoft Commitment to Science Mission Statement: Global Collaboration ‘Promoting Computing into Science and Science into Computing’ Computing’ Technology Excellence Interoperability 23 24 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  5. 5. 6/28/2007 11:46 PM Technical Computing at Microsoft Fighting HIV with Computer Science Advanced Computing for Science and Engineering A major problem: Over 40 million infected Application of new algorithms, tools and Drug treatments are effective but are an expensive life commitment technologies to scientific and engineering problems Vaccine needed for third world countries Effective vaccine could eradicate disease High Productivity Computing Methods from computer science are helping Application of high performance clusters, with the design of vaccine information worker tools and database technologies to industrial and scientific Machine learning: Finding biological patterns that may stimulate the immune system to fight the HIV applications virus Radical Computing Optimization methods: Compressing these patterns into a small, effective vaccine Research in potential breakthrough 25 technologies 26 MICROSOFT SPONSORED RESEARCH AT THE CENTER Technical Computing and HPC FOR BIOINFORMATICS AND GENOME BIOLOGY AND THE FUNDACION CIENCIA PARA LA VIDA, CHILE Collaboration with MS HPC product groups complement and extend MS HPC institutes Some examples: HPC for Aerospace at Southampton Cancer research, financial and climate modeling at Oxford OeRC HPC for automotive industry at HLRS Stuttgart HPC support to computational system biology at MSRC joint centre with Courtesy of David Holmes 28 University of Trento in Italy Top Challenges Microsoft HPC Institutes “Make high-end computing • Setup is painful easier and more productive TACC – University of University of Virginia Charlottesville, Southampton University Nizhni Novgorod University Nizhni Novgorod, Texas • Takes a long time to get clusters to use. Emphasis should be Austin, TX USA VA USA Southampton, UK Russia up and running placed on time to solution, • Clusters are separate islands the major metric of value to University of • Lack of integration into IT high-end computing users… Utah Salt Lake City, UT Cornell Theory Center Ithaca, NY USA Tokyo Institute of Technology Tokyo, Japan infrastructure A common software USA • Job management environment for scientific computation encompassing HLRS – • Lack of integration into University of Tennessee University of Shanghai Jiao desktop to high-end systems Knoxville, TN Stuttgart Stuttgart, Tong University Shanghai, PRC end-user apps will enhance productivity USA Germany • Application availability gains by promoting ease of • Limited eco-system of use and manageability of application that can exploit systems.” parallel processing High-End Computing Revitalization Task Force, 2004 capabilities (Office of Science and Technology Policy, Executive Office of the President) 29 30 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  6. 6. 6/28/2007 11:46 PM Radical Computing: The End of Moore’s Law? Moore’ Future of silicon chips “100’s of cores on a chip in 2015” 100’ 2015” 10,000 Sun’s Surface Sun’ (Justin Rattner, Intel) Rattner, Power Density (W/cm2) Rocket Nozzle 1,000 Challenge for IT industry and Nuclear Reactor Computer Science community 100 Pentium® Can we make parallel computing on a chip 8086 Hot Plate 10 4004 8085 easier than message-passing? message- 8008 386 286 8080 486 Challenge for the Scientific Community 1 ‘70 ‘80 ‘90 ‘00 ‘10 How will the Multi-Core transition affect Multi- scientific computing? 31 32 Intel Developer Forum, Spring 2004 - Pat Gelsinger Radical Computig @ BSC Summary Microsoft wishes to work with the university research and business communities to: Major collaboration at the Barcelona • develop interoperable high-level services, work high- Super Computer Centre (Prof. Mateo flows, tools and data services (make computing Valero) on development of S/W easy) environment for support of Many- Many- • accelerate progress in a small number of multicore architectures in societally important scientific applications (make collaboration with Microsoft Research a difference) in Cambridge • explore radical new directions in computing and ways and applications to exploit on-chip on- parallelism www.microsoft.com/science 33 34 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

×