Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Grid Computing July 2009

2,695 views

Published on

I presented this keynote talk at the WorldComp conference in Las Vegas, on July 13, 2009. In it, I summarize what grid is about (focusing in particular on the "integration" function, rather than the "outsourcing" function--what people call "cloud" today), using biomedical examples in particular.

Published in: Technology, Education
  • Be the first to comment

Grid Computing July 2009

  1. 1. Grid computing Ian Foster Computation Institute Argonne National Lab & University of Chicago
  2. 2. “ When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001)
  3. 3. <ul><li>“ I’ve been doing cloud computing since before it was called grid.” </li></ul>
  4. 4. “ Computation may someday be organized as a public utility … The computing utility could become the basis for a new and important industry.” John McCarthy (1961)
  5. 5. Scientific collaboration Scientific collaboration
  6. 6. Addressing urban health needs
  7. 7. Important characteristics <ul><li>We must integrate systems that may not have worked together before </li></ul><ul><li>These are human systems, with differing goals, incentives, capabilities </li></ul><ul><li>All components are dynamic—change is the norm, not the exception </li></ul><ul><li>Processes evolve rapidly also </li></ul>We are not building something simple like a bridge or an airline reservation system
  8. 8. We are dealing with complex adaptive systems <ul><li>A complex adaptive system is a collection of individual agents that have the freedom to act in ways that are not always predictable and whose actions are interconnected such that one agent’s actions changes the context for other agents. </li></ul><ul><li>Crossing the Quality Chasm, IOM, 2001; pp 312-13 </li></ul><ul><li>Non-linear and dynamic </li></ul><ul><li>Agents are independent and intelligent </li></ul><ul><li>Goals and behaviors often in conflict </li></ul><ul><li>Self-organization through adaptation and learning </li></ul><ul><li>No single point(s) of control </li></ul><ul><li>Hierarchical decomp-osition has limited value </li></ul>
  9. 9. We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos Zone of complexity
  10. 10. We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
  11. 11. “ The Anatomy of the Grid,” 2001 <ul><li>The … problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations . The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource -brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO). </li></ul>
  12. 12. Examples (from AotG, 2001) <ul><li>“ The application service providers, storage service providers, cycle providers, and consultants engaged by a car manufacturer to perform scenario evaluation during planning for a new factory” </li></ul><ul><li>“ Members of an industrial consortium bidding on a new aircraft” </li></ul><ul><li>“ A crisis management team and the databases and simulation systems that they use to plan a response to an emergency situation” </li></ul><ul><li>“ Members of a large, international, multiyear high-energy physics collaboration” </li></ul>
  13. 13. From the organizational behavior and management community <ul><li>“ [A] group of people who interact through interdependent tasks guided by common purpose [that] works across space, time, and organizational boundaries with links strengthened by webs of communication technologies ” </li></ul><ul><li>— Lipnack & Stamps, 1997 </li></ul><ul><li>Yes—but adding cyber-infrastructure: </li></ul><ul><ul><li>People  computational agents & services </li></ul></ul><ul><ul><li>Communication technologies  IT infrastructure </li></ul></ul>Collaboration based on rich data & computing capabilities
  14. 14. NSF Workshops on Building Effective Virtual Organizations <ul><li>[Search “BEVO 2008”] </li></ul>
  15. 15. The Grid paradigm <ul><li>Principles and mechanisms for dynamic VOs </li></ul><ul><li>Leverage service oriented architecture (SOA) </li></ul><ul><li>Loose coupling of data and services </li></ul><ul><li>Open software, architecture </li></ul>1995 2000 2005 2010 Computer science Physics Astronomy Engineering Biology Biomedicine Healthcare
  16. 16. We call these groupings virtual organizations (VOs) <ul><li>Healthcare = dynamic, overlapping VOs, linking </li></ul><ul><ul><li>Patient – primary care </li></ul></ul><ul><ul><li>Sub-specialist – hospital </li></ul></ul><ul><ul><li>Pharmacy – laboratory </li></ul></ul><ul><ul><li>Insurer – … </li></ul></ul>A set of individuals and/or institutions engaged in the controlled sharing of resources in pursuit of a common goal But U.S. health system is marked by fragmented and inefficient VOs with insufficient mechanisms for controlled sharing <ul><ul><li>I advocate … a model of virtual integration rather than true vertical integration … G. Halvorson, CEO Kaiser </li></ul></ul>
  17. 17. The Grid paradigm and information integration Data sources Platform services Radiology Medical records Name resources; move data around Make resources usable and useful Make resources accessible over the network Pathology Genomics Labs Manage who can do what RHIO
  18. 18. The Grid paradigm and information integration Data sources Platform services Transform data into knowledge Radiology Medical records Management Integration Publication Enhance user cognitive processes Incorporate into business processes Pathology Genomics Labs Security and policy RHIO
  19. 19. The Grid paradigm and information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
  20. 20. We partition the multi-faceted interoperability problem <ul><li>Process interoperability </li></ul><ul><ul><li>Integrate work across healthcare enterprise </li></ul></ul><ul><li>Data interoperability </li></ul><ul><ul><li>Syntactic: move structured data among system elements </li></ul></ul><ul><ul><li>Semantic: use information across system elements </li></ul></ul><ul><li>Systems interoperability </li></ul><ul><ul><li>Communicate securely, reliably among system elements </li></ul></ul>Analysis Management Integration Publication Applications
  21. 21. Security and policy : Managing who can do what <ul><li>Familiar division of labor </li></ul><ul><li>Publication level: bridge between local and global </li></ul><ul><li>Integration level: VO-specific policies, based on attributes </li></ul><ul><li> Attribute authorities </li></ul>
  22. 22. Identity-based authZ Most simple - not scalable Unix Access Control Lists (Discretionary Access Control: DAC) Groups, directories, simple admin POSIX ACLs/MS-ACLs Finer-grained admin policy Role-based Access Control (RBAC) Separation of role/group from rule admin Mandatory Access Control (MAC) Clearance, classification, compartmentalization Attribute-based Access Control (ABAC) Generalization of attributes >>> Policy language abstraction level and expressiveness >>>
  23. 23. Globus / caGrid GAARDS
  24. 24. Publication : Make information accessible <ul><li>Make data available in a remotely accessible, reusable manner </li></ul><ul><li>Leave mediation for integration layer </li></ul><ul><li>Gateway from local policy/protocol into wide area mechanisms (transport, security, …) </li></ul>
  25. 25. TeraGrid participants
  26. 26. Federating computers for physics data analysis
  27. 28. Earth System Grid Main ESG Portal CMIP3 (IPCC AR4) ESG Portal <ul><li>198 TB of data at four locations </li></ul><ul><li>1,150 datasets </li></ul><ul><li>1,032,000 files </li></ul><ul><li>Includes the past 6 years of joint DOE/NSF climate modeling experiments </li></ul><ul><li>35 TB of data at one location </li></ul><ul><li>74,700 files </li></ul><ul><li>Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change </li></ul><ul><li>Data from 13 countries, representing 25 models </li></ul>8,000 registered users 1,900 registered projects <ul><li>Downloads to date </li></ul><ul><li>49 TB </li></ul><ul><li>176,000 files </li></ul><ul><li>Downloads to date </li></ul><ul><li>387 TB </li></ul><ul><li>1,300,000 files </li></ul><ul><li>500 GB/day (average) </li></ul>400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data ESG usage: over 500 sites worldwide ESG monthly download volumes Globus
  28. 29. Children’s Oncology Group Enterprise/Grid Interface service DICOM protocols Grid protocols (Web services) DICOM XDS HL7 Vendor-specific Wide area service actor Plug-in adapters
  29. 30. Automating service creation, deployment <ul><li>Introduce </li></ul><ul><ul><li>Define service </li></ul></ul><ul><ul><li>Create skeleton </li></ul></ul><ul><ul><li>Discover types </li></ul></ul><ul><ul><li>Add operations </li></ul></ul><ul><ul><li>Configure security </li></ul></ul><ul><li>Grid Remote Application Virtualization Infrastructure </li></ul><ul><ul><li>Wrap executables </li></ul></ul>Index service Repository Service Introduce Container caGrid, Introduce, gRAVI: Ohio State, U.Chicago Appln Service Create Store Advertize Discover Invoke; get results Transfer GAR Deploy
  30. 31. As of Oct 19, 2008: 122 participants 105 services 70 data 35 analytical
  31. 32. Management : Naming and moving information <ul><li>Persistent, uniform global naming of objects, independent of type </li></ul><ul><li>Orchestration of data movement among services </li></ul>D S1 S2 S3 D S1 S2 S3 D S1 S2 S3
  32. 33. LIGO Data Grid Birmingham • Replicating >1 Terabyte/day to 8 sites 770 TB replicated to date: >120 million replicas MTBF = 1 month LIGO Gravitational Wave Observatory Ann Chervenak et al., ISI; Scott Koranda et al, LIGO <ul><li>Cardiff </li></ul>AEI/Golm Globus
  33. 34. <ul><li>Pull “missing” files to a storage system </li></ul>Data replication service List of required Files GridFTP Local Replica Catalog Replica Location Index Data Replication Service Reliable File Transfer Service Local Replica Catalog GridFTP “ Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005 Replica Location Index Data movement Data location Data replication
  34. 35. Naming objects: A prerequisite to management <ul><li>The naming problem: </li></ul><ul><li>“ Health objects” = patient information, images, records, etc. </li></ul><ul><li>“ Names” refer to health objects in records, files, databases, papers, reports, research, emails, etc. </li></ul><ul><li>Challenges: </li></ul><ul><li>No systematic way of naming health objects </li></ul><ul><li>Many health objects, like DICOM images and reports, include references to other objects through non-unique, ambiguous, PHI-tainted identifiers </li></ul>A framework for distributed digital object services: Kahn, Wilensky, 1995
  35. 36. Health Object Identifier (HOI) naming system uri:hdl :// 888 .us.npi. 1234567890 .dicom/ 8A648C33 -A5…4939EBE Random String for Identifier-Body PHI-free and guaranteed unique 888: CHI’s top-level naming authority National Provider Id used in hierarchical Identifier Namespace Application Context’s Namespace governed by provider Naming Authority HOI’s URI schema identifier—based on Handle
  36. 37. Data movement in clinical trials
  37. 38. Community public health: Digital retinopathy screening network
  38. 39. Integration : Making information useful ? 0% 100% Degree of prior syntactic and semantic agreement Degree of communication 0% 100% Rigid standards-based approach Loosely coupled approach Adaptive approach
  39. 40. Integration via mediation <ul><li>Map between models </li></ul><ul><li>Scoped to domain use </li></ul><ul><ul><li>Multiple concurrent use </li></ul></ul><ul><li>Bottom up mediation </li></ul><ul><ul><li>Between standards and versions </li></ul></ul><ul><ul><li>Between local versions </li></ul></ul><ul><ul><li>In absence of agreement </li></ul></ul>Query Reformulation Query Optimization Query Execution Engine Wrapper Query in the source schema Wrapper Query in union of exported source schema Distributed query execution Global Data Model (Levy 2000)
  40. 41. ECOG 5202 integrated sample management ECOG CC ECOG PCO MD Anderson Web portal OGSA-DQP OGSA-DAI OGSA-DAI OGSA-DAI Mediator
  41. 42. Analytics : Transform data into knowledge <ul><li>“ The overwhelming success of genetic and genomic research efforts has created an enormous backlog of data with the potential to improve the quality of patient care and cost effectiveness of treatment.” </li></ul><ul><ul><li>— US Presidential Council of Advisors on Science and Technology, Personalized Medicine Themes, 2008 </li></ul></ul>
  42. 43. Microarray clustering using Taverna <ul><li>Query and retrieve microarray data from a caArray data service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub </li></ul><ul><li>Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService </li></ul><ul><li>Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage </li></ul>Workflow in/output caGrid services “ Shim” services others Wei Tan
  43. 44. Many many tasks: Identifying potential drug targets 2M+ ligands Protein x target(s) (Mike Kubal, Benoit Roux, and others)
  44. 45. start report DOCK6 Receptor (1 per protein: defines pocket to bind to) ZINC 3-D structures ligands complexes NAB script parameters (defines flexible residues, #MDsteps) Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript end BuildNABScript NAB Script NAB Script Template Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB) PDB protein descriptions For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years) 6 GB 2M structures (6 GB) DOCK6 FRED ~4M x 60s x 1 cpu ~60K cpu-hrs Amber ~10K x 20m x 1 cpu ~3K cpu-hrs Select best ~500 ~500 x 10hr x 100 cpu ~500K cpu-hrs GCMC Select best ~5K Select best ~5K
  45. 46. DOCK on BG/P: ~1M tasks on 118,000 CPUs <ul><li>CPU cores: 118784 </li></ul><ul><li>Tasks: 934803 </li></ul><ul><li>Elapsed time: 7257 sec </li></ul><ul><li>Compute time: 21.43 CPU years </li></ul><ul><li>Average task time: 667 sec </li></ul><ul><li>Relative Efficiency: 99.7% (from 16 to 32 racks) </li></ul><ul><li>Utilization: </li></ul><ul><ul><li>Sustained: 99.6% </li></ul></ul><ul><ul><li>Overall: 78.3% </li></ul></ul>Time (secs)
  46. 47. Scaling Posix to petascale … . . . Large dataset CN-striped intermediate file system  Torus and tree interconnects  Global file system Chirp (multicast) MosaStore (striping) Staging Inter- mediate Local LFS Compute node (local datasets) LFS Compute node (local datasets)
  47. 48. Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors
  48. 49. “ Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node Ioan Raicu
  49. 50. Same scenario, but with dynamic resource provisioning
  50. 51. Data diffusion sine-wave workload: Summary <ul><li>GPFS  5.70 hrs, ~8Gb/s, 1138 CPU hrs </li></ul><ul><li>DD+SRP  1.80 hrs, ~25Gb/s, 361 CPU hrs </li></ul><ul><li>DD+DRP  1.86 hrs, ~24Gb/s, 253 CPU hrs </li></ul>
  51. 52. Recap <ul><li>Increased recognition that information systems and data understanding are limiting factor </li></ul><ul><ul><li>… much of the promise associated with health IT requires high levels of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) … . RAND COMPARE </li></ul></ul><ul><li>Health system is complex, adaptive system </li></ul><ul><ul><li>There is no single point(s) of control. System behaviors are often unpredictable and uncontrollable, and no one is “in charge.” W Rouse, NAE Bridge </li></ul></ul><ul><li>With diverse and evolving requirements and user communities </li></ul><ul><ul><li>… I advocate … a model of virtual integration rather than true vertical integration…. G. Halvorson, CEO Kaiser </li></ul></ul>
  52. 53. Functioning in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
  53. 54. The Grid paradigm and information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
  54. 55. “ The computer revolution hasn’t happened yet.” Alan Kay, 1997
  55. 56. Time Connectivity (on log scale) Science Enterprise Consumer “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid Cloud ????
  56. 57. Thank you! Computation Institute www.ci.uchicago.edu

×