Grid And Healthcare For IOM July 2009
Upcoming SlideShare
Loading in...5
×
 

Grid And Healthcare For IOM July 2009

on

  • 1,495 views

Carl Kesselman and I (along with our colleagues Stephan Erberich, Jonathan Silverstein, and Steve Tuecke) participated in an interesting workshop at the Institute of Medicine on July 14, 2009. Along ...

Carl Kesselman and I (along with our colleagues Stephan Erberich, Jonathan Silverstein, and Steve Tuecke) participated in an interesting workshop at the Institute of Medicine on July 14, 2009. Along with Patrick Soon-Shiong, we presented our views on how grid technologies can help address the challenges inherent in healthcare data integration.

Statistics

Views

Total Views
1,495
Views on SlideShare
1,493
Embed Views
2

Actions

Likes
1
Downloads
33
Comments
0

1 Embed 2

https://www.linkedin.com 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We were asked to consider an H1N1 pandemic—certainly a challenging use case for healthcare integration As pandemic proceeds, we see an expanding set of individuals and institutions involved—CDC, HHS, local hospitals, clinics. Need for rapid access to information from many sources that have not previously interacted, dynamic integration of new capabilities—data mining, simulation, etc. Explosion in the number of sick people. Need for new tests. Etc. etc. -- Rapid integration of systems that haven’t worked together before (incompatible EHR implementations) -- One off new data model -- Rapidly changing set of participants -- Dynamic integration of new capabilities -- Unknown scale
  • A second, very different example – information integration in a poor urban setting. A more constrained set of participants, but otherwise not that different. Many different IT systems (or nonsystems) pose significant barriers to entry and make integration difficult. Thus many untapped opportunities: better patient care, healthcare effectiveness research, clinical trial recruitment, etc.
  • What these (and other examples that we will not have time to review) have in common …
  • We cite [Rouse, Health Care as a CAS: Implications for Design… , NAE 2008] for the righthand side aprt. Must support Dynamic composition for a specific purpose Evolving community, function, environment Messy data, failure, incomplete knowledge Nice, but insufficient Data standards Platform standards Federal policies
  • Another perspective on the problem. A few words of explanation. If we are deploying a hospital IT system, we are (hopefully) in the bottom left hand corner. “ You can’t achieve success via central planning.” Quoted in Crossing the Quality Chasm, p. 312 In our scenarios, we don’t have that ability to control.
  • What is the alternative? We can put in place mechanisms that facilitate groups with some common goal to form and function. Over time, things change, these groups evolve. If we are successful, they can expand, perhaps merge. Challenges: make this easy. Leverage scale effects.
  • These are issues that the grid community has been working on for many years. We call these groupings Virtual Organizations. In healthcare today, there are of course many such “VOs.” But they are hard to form, fragmented, …
  • Principles and mechanisms that has been under development for some years. First CS, then physical sciences, then biology, most recently biomedicine –
  • What are these grid mechanisms and concepts, then? Hard to say something sensible in a few minutes. But basically it is about separating out concerns in a way that reduces barriers to entry and permits flexible use.
  • Talk about API vs Protocol Add “ilities,” function benefits to stack.
  • [Create an image here.] For example DICOM and HL7 combine messaging and data model in the same interoperability standard. People are contextualizing this problem at the data interoperability level.  Systems interoperability often neglected.  An area of differentiation, bringing in best practice in industry and science into health care space. Open source platform.  Experience with systems interoperability standards: IETF, OASIS, W3C, 
  • Attribute authorities emerge as an important system component Bridge between local and global: honest broker is an example Note sure what “policy in the network” means.
  • List services from
  • DO SOMETHING INTERESTING ON THE RIGHT Scaling via automating data adapters Representations of those things and semantics of those representations. Talk about how services are published, data modeling, etc. Publish data bases Publish services Name published objects
  • 07/25/09 Test Built using the same mechanisms used to build SOI. -- PKI, delegation, attribute-based authorization -- Registries, monitoring Operating a service is a pain! Would be nice to outsource. But they need to be near the data, which also has privacy concerns. So things become complicated.
  • Objects are published, they need to be named, then they can be moved around without losing track of them Bulk data movement Fine grain access for data integration
  • Clinical, administrative, research. Issues often hidden and escalate Uniqueness No guaranteed global uniqueness Name ownership No ability to prove that a certain entity issued that name PHI-tainted names Filenames for some images have patientID embedded – sharing of name only may constitute HIPPA violation
  • Talk about handle….
  • TO PUT IN A SLIDE? Loose coupling and encapsulation Interoperability through integration based on data mediation Evolutionary in nature Set of scalable systems and methods Explicit in architecture – data integration layer Demonstrated in GSI, GridFTP, MDS, ECOG
  • Free text : common in electronic health records Tight encoding : Common in clinical trials and biomedical research Post-hoc : good but not sufficient to maintain context (e.g. Google Health fiasco) Constraining : ideal but burdensome (e.g. caBIG/caDSR deployment challenges) Warehouses : query is difficult
  • granularity varies according to purpose ICD-9: International Statistical Classification of Diseases CPT: Current Procedural Terminology Physicians prefer free text: maximum expressivity; but subsequent NLP/encoding loses context
  • This would be a good place for a graphic, perhaps showing top down vs. bottom up.
  • Show the types of data below? Do we really have to use CHI appliances? (That seems a substantial barrier to entry.)
  • DO SOMETHING INTERESTING ON THE RIGHT Scaling via automating data adapters Representations of those things and semantics of those representations. Talk about how services are published, data modeling, etc. Publish data bases Publish services Name published objects
  • 07/25/09 Test Workflows are becoming a widespread mechanism for coordinating the execution of scientific services and linking scientific resources. Analytical and data processing pipelines. Is this stuff real? EBI 3 million+ web service API submissions in 2007 A lot? We want to publish workflows as services. Think of caBIG services as service providers that then invoke grid services to execute services. (E.g., via TeraGrid gateways.)
  • "docking" is the identification of the low-energy binding modes of a small molecule (ligands) within the active site of a macromolecule (receptor) whose structure is known A compound that interacts strongly with (i.e. binds) a receptor associated with a disease may inhibit its function and thus act as a drug Typical Workload: Application Size: 7MB (static binary) Static input data: 35MB (binary and ASCII text) Dynamic input data:10KB (ASCII text) Output data: 10KB (ASCII text) Expected execution time: 5~5000 seconds Parameter space: 1 billion tasks
  • More precisely, step 3 is “GCMC + hydration.” Mike Kubal say: “This task is a Free Energy Perturbation computation using the Grand Canonical Monte Carlo algorithm for modeling the transition of the ligand (compound) between different potential states and the General Solvent Boundary Partition to explicitly model the water molecules in the volume around the ligand and pocket of the protein. The result is a binding energy just like the task at the top of the funnel; it is just a more rigorous attempt to model the actual interaction of protein and compound. To refer to the task in short hand, you can use "GCMC + hydration". This is a method that Benoit has pioneered.”
  • Application Efficiency was computed between the 16 rack and 32 rack runs. Sustained Utilization is the utilization achieved during the part of the experiment while there was enough work to do, 0 to 5300 sec. Overall utilization is the number of CPU hours used divided by total number of CPU hours allocated. The experiment included the caching of the 36 MB (52MB uncompressed) archive on each of the 1 st access per node We use “dd” to move data to and from GPFS…. The application itself had some bad I/O patterns in the write, which prevented it from scaling well, so we decided to write to RAM, and then dd back to GPFS. For this particular run, we had 464 Falkon services running on 464 I/O nodes, 118K workers (256 per Falkon service), and 1 client on a login node. The 32 rack job took 15 minutes to start. It took the client 6 minutes to establish a connection and setup the corresponding state with all 464 Falkon services. It took the client 40 seconds to dispatch 118K tasks to 118K CPUs. The rest can be seen from the graph and slide text…
  • We could show these things as moving if we wanted to be really clever  Over time, things change, these groups evolve. If we are successful, they merge
  • Talk about API vs Protocol Add “ilities,” function benefits to stack.

Grid And Healthcare For IOM July 2009 Grid And Healthcare For IOM July 2009 Presentation Transcript

  • Grid computing and health information sharing — A platform proposal — Ian Foster Director, Computation Institute Chan Soon-Shiong Scholar U. Chicago & Argonne Natl Lab National Coalition For Heath Integration Carl Kesselman Co-Director Center for Health Informatics University of Southern California
  • Responding to a pandemic
  • Addressing urban health needs
  • Important characteristics
    • We must integrate systems that may not have worked together before
    • These are human systems, with differing goals, incentives, capabilities
    • All components are dynamic—change is the norm, not the exception
    • Processes are evolving rapidly too
    We are not building something simple like a bridge or an airline reservation system
  • Healthcare is a complex adaptive system
    • A complex adaptive system is a collection of individual agents that have the freedom to act in ways that are not always predictable and whose actions are interconnected such that one agent’s actions changes the context for other agents.
    • Crossing the Quality Chasm, IOM, 2001; pp 312-13
    • Non-linear and dynamic
    • Agents are independent and intelligent
    • Goals and behaviors often in conflict
    • Self-organization through adaptation and learning
    • No single point(s) of control
    • Hierarchical decomp-osition has limited value
  • We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos Zone of complexity
  • We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
  • We call these groupings virtual organizations (VOs)
    • Healthcare = dynamic, overlapping VOs, linking
      • Patient – primary care
      • Sub-specialist – hospital
      • Pharmacy – laboratory
      • Insurer – …
    A set of individuals and/or institutions engaged in the controlled sharing of resources in pursuit of a common goal But U.S. health system is marked by fragmented and inefficient VOs with insufficient mechanisms for controlled sharing
      • I advocate … a model of virtual integration rather than true vertical integration … G. Halvorson, CEO Kaiser
  • The Grid paradigm
    • Principles and mechanisms for dynamic VOs
    • Leverage service oriented architecture (SOA)
    • Loose coupling of data and services
    • Open software, architecture
    1995 2000 2005 2010 Computer science Physics Astronomy Engineering Biology Biomedicine Healthcare
  • The Grid paradigm and healthcare information integration Data sources Platform services Radiology Medical records Name data and move it around Make data usable and useful Make data accessible over the network Pathology Genomics Labs Manage who can do what RHIO
  • The Grid paradigm and healthcare information integration Data sources Platform services Transform data into knowledge Radiology Medical records Management Integration Publication Enhance user cognitive processes Incorporate into business processes Pathology Genomics Labs Security and policy RHIO
  • The Grid paradigm and healthcare information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
  • We partition the multi-faceted interoperability problem
    • Process interoperability
      • Integrate work across healthcare enterprise
    • Data interoperability
      • Syntactic: move structured data among system elements
      • Semantic: use information across system elements
    • Systems interoperability
      • Communicate securely, reliably among system elements
    Analysis Management Integration Publication Applications
  • Security and policy : Managing who can do what
    • Familiar division of labor
    • Publication level: bridge between local and global
    • Integration level: VO-specific policies, based on attributes
    •  Attribute authorities
  • Identity-based authZ Most simple - not scalable Unix Access Control Lists (Discretionary Access Control: DAC) Groups, directories, simple admin POSIX ACLs/MS-ACLs Finer-grained admin policy Role-based Access Control (RBAC) Separation of role/group from rule admin Mandatory Access Control (MAC) Clearance, classification, compartmentalization Attribute-based Access Control (ABAC) Generalization of attributes >>> Policy language abstraction level and expressiveness >>>
  • Globus / caGrid GAARDS
  • Publication : Make information accessible
    • Make data available in a remotely accessible, reusable manner
    • Leave mediation for integration layer
    • Gateway from local policy/protocol into wide area mechanisms (transport, security, …)
  • Imaging clinical trials use case Childrens Oncology Group VO Neuroblastoma Cancer Foundation VO
  • Automating service creation, deployment
    • Introduce
      • Define service
      • Create skeleton
      • Discover types
      • Add operations
      • Configure security
    • Grid Remote Application Virtualization Infrastructure
      • Wrap executables
    Index service Repository Service Introduce Container caGrid, Introduce, gRAVI: Ohio State, U.Chicago Appln Service Create Store Advertize Discover Invoke; get results Transfer GAR Deploy
  • As of Oct 19, 2008: 122 participants 105 services 70 data 35 analytical
  • Management : Naming and moving data
    • Persistent, uniform global naming of objects, independent of type
    • Orchestration of data movement among services
    D S1 S2 S3 D S1 S2 S3 D S1 S2 S3
  • Naming health objects: A prerequisite to management
    • The naming problem:
    • “ Health objects” = patient information, images, records, etc.
    • “ Names” refer to health objects in records, files, databases, papers, reports, research, emails, etc.
    • Challenges:
    • No systematic way of naming health objects
    • Many health objects, like DICOM images and reports, include references to other objects through non-unique, ambiguous, PHI-tainted identifiers
    A framework for distributed digital object services: Kahn, Wilensky, 1995
  • Health Object Identifier (HOI) naming system uri:hdl :// 888 .us.npi. 1234567890 .dicom/ 8A648C33 -A5…4939EBE Random String for Identifier-Body PHI-free and guaranteed unique 888: CHI’s top-level naming authority National Provider Id used in hierarchical Identifier Namespace Application Context’s Namespace governed by provider Naming Authority HOI’s URI schema identifier—based on Handle
  • Data movement in clinical trials
  • Community public health: Digital retinopathy screening network
  • Integration : Making data usable and useful ? 0% 100% Degree of prior syntactic and semantic agreement Degree of communication 0% 100% Rigid standards-based approach Loosely coupled approach Adaptive approach
  • Integration: Generally used approaches
    • Allow free text and lose interoperability
    • Tightly encode data elements specific to purpose but lose expressivity/re-use and interoperability
    • Post-hoc tying data elements to biomedical vocabularies
    • Constraining choices to concepts in biomedical vocabularies
    • Assemble raw data into warehouses
  • Semantic expressivity is generally problematic in biomedical data
    • Biomedical concepts are context dependent
      • For billing data, ICD and CPT works
      • For quality/effectiveness/research more detail is required
    • Encode data for semantic interoperability and re-use— or collect specific to context?
      • Physicians prefer free text
      • Biomedical researchers collect data in highly specific contexts -> tying data to standard vocabularies alone is insufficient and burdensome
  • Integration via mediation
    • Map between models
    • Scoped to domain use
      • Multiple concurrent use
    • Bottom up mediation
      • between standards and versions
      • between local versions
      • in absence of agreement
    Query Reformulation Query Optimization Query Execution Engine Wrapper Query in the source schema Wrapper Query in union of exported source schema Distributed query execution Global Data Model (Levy 2000)
  • ECOG 5202 integrated sample management No coordinated data systems MD Anderson ECOG PCO ECOG CC Web portal CHI appliance CHI appliance CHI appliance CHI appliance OGSA-DQP OGSA-DAI OGSA-DAI OGSA-DAI Mediator
  • Analytics : Transform data into knowledge
    • “ The overwhelming success of genetic and genomic research efforts has created an enormous backlog of data with the potential to improve the quality of patient care and cost effectiveness of treatment.”
      • — US Presidential Council of Advisors on Science and Technology, Personalized Medicine Themes, 2008
  • Microarray clustering using Taverna
    • Query and retrieve microarray data from a caArray data service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
    • Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
    • Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
    Workflow in/output caGrid services “ Shim” services others
  • Many many tasks: Identifying potential drug targets 2M+ ligands Protein x target(s) (Mike Kubal, Benoit Roux, and others)
  • start report DOCK6 Receptor (1 per protein: defines pocket to bind to) ZINC 3-D structures ligands complexes NAB script parameters (defines flexible residues, #MDsteps) Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript end BuildNABScript NAB Script NAB Script Template Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB) PDB protein descriptions For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years) 6 GB 2M structures (6 GB) DOCK6 FRED ~4M x 60s x 1 cpu ~60K cpu-hrs Amber ~10K x 20m x 1 cpu ~3K cpu-hrs Select best ~500 ~500 x 10hr x 100 cpu ~500K cpu-hrs GCMC Select best ~5K Select best ~5K
  • DOCK on BG/P: ~1M tasks on 118,000 CPUs
    • CPU cores: 118784
    • Tasks: 934803
    • Elapsed time: 7257 sec
    • Compute time: 21.43 CPU years
    • Average task time: 667 sec
    • Relative Efficiency: 99.7% (from 16 to 32 racks)
    • Utilization:
      • Sustained: 99.6%
      • Overall: 78.3%
    Time (secs)
  • Recap
    • Increased recognition that information systems and data understanding are limiting factor
      • … much of the promise associated with health IT requires high levels of adoption … and high levels of use of interoperable systems (in which information can be exchanged across unrelated systems) … . RAND COMPARE
    • Health system is complex, adaptive system
      • There is no single point(s) of control. System behaviors are often unpredictable and uncontrollable, and no one is “in charge.” W Rouse, NAE Bridge
    • With diverse and evolving requirements and user communitities
      • … I advocate … a model of virtual integration rather than true vertical integration…. G. Halvorson, CEO Kaiser
  • Functioning in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
  • The Grid paradigm and healthcare information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO