Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars


Published on

Presentation at NSF Workshop on Big Data and Extreme Scale Computing, May 2013, Charleston SC

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars

  1. 1. Joel Saltz MD, PhDEmory UniversityFebruary 2013Exascale Challenges: Space,Time, Experimental Science andSelf Driving Cars    
  2. 2. CenterforComprehensiveInformaticsIntegrate Information from Sensors, Images, Cameras•  Multi-dimensional spatial-temporal datasets–  Radiology and Microscopy Image Analyses–  Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution Remediation–  Biomass monitoring and disaster surveillance using multipletypes of satellite imagery–  Weather prediction using satellite and ground sensor data–  Analysis of Results from Large Scale Simulations–  Square Kilometer Array–  Google Self Driving Car•  Correlative and cooperative analysis of data from multiplesensor modalities and sources•  Equivalent from standpoint of data access patterns –we propose a integrative sensor data mini-App
  3. 3. CenterforComprehensiveInformatics
  4. 4. CenterforComprehensiveInformaticsDeep Learning from MedicalImaging
  5. 5. Emory In Silico Center for Brain TumorResearch (PI = Dan Brat, PD= Joel Saltz)
  6. 6. CenterforComprehensiveInformaticsIntegrative Cancer Research with Digital Pathologyhistology neuroimagingclincalpathologyIntegratedAnalysismolecularHigh-resolution whole-slide microscopyMultiplex IHC
  7. 7. CenterforComprehensiveInformaticsMorphological Tissue ClassificationNuclei SegmentationCellular FeaturesLee Cooper,Jun KongWhole Slide Imaging
  8. 8. CenterforComprehensiveInformaticsDirect Study of Relationship BetweenvsLee Cooper,Carlos Moreno
  9. 9. CenterforComprehensiveInformaticsHPC Segmentation and Feature Extraction PipelineTony Pan and George Teodoro
  10. 10. CenterforComprehensiveInformatics
  11. 11. CenterforComprehensiveInformaticsLaser Captured Microdissection -- Spatio-temporal“omic” studies
  12. 12. Macroscopic 3-D Tissue at Micron Resolution: OSUBISTI NBIB Center Big Data (2005)Associate genotype withphenotypeBig science experiments oncancer, heart disease,pathogen host responseTissue specimen -- 1 cm30.3 µ resolution – roughly 1013bytesMolecular data (spatial location)can add additional significantfactor; e.g. 102Multispectral imaging, lasercaptured microdissection,Imaging Mass Spec, MultiplexQDMultiple tissue specimens; anotherfactor of 103Total: 1018 bytes – exabyte per bigscience experiment
  13. 13. CenterforComprehensiveInformatics
  14. 14. CenterforComprehensiveInformaticsComplex Structure/Function Interactions – Very,Very Active MaterialsImage from: Gritsenko et al,J Pathol 2012; 226: 185–199
  15. 15. CenterforComprehensiveInformaticsCore Transformations•  Data Cleaning and Low Level Transformations•  Data Subsetting, Filtering, Subsampling•  Spatio-temporal Mapping and Registration•  Object Segmentation•  Feature Extraction•  Object/Region/Feature Classification•  Spatio-temporal Aggregation•  Change Detection, Comparison, and Quantification
  16. 16. A Data Intense Challenge:The Instrumented Oil Field of the Future
  17. 17. The Tyranny of Scale(Tinsley Oden - U Texas)process scalefield scalekmcmsimulation scaleµmpore scale
  18. 18. Why Applications Get Big•  Physical world or simulation results•  Detailed description of two, three (or more)dimensional space•  High resolution in each dimension, lots oftimesteps•  e.g. oil reservoir code -- simulate 100 km by100 km region to 1 km depth at resolution of100 cm:– 10^6*10^6*10^4 mesh points, 10^2 bytes permesh point, 10^6 timesteps --- 10^24 bytes(Yottabyte) of data!!!
  19. 19. Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changesAssimilate data & reservoir properties intothe evolving reservoir modelUse simulation and optimization to guide future productionOil Field Management – Joint ITR with Mary Wheeler,Paul Stoffa
  20. 20. Coupled Ground Water and Surface Water SimulationsMultiple codes -- e.g. fluid code, contaminanttransport codeDifferent space and time scalesData from a given fluid code run is used in differentcontaminant transport code scenarios
  21. 21. Bioremediation SimulationMicrobe colonies(magenta)Dissolved NAPL (blue)Mineral oxidationproducts (green)abiotic reactionscompete withmicrobes,reduce extent ofbiodegradation
  22. 22. CenterforComprehensiveInformaticsCore Transformations (Again)•  Data Cleaning and Low Level Transformations•  Data Subsetting, Filtering, Subsampling•  Spatio-temporal Mapping and Registration•  Object Segmentation•  Feature Extraction•  Object/Region/Feature Classification•  Spatio-temporal Aggregation•  Change Detection, Comparison, and Quantification
  23. 23. CenterforComprehensiveInformaticsSensor Integration and Analysis -- “Mini-app”and Co-Design Vehicle•  Tremendous commonality between applications thatcompose and analyze information from multiple sensors/cameras/scientific simulations•  Recommendation – define and publicize “abstractapplication class” that captures essential aspects•  Spatio-temporal data is often accompanied by chemicalspecies, ‘omics” or appropriate domain specific chemicalinformation•  Computationally similar applications are currentlyindependently developed by many research communities•  Biology/Medicine is not a special case!!!•  Mini-app is quite doable and would be tremendouslyuseful if accompanied by domain area buy-in
  24. 24. CenterforComprehensiveInformaticsRuntime Support•  Hierarchical dataflow/task management•  Programming model and runtime support need towork together: specify/extract tasks, dependencies•  Concurrent Collections (CnC), ParalleX ExecutionModel, Region Templates
  25. 25. CenterforComprehensiveInformaticsRuntime Support Objectives•  Coordinated mapping of data and computation tocomplex memory hierarchies•  Hierarchical work assignment with flexibility capableof dealing with data dependent computationalpatterns, fluctuations in computational speedassociated with power management, faults•  Linked to comprehensible programming model –model targeted at abstract application class but notto application domain (In the sensor, image,camera case -- Region Templates)•  Software stack including coordinated compiler/runtime support/autotuning frameworks
  26. 26. CenterforComprehensiveInformaticsCoarse Grain Workflows: Interoperability•  Pipelines are complex and written in multiple languagesand designed to run on multiple environments•  Key components needed to tackle problems may bepurpose built to run on particular environments, may beproprietary•  Patient privacy and HIPPA issues can constrain portionsof computations to institutional or highly secure, HIPPAcertified environments•  Last mile bandwidth issues, performance/storageavailability where data needs to be staged. Largenumber of tactical pitfalls which erode researcherproductivity•  Data generation is cheap and often local, still expensiveto move multiple TB/PB data to supercomputer centers
  27. 27. CenterforComprehensiveInformaticsExascale Hardware/Software Architecture•  Need to stage very large datasets for relativelyshort periods of time -- large aggregate bandwidthto non volatile scratch storage -- distributed flashand disk•  Globally addressed/indexed persistent datacollections -- e.g. DataSpaces, Region Templates(GIS analogy), persistent PGAS•  Intelligent I/O with in-transit processing, datareduction (e.g. ADIOS)•  Visualizations need to be carried out interactivelyand in situ as data is produced and as computationsproceed – efficient streaming data
  28. 28. CenterforComprehensiveInformaticsData Representation and QueryØ  Complex data models capturing multi-facetedinformation including markups, annotations,algorithm provenance etc.Ø  Much data modeling can be re-used in sensor imagecamera application classØ  Efficient implementations of data modelsØ  ADIOS, in transit processing – data, formattransformations, reductions, summarizations closeto dataØ  Key-value stores capable of supporting efficientapplication data access in deep, complex storagehierarchies