Exascale Computing and
Experimental Sensor Data
Overview given at Brookhaven National Laboratory
April 18 2014
Joel Saltz
...
Integrate Information from
Sensors, Images, Cameras
• Multi-dimensional spatial-temporal datasets
– Radiology and Microsco...
Spatio-temporal Sensor Integration,
Analysis, Classification
• Multi-scale material/tissue structural, molecular, function...
Typical Computational/Analysis Tasks
Spatio-temporal Sensor Integration, Analysis, Classification
• Data Cleaning and Low ...
Detect and track changes in data during production
Invert data for reservoir properties
Detect and track reservoir changes...
Future State
• 100K – 1M pathology slides/hospital/year
• 2GB compressed per slide
• 1-10 slides used for Pathologist comp...
Center
Brain Tumor Pipeline Scaling on GT/ORNL NSF
Keeneland (100 Nodes)
Center
Runtime Support Objectives
• Coordinated mapping of data and computation to
complex memory hierarchies
• Hierarchic...
HPC Segmentation and Feature Extraction
Pipeline
Tony Pan, George Teodoro,
Tahsin Kurc and Scott Klasky
Region Templates
• Provides a generic container template for common data structures, such
as points, arrays, regions, and ...
Region Template: Preliminary
Experimental Evaluation
• Experimentally evaluated using pathology image analysis on the
Keen...
Center
Large Scale Data Management
 Represented by a complex data model capturing
multi-faceted information including mar...
Center
Spatial Centric – Sensor Data Feature “GIS”
Point query: human marked point
inside a nucleus
.
Window query: return...
Center
Algorithm Validation: Intersection between Two
Result Sets (Spatial Join)
PAIS: Example Queries
. .
AIS (Analytical Imaging Standards)
 AIS Logical Model
 62 UML classes
 markups, annotations,
imageReferences,
provenanc...
Center
VLDB 2012, 2013
Spatial Query, Change Detection, Comparison, and
Quantification
Soft real time and streaming Sensor
Data Analysis, Event Detection,
Decision Support
• Integrated analyses of patient data...
Typical Computational Analysis Tasks
Streaming Sensor Data Analysis, Event Detection, Decision
Support
• Prediction algori...
Exascale Computing and Experimental Sensor Data
Upcoming SlideShare
Loading in …5
×

Exascale Computing and Experimental Sensor Data

316 views

Published on

Methods, tools and middleware for analysis of extremely large sensor datasets on high end architectures

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
316
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Metadata about images
    Metadata about image targets, how images are derived (patient, specimen, anatomicEntity, etc)
    3) Metadata about analyses (the purpose of the analysis, who performed the analysis, etc)
    4) Image markups -- a markup delineates a spatial region (e.g., as points, lines, polygons, multi-polygons) in images
    5) Annotation: Image features: a type of annotation calculated or derived from the markups
    6) Annotation: observation -- an annotation associates semantic meaning to markup entities through coded or free text terms that provide explanatory or descriptive information
    7) provenance information, i.e., the derivation history of a markup or annotation, including algorithm information, parameters, and inputs
    Native XML database based approach
    Small sized PAIS documents, e.g., organ, tissue, or region level annotations
    No mapping needed, support standard XML queries
    Relational and spatial database approach
    For large scale PAIS documents, e.g., analysis results at cellular or subcellular level
    Data mapped into relational tables and spatial objects
    Highly efficient on storage and queries
  • Exascale Computing and Experimental Sensor Data

    1. 1. Exascale Computing and Experimental Sensor Data Overview given at Brookhaven National Laboratory April 18 2014 Joel Saltz Stony Brook University joel.saltz@stonybrook.edu
    2. 2. Integrate Information from Sensors, Images, Cameras • Multi-dimensional spatial-temporal datasets – Radiology and Microscopy Image Analyses – Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution Remediation – Biomass monitoring and disaster surveillance using multiple types of satellite imagery – Weather prediction using satellite and ground sensor data – Analysis of Results from Large Scale Simulations – Square Kilometer Array – Google Self Driving Car • Correlative and cooperative analysis of data from multiple sensor modalities and sources • Equivalent from standpoint of data access patterns – need to develop new generation of data skeletons/mini-apps/data dwarfs
    3. 3. Spatio-temporal Sensor Integration, Analysis, Classification • Multi-scale material/tissue structural, molecular, functional characterization. Design of materials with specific structural, energy storage properties, brain, regenerative medicine, cancer • Integrative multi-scale analyses of the earth, oceans, atmosphere, cities, vegetation etc – cameras and sensors on satellites, aircraft, drones, land vehicles, stationary cameras • Digital astronomy • Hydrocarbon exploration, exploitation, pollution remediation • Aerospace – wind tunnels, acquisition of data during flight • Solid printing integrative data analyses • Autonomous vehicles, e.g. self driving cars • Data generated by numerical simulation codes – PDEs, particle methods • Fit model with data
    4. 4. Typical Computational/Analysis Tasks Spatio-temporal Sensor Integration, Analysis, Classification • Data Cleaning and Low Level Transformations • Data Subsetting, Filtering, Subsampling • Spatio-temporal Mapping and Registration • Object Segmentation • Feature Extraction • Object/Region/Feature Classification • Spatio-temporal Aggregation • Diffeomorphism type mapping methods (e.g. optimal mass transport) • Particle filtering/prediction • Change Detection, Comparison, and Quantification
    5. 5. Detect and track changes in data during production Invert data for reservoir properties Detect and track reservoir changes Assimilate data & reservoir properties into the evolving reservoir model Use simulation and optimization to guide future production Coupled data acquisition, data analysis, modeling, prediction and correction – data assimilation, particle filtering etc.
    6. 6. Future State • 100K – 1M pathology slides/hospital/year • 2GB compressed per slide • 1-10 slides used for Pathologist computer aided diagnosis • 100-10K slides used in hospital Quality control • Groups of 100K+ slides used for clinical research studies -- Combined with molecular, outcome data
    7. 7. Center Brain Tumor Pipeline Scaling on GT/ORNL NSF Keeneland (100 Nodes)
    8. 8. Center Runtime Support Objectives • Coordinated mapping of data and computation to complex memory hierarchies • Hierarchical work assignment with flexibility capable of dealing with data dependent computational patterns, fluctuations in computational speed associated with power management, faults • Linked to comprehensible programming model – model targeted at abstract application class but not to application domain (In the sensor, image, camera case -- Region Templates) • Software stack including coordinated compiler/runtime support/autotuning frameworks
    9. 9. HPC Segmentation and Feature Extraction Pipeline Tony Pan, George Teodoro, Tahsin Kurc and Scott Klasky
    10. 10. Region Templates • Provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box • Data region object is a storage materialization of data types and stores the data elements in the region contained by a region template instance; region template instance may have multiple data regions. • Allows for different data I/O, storage, and management strategies and implementations, while providing a homogeneous, unified interface to the application developer. • Application operations interact with data regions and region templates to store and retrieve data elements, rather than explicitly handling the management, staging, and distribution of the data elements. • Current implementations on nodes with multi-core CPUs and GPUs, distributed memory storage, and high bandwidth disk I/O.
    11. 11. Region Template: Preliminary Experimental Evaluation • Experimentally evaluated using pathology image analysis on the Keeneland system • This application consists of a pipeline with Segmentation and Feature Computation Stages, and each of these stages are internally divided into finer-grained tasks for better scheduling on heterogeneous CPU-GPU equipped machines.
    12. 12. Center Large Scale Data Management  Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.  Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships  Highly optimized spatial query and analyses  Implemented in a variety of ways including optimized CPU/GPU, Hadoop/HDFS and IBM DB2  Supported by two NLM R01 grants – Saltz/Foran
    13. 13. Center Spatial Centric – Sensor Data Feature “GIS” Point query: human marked point inside a nucleus . Window query: return markups contained in a rectangle Spatial join query: algorithm validation/comparison Containment query: nuclear feature aggregation in tumor regions Fusheng Wang
    14. 14. Center Algorithm Validation: Intersection between Two Result Sets (Spatial Join) PAIS: Example Queries . .
    15. 15. AIS (Analytical Imaging Standards)  AIS Logical Model  62 UML classes  markups, annotations, imageReferences, provenance  AIS Data Representation  XML (compressed) or HDF5  AIS Databases  loading, managing and querying and sharing data  Native XML DBMS or RDBMS + SDBMS class Domain Mo... Annotation GeometricShape CalculationObservation Specimen ImageReference Provenance User PAIS Equipment Group AnatomicEntity Subject Field Project MicroscopyImageReference DICOMImageReference TMAImageReference Markup Inference Region WholeSlideImageReference Patient Surface Collection AnnotationReference 10..1 1 0..1 0..* 0..* 1 0..* 1 0..1 1 0..* 1 0..1 1 0..1 1 0..1 1 0..* 1 0..* 0..* 0..* 1 0..1 1 0..1 1 0..* 0..1 0..* 1 0..* 1 0..1 1 0..* 1 0..1 1 0..1 1 0..* 10..* 1 0..* 1 0..* PAIS
    16. 16. Center VLDB 2012, 2013 Spatial Query, Change Detection, Comparison, and Quantification
    17. 17. Soft real time and streaming Sensor Data Analysis, Event Detection, Decision Support • Integrated analyses of patient data – physiological streams, labs, mediations, notes, Radiology, Pathology images, mobile health data feeds • High frequency trading, arbitrage • Real time monitoring earthquakes, control of oilfields • Control of industrial plants, aircraft engines • Fusion – data capture, control, prediction of disruptions • Internet of things • Twitter feeds • Intensive care alarms
    18. 18. Typical Computational Analysis Tasks Streaming Sensor Data Analysis, Event Detection, Decision Support • Prediction algorithms – Kalman, particle filtering • Machine learning algorithms on aggregated data to develop model, use of model on streaming data for decision support • Searching for rare events • Statistical algorithms to distinguish signal from noise • On the fly integration of multiple complementary data streams

    ×