1. Exascale Computing and
Experimental Sensor Data
Overview given at Brookhaven National Laboratory
April 18 2014
Joel Saltz
Stony Brook University
joel.saltz@stonybrook.edu
2. Integrate Information from
Sensors, Images, Cameras
• Multi-dimensional spatial-temporal datasets
– Radiology and Microscopy Image Analyses
– Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution
Remediation
– Biomass monitoring and disaster surveillance using multiple types of
satellite imagery
– Weather prediction using satellite and ground sensor data
– Analysis of Results from Large Scale Simulations
– Square Kilometer Array
– Google Self Driving Car
• Correlative and cooperative analysis of data from multiple sensor
modalities and sources
• Equivalent from standpoint of data access patterns – need to develop
new generation of data skeletons/mini-apps/data dwarfs
3. Spatio-temporal Sensor Integration,
Analysis, Classification
• Multi-scale material/tissue structural, molecular, functional
characterization. Design of materials with specific structural, energy
storage properties, brain, regenerative medicine, cancer
• Integrative multi-scale analyses of the earth, oceans, atmosphere, cities,
vegetation etc – cameras and sensors on satellites, aircraft, drones, land
vehicles, stationary cameras
• Digital astronomy
• Hydrocarbon exploration, exploitation, pollution remediation
• Aerospace – wind tunnels, acquisition of data during flight
• Solid printing integrative data analyses
• Autonomous vehicles, e.g. self driving cars
• Data generated by numerical simulation codes – PDEs, particle methods
• Fit model with data
4. Typical Computational/Analysis Tasks
Spatio-temporal Sensor Integration, Analysis, Classification
• Data Cleaning and Low Level Transformations
• Data Subsetting, Filtering, Subsampling
• Spatio-temporal Mapping and Registration
• Object Segmentation
• Feature Extraction
• Object/Region/Feature Classification
• Spatio-temporal Aggregation
• Diffeomorphism type mapping methods (e.g. optimal
mass transport)
• Particle filtering/prediction
• Change Detection, Comparison, and Quantification
5. Detect and track changes in data during production
Invert data for reservoir properties
Detect and track reservoir changes
Assimilate data & reservoir properties into
the evolving reservoir model
Use simulation and optimization to guide future production
Coupled data acquisition, data analysis, modeling, prediction and
correction – data assimilation, particle filtering etc.
6.
7. Future State
• 100K – 1M pathology slides/hospital/year
• 2GB compressed per slide
• 1-10 slides used for Pathologist computer
aided diagnosis
• 100-10K slides used in hospital Quality control
• Groups of 100K+ slides used for clinical
research studies -- Combined with molecular,
outcome data
9. Center
Runtime Support Objectives
• Coordinated mapping of data and computation to
complex memory hierarchies
• Hierarchical work assignment with flexibility capable
of dealing with data dependent computational
patterns, fluctuations in computational speed
associated with power management, faults
• Linked to comprehensible programming model –
model targeted at abstract application class but not
to application domain (In the sensor, image,
camera case -- Region Templates)
• Software stack including coordinated
compiler/runtime support/autotuning frameworks
10. HPC Segmentation and Feature Extraction
Pipeline
Tony Pan, George Teodoro,
Tahsin Kurc and Scott Klasky
11. Region Templates
• Provides a generic container template for common data structures, such
as points, arrays, regions, and object sets, within a spatial and temporal
bounding box
• Data region object is a storage materialization of data types and stores
the data elements in the region contained by a region template instance;
region template instance may have multiple data regions.
• Allows for different data I/O, storage, and management strategies and
implementations, while providing a homogeneous, unified interface to the
application developer.
• Application operations interact with data regions and region templates to
store and retrieve data elements, rather than explicitly handling the
management, staging, and distribution of the data elements.
• Current implementations on nodes with multi-core CPUs and GPUs,
distributed memory storage, and high bandwidth disk I/O.
12. Region Template: Preliminary
Experimental Evaluation
• Experimentally evaluated using pathology image analysis on the
Keeneland system
• This application consists of a pipeline with Segmentation and Feature
Computation Stages, and each of these stages are internally divided into
finer-grained tasks for better scheduling on heterogeneous CPU-GPU
equipped machines.
13. Center
Large Scale Data Management
Represented by a complex data model capturing
multi-faceted information including markups,
annotations, algorithm provenance, specimen, etc.
Support for complex relationships and spatial
query: multi-level granularities, relationships
between markups and annotations, spatial and
nested relationships
Highly optimized spatial query and analyses
Implemented in a variety of ways including
optimized CPU/GPU, Hadoop/HDFS and IBM DB2
Supported by two NLM R01 grants – Saltz/Foran
14. Center
Spatial Centric – Sensor Data Feature “GIS”
Point query: human marked point
inside a nucleus
.
Window query: return markups
contained in a rectangle
Spatial join query: algorithm
validation/comparison
Containment query: nuclear feature
aggregation in tumor regions
Fusheng Wang
18. Soft real time and streaming Sensor
Data Analysis, Event Detection,
Decision Support
• Integrated analyses of patient data – physiological
streams, labs, mediations, notes, Radiology, Pathology
images, mobile health data feeds
• High frequency trading, arbitrage
• Real time monitoring earthquakes, control of oilfields
• Control of industrial plants, aircraft engines
• Fusion – data capture, control, prediction of
disruptions
• Internet of things
• Twitter feeds
• Intensive care alarms
19. Typical Computational Analysis Tasks
Streaming Sensor Data Analysis, Event Detection, Decision
Support
• Prediction algorithms – Kalman, particle filtering
• Machine learning algorithms on aggregated data
to develop model, use of model on streaming
data for decision support
• Searching for rare events
• Statistical algorithms to distinguish signal from
noise
• On the fly integration of multiple complementary
data streams
Editor's Notes
Metadata about images
Metadata about image targets, how images are derived (patient, specimen, anatomicEntity, etc)
3) Metadata about analyses (the purpose of the analysis, who performed the analysis, etc)
4) Image markups -- a markup delineates a spatial region (e.g., as points, lines, polygons, multi-polygons) in images
5) Annotation: Image features: a type of annotation calculated or derived from the markups
6) Annotation: observation -- an annotation associates semantic meaning to markup entities through coded or free text terms that provide explanatory or descriptive information
7) provenance information, i.e., the derivation history of a markup or annotation, including algorithm information, parameters, and inputs
Native XML database based approach
Small sized PAIS documents, e.g., organ, tissue, or region level annotations
No mapping needed, support standard XML queries
Relational and spatial database approach
For large scale PAIS documents, e.g., analysis results at cellular or subcellular level
Data mapped into relational tables and spatial objects
Highly efficient on storage and queries