SlideShare a Scribd company logo
Scientific Applications and
Heterogeneous Architectures –
Data Analytics and the Intersection of HPC
and Edge Computing
Michela Taufer
Acknowledgements
T. Johnston
T. Estrada M. Cuendet E. Deelman
R. da Silva
Sponsors:
H. Weinstein
T. Do
S. Thomas
B. Mulligan
A. RazaviH. Carrillo
R. Vargas
D. Rorabaugh
R. LLamas M. Guevara
Trends in Next-Generation Systems: IO Gap and Ensembles
5x
12x
64x
13x
Rising Importance of Ensembles
Source:
https://wci.llnl.gov/simulation/computer-codes/uncertainty-quantification
ExascalePetascaleTerascale
Sierra (2018)
Sequoia (2013)
Purple (2005)
Number of Jobs
SystemScale(FLOPS)
Few Many
Widening IO Gap
Source: Lucy Nowell (DOE)
3
4
Pegasus LIGO PyCBC Workflow
Laser Interferometer Gravitational-Wave Observatory (LIGO)
Trends in Workflows: Compute + Analytics + Data
Source: Ewa Deelman, ISI - USC
Extending HPC to Connect to the “Edge”
Image Source: https://iiot-world.com/digital-disruption/the-right-representation-of-digital-twins-for-data-analytics/
Simulations Data Analytics Real System with Sensors
5
Two Use Cases
• Extending HPC to integrate data analytics
§ Next generation MD workflows
§ Molecular structures
§ Data transformation – i.e., capturing information
§ Dataflow modeling – i.e., lost information
• Extending HPC to connect to the “Edge”
§ Next generation precision farming
§ Soil moisture data
§ Data prediction – i.e., from coarse- to fine-grained information
6
Two Use Cases
• Extending HPC to integrate data analytics
§ Next generation MD workflows
§ Molecular structures
§ Data transformation – i.e., capturing information
§ Dataflow modeling – i.e., lost information
• Extending HPC to connect to the “Edge”
§ Next generation precision farming
§ Soil moisture data
§ Data prediction – i.e., from coarse- to fine-grained information
7
Classical Molecular Dynamics Simulations
Source: Vincent Voelz Gorup - CC BY-SA 3.0, http://www.voelzlab.org/
• A MD simulation comprises of
hundreds of thousands of MD job
• Each job preforms hundreds of
thousands of MD steps
8
Classical Molecular Dynamics Simulations
àForces on single atoms
à Acceleration
à Velocity
à Position
• MD step computes forces on single
atoms (e.g., bond, dihedrals, nonbond)
• Forces are added to compute
acceleration
• Acceleration is used to update
velocities
• Velocities are used to update the atom
positions
• Every n steps (Stride)
à Store 3D snapshot or frame
9
Building a Closed-loop Workflow
MD code
(e.g., GROMACS)
Run n-Stride
simulation steps
Data Generation
10
Building a Closed-loop Workflow
MD code
(e.g., GROMACS)
Dataflow
Ingestor
Run n-Stride
simulation steps
Data Generation Data Storage
11
Plumed In-memory
Staging Area
DataSpaces
Burst Buffer
Parallel File
System
Building a Closed-loop Workflow
MD code
(e.g., GROMACS)
A4MD
analytics
Retriever
Dataflow
Ingestor
Run n-Stride
simulation steps
Data Generation Data AnalyticsData Storage
Dataflow A4MD
analytics
Analytics
representations
+ algorithms
ML-inferred
algorithms
Collective
variables
12
Plumed In-memory
Staging Area
DataSpaces
Burst Buffer
Parallel File
System
Building a Closed-loop Workflow
MD code
(e.g., GROMACS)
A4MD
analytics
Retriever
Dataflow
Ingestor
Run n-Stride
simulation steps
Data Generation Data AnalyticsData Storage
Dataflow A4MD
analytics
Analytics
representations
+ algorithms
ML-inferred
algorithms
Collective
variables
13
Data Feedback
Plumed In-memory
Staging Area
DataSpaces
Burst Buffer
Parallel File
System
Extending HPC to Integrate Data Analytics
Analysis
Local Storage
MEMORY
CN
Application
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
Parallel File System
MEMORY
CN
Application
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
Analysis
CN
14
Data Data + metadata
Parallel File System
Augmenting HPC with In Situ and In Transit Analytics
AnalysisSimulation
Node 1 Node 2 Node 3
Network Interconnect
Core Core
Core Core
Core Core
Core Core
Simulation
Node
Analysis
SharedMemory
Example of tools:
• DataSpaces (Rutgers U.)
• DataStager (GeorgiaTech)
15
In Situ and In Transit Analytics for MD Simulations
• We want to capture what is going on in each frame without:
§ Disrupting the simulation (e.g., stealing CPU and memory on the node)
§ Moving all the frames to a central file system and analyzing them once the
simulation is over
§ Comparing each frame with past frames of the same job
§ Comparing each frame with frames of other jobs
Frames (or snapshots) of an MD trajectory with a stride of 5 steps:
Frame 55 Frame 60 Frame 65 Frame 70 Frame 80
Frame 75
16
Building a Closed-loop Workflow
MD code
(e.g., GROMACS)
A4MD
analytics
Retriever
Dataflow
Ingestor
Run n-Stride
simulation steps
Data Generation Data AnalyticsData Storage
Dataflow A4MD
analytics
Analytics
representations
+ algorithms
ML-inferred
algorithms
Collective
variables
17
Data Feedback
Plumed In-memory
Staging Area
DataSpaces
Burst Buffer
Parallel File
System
Building a Closed-loop Workflow
Data Generation Data AnalyticsData Storage
18
Data Feedback
A4MD
analytics
Retriever
Dataflow A4MD
analytics
Analytics
representations
+ algorithms
ML-inferred
algorithms
Collective
variables
Proteins with Similar Functions
Key principle: proteins with similar structures have similar functions
• Measure millions of protein variants expressed from yeast or bacteria
• Structure proteins to produce desired properties (protein engineering)
19 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer.
Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
Protein Representations
3D Cartesian representation Surface representationMulti-fold representation
20 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer.
Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
From Multi-fold Representation to Image Encoding
21 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer.
Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
From Multi-fold Representation to Image Encoding
22 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer.
Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
From Multi-fold Representation to Image Encoding
23 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer.
Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
High-Throughput Protein Analysis
• Eight biological processes from biological process taxonomy in RCSB-PDB
• 62,991 proteins from the PDB
Proteinsas3Dtens
24
T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer.
Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
convolutional
neural network
Google’s Inception-v3,
Gem-Net Groundtrue
Predictions
Normalized Confusion Matrix - Accuracy 80.66%
Capturing Changes in Folding with Transfer Learning
25
T. Estrada,et al. A Graphic Encoding Method for Quantitative Classification of Protein Structure and
Representation of Conformational Changes. 2019
Protein: Opsin Frame 50 Frame 1500 Frame 1950
Two Use Cases
• Extending HPC to integrate data analytics
§ Next generation MD workflows
§ Molecular structures
§ Data transformation – i.e., capturing information
§ Dataflow modeling – i.e., lost information
• Extending HPC to connect to the “Edge”
§ Next generation precision farming
§ Soil moisture data
§ Data prediction – i.e., from coarse- to fine-grained information
26
Building a Closed-loop Workflow
MD code
(e.g., GROMACS)
A4MD
analytics
Retriever
Dataflow
Ingestor
Run n-Stride
simulation steps
Data Generation Data AnalyticsData Storage
Dataflow A4MD
analytics
Analytics
representations
+ algorithms
ML-inferred
algorithms
Collective
variables
27
Data Feedback
Plumed In-memory
Staging Area
DataSpaces
Burst Buffer
Parallel File
System
Modeling Lost Frames
Node
MD
Simulation
Data
Analytics
Dataspaces
Server
Main Memory
Dataspaces
Shared Memory Region
Single Node (In Situ)
28 S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
S1, S2, S3: generate MD frame
W1, W2, W2: write to shared memory
R1, R2, R3: read from shared memory
A1, A3: analyze frame
Dataflow for Modeling Lost Frames
29
Generator of
MD frames
Ingestor
Dataflow
Plumed In-memory
Staging Area
DataSpaces Retriever
Data Generation Data AnalyticsData Storage
Dataflow
Analytics
representations
Distance matrices
λmax
eigenvalues
T. Johnston et al. In-Situ Data Analytics and Indexing of Protein Trajectories. Journal of
Computational Chemistry (JCC), 38(16):1419-1430, 2017.
Eigenvalues: Proxy for Structural Changes
Frames (or snapshots) of an MD trajectory with a stride of 5 steps:
Frame 55 Frame 60 Frame 65 Frame 70 Frame 80
Frame 75
30
T. Johnston et al. In-Situ Data Analytics and Indexing of Protein Trajectories. Journal of
Computational Chemistry (JCC), 38(16):1419-1430, 2017.
λmax 60 λmax65 λmax70 λmax75 λmax80λmax55
The distance between two max eigenvalues can serve as a proxy for
distance between the two associated conformations
Cα
1
Cα
Nα
31
Single frame at time t
Nα Cα atoms
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
Cα
Nα
Cα
Nα/2
Cα
Nα/2 - 1
Cα
1
Cα
1 Cα
Nα
0 dij
dji 0
Distance of two segments with
segment length:
Nα/2 x Cα atoms
(Cα
1 - Cα
Nα/2 -1 ) ( Cα
Nα/2 - Cα
Nα/2)
Single λmax
Cα
1
Cα
Nα
32
Single frame at time t
Nα Cα atoms
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
Cα
1
Cα
2
Cα
3
Cα
4
Cα
1. Cα
4
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
Cα
Nα
Cα
Nα/2
Cα
Nα/2 - 1
Cα
1
Cα
1 Cα
Nα
0 dij
dji 0
Distance of two segments with
segment length:
Nα/2 x Cα atoms
(Cα
1 - Cα
Nα/2 -1 ) ( Cα
Nα/2 - Cα
Nα/2)
Single λmax
Cα
1
Cα
Nα
33
Distances of Nα/2 segments with
segment length: 2 x Cα atoms
(Cα
1 - Cα
2 ) ( Cα
3 - Cα
4)
(Cα
5 - Cα
6 ) ( Cα
7 – Cα
8)
….
(Cα
Nα/2-3 - Cα
Nα/2-2 ) ( Cα
Nα/2-1 - Cα
Nα/2)
λmax, 1 λmax, 2 … λmax, Nα/2
Single frame at time t
Nα Cα atoms
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
Cα
Nα
Cα
Nα/2
Cα
Nα/2 - 1
Cα
1
Cα
1 Cα
Nα
Cα
1
Cα
2
Cα
3
Cα
4
Cα
1. Cα
4
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
0 dij
dji 0
Many small
matrices
Few large
matrices
Segment size = proxy of number
of matrices and matrix sizes
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
34
2-step Model: Fraction of Analyzed Frames
Gltph 270,088 atoms
Trp cage 12,619 atoms T cell receptor 81,092 atoms
35
Observables: small segment lengths
(i.e., 2 , 4 , 6 , 8 , 10 , 12 , 14, 16, and 18 )
Matrix size (# elements)
Matrixperiod(s/matrices)
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
2-step Model: Fraction of Analyzed Frames
Observables: segment lengths
(i.e., 2 , 4 , 6 , 8 , 10 , 12 , 14, 16, and 18 )
Model: polynomial model of degree 2
obtained by least-square fitting observables
R2 = 0.88
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted)
36
Matrixperiod(s/matrices)
Matrix size (# elements)
Matrixperiod(s/matrices)
Matrix size (# elements)
2-step Model: Fraction of Analyzed Frames
Observables Degree 2 polynomial model Absolute error between data
and fitting model
R2 = 0.88
Matrixperiod(s/matrices)
Matrix size (# elements)
Matrixperiod(s/matrices)
Matrix size (# elements)
Matrixperiod(s/matrices)
Matrix size (# elements)
FractionofAnalyzedFrames
FractionofAnalyzedFrames
Error
2-step Model: Frames Distribution
• Given a trajectory, we model the proportions p and q of
analyzed frames (f) with periods k and k+1
§ Example: Gltph (27,000 atoms and TPS 318), trajectory of 1,000 frames.
Period between analyzed frames Period between analyzed frames Period between analyzed frames
Stride 2 Stride 6 Stride 12
Numberofanalyzedframes
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019.
38
Case Study: 1BDD Protein Conformations
Frame 1330 Frame 1360 Frame 1390
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted)
39
Case Study: 1BDD Protein Conformations
Frame 1330 Frame 1360 Frame 1390
Helix 1-3: λmax Helix 2-3: λmaxHelix 1-2: λmax
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted)
40
Case Study: 1BDD Protein Conformations
Frame 1330 Frame 1360 Frame 1390
Helix 1-3: λmax Helix 2-3: λmaxHelix 1-2: λmax
S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics
Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted)
41
Two Use Cases
• Extending HPC to integrate data analytics
§ Next generation MD workflows
§ Molecular structures
§ Data transformation – i.e., capturing information
§ Dataflow modeling – i.e., lost information
• Extending HPC to connect to the “Edge”
§ Next generation precision farming
§ Soil moisture data
§ Data prediction – i.e., from coarse- to fine-grained information
42
Soil Moisture Data for Precision Farming
• Satellites collect raster data across the surface of the Earth
Image Source: http://www.esa-soilmoisture-cci.org/43
Soil Moisture: Incomplete Data
Image source: Ricardo Llamas, University of Delaware
Data source: ESA-CCI soil moisture database (http://www.esa-soilmoisture-cci.org/)
December 2000
Averages
(m3/m3)
44
Soil Moisture: Incomplete Data
Image source: Ricardo Llamas, University of Delaware
Data source: ESA-CCI soil moisture database (http://www.esa-soilmoisture-cci.org/)
December 2000
Averages
(m3/m3)
Causes of missing data:
● dense vegetation
● snow/ice cover
● frozen surface
● extremely dry surface
45
Soil Moisture: Coarse-grained Data
Original Resolution
27 km × 27 km
Image source: McPherson et al., Using coarse-grained occurrence data to predict species distributions at
finer spatial resolutions—possibilities and limitations, Ecological Modeling 192:499-522, 2006.
Desired Resolution
1 km × 1 km
46
Soil Moisture
ESA-CCI
Building a Closed-loop Workflow
Data Generation
Landscape
Surface DSM
Weather Data
NOAA
47
Satellite and
sensors data
Course-grained,
incomplete data
Soil Moisture
ESA-CCI
Building a Closed-loop Workflow
Data GenerationData Analytics
Landscape
Surface DSM
Weather Data
NOAA
48
A4MD
analytics
A4MD
analytics
Analytics
representations
+ algorithms
Satellite and
sensors data
Data
predictions
Course-grained,
incomplete data
Soil Moisture
ESA-CCI
Building a Closed-loop Workflow
Data GenerationData Analytics
Landscape
Surface DSM
Compute
Weather Data
NOAA
Kubernetes
Singularity
Shifter and
Docker
49
A4MD
analytics
A4MD
analytics
Analytics
representations
+ algorithms
Satellite and
sensors data
Data
predictions
Soil moisture
integrated in FDS for:
• Controlled
combustion;
• Wildfire
propagation
Fire Dynamics Simulator
Fine-grained,
complete data
Course-grained,
incomplete data
Soil Moisture
ESA-CCI
Building a Closed-loop Workflow
Data GenerationData Analytics
Landscape
Surface DSM
Compute
Weather Data
NOAA
Kubernetes
Singularity
Shifter and
Docker
50
A4MD
analytics
A4MD
analytics
Analytics
representations
+ algorithms
Data Feedback
Satellite and
sensors data
Data
predictions
Soil moisture
integrated in FDS for:
• Controlled
combustion
• Wildfire
propagation
Fire Dynamics Simulator
Fine-grained,
complete data
Course-grained,
incomplete data
Augmenting HPC with the Cloud
nodes
HPC system
ParallelFileSystem
51 Source: https://www.nextplatform.com/2018/02/26/adaptive-approach-bursting-hpc-cloud/
nodes
nodes nodes
nodes nodes
nodes
nodes
nodes
nodes
nodes
nodes
nodes
nodes
nodes
nodes
nodes
Compute Analytics
Soil Moisture
ESA-CCI
Building a Closed-loop Workflow
Data GenerationData Analytics
Landscape
Surface DSM
Compute
Weather Data
NOAA
Kubernetes
Singularity
Shifter and
Docker
52
A4MD
analytics
A4MD
analytics
Analytics
representations
+ algorithms
Data Feedback
Satellite and
sensors data
Data
predictions
Soil moisture
integrated in FDS for:
• Controlled
combustion
• Wildfire
propagation
Fire Dynamics Simulator
Fine-grained,
complete data
Course-grained,
incomplete data
Hybrid Algorithms for Analytics
Data GenerationData AnalyticsCompute
53
A4MD
analytics
A4MD
analytics
Analytics
representations
+ algorithms
Data Feedback
Fine-grained,
complete data
Course-grained,
incomplete data
Data
(observations)
Parameter
Global versus Local Data Modeling
DependentVariable
Data
(observations)
Parameter
DependentVariable
54
Data
(observations)
Surrogate
Model
Parameter
Global versus Local Data Modeling
DependentVariable
Data
(observations)
Parameter
DependentVariable
k-Nearest Neighbor
Model
55
Global
modeling
Local
modeling
SBM vs. k-NN
56
Observations
Piecewise linear data
Surrogate-based model
2-Nearest Neighbor Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
SBM vs. k-NN
57
Observations
Piecewise linear data
Surrogate-based model
2-Nearest Neighbor Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
SBM vs. k-NN
58
Observations
Piecewise linear data
Surrogate-based model
2-Nearest Neighbor Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
SBM vs. k-NN
59
Observations
Piecewise linear data
Surrogate-based model
2-Nearest Neighbor Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
SBM vs. k-NN
60
Observations
Piecewise linear data
Surrogate-based model
2-Nearest Neighbor Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
Hybrid Piecewise Polynomial Modeling (HYPPO)
Surrogate-Based Modeling
• Use all sampled data
• Construct one polynomial
(single complex global model)
k Nearest Neighbors
• Use local data
• Compute the average
(many simple local models)
61 Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
Hybrid Piecewise Polynomial Modeling (HYPPO)
Surrogate-Based Modeling
• Use all sampled data
• Construct one polynomial
(single complex global model)
k Nearest Neighbors
• Use local data
• Compute the average
(many simple local models)
Hybrid modeling à HYPPO
• Use local data
• Construct many polynomials
(many complex local models)
62 Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
63
Observations
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
64
Surrogate-based Model
Observations
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
65
Surrogate-based Model
Observations
k Nearest Neighbors Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
66
Surrogate-based Model
Observations
k Nearest Neighbors Model
Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth
Surfaces.” SBAC-PAD 2016: 26-33
Hybrid Piecewise Polynomial
Model (HYPPO)
3
2
1
0
Soil moisture predictions Model degrees
Case Study: Fine-grained Modeling of Mid-Atlantic Region
67 D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, M. Taufer. SOMOSPIE: A modular SOil MOisture SPatial
Inference Engine based on data driven decision. arXiv:1904.07754, 2019
Challenges and Opportunities (I)
• Two trends in HPC are impacting scientific applications:
§ Convergence of simulations and data analytics
§ Emergence of edge computing
• Applications in precision medicine and precision farming
are leveraging these trends
• There is the need to further integrate these trends in HPC
§ New challenges and opportunities for the HPC community
68
Challenges and Opportunities (II)
• Efficiency: Optimize performance and power usage associated to data
generation, movement, and analytics
• Non-invasive: Capture knowledge from data without rewriting
simulations’ legacy codes or simulations’ scripts
• Generality: Build workflows that support different types of analytics
across different applications and different data
• Portability: Execute combined compute and analytics across different
systems, including the edge, and with heterogenous resources
• Scalability: Design methods for knowledge discovery at scale (e.g.,
scalable ML algorithms) for “compute + analytics + data” workflows
69
Scientific Applications and Heterogeneous Architectures

More Related Content

What's hot

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
Data science with Perl & Raku
Data science with Perl & RakuData science with Perl & Raku
Data science with Perl & Raku
Sören Laird Sörries
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
dgarijo
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
Geoffrey Fox
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
IRJET Journal
 
15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
Paul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth
 
Indexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and DeduplicationIndexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and Deduplication
Pradeeban Kathiravelu, Ph.D.
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
Pradeeban Kathiravelu, Ph.D.
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONEFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
AM Publications,India
 
A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...
Tejovat Technologies Pvt.Ltd.,Wakad
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
Florian Stegmaier
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
台灣資料科學年會
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 

What's hot (20)

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Data science with Perl & Raku
Data science with Perl & RakuData science with Perl & Raku
Data science with Perl & Raku
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil15. STL - Data Structures using C++ by Varsha Patil
15. STL - Data Structures using C++ by Varsha Patil
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Indexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and DeduplicationIndexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and Deduplication
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
PAKDD2013
PAKDD2013PAKDD2013
PAKDD2013
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONEFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
 
A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...A survey on massively Parallelism for indexing multidimensional datasets on t...
A survey on massively Parallelism for indexing multidimensional datasets on t...
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 

Similar to Scientific Applications and Heterogeneous Architectures

Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
Ian Foster
 
Nokia Augmented Reality
Nokia Augmented RealityNokia Augmented Reality
Nokia Augmented Reality
StudioSFO
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Seattle DAML meetup
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
ranjit banshpal
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Webinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDBWebinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDB
MongoDB
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
William Smith
 
Using R for Advanced Analytics with MongoDB
Using R for Advanced Analytics with MongoDBUsing R for Advanced Analytics with MongoDB
Using R for Advanced Analytics with MongoDB
MongoDB
 
Big data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policiesBig data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policies
BigData_Europe
 
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Prasanta Paul
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM France Lab
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
Kamal Singh Lodhi
 
expeditions praneeth_june-2021
expeditions praneeth_june-2021expeditions praneeth_june-2021
expeditions praneeth_june-2021
Praneeth Vepakomma
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsHong-Linh Truong
 

Similar to Scientific Applications and Heterogeneous Architectures (20)

Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Enabling semantic integration
Enabling semantic integration Enabling semantic integration
Enabling semantic integration
 
Nokia Augmented Reality
Nokia Augmented RealityNokia Augmented Reality
Nokia Augmented Reality
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 
Slide 1
Slide 1Slide 1
Slide 1
 
Slide 1
Slide 1Slide 1
Slide 1
 
Webinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDBWebinar: Using R for Advanced Analytics with MongoDB
Webinar: Using R for Advanced Analytics with MongoDB
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
 
Using R for Advanced Analytics with MongoDB
Using R for Advanced Analytics with MongoDBUsing R for Advanced Analytics with MongoDB
Using R for Advanced Analytics with MongoDB
 
Big data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policiesBig data in the research life cycle: technologies, infrastructures, policies
Big data in the research life cycle: technologies, infrastructures, policies
 
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
Ppt for paper id 696 a review of hybrid data mining algorithm for big data mi...
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
expeditions praneeth_june-2021
expeditions praneeth_june-2021expeditions praneeth_june-2021
expeditions praneeth_june-2021
 
TUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflowsTUW - Quality of data-aware data analytics workflows
TUW - Quality of data-aware data analytics workflows
 

More from inside-BigData.com

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Scientific Applications and Heterogeneous Architectures

  • 1. Scientific Applications and Heterogeneous Architectures – Data Analytics and the Intersection of HPC and Edge Computing Michela Taufer
  • 2. Acknowledgements T. Johnston T. Estrada M. Cuendet E. Deelman R. da Silva Sponsors: H. Weinstein T. Do S. Thomas B. Mulligan A. RazaviH. Carrillo R. Vargas D. Rorabaugh R. LLamas M. Guevara
  • 3. Trends in Next-Generation Systems: IO Gap and Ensembles 5x 12x 64x 13x Rising Importance of Ensembles Source: https://wci.llnl.gov/simulation/computer-codes/uncertainty-quantification ExascalePetascaleTerascale Sierra (2018) Sequoia (2013) Purple (2005) Number of Jobs SystemScale(FLOPS) Few Many Widening IO Gap Source: Lucy Nowell (DOE) 3
  • 4. 4 Pegasus LIGO PyCBC Workflow Laser Interferometer Gravitational-Wave Observatory (LIGO) Trends in Workflows: Compute + Analytics + Data Source: Ewa Deelman, ISI - USC
  • 5. Extending HPC to Connect to the “Edge” Image Source: https://iiot-world.com/digital-disruption/the-right-representation-of-digital-twins-for-data-analytics/ Simulations Data Analytics Real System with Sensors 5
  • 6. Two Use Cases • Extending HPC to integrate data analytics § Next generation MD workflows § Molecular structures § Data transformation – i.e., capturing information § Dataflow modeling – i.e., lost information • Extending HPC to connect to the “Edge” § Next generation precision farming § Soil moisture data § Data prediction – i.e., from coarse- to fine-grained information 6
  • 7. Two Use Cases • Extending HPC to integrate data analytics § Next generation MD workflows § Molecular structures § Data transformation – i.e., capturing information § Dataflow modeling – i.e., lost information • Extending HPC to connect to the “Edge” § Next generation precision farming § Soil moisture data § Data prediction – i.e., from coarse- to fine-grained information 7
  • 8. Classical Molecular Dynamics Simulations Source: Vincent Voelz Gorup - CC BY-SA 3.0, http://www.voelzlab.org/ • A MD simulation comprises of hundreds of thousands of MD job • Each job preforms hundreds of thousands of MD steps 8
  • 9. Classical Molecular Dynamics Simulations àForces on single atoms à Acceleration à Velocity à Position • MD step computes forces on single atoms (e.g., bond, dihedrals, nonbond) • Forces are added to compute acceleration • Acceleration is used to update velocities • Velocities are used to update the atom positions • Every n steps (Stride) à Store 3D snapshot or frame 9
  • 10. Building a Closed-loop Workflow MD code (e.g., GROMACS) Run n-Stride simulation steps Data Generation 10
  • 11. Building a Closed-loop Workflow MD code (e.g., GROMACS) Dataflow Ingestor Run n-Stride simulation steps Data Generation Data Storage 11 Plumed In-memory Staging Area DataSpaces Burst Buffer Parallel File System
  • 12. Building a Closed-loop Workflow MD code (e.g., GROMACS) A4MD analytics Retriever Dataflow Ingestor Run n-Stride simulation steps Data Generation Data AnalyticsData Storage Dataflow A4MD analytics Analytics representations + algorithms ML-inferred algorithms Collective variables 12 Plumed In-memory Staging Area DataSpaces Burst Buffer Parallel File System
  • 13. Building a Closed-loop Workflow MD code (e.g., GROMACS) A4MD analytics Retriever Dataflow Ingestor Run n-Stride simulation steps Data Generation Data AnalyticsData Storage Dataflow A4MD analytics Analytics representations + algorithms ML-inferred algorithms Collective variables 13 Data Feedback Plumed In-memory Staging Area DataSpaces Burst Buffer Parallel File System
  • 14. Extending HPC to Integrate Data Analytics Analysis Local Storage MEMORY CN Application CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN Parallel File System MEMORY CN Application CN CN CN CN CN CN CN CN CN CN CN CN CN CN Analysis CN 14 Data Data + metadata Parallel File System
  • 15. Augmenting HPC with In Situ and In Transit Analytics AnalysisSimulation Node 1 Node 2 Node 3 Network Interconnect Core Core Core Core Core Core Core Core Simulation Node Analysis SharedMemory Example of tools: • DataSpaces (Rutgers U.) • DataStager (GeorgiaTech) 15
  • 16. In Situ and In Transit Analytics for MD Simulations • We want to capture what is going on in each frame without: § Disrupting the simulation (e.g., stealing CPU and memory on the node) § Moving all the frames to a central file system and analyzing them once the simulation is over § Comparing each frame with past frames of the same job § Comparing each frame with frames of other jobs Frames (or snapshots) of an MD trajectory with a stride of 5 steps: Frame 55 Frame 60 Frame 65 Frame 70 Frame 80 Frame 75 16
  • 17. Building a Closed-loop Workflow MD code (e.g., GROMACS) A4MD analytics Retriever Dataflow Ingestor Run n-Stride simulation steps Data Generation Data AnalyticsData Storage Dataflow A4MD analytics Analytics representations + algorithms ML-inferred algorithms Collective variables 17 Data Feedback Plumed In-memory Staging Area DataSpaces Burst Buffer Parallel File System
  • 18. Building a Closed-loop Workflow Data Generation Data AnalyticsData Storage 18 Data Feedback A4MD analytics Retriever Dataflow A4MD analytics Analytics representations + algorithms ML-inferred algorithms Collective variables
  • 19. Proteins with Similar Functions Key principle: proteins with similar structures have similar functions • Measure millions of protein variants expressed from yeast or bacteria • Structure proteins to produce desired properties (protein engineering) 19 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer. Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
  • 20. Protein Representations 3D Cartesian representation Surface representationMulti-fold representation 20 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer. Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
  • 21. From Multi-fold Representation to Image Encoding 21 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer. Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
  • 22. From Multi-fold Representation to Image Encoding 22 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer. Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
  • 23. From Multi-fold Representation to Image Encoding 23 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer. Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018.
  • 24. High-Throughput Protein Analysis • Eight biological processes from biological process taxonomy in RCSB-PDB • 62,991 proteins from the PDB Proteinsas3Dtens 24 T. Estrada, J. Benson, H. Carrillo-Cabada, A. Razavi, M. Cuendet, H. Weinstein, E. Deelman, and M. Taufer. Graphic Encoding of Proteins for Efficient High-Throughput Analysis. ICPP 2018. convolutional neural network Google’s Inception-v3, Gem-Net Groundtrue Predictions Normalized Confusion Matrix - Accuracy 80.66%
  • 25. Capturing Changes in Folding with Transfer Learning 25 T. Estrada,et al. A Graphic Encoding Method for Quantitative Classification of Protein Structure and Representation of Conformational Changes. 2019 Protein: Opsin Frame 50 Frame 1500 Frame 1950
  • 26. Two Use Cases • Extending HPC to integrate data analytics § Next generation MD workflows § Molecular structures § Data transformation – i.e., capturing information § Dataflow modeling – i.e., lost information • Extending HPC to connect to the “Edge” § Next generation precision farming § Soil moisture data § Data prediction – i.e., from coarse- to fine-grained information 26
  • 27. Building a Closed-loop Workflow MD code (e.g., GROMACS) A4MD analytics Retriever Dataflow Ingestor Run n-Stride simulation steps Data Generation Data AnalyticsData Storage Dataflow A4MD analytics Analytics representations + algorithms ML-inferred algorithms Collective variables 27 Data Feedback Plumed In-memory Staging Area DataSpaces Burst Buffer Parallel File System
  • 28. Modeling Lost Frames Node MD Simulation Data Analytics Dataspaces Server Main Memory Dataspaces Shared Memory Region Single Node (In Situ) 28 S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. S1, S2, S3: generate MD frame W1, W2, W2: write to shared memory R1, R2, R3: read from shared memory A1, A3: analyze frame
  • 29. Dataflow for Modeling Lost Frames 29 Generator of MD frames Ingestor Dataflow Plumed In-memory Staging Area DataSpaces Retriever Data Generation Data AnalyticsData Storage Dataflow Analytics representations Distance matrices λmax eigenvalues T. Johnston et al. In-Situ Data Analytics and Indexing of Protein Trajectories. Journal of Computational Chemistry (JCC), 38(16):1419-1430, 2017.
  • 30. Eigenvalues: Proxy for Structural Changes Frames (or snapshots) of an MD trajectory with a stride of 5 steps: Frame 55 Frame 60 Frame 65 Frame 70 Frame 80 Frame 75 30 T. Johnston et al. In-Situ Data Analytics and Indexing of Protein Trajectories. Journal of Computational Chemistry (JCC), 38(16):1419-1430, 2017. λmax 60 λmax65 λmax70 λmax75 λmax80λmax55 The distance between two max eigenvalues can serve as a proxy for distance between the two associated conformations
  • 31. Cα 1 Cα Nα 31 Single frame at time t Nα Cα atoms S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019.
  • 32. Cα Nα Cα Nα/2 Cα Nα/2 - 1 Cα 1 Cα 1 Cα Nα 0 dij dji 0 Distance of two segments with segment length: Nα/2 x Cα atoms (Cα 1 - Cα Nα/2 -1 ) ( Cα Nα/2 - Cα Nα/2) Single λmax Cα 1 Cα Nα 32 Single frame at time t Nα Cα atoms S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019.
  • 33. Cα 1 Cα 2 Cα 3 Cα 4 Cα 1. Cα 4 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 Cα Nα Cα Nα/2 Cα Nα/2 - 1 Cα 1 Cα 1 Cα Nα 0 dij dji 0 Distance of two segments with segment length: Nα/2 x Cα atoms (Cα 1 - Cα Nα/2 -1 ) ( Cα Nα/2 - Cα Nα/2) Single λmax Cα 1 Cα Nα 33 Distances of Nα/2 segments with segment length: 2 x Cα atoms (Cα 1 - Cα 2 ) ( Cα 3 - Cα 4) (Cα 5 - Cα 6 ) ( Cα 7 – Cα 8) …. (Cα Nα/2-3 - Cα Nα/2-2 ) ( Cα Nα/2-1 - Cα Nα/2) λmax, 1 λmax, 2 … λmax, Nα/2 Single frame at time t Nα Cα atoms S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019.
  • 34. Cα Nα Cα Nα/2 Cα Nα/2 - 1 Cα 1 Cα 1 Cα Nα Cα 1 Cα 2 Cα 3 Cα 4 Cα 1. Cα 4 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 0 dij dji 0 Many small matrices Few large matrices Segment size = proxy of number of matrices and matrix sizes S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. 34
  • 35. 2-step Model: Fraction of Analyzed Frames Gltph 270,088 atoms Trp cage 12,619 atoms T cell receptor 81,092 atoms 35 Observables: small segment lengths (i.e., 2 , 4 , 6 , 8 , 10 , 12 , 14, 16, and 18 ) Matrix size (# elements) Matrixperiod(s/matrices) S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019.
  • 36. 2-step Model: Fraction of Analyzed Frames Observables: segment lengths (i.e., 2 , 4 , 6 , 8 , 10 , 12 , 14, 16, and 18 ) Model: polynomial model of degree 2 obtained by least-square fitting observables R2 = 0.88 S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted) 36 Matrixperiod(s/matrices) Matrix size (# elements) Matrixperiod(s/matrices) Matrix size (# elements)
  • 37. 2-step Model: Fraction of Analyzed Frames Observables Degree 2 polynomial model Absolute error between data and fitting model R2 = 0.88 Matrixperiod(s/matrices) Matrix size (# elements) Matrixperiod(s/matrices) Matrix size (# elements) Matrixperiod(s/matrices) Matrix size (# elements) FractionofAnalyzedFrames FractionofAnalyzedFrames Error
  • 38. 2-step Model: Frames Distribution • Given a trajectory, we model the proportions p and q of analyzed frames (f) with periods k and k+1 § Example: Gltph (27,000 atoms and TPS 318), trajectory of 1,000 frames. Period between analyzed frames Period between analyzed frames Period between analyzed frames Stride 2 Stride 6 Stride 12 Numberofanalyzedframes S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. 38
  • 39. Case Study: 1BDD Protein Conformations Frame 1330 Frame 1360 Frame 1390 S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted) 39
  • 40. Case Study: 1BDD Protein Conformations Frame 1330 Frame 1360 Frame 1390 Helix 1-3: λmax Helix 2-3: λmaxHelix 1-2: λmax S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted) 40
  • 41. Case Study: 1BDD Protein Conformations Frame 1330 Frame 1360 Frame 1390 Helix 1-3: λmax Helix 2-3: λmaxHelix 1-2: λmax S. Thomas et al. Characterization of In Situ and In Transit Analytics of Molecular Dynamics Simulations for Next-generation Supercomputers. eScience, 2019. (Submitted) 41
  • 42. Two Use Cases • Extending HPC to integrate data analytics § Next generation MD workflows § Molecular structures § Data transformation – i.e., capturing information § Dataflow modeling – i.e., lost information • Extending HPC to connect to the “Edge” § Next generation precision farming § Soil moisture data § Data prediction – i.e., from coarse- to fine-grained information 42
  • 43. Soil Moisture Data for Precision Farming • Satellites collect raster data across the surface of the Earth Image Source: http://www.esa-soilmoisture-cci.org/43
  • 44. Soil Moisture: Incomplete Data Image source: Ricardo Llamas, University of Delaware Data source: ESA-CCI soil moisture database (http://www.esa-soilmoisture-cci.org/) December 2000 Averages (m3/m3) 44
  • 45. Soil Moisture: Incomplete Data Image source: Ricardo Llamas, University of Delaware Data source: ESA-CCI soil moisture database (http://www.esa-soilmoisture-cci.org/) December 2000 Averages (m3/m3) Causes of missing data: ● dense vegetation ● snow/ice cover ● frozen surface ● extremely dry surface 45
  • 46. Soil Moisture: Coarse-grained Data Original Resolution 27 km × 27 km Image source: McPherson et al., Using coarse-grained occurrence data to predict species distributions at finer spatial resolutions—possibilities and limitations, Ecological Modeling 192:499-522, 2006. Desired Resolution 1 km × 1 km 46
  • 47. Soil Moisture ESA-CCI Building a Closed-loop Workflow Data Generation Landscape Surface DSM Weather Data NOAA 47 Satellite and sensors data Course-grained, incomplete data
  • 48. Soil Moisture ESA-CCI Building a Closed-loop Workflow Data GenerationData Analytics Landscape Surface DSM Weather Data NOAA 48 A4MD analytics A4MD analytics Analytics representations + algorithms Satellite and sensors data Data predictions Course-grained, incomplete data
  • 49. Soil Moisture ESA-CCI Building a Closed-loop Workflow Data GenerationData Analytics Landscape Surface DSM Compute Weather Data NOAA Kubernetes Singularity Shifter and Docker 49 A4MD analytics A4MD analytics Analytics representations + algorithms Satellite and sensors data Data predictions Soil moisture integrated in FDS for: • Controlled combustion; • Wildfire propagation Fire Dynamics Simulator Fine-grained, complete data Course-grained, incomplete data
  • 50. Soil Moisture ESA-CCI Building a Closed-loop Workflow Data GenerationData Analytics Landscape Surface DSM Compute Weather Data NOAA Kubernetes Singularity Shifter and Docker 50 A4MD analytics A4MD analytics Analytics representations + algorithms Data Feedback Satellite and sensors data Data predictions Soil moisture integrated in FDS for: • Controlled combustion • Wildfire propagation Fire Dynamics Simulator Fine-grained, complete data Course-grained, incomplete data
  • 51. Augmenting HPC with the Cloud nodes HPC system ParallelFileSystem 51 Source: https://www.nextplatform.com/2018/02/26/adaptive-approach-bursting-hpc-cloud/ nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes nodes Compute Analytics
  • 52. Soil Moisture ESA-CCI Building a Closed-loop Workflow Data GenerationData Analytics Landscape Surface DSM Compute Weather Data NOAA Kubernetes Singularity Shifter and Docker 52 A4MD analytics A4MD analytics Analytics representations + algorithms Data Feedback Satellite and sensors data Data predictions Soil moisture integrated in FDS for: • Controlled combustion • Wildfire propagation Fire Dynamics Simulator Fine-grained, complete data Course-grained, incomplete data
  • 53. Hybrid Algorithms for Analytics Data GenerationData AnalyticsCompute 53 A4MD analytics A4MD analytics Analytics representations + algorithms Data Feedback Fine-grained, complete data Course-grained, incomplete data
  • 54. Data (observations) Parameter Global versus Local Data Modeling DependentVariable Data (observations) Parameter DependentVariable 54
  • 55. Data (observations) Surrogate Model Parameter Global versus Local Data Modeling DependentVariable Data (observations) Parameter DependentVariable k-Nearest Neighbor Model 55 Global modeling Local modeling
  • 56. SBM vs. k-NN 56 Observations Piecewise linear data Surrogate-based model 2-Nearest Neighbor Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 57. SBM vs. k-NN 57 Observations Piecewise linear data Surrogate-based model 2-Nearest Neighbor Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 58. SBM vs. k-NN 58 Observations Piecewise linear data Surrogate-based model 2-Nearest Neighbor Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 59. SBM vs. k-NN 59 Observations Piecewise linear data Surrogate-based model 2-Nearest Neighbor Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 60. SBM vs. k-NN 60 Observations Piecewise linear data Surrogate-based model 2-Nearest Neighbor Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 61. Hybrid Piecewise Polynomial Modeling (HYPPO) Surrogate-Based Modeling • Use all sampled data • Construct one polynomial (single complex global model) k Nearest Neighbors • Use local data • Compute the average (many simple local models) 61 Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 62. Hybrid Piecewise Polynomial Modeling (HYPPO) Surrogate-Based Modeling • Use all sampled data • Construct one polynomial (single complex global model) k Nearest Neighbors • Use local data • Compute the average (many simple local models) Hybrid modeling à HYPPO • Use local data • Construct many polynomials (many complex local models) 62 Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 63. 63 Observations Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 64. 64 Surrogate-based Model Observations Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 65. 65 Surrogate-based Model Observations k Nearest Neighbors Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33
  • 66. 66 Surrogate-based Model Observations k Nearest Neighbors Model Johnston et al., “HYPPO: A Hybrid, Piecewise Polynomial Modeling Technique for Non-Smooth Surfaces.” SBAC-PAD 2016: 26-33 Hybrid Piecewise Polynomial Model (HYPPO)
  • 67. 3 2 1 0 Soil moisture predictions Model degrees Case Study: Fine-grained Modeling of Mid-Atlantic Region 67 D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, M. Taufer. SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data driven decision. arXiv:1904.07754, 2019
  • 68. Challenges and Opportunities (I) • Two trends in HPC are impacting scientific applications: § Convergence of simulations and data analytics § Emergence of edge computing • Applications in precision medicine and precision farming are leveraging these trends • There is the need to further integrate these trends in HPC § New challenges and opportunities for the HPC community 68
  • 69. Challenges and Opportunities (II) • Efficiency: Optimize performance and power usage associated to data generation, movement, and analytics • Non-invasive: Capture knowledge from data without rewriting simulations’ legacy codes or simulations’ scripts • Generality: Build workflows that support different types of analytics across different applications and different data • Portability: Execute combined compute and analytics across different systems, including the edge, and with heterogenous resources • Scalability: Design methods for knowledge discovery at scale (e.g., scalable ML algorithms) for “compute + analytics + data” workflows 69