SlideShare a Scribd company logo
1 of 30
Download to read offline
technology consulting
Arno Kolster, Principal & Co-Founder
(Un)Obscured By Clouds
Applying Cloud Techniques To Address Complexity
In High Performance System Integrations
What Prompted This Talk?
3
Integrations between HPC and Enterprise have been coming for quite some time.
HPC in the cloud is now a reality - easier and cheaper to use than ever before.
But cloud still a huge learning curve and paradigm shift for traditional HPC.
FOMO on latest technical trends, technology itself and desire to increase ROI.
Majority of our work has been leveraging our hyperscale/cloud experience in
HPC environments - time to talk about ’eating our own dog food’.
As it is for enterprise embracing HPC tech – i.e. GPUs for ML/DL/AI.
Use Case #1
Leadership Class Supercomputer Operations Integration
“As we build faster and faster supercomputers, their energy efficiency becomes more
and more important,” said Jim Rogers, computing and facilities director for ORNL’s
National Center for Computational Sciences. “We wanted to couple Summit’s
mechanical cooling system with its computational workload to optimize efficiency,
which can translate to significant cost savings for a system of this size.”
Drivers For Summit Cooling Intelligence
6
Summit is the largest water cooled supercomputer in the world.
Minor fluctuations in temperature across 4,608 nodes can have a dramatic
positive or negative impact on Summit’s operational efficiency.
No mechanism exists to monitor the temperature in real time so that systems
operators can make adjustments as appropriate.
Can take upwards of 3 mins for change in water temp to reach furthest node
in the grid.
Systems-level intelligence can inform decisions and impact costs, efficiency
and operational excellence.
SCI Design Goals
7
Architect, develop and implement a scalable platform to:
Ingest line-rate telemetry from Summit & supporting systems in real time.
Add data sets easily to enhance analytics and insight.
Employ cloud-style ‘reliable messaging’ as a common interface to provide
consistency and coherency for ‘in-flight’ data.
Scale data at orders-of-magnitude without proportional staff increases.
Obey the axiom: No single failure results in a single point-of-failure.
Currently 4 Source-Data Metrics
8
TOTAL ~460,000 metrics per second available for real-time analytics
1. IBM OpenBMC framework (99 metrics * 4608 nodes/second = 456,192/sec)
2. IBM LSF jobs data for running applications (~10sec)
3. NOAA weather/wet bulb for Oak Ridge area (continuous)
4. Programmable Logic Circuit (PLC) water flow (continuous)
Instrumentation and monitoring data for developed applications (ad hoc)
Summarized and collated metrics (continuous)
Instrumented Ascent (dev cluster) at the same time using same platform
SCI Architecture Created
9
Event Message Bus
Prometheus
Operational
Etcd/Consensus
Transient Config
ElasticSearch, Log
Unstructured Data
Structured Data
Relational DBs
Streaming
Analytics
Neural Networks
TensorFlow, Caffe, etc
openBMC NOAA WeatherLSF job data PLC
Other
Data
Machine
Learning
Data Sources
Persistence Learning
PLC
Recommendations
User Dashboarding
Monitoring
Reporting
Research/Search
Technologies Used & Integrated
10
High-availability microservices designed for scalable, small footprint
Kafka message bus for ubiquitous message delivery
Docker containers for ease of deployment and elastic scalability
Openshift/Kubernetes for container orchestration and instrumentation
Spark Streaming for on-the-wire data analytics
Prometheus for service discovery metrics collection and time-series db
Jupyter Notebooks for models and data science exploration (Python/Spark)
Grafana, Seaborn for visualization
Breakthroughs
11
Created a new paradigm for analyzing data beyond data-at-rest or data at the sensor.
Data-in-flight is agnostic and can be captured prior to being translated into
specific software formats, some of which can be proprietary.
Overlapping data sets from disassociated sources can now be displayed on an
identical scale (time), which is something new.
What Can Be Done Now
12
• Core temps for CPU, GPU, memory DIMMs, HBMs
• Temps by CPU core, GPU core
• Power to the node, fans and to other individual components
Capture 99 node metrics per second for streaming analytics; for example,
Visualize per-second node temperature+power and temperature+jobs (currently).
Optimize operations by combining metrics from Summit, control systems & weather.
Add more real-time data streams from varying sources to augment analytics.
Analyze behavior from both a systems level and node level.
Impact
13
• Efficiencies in Summit ecosystem, job throughput and individual jobs
• Micro-managed cooling leads to running at higher temps with less risk
• Greater efficiency leads to Summit optimization, health and lower costs
Real-time analytics helps operators detect anomalies quickly and in a manner that is
more responsive than querying a data warehouse. This approach is especially valuable
with such a large investment in a critical asset.
This platform captures all the data needed to conduct predictive analytics, thus
ensuring new levels of insight, alerting and monitoring.
Real-time data can ultimately help micro-manage such a large system in ways that
are far more efficient and cost effective than human-only intervention.
Real-Time Visualization
Summit Power/Jobs Every Second
Summit Temperature/Jobs Every Second
This Was Not Without Its Challenges…
17
Long ramp up time due to contract, legal & solidifying SOW requirements.
Resource constraints on both sides due to scheduling and unforeseen delays.
Had to work within the confines of the Summit security model – no direct access -
containers saved the day.
Limited computer resources due to h/w budget. Built everything shown on 3 nodes.
But The Effort Was Worth It.
18
Overlay additional disparate data sources in a production environment that is both
scalable and elastic.
Provide predictive analytics for water temperature control based on scheduled jobs in
the queue that can provide input to the PLCs.
Develop predictive algorithms that intelligently affect power/temp across multiple
dimensions: CPU, GPU, memory, power.
Determine how this platoform can be utilized in Exascale systems.
Provide interoperability between expert systems and components to create self-healing,
scaling and operational control.
Research how specific codes/algorithms affect job performance.
Use Case #2
Aerospace Defense Contractor
Problem Statement
21
Client wanted understanding on cloud technologies and how to move
certain workloads into the cloud
They had minimal cloud experience
Wanted on-site, hands-on training
Comparative analysis between major cloud providers
Alleviate concerns around security, data transfer, multi-tenancy, account
management
Solution Provided
22
Designed a custom two day ‘Cloud 101’ course tailored to the project the client
desired to migrate into the cloud
Engagement model included weekly consultation for ongoing feedback,
technical direction and open dialog (“Are we doing the right thing?”)
Showcased best practices that can be used on all cloud platforms
Presented how to think about cloud services as simplified tasks vs single
monolithic HPC jobs
CLIENT PROPOSED ARCHITECTURE
Client Solution Is…
24
Overly complex, both from a workflow and operational perspective.
Cannot meet availability requirements as structured.
Extremely difficult to scale, deploy and integrate new components.
Using mix of legacy, client/server Linux, HPC and standalone Windows.
On-prem / hybrid / cloud delineation is blurred.
25
PW CLOUD BASED ARCHITECTURE
26
Cloud Based Solutions Applied
27
Now have availability zones across multiple regions for potential worldwide site
deployments
Simple containerized deployment can be pushed to any cloud provider
Message bus allows faster and easier future integrations from disparate sites
just by pushing container
Applied a cloud-centric model leveraging modern architectures including
containerized microservices
On-prem / cloud integration delineated, but seamless
In Summary
28
HPC Integrations CAN Be Simplified
29
Leverage cloud based architectures that maximize efficiencies of scale.
Middleware is important - Use an ubiquitous transport system to deliver data,
payloads, logs and control messages. Two use cases, two completely different
reasons for bringing messages to bear.
Break large tasks into smaller ones to decrease complexity and introduce
parallelism (where it makes sense).
Microservices can then be scaled independently as the needs arise.
Thank you.
arno.kolster@providentiaworldwide.com
@providentia_ww
30
www.providentiaworldwide.com

More Related Content

What's hot

High performance computing
High performance computingHigh performance computing
High performance computingGuy Tel-Zur
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDivyen Patel
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computingijujournal
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computingijujournal
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Ankit Gupta
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
High performance computing
High performance computingHigh performance computing
High performance computingMaher Alshammari
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingRamandeep Kaur
 
Mod05lec23(map reduce tutorial)
Mod05lec23(map reduce tutorial)Mod05lec23(map reduce tutorial)
Mod05lec23(map reduce tutorial)Ankit Gupta
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning SimulatorCloudLightning
 
Job sequence scheduling for cloud computing
Job sequence scheduling for cloud computingJob sequence scheduling for cloud computing
Job sequence scheduling for cloud computingSamruddhi Gaikwad
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmenteSAT Publishing House
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?Ian Lumb
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

What's hot (20)

High performance computing
High performance computingHigh performance computing
High performance computing
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
A Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud ComputingA Review on Scheduling in Cloud Computing
A Review on Scheduling in Cloud Computing
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 
cloud schedualing
cloud schedualingcloud schedualing
cloud schedualing
 
Mod05lec23(map reduce tutorial)
Mod05lec23(map reduce tutorial)Mod05lec23(map reduce tutorial)
Mod05lec23(map reduce tutorial)
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Scheduling in cloud
Scheduling in cloudScheduling in cloud
Scheduling in cloud
 
CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning Simulator
 
Job sequence scheduling for cloud computing
Job sequence scheduling for cloud computingJob sequence scheduling for cloud computing
Job sequence scheduling for cloud computing
 
A survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environmentA survey of various scheduling algorithm in cloud computing environment
A survey of various scheduling algorithm in cloud computing environment
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Similar to Applying Cloud Techniques to Address Complexity in HPC System Integrations

High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingNous Infosystems
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdfLevLafayette1
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformWorkload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformActiveeon
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentIRJET Journal
 
Wp a-break-in-the-clouds
Wp a-break-in-the-cloudsWp a-break-in-the-clouds
Wp a-break-in-the-cloudsMohsen Tayefeh
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSBUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSijccsa
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...Angela Williams
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland mictc
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM Events
 
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridadMarketing Donalba
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...IEEEGLOBALSOFTTECHNOLOGIES
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagedbpublications
 
CloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfCloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfkhan593595
 

Similar to Applying Cloud Techniques to Address Complexity in HPC System Integrations (20)

High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformWorkload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning Platform
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud Environment
 
Wp a-break-in-the-clouds
Wp a-break-in-the-cloudsWp a-break-in-the-clouds
Wp a-break-in-the-clouds
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONSBUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
BUILDING A PRIVATE HPC CLOUD FOR COMPUTE AND DATA-INTENSIVE APPLICATIONS
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3
 
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & WieckIBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
IBM InterConnect 2013 Expert Integrated Systems Keynote: Sotiropoulos & Wieck
 
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
Procesamiento multinúcleo óptimo para aplicaciones críticas de seguridad
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 
CloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdfCloudComputing_UNIT5.pdf
CloudComputing_UNIT5.pdf
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Applying Cloud Techniques to Address Complexity in HPC System Integrations

  • 2. Arno Kolster, Principal & Co-Founder (Un)Obscured By Clouds Applying Cloud Techniques To Address Complexity In High Performance System Integrations
  • 3. What Prompted This Talk? 3 Integrations between HPC and Enterprise have been coming for quite some time. HPC in the cloud is now a reality - easier and cheaper to use than ever before. But cloud still a huge learning curve and paradigm shift for traditional HPC. FOMO on latest technical trends, technology itself and desire to increase ROI. Majority of our work has been leveraging our hyperscale/cloud experience in HPC environments - time to talk about ’eating our own dog food’. As it is for enterprise embracing HPC tech – i.e. GPUs for ML/DL/AI.
  • 4. Use Case #1 Leadership Class Supercomputer Operations Integration
  • 5. “As we build faster and faster supercomputers, their energy efficiency becomes more and more important,” said Jim Rogers, computing and facilities director for ORNL’s National Center for Computational Sciences. “We wanted to couple Summit’s mechanical cooling system with its computational workload to optimize efficiency, which can translate to significant cost savings for a system of this size.”
  • 6. Drivers For Summit Cooling Intelligence 6 Summit is the largest water cooled supercomputer in the world. Minor fluctuations in temperature across 4,608 nodes can have a dramatic positive or negative impact on Summit’s operational efficiency. No mechanism exists to monitor the temperature in real time so that systems operators can make adjustments as appropriate. Can take upwards of 3 mins for change in water temp to reach furthest node in the grid. Systems-level intelligence can inform decisions and impact costs, efficiency and operational excellence.
  • 7. SCI Design Goals 7 Architect, develop and implement a scalable platform to: Ingest line-rate telemetry from Summit & supporting systems in real time. Add data sets easily to enhance analytics and insight. Employ cloud-style ‘reliable messaging’ as a common interface to provide consistency and coherency for ‘in-flight’ data. Scale data at orders-of-magnitude without proportional staff increases. Obey the axiom: No single failure results in a single point-of-failure.
  • 8. Currently 4 Source-Data Metrics 8 TOTAL ~460,000 metrics per second available for real-time analytics 1. IBM OpenBMC framework (99 metrics * 4608 nodes/second = 456,192/sec) 2. IBM LSF jobs data for running applications (~10sec) 3. NOAA weather/wet bulb for Oak Ridge area (continuous) 4. Programmable Logic Circuit (PLC) water flow (continuous) Instrumentation and monitoring data for developed applications (ad hoc) Summarized and collated metrics (continuous) Instrumented Ascent (dev cluster) at the same time using same platform
  • 9. SCI Architecture Created 9 Event Message Bus Prometheus Operational Etcd/Consensus Transient Config ElasticSearch, Log Unstructured Data Structured Data Relational DBs Streaming Analytics Neural Networks TensorFlow, Caffe, etc openBMC NOAA WeatherLSF job data PLC Other Data Machine Learning Data Sources Persistence Learning PLC Recommendations User Dashboarding Monitoring Reporting Research/Search
  • 10. Technologies Used & Integrated 10 High-availability microservices designed for scalable, small footprint Kafka message bus for ubiquitous message delivery Docker containers for ease of deployment and elastic scalability Openshift/Kubernetes for container orchestration and instrumentation Spark Streaming for on-the-wire data analytics Prometheus for service discovery metrics collection and time-series db Jupyter Notebooks for models and data science exploration (Python/Spark) Grafana, Seaborn for visualization
  • 11. Breakthroughs 11 Created a new paradigm for analyzing data beyond data-at-rest or data at the sensor. Data-in-flight is agnostic and can be captured prior to being translated into specific software formats, some of which can be proprietary. Overlapping data sets from disassociated sources can now be displayed on an identical scale (time), which is something new.
  • 12. What Can Be Done Now 12 • Core temps for CPU, GPU, memory DIMMs, HBMs • Temps by CPU core, GPU core • Power to the node, fans and to other individual components Capture 99 node metrics per second for streaming analytics; for example, Visualize per-second node temperature+power and temperature+jobs (currently). Optimize operations by combining metrics from Summit, control systems & weather. Add more real-time data streams from varying sources to augment analytics. Analyze behavior from both a systems level and node level.
  • 13. Impact 13 • Efficiencies in Summit ecosystem, job throughput and individual jobs • Micro-managed cooling leads to running at higher temps with less risk • Greater efficiency leads to Summit optimization, health and lower costs Real-time analytics helps operators detect anomalies quickly and in a manner that is more responsive than querying a data warehouse. This approach is especially valuable with such a large investment in a critical asset. This platform captures all the data needed to conduct predictive analytics, thus ensuring new levels of insight, alerting and monitoring. Real-time data can ultimately help micro-manage such a large system in ways that are far more efficient and cost effective than human-only intervention.
  • 17. This Was Not Without Its Challenges… 17 Long ramp up time due to contract, legal & solidifying SOW requirements. Resource constraints on both sides due to scheduling and unforeseen delays. Had to work within the confines of the Summit security model – no direct access - containers saved the day. Limited computer resources due to h/w budget. Built everything shown on 3 nodes.
  • 18. But The Effort Was Worth It. 18 Overlay additional disparate data sources in a production environment that is both scalable and elastic. Provide predictive analytics for water temperature control based on scheduled jobs in the queue that can provide input to the PLCs. Develop predictive algorithms that intelligently affect power/temp across multiple dimensions: CPU, GPU, memory, power. Determine how this platoform can be utilized in Exascale systems. Provide interoperability between expert systems and components to create self-healing, scaling and operational control. Research how specific codes/algorithms affect job performance.
  • 19.
  • 20. Use Case #2 Aerospace Defense Contractor
  • 21. Problem Statement 21 Client wanted understanding on cloud technologies and how to move certain workloads into the cloud They had minimal cloud experience Wanted on-site, hands-on training Comparative analysis between major cloud providers Alleviate concerns around security, data transfer, multi-tenancy, account management
  • 22. Solution Provided 22 Designed a custom two day ‘Cloud 101’ course tailored to the project the client desired to migrate into the cloud Engagement model included weekly consultation for ongoing feedback, technical direction and open dialog (“Are we doing the right thing?”) Showcased best practices that can be used on all cloud platforms Presented how to think about cloud services as simplified tasks vs single monolithic HPC jobs
  • 24. Client Solution Is… 24 Overly complex, both from a workflow and operational perspective. Cannot meet availability requirements as structured. Extremely difficult to scale, deploy and integrate new components. Using mix of legacy, client/server Linux, HPC and standalone Windows. On-prem / hybrid / cloud delineation is blurred.
  • 25. 25 PW CLOUD BASED ARCHITECTURE
  • 26. 26
  • 27. Cloud Based Solutions Applied 27 Now have availability zones across multiple regions for potential worldwide site deployments Simple containerized deployment can be pushed to any cloud provider Message bus allows faster and easier future integrations from disparate sites just by pushing container Applied a cloud-centric model leveraging modern architectures including containerized microservices On-prem / cloud integration delineated, but seamless
  • 29. HPC Integrations CAN Be Simplified 29 Leverage cloud based architectures that maximize efficiencies of scale. Middleware is important - Use an ubiquitous transport system to deliver data, payloads, logs and control messages. Two use cases, two completely different reasons for bringing messages to bear. Break large tasks into smaller ones to decrease complexity and introduce parallelism (where it makes sense). Microservices can then be scaled independently as the needs arise.