SlideShare a Scribd company logo
1 of 19
MODERNIZING THE HPC SYSTEM
SOWARE STACK
2 0 2 1 . 0 5 . 2 8 報 告 人 陳 佑 昇
ALLEN, BENJAMIN S.; EZELL, MATTHEW A.;
PELTZ, PAUL; JACOBSEN, DOUG;
ROMAN, ERIC; LUENINGHOENER, CORY;
LOWELL WOFFORD, J.
Keyword: high performance computing, distributed computing, operating systems
T H I S PA P E R W A S P U B L I S H E D I N S C 2 0
CONTENT
• I n t ro d u c t i o n
• C o m p u t e a n d s e r v i c e n o d e s
w i t h i n a n H P C S y s t e m
• T h e l o g i c a l c o m p o n e n t s fo r
f u t u re H P C sy s t e m s
m a n a g e m e n t
- c o n f i g u ra t i o n m a n a g e m e n t
- s t a t e m a n a g e m e n t
- o rc h e s t ra t i o n
- p ro v i s i o n i n g
• C o n c l u s i o n s
3 /18
VOCABULARIES
4
English Chinese
Exascale
HPC (high performance computing)
百億億級高效能運
算 (10^18浮點)
magnitudes 量級
stagnate 停滯
monolithic 龐大的
orchestration 調度(一種電腦技術)
remediation 糾正
Stateless service 無狀態服務
transparent 顯然的
intervention 干預
hierarchical 分層的
credential 憑據
/18
5
INTRODUCTION
5
Mid-1990s
• US DOE had a largest
HPC systems.
By around
2010s
• HPC Were eclipsed
by the scale of web-
scale and cloud
computing tech.
This Photo by Unknown Author is licensed under CC BY-SA
The paper contend that a modern system
software stack that focuses on
manageability, scalability, security, and
modern methods and make
recommendations for HPC community.
6
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Minimal OS
Cluster
Services
Jobs
• Provide for
stateless service
• Mount from
network
• Hierarchically
manage
• Easy to copy
• Containerization
environment
(solve conflict)
/18
7
• C u r r e n t St a t e
Almost mini OS distributions are generally
targeted toward microservices environments.
• M a n a g e a b i l i t y & S e r v i c e a b i l i t y
A. Reduced code base
-Only include the kernel and base services
B. Reduced image configuration
-Simplified node image configurations
-Lower node boot time when moving conf
7
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Minimal OS
This Photo by Unknown Author is licensed under CC BY-NC-ND
8
This Photo by Unknown Author is licensed under CC BY-NC-ND
• S c a l a b i l i t y & Re s i l i e n c y
-Easier to logically separate a node’s
-Have a Layer for a center to include sandboxing
-Automatic remediation tools
• I m p l e m e n t a t i o n
-Kernel, Kernel Modules, and Hardware Support
-Initial ramdisk
-Read-only root filesystem image
-Boot-time OS configuation
8
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Minimal OS
9
• C u r r e n t St a t e
-Multiple copies need to run on the same time
-Request monitoring and system managers are
not easy to work
• M a n a g e a b i l i t y & S e r v i c e a b i l i t y
-Containerization/Virtualization
-Minimal OS
-Service profiling
-Visibility into operations
9
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Cluster Services
10
• S c a l a b i l i t y & Re s i l i e n c y
Resiliency
Components that can be quickly started,
restarted, and replaced without affecting a
running system
Failure Modes
A failure in a service node should not result in
failures of client node
Cluster independence
Used for more than one logical cluster at a time
-Transparent load balancing
-Automatic scalability
10
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Cluster Services
11
• C u r r e n t St a t e
-Containers develop slowly in HPC community
-There are full-service and lightweight containers
• U s a b i l i t y
1. Standardize on a single container image format
that can work on any system
2.Provide transparent containerization to one
11
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Jobs
12
• M a n a g e a b i l i t y & S e r v i c e a b i l i t y
-User Environment Upgrades
-User Environment Flexibility
-Operating System Separation
• I m p l e m e n t a t i o n
-To provide a very basic level of support, this
requires the ability to start jobs on compute nodes
with a minimal set of Linux namespaces in use
HPC systems often have dependencies that cross
that boundary
12
COMPUTE AND SERVICE NODES
WITHIN AN HPC SYSTEM
Jobs
13
THE LOGICAL COMPONENTS
FOR FUTURE HPC SYSTEMS
MANAGEMENT
Configuration
management
State
management
Orchestration Provisioning
/18
14
THE LOGICAL COMPONENTS
FOR FUTURE HPC SYSTEMS
MANAGEMENT
-Configuration management
• Manageability & Serviceability (實施策略自動化方法,
統一API介面、提供不同環境設定)
• Scalability & Resiliency (實施非同步操作)
• Modern methods (實施版本控制)
• Security (實施防火牆控制、金鑰管理)
• Current State (目前技術發展成熟,唯獨安全管理尚
需加強)
/18
This Photo by Unknown Author is licensed under CC BY-SA
15
THE LOGICAL COMPONENTS
FOR FUTURE HPC SYSTEMS
MANAGEMENT
-State management
• Manageability & Serviceability (提供狀態管理的
可信賴方法)
• Scalability & Resiliency (管理狀態可以一致事件處理)
• Implementation (實施狀態管理伺服器)
• Current State (本方面在狀態管理中很常被HPC忽略)
/18
This Photo by Unknown Author is licensed under CC BY-SA
16
THE LOGICAL COMPONENTS
FOR FUTURE HPC SYSTEMS
MANAGEMENT
-Orchestration
• Manageability & Serviceability (跨系統實施更新、
系統控制與復原)
• Scalability & Resiliency (編排適當的任務邏輯)
• Modern methods (提供API介面存取)
• Implementation (操控、實施完全自動化)
• Current State (自動化系統仍不常見於HPC)
/18
This Photo by Unknown Author is licensed under CC BY-SA
17
THE LOGICAL COMPONENTS
FOR FUTURE HPC SYSTEMS
MANAGEMENT
-Provisioning
• Manageability & Serviceability (實施簡單自動
化配置)
• Scalability & Resiliency (快速啟動的需求、發
現節點)
• Security (產生不可變更的唯讀檔)
• Implementation (節點發現、產生映像與傳輸)
• Current State (現有工具主要針對企業化部屬)
/18
18
CONCLUSIONS
This Photo by Unknown Author is licensed under CC BY-SA
• A variety of practices that can be beneficial to
adapt to make HPC systems more
manageable, serviceable, scalable, resilient,
and secure.
• Many can translate to this model with minimal
effort due to their horizontal scaling features.
• A lot of potential in moving toward
containerized workflows in both of these
areas.
/18
THANK YOU

More Related Content

What's hot

NetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineersNetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineerssubtitle
 
Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]Bruce Tolley
 
ONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
ONIE: Open Network Install Environment @ OSDC 2014 Netways, BerlinONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
ONIE: Open Network Install Environment @ OSDC 2014 Netways, BerlinNat Morris
 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & EcosystemKingston Smiler
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Sdn presentation
Sdn presentation Sdn presentation
Sdn presentation Frikha Nour
 
LF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDKLF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDKLF_OpenvSwitch
 

What's hot (8)

NetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineersNetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineers
 
Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]Thu 430pm solarflare_tolley_v1[1]
Thu 430pm solarflare_tolley_v1[1]
 
ONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
ONIE: Open Network Install Environment @ OSDC 2014 Netways, BerlinONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
ONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & Ecosystem
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
Sdn presentation
Sdn presentation Sdn presentation
Sdn presentation
 
LF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDKLF_OVS_17_IPSEC and OVS DPDK
LF_OVS_17_IPSEC and OVS DPDK
 
HP 3PAR SSMC 2.1
HP 3PAR SSMC 2.1HP 3PAR SSMC 2.1
HP 3PAR SSMC 2.1
 

Similar to Paper sharing_Modernizing the HPC system

XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsXPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsThe Linux Foundation
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stackinside-BigData.com
 
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfDatabase & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfInSync2011
 
A Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDNA Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDNJeremy Schulman
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute ClusterRamsay Key
 
Comparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RACComparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RACThomas Burg
 
Network Architecture for Containers
Network Architecture for ContainersNetwork Architecture for Containers
Network Architecture for ContainersCumulus Networks
 
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Yandex
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
 
Ricon 2015 final
Ricon 2015 finalRicon 2015 final
Ricon 2015 finalKevin Jones
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsRakuten Group, Inc.
 
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...Dmitry Afanasiev
 
2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...
2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...
2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...Shawn Wells
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...VirtualTech Japan Inc.
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) 동현 김
 

Similar to Paper sharing_Modernizing the HPC system (20)

XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsXPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfDatabase & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
 
A Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDNA Networking View for the DevOps Crew: SDN
A Networking View for the DevOps Crew: SDN
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
Comparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RACComparing the TCO of HP NonStop with Oracle RAC
Comparing the TCO of HP NonStop with Oracle RAC
 
Network Architecture for Containers
Network Architecture for ContainersNetwork Architecture for Containers
Network Architecture for Containers
 
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
Linux basics
Linux basicsLinux basics
Linux basics
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Ricon 2015 final
Ricon 2015 finalRicon 2015 final
Ricon 2015 final
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Evolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deploymentsEvolution of unix environments and the road to faster deployments
Evolution of unix environments and the road to faster deployments
 
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...
2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...
2009-05-05 A Customer's Perspective on Making Enterprise Linux Deployable, Sc...
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...OpenStack Infrastructure at any Scale - Simple is BEST!? -  - OpenStack最新情報セミ...
OpenStack Infrastructure at any Scale - Simple is BEST!? - - OpenStack最新情報セミ...
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
 

More from YOU SHENG CHEN

R語言期末專題-108年至110年山域意外事故救援案件
R語言期末專題-108年至110年山域意外事故救援案件R語言期末專題-108年至110年山域意外事故救援案件
R語言期末專題-108年至110年山域意外事故救援案件YOU SHENG CHEN
 
Paper sharing_Digital transformation of maritime logistics- Exploring trends ...
Paper sharing_Digital transformation of maritime logistics- Exploring trends ...Paper sharing_Digital transformation of maritime logistics- Exploring trends ...
Paper sharing_Digital transformation of maritime logistics- Exploring trends ...YOU SHENG CHEN
 
Paper sharing_Envisioning entrepreneurship and digital innovation through a d...
Paper sharing_Envisioning entrepreneurship and digital innovation through a d...Paper sharing_Envisioning entrepreneurship and digital innovation through a d...
Paper sharing_Envisioning entrepreneurship and digital innovation through a d...YOU SHENG CHEN
 
Paper sharing_Digital assemblages information infrastructures and mobile know...
Paper sharing_Digital assemblages information infrastructures and mobile know...Paper sharing_Digital assemblages information infrastructures and mobile know...
Paper sharing_Digital assemblages information infrastructures and mobile know...YOU SHENG CHEN
 
Paper sharing_Patient health locus of control the design of information syste...
Paper sharing_Patient health locus of control the design of information syste...Paper sharing_Patient health locus of control the design of information syste...
Paper sharing_Patient health locus of control the design of information syste...YOU SHENG CHEN
 
Paper sharing_An integrated framework of change management for social CRM imp...
Paper sharing_An integrated framework of change management for social CRM imp...Paper sharing_An integrated framework of change management for social CRM imp...
Paper sharing_An integrated framework of change management for social CRM imp...YOU SHENG CHEN
 
Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...
Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...
Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...YOU SHENG CHEN
 
LeetCode477_Total Hamming Distance.pptx
LeetCode477_Total Hamming Distance.pptxLeetCode477_Total Hamming Distance.pptx
LeetCode477_Total Hamming Distance.pptxYOU SHENG CHEN
 
Paper sharing_An assisted approach to business process redesign
Paper sharing_An assisted approach to business process redesignPaper sharing_An assisted approach to business process redesign
Paper sharing_An assisted approach to business process redesignYOU SHENG CHEN
 
Paper sharing_How Information Technology Governance Influences Organizational...
Paper sharing_How Information Technology Governance Influences Organizational...Paper sharing_How Information Technology Governance Influences Organizational...
Paper sharing_How Information Technology Governance Influences Organizational...YOU SHENG CHEN
 
Paper sharing_The interplay of digital transformation and employee competency
Paper sharing_The interplay of digital transformation and employee competencyPaper sharing_The interplay of digital transformation and employee competency
Paper sharing_The interplay of digital transformation and employee competencyYOU SHENG CHEN
 
Paper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturingPaper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturingYOU SHENG CHEN
 
Paper sharing_Digital servitization of symbiotic service composition in produ...
Paper sharing_Digital servitization of symbiotic service composition in produ...Paper sharing_Digital servitization of symbiotic service composition in produ...
Paper sharing_Digital servitization of symbiotic service composition in produ...YOU SHENG CHEN
 
Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...YOU SHENG CHEN
 
Paper sharing_Legacy information system replacement_Pursuing quality design o...
Paper sharing_Legacy information system replacement_Pursuing quality design o...Paper sharing_Legacy information system replacement_Pursuing quality design o...
Paper sharing_Legacy information system replacement_Pursuing quality design o...YOU SHENG CHEN
 
Microservice 微服務
Microservice 微服務Microservice 微服務
Microservice 微服務YOU SHENG CHEN
 
Paper sharing_Standardizing information security _ a structurational analysis
Paper sharing_Standardizing information security _ a structurational analysisPaper sharing_Standardizing information security _ a structurational analysis
Paper sharing_Standardizing information security _ a structurational analysisYOU SHENG CHEN
 
Paper sharing_data-driven smart manufacturing (include smart manufacturing se...
Paper sharing_data-driven smart manufacturing (include smart manufacturing se...Paper sharing_data-driven smart manufacturing (include smart manufacturing se...
Paper sharing_data-driven smart manufacturing (include smart manufacturing se...YOU SHENG CHEN
 
Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...
Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...
Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...YOU SHENG CHEN
 
Paper sharing_Tapping into the wearable device revolution in the work environ...
Paper sharing_Tapping into the wearable device revolution in the work environ...Paper sharing_Tapping into the wearable device revolution in the work environ...
Paper sharing_Tapping into the wearable device revolution in the work environ...YOU SHENG CHEN
 

More from YOU SHENG CHEN (20)

R語言期末專題-108年至110年山域意外事故救援案件
R語言期末專題-108年至110年山域意外事故救援案件R語言期末專題-108年至110年山域意外事故救援案件
R語言期末專題-108年至110年山域意外事故救援案件
 
Paper sharing_Digital transformation of maritime logistics- Exploring trends ...
Paper sharing_Digital transformation of maritime logistics- Exploring trends ...Paper sharing_Digital transformation of maritime logistics- Exploring trends ...
Paper sharing_Digital transformation of maritime logistics- Exploring trends ...
 
Paper sharing_Envisioning entrepreneurship and digital innovation through a d...
Paper sharing_Envisioning entrepreneurship and digital innovation through a d...Paper sharing_Envisioning entrepreneurship and digital innovation through a d...
Paper sharing_Envisioning entrepreneurship and digital innovation through a d...
 
Paper sharing_Digital assemblages information infrastructures and mobile know...
Paper sharing_Digital assemblages information infrastructures and mobile know...Paper sharing_Digital assemblages information infrastructures and mobile know...
Paper sharing_Digital assemblages information infrastructures and mobile know...
 
Paper sharing_Patient health locus of control the design of information syste...
Paper sharing_Patient health locus of control the design of information syste...Paper sharing_Patient health locus of control the design of information syste...
Paper sharing_Patient health locus of control the design of information syste...
 
Paper sharing_An integrated framework of change management for social CRM imp...
Paper sharing_An integrated framework of change management for social CRM imp...Paper sharing_An integrated framework of change management for social CRM imp...
Paper sharing_An integrated framework of change management for social CRM imp...
 
Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...
Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...
Paper sharing_Explaining Data-Driven Decisions made by AI Systems_The Counter...
 
LeetCode477_Total Hamming Distance.pptx
LeetCode477_Total Hamming Distance.pptxLeetCode477_Total Hamming Distance.pptx
LeetCode477_Total Hamming Distance.pptx
 
Paper sharing_An assisted approach to business process redesign
Paper sharing_An assisted approach to business process redesignPaper sharing_An assisted approach to business process redesign
Paper sharing_An assisted approach to business process redesign
 
Paper sharing_How Information Technology Governance Influences Organizational...
Paper sharing_How Information Technology Governance Influences Organizational...Paper sharing_How Information Technology Governance Influences Organizational...
Paper sharing_How Information Technology Governance Influences Organizational...
 
Paper sharing_The interplay of digital transformation and employee competency
Paper sharing_The interplay of digital transformation and employee competencyPaper sharing_The interplay of digital transformation and employee competency
Paper sharing_The interplay of digital transformation and employee competency
 
Paper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturingPaper sharing_A digital twin hierarchy for metal additive manufacturing
Paper sharing_A digital twin hierarchy for metal additive manufacturing
 
Paper sharing_Digital servitization of symbiotic service composition in produ...
Paper sharing_Digital servitization of symbiotic service composition in produ...Paper sharing_Digital servitization of symbiotic service composition in produ...
Paper sharing_Digital servitization of symbiotic service composition in produ...
 
Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...Paper sharing_The architectural design and implementation of a digital platfo...
Paper sharing_The architectural design and implementation of a digital platfo...
 
Paper sharing_Legacy information system replacement_Pursuing quality design o...
Paper sharing_Legacy information system replacement_Pursuing quality design o...Paper sharing_Legacy information system replacement_Pursuing quality design o...
Paper sharing_Legacy information system replacement_Pursuing quality design o...
 
Microservice 微服務
Microservice 微服務Microservice 微服務
Microservice 微服務
 
Paper sharing_Standardizing information security _ a structurational analysis
Paper sharing_Standardizing information security _ a structurational analysisPaper sharing_Standardizing information security _ a structurational analysis
Paper sharing_Standardizing information security _ a structurational analysis
 
Paper sharing_data-driven smart manufacturing (include smart manufacturing se...
Paper sharing_data-driven smart manufacturing (include smart manufacturing se...Paper sharing_data-driven smart manufacturing (include smart manufacturing se...
Paper sharing_data-driven smart manufacturing (include smart manufacturing se...
 
Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...
Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...
Paper sharing_Swarm intelligence goal oriented approach to data-driven innova...
 
Paper sharing_Tapping into the wearable device revolution in the work environ...
Paper sharing_Tapping into the wearable device revolution in the work environ...Paper sharing_Tapping into the wearable device revolution in the work environ...
Paper sharing_Tapping into the wearable device revolution in the work environ...
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Paper sharing_Modernizing the HPC system

  • 1. MODERNIZING THE HPC SYSTEM SOWARE STACK 2 0 2 1 . 0 5 . 2 8 報 告 人 陳 佑 昇 ALLEN, BENJAMIN S.; EZELL, MATTHEW A.; PELTZ, PAUL; JACOBSEN, DOUG; ROMAN, ERIC; LUENINGHOENER, CORY; LOWELL WOFFORD, J.
  • 2. Keyword: high performance computing, distributed computing, operating systems T H I S PA P E R W A S P U B L I S H E D I N S C 2 0
  • 3. CONTENT • I n t ro d u c t i o n • C o m p u t e a n d s e r v i c e n o d e s w i t h i n a n H P C S y s t e m • T h e l o g i c a l c o m p o n e n t s fo r f u t u re H P C sy s t e m s m a n a g e m e n t - c o n f i g u ra t i o n m a n a g e m e n t - s t a t e m a n a g e m e n t - o rc h e s t ra t i o n - p ro v i s i o n i n g • C o n c l u s i o n s 3 /18
  • 4. VOCABULARIES 4 English Chinese Exascale HPC (high performance computing) 百億億級高效能運 算 (10^18浮點) magnitudes 量級 stagnate 停滯 monolithic 龐大的 orchestration 調度(一種電腦技術) remediation 糾正 Stateless service 無狀態服務 transparent 顯然的 intervention 干預 hierarchical 分層的 credential 憑據 /18
  • 5. 5 INTRODUCTION 5 Mid-1990s • US DOE had a largest HPC systems. By around 2010s • HPC Were eclipsed by the scale of web- scale and cloud computing tech. This Photo by Unknown Author is licensed under CC BY-SA The paper contend that a modern system software stack that focuses on manageability, scalability, security, and modern methods and make recommendations for HPC community.
  • 6. 6 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Minimal OS Cluster Services Jobs • Provide for stateless service • Mount from network • Hierarchically manage • Easy to copy • Containerization environment (solve conflict) /18
  • 7. 7 • C u r r e n t St a t e Almost mini OS distributions are generally targeted toward microservices environments. • M a n a g e a b i l i t y & S e r v i c e a b i l i t y A. Reduced code base -Only include the kernel and base services B. Reduced image configuration -Simplified node image configurations -Lower node boot time when moving conf 7 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Minimal OS This Photo by Unknown Author is licensed under CC BY-NC-ND
  • 8. 8 This Photo by Unknown Author is licensed under CC BY-NC-ND • S c a l a b i l i t y & Re s i l i e n c y -Easier to logically separate a node’s -Have a Layer for a center to include sandboxing -Automatic remediation tools • I m p l e m e n t a t i o n -Kernel, Kernel Modules, and Hardware Support -Initial ramdisk -Read-only root filesystem image -Boot-time OS configuation 8 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Minimal OS
  • 9. 9 • C u r r e n t St a t e -Multiple copies need to run on the same time -Request monitoring and system managers are not easy to work • M a n a g e a b i l i t y & S e r v i c e a b i l i t y -Containerization/Virtualization -Minimal OS -Service profiling -Visibility into operations 9 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Cluster Services
  • 10. 10 • S c a l a b i l i t y & Re s i l i e n c y Resiliency Components that can be quickly started, restarted, and replaced without affecting a running system Failure Modes A failure in a service node should not result in failures of client node Cluster independence Used for more than one logical cluster at a time -Transparent load balancing -Automatic scalability 10 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Cluster Services
  • 11. 11 • C u r r e n t St a t e -Containers develop slowly in HPC community -There are full-service and lightweight containers • U s a b i l i t y 1. Standardize on a single container image format that can work on any system 2.Provide transparent containerization to one 11 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Jobs
  • 12. 12 • M a n a g e a b i l i t y & S e r v i c e a b i l i t y -User Environment Upgrades -User Environment Flexibility -Operating System Separation • I m p l e m e n t a t i o n -To provide a very basic level of support, this requires the ability to start jobs on compute nodes with a minimal set of Linux namespaces in use HPC systems often have dependencies that cross that boundary 12 COMPUTE AND SERVICE NODES WITHIN AN HPC SYSTEM Jobs
  • 13. 13 THE LOGICAL COMPONENTS FOR FUTURE HPC SYSTEMS MANAGEMENT Configuration management State management Orchestration Provisioning /18
  • 14. 14 THE LOGICAL COMPONENTS FOR FUTURE HPC SYSTEMS MANAGEMENT -Configuration management • Manageability & Serviceability (實施策略自動化方法, 統一API介面、提供不同環境設定) • Scalability & Resiliency (實施非同步操作) • Modern methods (實施版本控制) • Security (實施防火牆控制、金鑰管理) • Current State (目前技術發展成熟,唯獨安全管理尚 需加強) /18 This Photo by Unknown Author is licensed under CC BY-SA
  • 15. 15 THE LOGICAL COMPONENTS FOR FUTURE HPC SYSTEMS MANAGEMENT -State management • Manageability & Serviceability (提供狀態管理的 可信賴方法) • Scalability & Resiliency (管理狀態可以一致事件處理) • Implementation (實施狀態管理伺服器) • Current State (本方面在狀態管理中很常被HPC忽略) /18 This Photo by Unknown Author is licensed under CC BY-SA
  • 16. 16 THE LOGICAL COMPONENTS FOR FUTURE HPC SYSTEMS MANAGEMENT -Orchestration • Manageability & Serviceability (跨系統實施更新、 系統控制與復原) • Scalability & Resiliency (編排適當的任務邏輯) • Modern methods (提供API介面存取) • Implementation (操控、實施完全自動化) • Current State (自動化系統仍不常見於HPC) /18 This Photo by Unknown Author is licensed under CC BY-SA
  • 17. 17 THE LOGICAL COMPONENTS FOR FUTURE HPC SYSTEMS MANAGEMENT -Provisioning • Manageability & Serviceability (實施簡單自動 化配置) • Scalability & Resiliency (快速啟動的需求、發 現節點) • Security (產生不可變更的唯讀檔) • Implementation (節點發現、產生映像與傳輸) • Current State (現有工具主要針對企業化部屬) /18
  • 18. 18 CONCLUSIONS This Photo by Unknown Author is licensed under CC BY-SA • A variety of practices that can be beneficial to adapt to make HPC systems more manageable, serviceable, scalable, resilient, and secure. • Many can translate to this model with minimal effort due to their horizontal scaling features. • A lot of potential in moving toward containerized workflows in both of these areas. /18