SlideShare a Scribd company logo
1 of 16
Download to read offline
Japan’s post K Computer
Yutaka Ishikawa
Project Leader
RIKEN AICS
HPC User Forum, 7th September, 2016
Outline of Talk
 Introduction of FLAGSHIP2020 project
 An Overview of post K system
 Concluding Remarks
2016/09/07 2HP User Forum @ Austin
An Overview of Flagship 2020 project
 Developing the next Japanese flagship
computer, so-called “post K”
 Developing a wide range of application
codes, to run on the “post K”, to solve
major social and science issues
Vendor partner
The Japanese government selected 9
social & scientific priority issues and
their R&D organizations.
3
Co-design
4
Target ApplicationsArchitectural Parameters
• #SIMD, SIMD length, #core, #NUMA node
• cache (size and bandwidth)
• memory technologies
• specialized hardware
• Interconnect
• I/O network
Target Applications’ Characteristics
5
Target Application
Program Brief description Co-design
① GENESIS MD for proteins Collective comm. (all-to-all), Floating point perf (FPP)
② Genomon Genome processing (Genome alignment) File I/O, Integer Perf.
③ GAMERA
Earthquake simulator (FEM in unstructured
& structured grid)
Comm., Memory bandwidth
④ NICAM+LETK
Weather prediction system using Big data
(structured grid stencil & ensemble Kalman
filter)
Comm., Memory bandwidth, File I/O, SIMD
⑤ NTChem molecular electronic (structure calculation) Collective comm. (all-to-all, allreduce), FPP, SIMD,
⑥ FFB Large Eddy Simulation (unstructured grid) Comm., Memory bandwidth,
⑦ RSDFT
an ab-initio program (density functional
theory)
Collective comm. (bcast), FFP
⑧ Adventure
Computational Mechanics System for Large
Scale Analysis and Design (unstructured
grid)
Comm., Memory bandwidth, SIMD
⑨ CCS-QCD
Lattice QCD simulation (structured grid
Monte Carlo)
Comm., Memory bandwidth, Collective comm. (allreduce)
Co-design
6
Architectural Parameters
• #SIMD, SIMD length, #core, #NUMA node
• cache (size and bandwidth)
• memory technologies
• specialized hardware
• Interconnect
• I/O network
Target Applications
 Mutual understanding both
computer architecture/system software and applications
 Looking at performance predictions
 Finding out the best solution with constraints, e.g., power
consumption, budget, and space
Prediction of node-level
performance
Profiling applications,
e.g., cache misses and
execution unit usages
Prediction Tool
Prediction of scalability
(Communication cost)
R&D Organization
7
• DOE-MEXT
• JLESC
• …
International
Collaboration
• HPCI Consortium
• PC Cluster Consortium
• OpenHPC
• …
Communities
• Univ. of Tsukuba
• Univ. of Tokyo
• Univ. of Kyoto
Domestic
Collaboration
Outline of Talk
 Introduction of FLAGSHIP2020 project
 An Overview of post K system
 Concluding Remarks
2016/09/07 8HP User Forum @ Austin
An Overview of post K
 Hardware
 Manycore architecture
 6D mesh/torus Interconnect
 3-level hierarchical storage system
 Silicon Disk
 Magnetic Disk
 Storage for archive
Login
Servers
Maintenance
Servers
I/O Network
……
…
…
…
…
…
…
…
…
…
… Hierarchical
Storage System
Portal
Servers
 System Software
 Multi-Kernel: Linux with Light-weight Kernel
 File I/O middleware for 3-level hierarchical storage
system and application
 Application-oriented file I/O middleware
 MPI+OpenMP programming environment
 Highly productive programing language and libraries
2016/09/07 9HP User Forum @ Austin
What we have done
 Software
 OS functional design
 Communication functional
design
 File I/O functional design
 Programming languages
 Mathematical libraries
• Node architecture
• System configuration
• Storage system
Continue to design
 Hardware
 Instruction set architecture
2016/09/07 10HP User Forum @ Austin
Instruction Set Architecture
 ARM V8 HPC Extension
 Fujitsu is a lead partner of ARM HPC extension development
 Detailed features were announced at Hot Chips 28 - 2016
http://www.hotchips.org/program/
Mon 8/22 Day1 9:45AM GPUs & HPCs
ARMv8-A Next Generation Vector Architecture for
HPC
 Fujitsu’s inheritances
 FMA
 Math acceleration primitives
 Inter core barrier
 Sector cache
 Hardware prefetch assist
SVE (Scalable Vector Extension)
2016/09/07 11HP User Forum @ Austin
OS Kernel
 Requirements of OS Kernel targeting high-end HPC
 Noiseless execution environment for bulk-synchronous applications
 Ability to easily adapt to new/future system architectures
 E.g.: manycore CPUs, heterogenous core architectures, deep memory
hierarchy, etc.
- New process/thread management, memory management, …
 Ability to adapt to new/future application demand
 Big-Data, in-situ applications
- Support data flow from Internet devices to compute nodes
- Optimize data movement
Approach Pros. Cons.
Full-Weight
Kernel
(FWK)
e.g. Linux
Disabling, removing, tuning,
reimplementation, and adding
new features
Large community support results in
rapid new hardware adaptation
• Hard to implement a new feature if the
original mechanism is conflicted with
the new feature
• Hard to follow the latest kernel
distribution due to local large
modifications
Light-Weight
Kernel
(LWK)
Implementation from scratch
and adding new features
Easy to extend it because of small in
terms of logic and code size
• Applications, running on FWK, cannot
run always in LWK
• Small community maintenance limits
rapid growth
• Lack of device drivers
12
Our Approach:
Linux with Light-Weight Kernel
 Partition resources (CPU cores,
memory)
 Full Linux kernel on some cores
 System daemons and in-situ non
HPC applications
 Device drivers
 Light-weight kernel(LWK),
McKernel on other cores
 HPC applications
McKernel developed at RIKEN
 McKernel is loadable module of Linux
 McKernel supports Linux API
 McKernel runs on
 Intel Xeon and Xeon phi
 Fujitsu FX10
13
Very simple
memory
management
Thin LWK
Process/Thread
management
General
scheduler
Complex
Mem. Mngt.
Linux
TCP stack
Dev. Drivers
VFS
File Sys
Driers
Memory
… …
Interrupt
System
daemons
?
HPC Applications
PartitionPartition
In-situ non HPC application
Linux API (glibc, /sys/, /proc/)
Core Core Core Core Core Core
McKernel is deployed to the Oakforest-PACS
supercomputer, 25 PF in peak, at JCAHPC
organized by U. of Tsukuba and U. of Tokyo
2016/09/07 HP User Forum @ Austin
OS: McKernel
14
• Linux Kernel+Loadable LWK, McKernel
– Linux Kernel is resident, and daemons for job scheduler and etc. run on Linux
– McKernel is dynamically reloaded (rebooted) for each application
Finish
App A, requiring
LWK-without-
scheduler, Is invoked
App B, requiring
LWK-with-scheduler,
Is invoked
Finish
AppC,usingfullLinux
capability,Isinvoked
Finish
2016/09/07 HP User Forum @ Austin
OS: McKernel
15
https://asc.llnl.gov/sequoia/benchmarks
Linux with isolcpus McKernel
Results of FWQ (Fixed Work Quanta)
2016/09/07 HP User Forum @ Austin
Concluding Remarks
16
 Post K’s CPU is based on ARM V8 with HPC extension
 The usability will be improved than the K computer by changing
architecture
 More wide-range community support
 The system software stack for Post K is being designed and implemented
with the leverage of international collaborations
 The software stack developed at RIKEN is Open source
 It also runs on Intel Xeon and Xeon phi
 RIKEN would like to contribute to OpenHPC
2016/09/07 HP User Forum @ Austin

More Related Content

What's hot

What's hot (20)

ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
An Update on Arm HPC
An Update on Arm HPCAn Update on Arm HPC
An Update on Arm HPC
 
Arm in HPC
Arm in HPCArm in HPC
Arm in HPC
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Sx 6-single-node
Sx 6-single-nodeSx 6-single-node
Sx 6-single-node
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 

Similar to Japan's post K Computer

Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
jsvetter
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Heiko Joerg Schick
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 
Ap 06 4_10_simek
Ap 06 4_10_simekAp 06 4_10_simek
Ap 06 4_10_simek
Nguyen Vinh
 

Similar to Japan's post K Computer (20)

The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
LPC4300_two_cores
LPC4300_two_coresLPC4300_two_cores
LPC4300_two_cores
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE - QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
 
Ap 06 4_10_simek
Ap 06 4_10_simekAp 06 4_10_simek
Ap 06 4_10_simek
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
Superfluid Deployment of Virtual Functions: Exploiting Mobile Edge Computing ...
Superfluid Deployment of Virtual Functions: Exploiting Mobile Edge Computing ...Superfluid Deployment of Virtual Functions: Exploiting Mobile Edge Computing ...
Superfluid Deployment of Virtual Functions: Exploiting Mobile Edge Computing ...
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Japan's post K Computer

  • 1. Japan’s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7th September, 2016
  • 2. Outline of Talk  Introduction of FLAGSHIP2020 project  An Overview of post K system  Concluding Remarks 2016/09/07 2HP User Forum @ Austin
  • 3. An Overview of Flagship 2020 project  Developing the next Japanese flagship computer, so-called “post K”  Developing a wide range of application codes, to run on the “post K”, to solve major social and science issues Vendor partner The Japanese government selected 9 social & scientific priority issues and their R&D organizations. 3
  • 4. Co-design 4 Target ApplicationsArchitectural Parameters • #SIMD, SIMD length, #core, #NUMA node • cache (size and bandwidth) • memory technologies • specialized hardware • Interconnect • I/O network
  • 5. Target Applications’ Characteristics 5 Target Application Program Brief description Co-design ① GENESIS MD for proteins Collective comm. (all-to-all), Floating point perf (FPP) ② Genomon Genome processing (Genome alignment) File I/O, Integer Perf. ③ GAMERA Earthquake simulator (FEM in unstructured & structured grid) Comm., Memory bandwidth ④ NICAM+LETK Weather prediction system using Big data (structured grid stencil & ensemble Kalman filter) Comm., Memory bandwidth, File I/O, SIMD ⑤ NTChem molecular electronic (structure calculation) Collective comm. (all-to-all, allreduce), FPP, SIMD, ⑥ FFB Large Eddy Simulation (unstructured grid) Comm., Memory bandwidth, ⑦ RSDFT an ab-initio program (density functional theory) Collective comm. (bcast), FFP ⑧ Adventure Computational Mechanics System for Large Scale Analysis and Design (unstructured grid) Comm., Memory bandwidth, SIMD ⑨ CCS-QCD Lattice QCD simulation (structured grid Monte Carlo) Comm., Memory bandwidth, Collective comm. (allreduce)
  • 6. Co-design 6 Architectural Parameters • #SIMD, SIMD length, #core, #NUMA node • cache (size and bandwidth) • memory technologies • specialized hardware • Interconnect • I/O network Target Applications  Mutual understanding both computer architecture/system software and applications  Looking at performance predictions  Finding out the best solution with constraints, e.g., power consumption, budget, and space Prediction of node-level performance Profiling applications, e.g., cache misses and execution unit usages Prediction Tool Prediction of scalability (Communication cost)
  • 7. R&D Organization 7 • DOE-MEXT • JLESC • … International Collaboration • HPCI Consortium • PC Cluster Consortium • OpenHPC • … Communities • Univ. of Tsukuba • Univ. of Tokyo • Univ. of Kyoto Domestic Collaboration
  • 8. Outline of Talk  Introduction of FLAGSHIP2020 project  An Overview of post K system  Concluding Remarks 2016/09/07 8HP User Forum @ Austin
  • 9. An Overview of post K  Hardware  Manycore architecture  6D mesh/torus Interconnect  3-level hierarchical storage system  Silicon Disk  Magnetic Disk  Storage for archive Login Servers Maintenance Servers I/O Network …… … … … … … … … … … … Hierarchical Storage System Portal Servers  System Software  Multi-Kernel: Linux with Light-weight Kernel  File I/O middleware for 3-level hierarchical storage system and application  Application-oriented file I/O middleware  MPI+OpenMP programming environment  Highly productive programing language and libraries 2016/09/07 9HP User Forum @ Austin
  • 10. What we have done  Software  OS functional design  Communication functional design  File I/O functional design  Programming languages  Mathematical libraries • Node architecture • System configuration • Storage system Continue to design  Hardware  Instruction set architecture 2016/09/07 10HP User Forum @ Austin
  • 11. Instruction Set Architecture  ARM V8 HPC Extension  Fujitsu is a lead partner of ARM HPC extension development  Detailed features were announced at Hot Chips 28 - 2016 http://www.hotchips.org/program/ Mon 8/22 Day1 9:45AM GPUs & HPCs ARMv8-A Next Generation Vector Architecture for HPC  Fujitsu’s inheritances  FMA  Math acceleration primitives  Inter core barrier  Sector cache  Hardware prefetch assist SVE (Scalable Vector Extension) 2016/09/07 11HP User Forum @ Austin
  • 12. OS Kernel  Requirements of OS Kernel targeting high-end HPC  Noiseless execution environment for bulk-synchronous applications  Ability to easily adapt to new/future system architectures  E.g.: manycore CPUs, heterogenous core architectures, deep memory hierarchy, etc. - New process/thread management, memory management, …  Ability to adapt to new/future application demand  Big-Data, in-situ applications - Support data flow from Internet devices to compute nodes - Optimize data movement Approach Pros. Cons. Full-Weight Kernel (FWK) e.g. Linux Disabling, removing, tuning, reimplementation, and adding new features Large community support results in rapid new hardware adaptation • Hard to implement a new feature if the original mechanism is conflicted with the new feature • Hard to follow the latest kernel distribution due to local large modifications Light-Weight Kernel (LWK) Implementation from scratch and adding new features Easy to extend it because of small in terms of logic and code size • Applications, running on FWK, cannot run always in LWK • Small community maintenance limits rapid growth • Lack of device drivers 12 Our Approach: Linux with Light-Weight Kernel
  • 13.  Partition resources (CPU cores, memory)  Full Linux kernel on some cores  System daemons and in-situ non HPC applications  Device drivers  Light-weight kernel(LWK), McKernel on other cores  HPC applications McKernel developed at RIKEN  McKernel is loadable module of Linux  McKernel supports Linux API  McKernel runs on  Intel Xeon and Xeon phi  Fujitsu FX10 13 Very simple memory management Thin LWK Process/Thread management General scheduler Complex Mem. Mngt. Linux TCP stack Dev. Drivers VFS File Sys Driers Memory … … Interrupt System daemons ? HPC Applications PartitionPartition In-situ non HPC application Linux API (glibc, /sys/, /proc/) Core Core Core Core Core Core McKernel is deployed to the Oakforest-PACS supercomputer, 25 PF in peak, at JCAHPC organized by U. of Tsukuba and U. of Tokyo 2016/09/07 HP User Forum @ Austin
  • 14. OS: McKernel 14 • Linux Kernel+Loadable LWK, McKernel – Linux Kernel is resident, and daemons for job scheduler and etc. run on Linux – McKernel is dynamically reloaded (rebooted) for each application Finish App A, requiring LWK-without- scheduler, Is invoked App B, requiring LWK-with-scheduler, Is invoked Finish AppC,usingfullLinux capability,Isinvoked Finish 2016/09/07 HP User Forum @ Austin
  • 15. OS: McKernel 15 https://asc.llnl.gov/sequoia/benchmarks Linux with isolcpus McKernel Results of FWQ (Fixed Work Quanta) 2016/09/07 HP User Forum @ Austin
  • 16. Concluding Remarks 16  Post K’s CPU is based on ARM V8 with HPC extension  The usability will be improved than the K computer by changing architecture  More wide-range community support  The system software stack for Post K is being designed and implemented with the leverage of international collaborations  The software stack developed at RIKEN is Open source  It also runs on Intel Xeon and Xeon phi  RIKEN would like to contribute to OpenHPC 2016/09/07 HP User Forum @ Austin