SlideShare a Scribd company logo
1 of 19
Download to read offline
IBM Data Centric Systems & OpenPOWER
Yoonho Park
Research Staff Member/Senior Manager
Data Centric Systems Software, Cloud and Cognitive
IBM Research
1
LinkBandwidthperdirection,Gb/s
10
100
1000
4X
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
QDR
EDR
100G
FDR
HDR
200G
HDR
56G
2010
Sensors
& Devices
VoIP
Enterprise Data
Social
Media
2020
Data volume grows exponentially Microprocessor clock rates have stalled…
And network bandwidth cannot keep up.
Data Growth Outpaces Computing Technology Elements
1990 1995 2000 2005 2010
0.1
1
10
100
1000
SustainedHDDI/ORatesperGByte
Desktop and Server Drive Performance
<=7200 RPM
10K RPM
15K RPMDesktop
5400 & 7200 RPM
0.01 4TB SATA
I/O performance/capacity loosing ground…
2
Big Data and the New Era of Computing
Dimensions of data growth
Terabytes to
exabytes of
existing data
to process
Structured,
unstructured,
text, multimedia
Streaming data,
milliseconds to
seconds to
respond
Uncertainty
from inconsistency,
ambiguities, etc.
Variety
Volume Velocity
Veracity
9000
8000
7000
6000
5000
4000
3000
Data volume is on the rise
2010
VolumeinExabytes
Sensors
& Devices
VoIP
Enterprise
Data
Social
Media
2020
• Big Data analytics and Exascale High Performance Computing facing similar
challenges: scale, performance, bandwidth, computational complexity
• IBM approach: Move compute to data – Data Centric Systems (DCS)
3
Region defined by
LINPACK
Conceptual View of Data
Intensive with Floating Point
Workflow
Data Centric
Applications
Compute Centric
Applications
Floating Point
OPS
Integer
OPS
Low Spatial
Locality
High Spatial
Locality
Conceptual View of Data Intensive-
Integer Workflow
Conceptual View of Data
Fusion/Uncertainty Quantification
Workflow
Different Solutions for Different Types of Workflows
4
5
Data Centric Workflows: Mixed Compute Capabilities Required
All Source Analytics
Oil and Gas
Science
Financial Analytics
4-40 Racks
2-20 Racks
1-5+ Racks
1-100’s Racks
Analytics
Reservoir
Graph
Analytics
Throughput
Seismic
Value At Risk
Image
Analysis
Capability
Massively Parallel
Compute Capability:
• Simple kernels,
• Ops dominated (e.g.
DGEMM, Linpack)
• Simple data access
patterns.
• Can be preplanned for
high performance.
Analytics Capability:
• Complex code
• Data Dependent Code
Paths / Computation
• Lots of indirection /
pointer chasing
• Often Memory System
Latency Dependent
• C++ templated codes
• Limited opportunity for
vectorization
• Limited scalability
• Limited threading
opportunity
Comparing Compute Centric to Data Centric
 Systems and Solutions must become more data
centric and data aware
– Data movement minimized
• Within the system
• Within/across the end to end solution
– Compute enabled at all levels
– Workloads/workflow driven system and solution design
choices
– Modular, composable solution architectures
– Enhanced resource agility and sharing
 Focus must shift from algorithms to workflows
– New end to end efficiencies and optimizations
– Based on a data aware understanding of the full scope of
resource requirements
• Storage, Networking, Compute, Applications, Resource
management, Data Centers
 Uncertainty of the future puts a premium on
flexibility and innovation
TraditionalComputing
System
DataCentricComputing
System
6
IBM Data Centric Design Principles
Principle 3: Modularity
–Balanced, composable architecture for Big Data
analytics, modeling and simulation
Principle 2: Enable compute in all levels of the systems
hierarchy
–HW & SW innovations to support / enable compute in
data
Principle 4: Application-driven design
– Use real workloads/workflows to drive design points
Principle 1: Minimize data motion
– Data motion is expensive
– Allow workloads to run where they run best
Principle 5: Leverage OpenPOWER to accelerate innovation and broaden diversity for clients
Massive data requirements drive a composable architecture for big data, complex analytics, modeling and
simulation. The DCS architecture will appeal to segments experiencing an explosion of data and the
associated computational demands
7
8
IBM OpenPOWER-based HPC Roadmap
2015 2016 2017
POWER8
POWER8+
POWER9
OpenPower
CAPI Interface
NVLink
Enhanced
NVLink
Connect-IB
FDR Infiniband
PCIe Gen3
ConnectX-4
EDR Infiniband
CAPI over PCIe Gen3
ConnectX-5
Next-Gen Infiniband
Enhanced CAPI over PCIe Gen4
Mellanox
Interconnect
Technology
IBM CPUs
NVIDIA GPUs
Kepler
PCIe Gen3
Volta
Enhanced NVLink
Pascal
NVLink
IBM Nodes
8
Heterogeneity is Key
9 © 2017 OpenPOWER Foundation
• Moore’s law no longer
satisfies performance
gain
• Growing workload
demands
• Numerous IT
consumption models
• Mature Open software
ecosystem
OpenPOWER, a catalyst for Open Innovation
• Rich software ecosystem
• Spectrum of power
servers
• Multiple hardware
options
• Derivative POWER chips
OpenPOWER is an open development community,
using the POWER Architecture to serve the evolving needs of customers.
Performance of POWER
architecture
amplified capability
Open Development
open software, open hardware
Collaboration of thought
leaders
simultaneous innovation, multiple
disciplines
Feeds back … resulting in
client choice
140+ OpenPOWER Foundation Members
Implementation, HPC & Research
Software
System Integration
I/O, Storage & Acceleration
Boards & Systems
Chips & SoCs
US & UK Research Centers Select OpenPOWER-based Supercomputers
IBM, Mellanox, and
NVIDIA awarded $325M
U.S. Department of
Energy’s CORAL
Supercomputers
IBM & UK’s STFC in
£313M Partnership
for Big Data & Cognitive
Computing Research
CORAL: Leadership Class
Supercomputers
5X – 10X HIGHER APP PERF THAN CURRENT SYSTEMS
11
© 2014 IBM Corporation
Hybrid CPU/GPU architecture
 At least 5X Titan / Sequoia Application Performance
 Approximately 3,400 nodes, each with:
– Multiple IBM POWER9™ CPUs and multiple NVIDIA Tesla® GPUs using the NVIDIA
Volta architecture
– CPUs and GPUs completely connected with high speed NVLink
– Large coherent memory: over 512 GB (HBM + DDR4)
– All memory directly addressable from the CPUs and GPUs
 Over 40 TF peak performance per node
 Dual-rail Mellanox® EDR-IB full, non-blocking fat-tree interconnect
 IBM Elastic Storage (GPFS™) - 1TB/s I/O and 120 PB disk capacity.
12
© 2014 IBM Corporation
Programming Approaches
 Accelerator Approach:
– Required when not coherent
– Each processor computes in its own private address apace
– Data objects are homed in CPU Memory, are copied to GPU memory for GPU execution,
GPU engines act only on data in GPU memory.
 Compute in Shared Address Space:
– New option, now that CPU / GPU memories are coherent
– Data objects can be in any physical memory domain
– Processors (either CPU or GPU) can use data in place.
– No copies required
 Note:
– Will still have to manage NUMA
13
© 2014 IBM Corporation
Compute and Memory View – Emerging Approach
GPU 1 Compute
GPU 1 Memory
CPU 1 Compute
CPU 1 Memory
GPU 2 Compute
GPU 2 Memory
Multiple Compute Engines
• Consider all engines as equal peers
Multiple Memories
• Consider all memories as equal peers
14
© 2014 IBM Corporation
Compute and Memory View – Traditional Acceleration
MPI Task
Data
Data / Compute Flow
15
© 2014 IBM Corporation
Peer Processing
MPI Task
Data
No Data Flow
Compute Flow
Data
Data can be placed at Allocation, or
can migrate under run time control
(e.g. UVM)
Thread-like programming model
(OpenMP 4.x, OpenAcc, CUDA …
Still under development …
16
© 2016 IBM
Future DCS Programming Model
HPC workflowsBig Data workflows
Unstructured or semi-structured data
Collaborative Filtering, clustering, web search,
recommendation, …
Real-time analysis, fraud and anomaly detections
Sensor data filtering, classification, …
statistical averages/histogram, …
Deep Learning, Support Vector Machine, …
Graph Community finding to Shortest Path, …
Structured data
Exascale data sets
Primarily scientific calculations
Ensemble analysis
Sensitive analysis
Uncertainty quantification
Network bandwidth/
efficiency
CPU clock rate
stalling
IO performance
Layered storage:
New storage
technologies
CAPI, InifiniBand, Flash1GigE, TCP/IP, HDDs, …
HPC ecosystem
Fortran/C, MPI+OpenMP, CUDA, Legion, …
UPC, OpenSHMEM, Charm++, GASNet, …
UCX, PAMI, Verbs, …
Hadoop/Spark ecosystem
Spark API, Hadoop API, Cassandra, …
Java, Scala, Python, TCP sockets, … Exascale
Runtime &
Middleware
High productivity and High performanceCommodity clusters OpenPOWER/DCS
High productivity, fault-tolerance High performance, specialized hardware
common challenges
17
 Linux – NUMA support for hardware coherent GPU memory
 Provisioning – xCAT
 Burst Buffer – Support for fast shared-file checkpointing
 CSM – Cluster System Management
 Compilers – LLVM
 Tools – Ensure tools have appropriate APIs
DCS OpenPOWER Software Contribution Examples
18
 Experience/Observation: Extraction of meaningful insights from Big Data and
enabling real-time, predictive decision making requires similar computation
techniques that have been characteristic of Technical Computing
– Convergence in many future workflow requirements including Big Data-driven analytics,
modeling, visualization, and simulation
– Will require optimized full-system design and integration
 IBM Approach: Data Centric innovation in multiple areas, in open ecosystems with
workload-driven co-design
– System architecture and design with modular building blocks
– Hardware technologies
– Integration of heterogeneous compute elements
– Software enablement
Summary – IBM Data Centric Systems & OpenPOWER
19

More Related Content

What's hot

Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
inside-BigData.com
 

What's hot (20)

Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
 
HPC Accelerating Combustion Engine Design
HPC Accelerating Combustion Engine DesignHPC Accelerating Combustion Engine Design
HPC Accelerating Combustion Engine Design
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 

Similar to IBM Data Centric Systems & OpenPOWER

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM Switzerland
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
DataWorks Summit
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
 

Similar to IBM Data Centric Systems & OpenPOWER (20)

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013IBM Special Announcement session Intel #IDF2013 September 10, 2013
IBM Special Announcement session Intel #IDF2013 September 10, 2013
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Innovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big DataInnovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big Data
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platforms
 
IBM Aspera overview
IBM Aspera overview IBM Aspera overview
IBM Aspera overview
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 

More from inside-BigData.com

Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
inside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computing
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

IBM Data Centric Systems & OpenPOWER

  • 1. IBM Data Centric Systems & OpenPOWER Yoonho Park Research Staff Member/Senior Manager Data Centric Systems Software, Cloud and Cognitive IBM Research 1
  • 2. LinkBandwidthperdirection,Gb/s 10 100 1000 4X 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 QDR EDR 100G FDR HDR 200G HDR 56G 2010 Sensors & Devices VoIP Enterprise Data Social Media 2020 Data volume grows exponentially Microprocessor clock rates have stalled… And network bandwidth cannot keep up. Data Growth Outpaces Computing Technology Elements 1990 1995 2000 2005 2010 0.1 1 10 100 1000 SustainedHDDI/ORatesperGByte Desktop and Server Drive Performance <=7200 RPM 10K RPM 15K RPMDesktop 5400 & 7200 RPM 0.01 4TB SATA I/O performance/capacity loosing ground… 2
  • 3. Big Data and the New Era of Computing Dimensions of data growth Terabytes to exabytes of existing data to process Structured, unstructured, text, multimedia Streaming data, milliseconds to seconds to respond Uncertainty from inconsistency, ambiguities, etc. Variety Volume Velocity Veracity 9000 8000 7000 6000 5000 4000 3000 Data volume is on the rise 2010 VolumeinExabytes Sensors & Devices VoIP Enterprise Data Social Media 2020 • Big Data analytics and Exascale High Performance Computing facing similar challenges: scale, performance, bandwidth, computational complexity • IBM approach: Move compute to data – Data Centric Systems (DCS) 3
  • 4. Region defined by LINPACK Conceptual View of Data Intensive with Floating Point Workflow Data Centric Applications Compute Centric Applications Floating Point OPS Integer OPS Low Spatial Locality High Spatial Locality Conceptual View of Data Intensive- Integer Workflow Conceptual View of Data Fusion/Uncertainty Quantification Workflow Different Solutions for Different Types of Workflows 4
  • 5. 5 Data Centric Workflows: Mixed Compute Capabilities Required All Source Analytics Oil and Gas Science Financial Analytics 4-40 Racks 2-20 Racks 1-5+ Racks 1-100’s Racks Analytics Reservoir Graph Analytics Throughput Seismic Value At Risk Image Analysis Capability Massively Parallel Compute Capability: • Simple kernels, • Ops dominated (e.g. DGEMM, Linpack) • Simple data access patterns. • Can be preplanned for high performance. Analytics Capability: • Complex code • Data Dependent Code Paths / Computation • Lots of indirection / pointer chasing • Often Memory System Latency Dependent • C++ templated codes • Limited opportunity for vectorization • Limited scalability • Limited threading opportunity
  • 6. Comparing Compute Centric to Data Centric  Systems and Solutions must become more data centric and data aware – Data movement minimized • Within the system • Within/across the end to end solution – Compute enabled at all levels – Workloads/workflow driven system and solution design choices – Modular, composable solution architectures – Enhanced resource agility and sharing  Focus must shift from algorithms to workflows – New end to end efficiencies and optimizations – Based on a data aware understanding of the full scope of resource requirements • Storage, Networking, Compute, Applications, Resource management, Data Centers  Uncertainty of the future puts a premium on flexibility and innovation TraditionalComputing System DataCentricComputing System 6
  • 7. IBM Data Centric Design Principles Principle 3: Modularity –Balanced, composable architecture for Big Data analytics, modeling and simulation Principle 2: Enable compute in all levels of the systems hierarchy –HW & SW innovations to support / enable compute in data Principle 4: Application-driven design – Use real workloads/workflows to drive design points Principle 1: Minimize data motion – Data motion is expensive – Allow workloads to run where they run best Principle 5: Leverage OpenPOWER to accelerate innovation and broaden diversity for clients Massive data requirements drive a composable architecture for big data, complex analytics, modeling and simulation. The DCS architecture will appeal to segments experiencing an explosion of data and the associated computational demands 7
  • 8. 8 IBM OpenPOWER-based HPC Roadmap 2015 2016 2017 POWER8 POWER8+ POWER9 OpenPower CAPI Interface NVLink Enhanced NVLink Connect-IB FDR Infiniband PCIe Gen3 ConnectX-4 EDR Infiniband CAPI over PCIe Gen3 ConnectX-5 Next-Gen Infiniband Enhanced CAPI over PCIe Gen4 Mellanox Interconnect Technology IBM CPUs NVIDIA GPUs Kepler PCIe Gen3 Volta Enhanced NVLink Pascal NVLink IBM Nodes 8 Heterogeneity is Key
  • 9. 9 © 2017 OpenPOWER Foundation • Moore’s law no longer satisfies performance gain • Growing workload demands • Numerous IT consumption models • Mature Open software ecosystem OpenPOWER, a catalyst for Open Innovation • Rich software ecosystem • Spectrum of power servers • Multiple hardware options • Derivative POWER chips OpenPOWER is an open development community, using the POWER Architecture to serve the evolving needs of customers. Performance of POWER architecture amplified capability Open Development open software, open hardware Collaboration of thought leaders simultaneous innovation, multiple disciplines Feeds back … resulting in client choice
  • 10. 140+ OpenPOWER Foundation Members Implementation, HPC & Research Software System Integration I/O, Storage & Acceleration Boards & Systems Chips & SoCs
  • 11. US & UK Research Centers Select OpenPOWER-based Supercomputers IBM, Mellanox, and NVIDIA awarded $325M U.S. Department of Energy’s CORAL Supercomputers IBM & UK’s STFC in £313M Partnership for Big Data & Cognitive Computing Research CORAL: Leadership Class Supercomputers 5X – 10X HIGHER APP PERF THAN CURRENT SYSTEMS 11
  • 12. © 2014 IBM Corporation Hybrid CPU/GPU architecture  At least 5X Titan / Sequoia Application Performance  Approximately 3,400 nodes, each with: – Multiple IBM POWER9™ CPUs and multiple NVIDIA Tesla® GPUs using the NVIDIA Volta architecture – CPUs and GPUs completely connected with high speed NVLink – Large coherent memory: over 512 GB (HBM + DDR4) – All memory directly addressable from the CPUs and GPUs  Over 40 TF peak performance per node  Dual-rail Mellanox® EDR-IB full, non-blocking fat-tree interconnect  IBM Elastic Storage (GPFS™) - 1TB/s I/O and 120 PB disk capacity. 12
  • 13. © 2014 IBM Corporation Programming Approaches  Accelerator Approach: – Required when not coherent – Each processor computes in its own private address apace – Data objects are homed in CPU Memory, are copied to GPU memory for GPU execution, GPU engines act only on data in GPU memory.  Compute in Shared Address Space: – New option, now that CPU / GPU memories are coherent – Data objects can be in any physical memory domain – Processors (either CPU or GPU) can use data in place. – No copies required  Note: – Will still have to manage NUMA 13
  • 14. © 2014 IBM Corporation Compute and Memory View – Emerging Approach GPU 1 Compute GPU 1 Memory CPU 1 Compute CPU 1 Memory GPU 2 Compute GPU 2 Memory Multiple Compute Engines • Consider all engines as equal peers Multiple Memories • Consider all memories as equal peers 14
  • 15. © 2014 IBM Corporation Compute and Memory View – Traditional Acceleration MPI Task Data Data / Compute Flow 15
  • 16. © 2014 IBM Corporation Peer Processing MPI Task Data No Data Flow Compute Flow Data Data can be placed at Allocation, or can migrate under run time control (e.g. UVM) Thread-like programming model (OpenMP 4.x, OpenAcc, CUDA … Still under development … 16
  • 17. © 2016 IBM Future DCS Programming Model HPC workflowsBig Data workflows Unstructured or semi-structured data Collaborative Filtering, clustering, web search, recommendation, … Real-time analysis, fraud and anomaly detections Sensor data filtering, classification, … statistical averages/histogram, … Deep Learning, Support Vector Machine, … Graph Community finding to Shortest Path, … Structured data Exascale data sets Primarily scientific calculations Ensemble analysis Sensitive analysis Uncertainty quantification Network bandwidth/ efficiency CPU clock rate stalling IO performance Layered storage: New storage technologies CAPI, InifiniBand, Flash1GigE, TCP/IP, HDDs, … HPC ecosystem Fortran/C, MPI+OpenMP, CUDA, Legion, … UPC, OpenSHMEM, Charm++, GASNet, … UCX, PAMI, Verbs, … Hadoop/Spark ecosystem Spark API, Hadoop API, Cassandra, … Java, Scala, Python, TCP sockets, … Exascale Runtime & Middleware High productivity and High performanceCommodity clusters OpenPOWER/DCS High productivity, fault-tolerance High performance, specialized hardware common challenges 17
  • 18.  Linux – NUMA support for hardware coherent GPU memory  Provisioning – xCAT  Burst Buffer – Support for fast shared-file checkpointing  CSM – Cluster System Management  Compilers – LLVM  Tools – Ensure tools have appropriate APIs DCS OpenPOWER Software Contribution Examples 18
  • 19.  Experience/Observation: Extraction of meaningful insights from Big Data and enabling real-time, predictive decision making requires similar computation techniques that have been characteristic of Technical Computing – Convergence in many future workflow requirements including Big Data-driven analytics, modeling, visualization, and simulation – Will require optimized full-system design and integration  IBM Approach: Data Centric innovation in multiple areas, in open ecosystems with workload-driven co-design – System architecture and design with modular building blocks – Hardware technologies – Integration of heterogeneous compute elements – Software enablement Summary – IBM Data Centric Systems & OpenPOWER 19