SlideShare a Scribd company logo
1 of 24
Download to read offline
ARM Research – Software & Large Scale Systems
UCX:An Open Source Framework
for HPC Network APIs and Beyond
Pavel Shamis (Pasha)
Principal Research Engineer
Co-Design Collaboration
Collaborative Effort
Industry, National Laboratories and Academia
The Next Generation
HPC Communication Framework
Challenges
§  Performance Portability (across various interconnects)
§  Collaboration between industry and research institutions
§  …but mostly industry (because they built the hardware)
§  Maintenance
§  Maintaining a network stack is time consuming and expensive
§  Industry have resources and strategic interest for this
§  Extendibility
§  MPI+X+Y ?
§  Exascale programming environment is an ongoing debate
UCX – Unified Communication X Framework
§  Unified
§  Network API for multiple network architectures that target HPC
programing models and libraries
§  Communication
§  How to move data from location in memory A to location in memory B
considering multiple types of memories
§  Framework
§  A collection of libraries and utilities for HPC network programmers
History
MXM
●  Developed by Mellanox Technologies
●  HPC communication library for InfiniBand
devices and shared memory
●  Primary focus: MPI, PGAS
PAMI
●  Developed by IBM on BG/Q, PERCS, IB
VERBS
●  Network devices and shared memory
●  MPI, OpenSHMEM, PGAS, CHARM++, X10
●  C++ components
●  Aggressive multi-threading with contexts
●  Active Messages
●  Non-blocking collectives with hw accleration
support
Decades of community and
industry experience in
development of HPC software
UCCS
●  Developed by ORNL, UH, UTK
●  Originally based on Open MPI BTL and OPAL
layers
●  HPC communication library for InfiniBand,
Cray Gemini/Aries, and shared memory
●  Primary focus: OpenSHMEM, PGAS
●  Also supports: MPI
What we are doing differently…
§  UCX consolidates multiple industry and academic efforts
§  Mellanox MXM, IBM PAMI, ORNL/UTK/UH UCCS, etc.
§  Supported and maintained by industry
§  IBM, Mellanox, NVIDIA, Pathscale,ARM
What we are doing differently…
§  Co-design effort between national laboratories, academia, and
industry
Applications: LAMMPS, NWCHEM, etc.
Programming models: MPI, PGAS/Gasnet, etc.
Middleware:
Driver and Hardware
Co-design
UCX
InfiniBand uGNI
Shared
Memory
GPU Memory
Emerging
Interconnects
MPI GasNet PGAS
Task Based
Runtimes
I/O
Transports
Protocols Services
Applications
A Collaboration Efforts
§  Mellanox co-designs network API and contributes MXM technology
§  Infrastructure, transport, shared memory, protocols, integration with
OpenMPI/SHMEM, MPICH
§  ORNL & LANL co-designs network API and contributes UCCS project
§  InfiniBand optimizations, Cray devices, shared memory
§  ARM co-designs the network API and contributes optimizations for
ARM eco-system
§  NVIDIA co-designs high-quality support for GPU devices
§  GPUDirect, GDR copy, etc.
§  IBM co-designs network API and contributes ideas and concepts from
PAMI
§  UH/UTK focus on integration with their research platforms
Licensing
§  Open Source
§  BSD 3 Clause license
§  Contributor License Agreement – BSD 3 based
UCX Framework Mission
§  Collaboration between industry, laboratories, and academia
§  Create open-source production grade communication framework for HPC applications
§  Enable the highest performance through co-design of software-hardware interfaces
§  Unify industry - national laboratories - academia efforts
Performance oriented
Optimization for low-software
overheads in communication path allows
near native-level performance
Community driven
Collaboration between industry,
laboratories, and academia
Production quality
Developed, maintained, tested, and used
by industry and researcher community
API
Exposes broad semantics that target
data centric and HPC programming
models and applications
Research
The framework concepts and ideas are
driven by research in academia,
laboratories, and industry
Cross platform
Support for Infiniband, Cray, various
shared memory (x86-64 and Power),
GPUs
Co-design of Exascale Network APIs
Architecture
UCX Framework
UC-S for Services
This framework provides
basic infrastructure for
component based
programming, memory
management, and useful
system utilities
Functionality:
Platform abstractions, data
structures, debug facilities.
UC-T forTransport
Low-level API that expose
basic network operations
supported by underlying
hardware. Reliable, out-of-
order delivery.
Functionality:
Setup and instantiation of
communication operations.
UC-P for Protocols
High-level API uses UCT
framework to construct
protocols commonly found
in applications
Functionality:
Multi-rail, device selection,
pending queue, rendezvous,
tag-matching, software-
atomics, etc.
A High-level Overview
UC-T (Hardware Transports) - Low Level API
RMA, Atomic, Tag-matching, Send/Recv, Active Message
Transport for InfiniBand VERBs
driver
RC UD XRC DCT
Transport for intra-node host memory communication
SYSV POSIX KNEM CMA XPMEM
Transport for
Accelerator Memory
communucation
GPU
Transport for
Gemini/Aries
drivers
GNI
UC-S
(Services)
Common utilities
UC-P (Protocols) - High Level API
Transport selection, cross-transrport multi-rail, fragmentation, operations not supported by hardware
Message Passing API Domain:
tag matching, randevouze
PGAS API Domain:
RMAs, Atomics
Task Based API Domain:
Active Messages
I/O API Domain:
Stream
Utilities
Data
stractures
Hardware
MPICH, Open-MPI, etc.
OpenSHMEM, UPC, CAF, X10,
Chapel, etc.
Parsec, OCR, Legions, etc. Burst buffer, ADIOS, etc.
ApplicationsUCX
Memory
Management
OFA Verbs Driver Cray Driver OS Kernel Cuda
UCP API (DRAFT) Snippet
(https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h)
§  ucs_status_t ucp_put(ucp_ep_h ep, const void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)
Blocking remote memory put operation.
§  ucs_status_t ucp_put_nbi (ucp_ep_h ep, const void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)
Non-blocking implicit remote memory put operation.
§  ucs_status_t ucp_get (ucp_ep_h ep, void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)
Blocking remote memory get operation.
§  ucs_status_t ucp_get_nbi (ucp_ep_h ep, void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)
Non-blocking implicit remote memory get operation.
§  ucs_status_t ucp_atomic_add32 (ucp_ep_h ep, uint32_t add, uint64_t remote_addr, ucp_rkey_h rkey)
Blocking atomic add operation for 32 bit integers.
§  ucs_status_t ucp_atomic_add64 (ucp_ep_h ep, uint64_t add, uint64_t remote_addr, ucp_rkey_h rkey)
Blocking atomic add operation for 64 bit integers.
§  ucs_status_t ucp_atomic_fadd32 (ucp_ep_h ep, uint32_t add, uint64_t remote_addr, ucp_rkey_h rkey, uint32_t ∗result)
Blocking atomic fetch and add operation for 32 bit integers.
§  ucs_status_t ucp_atomic_fadd64 (ucp_ep_h ep, uint64_t add, uint64_t remote_addr, ucp_rkey_h rkey, uint64_t ∗result)
Blocking atomic fetch and add operation for 64 bit integers.
§  ucs_status_ptr_t ucp_tag_send_nb (ucp_ep_h ep, const void ∗buffer, size_t count, ucp_datatype_t datatype, ucp_tag_t tag, ucp_send_callback_t cb)
Non-blocking tagged-send operations.
§  ucs_status_ptr_t ucp_tag_recv_nb (ucp_worker_h worker, void ∗buffer, size_t count, ucp_datatype_t datatype, ucp_tag_t tag, ucp_tag_t tag_mask,
ucp_tag_recv_callback_t cb)
Non-blocking tagged-receive operation.
Preliminary Evaluation ( UCT )
§  Pavel Shamis, et al.“UCX:An Open Source Framework for HPC Network APIs and Beyond,” HOT Interconnects 2015 -
Santa Clara, California, US,August 2015
§  Two HP ProLiant DL380p Gen8 servers
§  Mellanox SX6036 switch, Single-port Mellanox Connect-IB FDR (10.10.5056)
§  Mellanox OFED 2.4-1.0.4. (VERBS)
§  Prototype implementation of AcceleratedVERBS (AVERBS)
��
��
��
��
��
���
���
���
���
���
���
�� �� �� �� ��� ��� ���
��������������������
��������������������
�����������������
����������������
�����������������
����������������
����
����
����
����
����
����
����
����
����
����
����
����
�� �� �� �� ��� ��� ���
������������ ��������������������
�����������������
�����������������
�����������������
����������������
����������������
����������������
��
��
��
��
��
��
��
��
�� ��� �� ��� ��
����������������
��������������������
�����������������
����������������
�����������������
����������������
OpenSHMEM and OSHMEM (OpenMPI)
Put Latency (shared memory)
0.1
1
10
100
1000
8 16 32 64 128 256 512 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB
Latency(usec,logscale)
Message Size
OpenSHMEM−UCX (intranode)
OpenSHMEM−UCCS (intranode)
OSHMEM (intranode)
Lower is better
Slide courtesy of ORNL UCXTeam
OpenSHMEM and OSHMEM (OpenMPI)
Put Injection Rate
Higher is better
Connect-IB
0
2e+06
4e+06
6e+06
8e+06
1e+07
1.2e+07
1.4e+07
8 16 32 64 128 256 512 1KB 2KB 4KB
MessageRate(putoperations/second)
Message Size
OpenSHMEM−UCX (mlx5)
OpenSHMEM−UCCS (mlx5)
OSHMEM (mlx5)
OSHMEM−UCX (mlx5)
Slide courtesy of ORNL UCXTeam
OpenSHMEM and OSHMEM (OpenMPI)
GUPs Benchmark
Higher is better
Connect-IB
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
2 4 6 8 10 12 14 16
GUPS(billionupdatespersecond)
Number of PEs (two nodes)
UCX (mlx5)
OSHMEM (mlx5)
Slide courtesy of ORNL UCXTeam
MPICH - Message rate
Preliminary Results
0
1
2
3
4
5
6 1
2
4
8
16
32
64
128
256
512
1k
2k
4k
8k
16k
32k
64k
128k
256k
512k
1M
2M
4M
MMPS
MPICH/UCX MPICH/MXM
Slide courtesy of Pavan Balaji,ANL - sent to the ucx mailing list
Connect-IB
“non-blocking tag-send”
Where is UCX being used?
§  Upcoming release of Open MPI 2.0 (MPI and OpenSHMEM APIs)
§  Upcoming release of MPICH
§  OpenSHMEM reference implementation by UH and ORNL
§  PARSEC – runtime used on Scientific Linear Libraries
What Next ?
§  UCX Consortium !
§  http://www.csm.ornl.gov/newsite/
§  UCX Specification
§  Early draft is available online:
http://www.openucx.org/early-draft-of-ucx-specification-is-here/
§  Production releases
§  MPICH, Open MPI, Open SHMEM(s), Gasnet, and more…
§  Support for more networks and applications and libraries
§  UCX Hackathon 2016 !
§  Will be announced on the mailing list and website
https://github.com/orgs/openucx
WEB: www.openucx.org
Contact: info@openucx.org
Mailing List:
https://elist.ornl.gov/mailman/listinfo/ucx-group
ucx-group@elist.ornl.gov
Questions ? Unified Communication - X
Framework
WEB: www.openucx.org
Contact: info@openucx.org
WE B: https://github.com/orgs/openucx
Mailing List:
https://elist.ornl.gov/mailman/listinfo/ucx-group
ucx-group@elist.ornl.gov

More Related Content

What's hot

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCKernel TLV
 
日本OpenStackユーザ会 第37回勉強会
日本OpenStackユーザ会 第37回勉強会日本OpenStackユーザ会 第37回勉強会
日本OpenStackユーザ会 第37回勉強会Yushiro Furukawa
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
VPP事始め
VPP事始めVPP事始め
VPP事始めnpsg
 
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...LINE Corporation
 
Developing MIPS Exploits to Hack Routers
Developing MIPS Exploits to Hack RoutersDeveloping MIPS Exploits to Hack Routers
Developing MIPS Exploits to Hack RoutersBGA Cyber Security
 
BGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみたBGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみたakira6592
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityThomas Graf
 
MAP 実装してみた
MAP 実装してみたMAP 実装してみた
MAP 実装してみたMasakazu Asama
 
IETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U Translation
IETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U TranslationIETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U Translation
IETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U TranslationKentaro Ebisawa
 
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...VirtualTech Japan Inc.
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netYan Vugenfirer
 
NEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfNEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfYasunori Goto
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理Takuya ASADA
 
Dockerと外部ルータを連携させる仕組みを作ってみた
Dockerと外部ルータを連携させる仕組みを作ってみたDockerと外部ルータを連携させる仕組みを作ってみた
Dockerと外部ルータを連携させる仕組みを作ってみたnpsg
 

What's hot (20)

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
日本OpenStackユーザ会 第37回勉強会
日本OpenStackユーザ会 第37回勉強会日本OpenStackユーザ会 第37回勉強会
日本OpenStackユーザ会 第37回勉強会
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
VPP事始め
VPP事始めVPP事始め
VPP事始め
 
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
Excitingly simple multi-path OpenStack networking: LAG-less, L2-less, yet ful...
 
Developing MIPS Exploits to Hack Routers
Developing MIPS Exploits to Hack RoutersDeveloping MIPS Exploits to Hack Routers
Developing MIPS Exploits to Hack Routers
 
BGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみたBGP Unnumbered で遊んでみた
BGP Unnumbered で遊んでみた
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
 
MAP 実装してみた
MAP 実装してみたMAP 実装してみた
MAP 実装してみた
 
NVMe over Fabric
NVMe over FabricNVMe over Fabric
NVMe over Fabric
 
IETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U Translation
IETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U TranslationIETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U Translation
IETF 104 Hackathon VPP Prototyping Stateless SRv6/GTP-U Translation
 
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
NEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfNEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdf
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 
Dockerと外部ルータを連携させる仕組みを作ってみた
Dockerと外部ルータを連携させる仕組みを作ってみたDockerと外部ルータを連携させる仕組みを作ってみた
Dockerと外部ルータを連携させる仕組みを作ってみた
 

Similar to Ucx an open source framework for hpc network ap is and beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondEd Dodds
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systemsinside-BigData.com
 
Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418inside-BigData.com
 
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale SystemsDesigning Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...inside-BigData.com
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceObject Automation
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascaleinside-BigData.com
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCObject Automation
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Datainside-BigData.com
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...inside-BigData.com
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersIntel® Software
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 

Similar to Ucx an open source framework for hpc network ap is and beyond (20)

UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418
 
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale SystemsDesigning Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
 
Communication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big DataCommunication Frameworks for HPC and Big Data
Communication Frameworks for HPC and Big Data
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
A Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing ClustersA Library for Emerging High-Performance Computing Clusters
A Library for Emerging High-Performance Computing Clusters
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Ucx an open source framework for hpc network ap is and beyond

  • 1. ARM Research – Software & Large Scale Systems UCX:An Open Source Framework for HPC Network APIs and Beyond Pavel Shamis (Pasha) Principal Research Engineer
  • 2. Co-Design Collaboration Collaborative Effort Industry, National Laboratories and Academia The Next Generation HPC Communication Framework
  • 3. Challenges §  Performance Portability (across various interconnects) §  Collaboration between industry and research institutions §  …but mostly industry (because they built the hardware) §  Maintenance §  Maintaining a network stack is time consuming and expensive §  Industry have resources and strategic interest for this §  Extendibility §  MPI+X+Y ? §  Exascale programming environment is an ongoing debate
  • 4. UCX – Unified Communication X Framework §  Unified §  Network API for multiple network architectures that target HPC programing models and libraries §  Communication §  How to move data from location in memory A to location in memory B considering multiple types of memories §  Framework §  A collection of libraries and utilities for HPC network programmers
  • 5. History MXM ●  Developed by Mellanox Technologies ●  HPC communication library for InfiniBand devices and shared memory ●  Primary focus: MPI, PGAS PAMI ●  Developed by IBM on BG/Q, PERCS, IB VERBS ●  Network devices and shared memory ●  MPI, OpenSHMEM, PGAS, CHARM++, X10 ●  C++ components ●  Aggressive multi-threading with contexts ●  Active Messages ●  Non-blocking collectives with hw accleration support Decades of community and industry experience in development of HPC software UCCS ●  Developed by ORNL, UH, UTK ●  Originally based on Open MPI BTL and OPAL layers ●  HPC communication library for InfiniBand, Cray Gemini/Aries, and shared memory ●  Primary focus: OpenSHMEM, PGAS ●  Also supports: MPI
  • 6. What we are doing differently… §  UCX consolidates multiple industry and academic efforts §  Mellanox MXM, IBM PAMI, ORNL/UTK/UH UCCS, etc. §  Supported and maintained by industry §  IBM, Mellanox, NVIDIA, Pathscale,ARM
  • 7. What we are doing differently… §  Co-design effort between national laboratories, academia, and industry Applications: LAMMPS, NWCHEM, etc. Programming models: MPI, PGAS/Gasnet, etc. Middleware: Driver and Hardware Co-design
  • 8. UCX InfiniBand uGNI Shared Memory GPU Memory Emerging Interconnects MPI GasNet PGAS Task Based Runtimes I/O Transports Protocols Services Applications
  • 9. A Collaboration Efforts §  Mellanox co-designs network API and contributes MXM technology §  Infrastructure, transport, shared memory, protocols, integration with OpenMPI/SHMEM, MPICH §  ORNL & LANL co-designs network API and contributes UCCS project §  InfiniBand optimizations, Cray devices, shared memory §  ARM co-designs the network API and contributes optimizations for ARM eco-system §  NVIDIA co-designs high-quality support for GPU devices §  GPUDirect, GDR copy, etc. §  IBM co-designs network API and contributes ideas and concepts from PAMI §  UH/UTK focus on integration with their research platforms
  • 10. Licensing §  Open Source §  BSD 3 Clause license §  Contributor License Agreement – BSD 3 based
  • 11. UCX Framework Mission §  Collaboration between industry, laboratories, and academia §  Create open-source production grade communication framework for HPC applications §  Enable the highest performance through co-design of software-hardware interfaces §  Unify industry - national laboratories - academia efforts Performance oriented Optimization for low-software overheads in communication path allows near native-level performance Community driven Collaboration between industry, laboratories, and academia Production quality Developed, maintained, tested, and used by industry and researcher community API Exposes broad semantics that target data centric and HPC programming models and applications Research The framework concepts and ideas are driven by research in academia, laboratories, and industry Cross platform Support for Infiniband, Cray, various shared memory (x86-64 and Power), GPUs Co-design of Exascale Network APIs
  • 13. UCX Framework UC-S for Services This framework provides basic infrastructure for component based programming, memory management, and useful system utilities Functionality: Platform abstractions, data structures, debug facilities. UC-T forTransport Low-level API that expose basic network operations supported by underlying hardware. Reliable, out-of- order delivery. Functionality: Setup and instantiation of communication operations. UC-P for Protocols High-level API uses UCT framework to construct protocols commonly found in applications Functionality: Multi-rail, device selection, pending queue, rendezvous, tag-matching, software- atomics, etc.
  • 14. A High-level Overview UC-T (Hardware Transports) - Low Level API RMA, Atomic, Tag-matching, Send/Recv, Active Message Transport for InfiniBand VERBs driver RC UD XRC DCT Transport for intra-node host memory communication SYSV POSIX KNEM CMA XPMEM Transport for Accelerator Memory communucation GPU Transport for Gemini/Aries drivers GNI UC-S (Services) Common utilities UC-P (Protocols) - High Level API Transport selection, cross-transrport multi-rail, fragmentation, operations not supported by hardware Message Passing API Domain: tag matching, randevouze PGAS API Domain: RMAs, Atomics Task Based API Domain: Active Messages I/O API Domain: Stream Utilities Data stractures Hardware MPICH, Open-MPI, etc. OpenSHMEM, UPC, CAF, X10, Chapel, etc. Parsec, OCR, Legions, etc. Burst buffer, ADIOS, etc. ApplicationsUCX Memory Management OFA Verbs Driver Cray Driver OS Kernel Cuda
  • 15. UCP API (DRAFT) Snippet (https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h) §  ucs_status_t ucp_put(ucp_ep_h ep, const void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey) Blocking remote memory put operation. §  ucs_status_t ucp_put_nbi (ucp_ep_h ep, const void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey) Non-blocking implicit remote memory put operation. §  ucs_status_t ucp_get (ucp_ep_h ep, void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey) Blocking remote memory get operation. §  ucs_status_t ucp_get_nbi (ucp_ep_h ep, void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey) Non-blocking implicit remote memory get operation. §  ucs_status_t ucp_atomic_add32 (ucp_ep_h ep, uint32_t add, uint64_t remote_addr, ucp_rkey_h rkey) Blocking atomic add operation for 32 bit integers. §  ucs_status_t ucp_atomic_add64 (ucp_ep_h ep, uint64_t add, uint64_t remote_addr, ucp_rkey_h rkey) Blocking atomic add operation for 64 bit integers. §  ucs_status_t ucp_atomic_fadd32 (ucp_ep_h ep, uint32_t add, uint64_t remote_addr, ucp_rkey_h rkey, uint32_t ∗result) Blocking atomic fetch and add operation for 32 bit integers. §  ucs_status_t ucp_atomic_fadd64 (ucp_ep_h ep, uint64_t add, uint64_t remote_addr, ucp_rkey_h rkey, uint64_t ∗result) Blocking atomic fetch and add operation for 64 bit integers. §  ucs_status_ptr_t ucp_tag_send_nb (ucp_ep_h ep, const void ∗buffer, size_t count, ucp_datatype_t datatype, ucp_tag_t tag, ucp_send_callback_t cb) Non-blocking tagged-send operations. §  ucs_status_ptr_t ucp_tag_recv_nb (ucp_worker_h worker, void ∗buffer, size_t count, ucp_datatype_t datatype, ucp_tag_t tag, ucp_tag_t tag_mask, ucp_tag_recv_callback_t cb) Non-blocking tagged-receive operation.
  • 16. Preliminary Evaluation ( UCT ) §  Pavel Shamis, et al.“UCX:An Open Source Framework for HPC Network APIs and Beyond,” HOT Interconnects 2015 - Santa Clara, California, US,August 2015 §  Two HP ProLiant DL380p Gen8 servers §  Mellanox SX6036 switch, Single-port Mellanox Connect-IB FDR (10.10.5056) §  Mellanox OFED 2.4-1.0.4. (VERBS) §  Prototype implementation of AcceleratedVERBS (AVERBS) �� �� �� �� �� ��� ��� ��� ��� ��� ��� �� �� �� �� ��� ��� ��� �������������������� �������������������� ����������������� ���������������� ����������������� ���������������� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� ���� �� �� �� �� ��� ��� ��� ������������ �������������������� ����������������� ����������������� ����������������� ���������������� ���������������� ���������������� �� �� �� �� �� �� �� �� �� ��� �� ��� �� ���������������� �������������������� ����������������� ���������������� ����������������� ����������������
  • 17. OpenSHMEM and OSHMEM (OpenMPI) Put Latency (shared memory) 0.1 1 10 100 1000 8 16 32 64 128 256 512 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB Latency(usec,logscale) Message Size OpenSHMEM−UCX (intranode) OpenSHMEM−UCCS (intranode) OSHMEM (intranode) Lower is better Slide courtesy of ORNL UCXTeam
  • 18. OpenSHMEM and OSHMEM (OpenMPI) Put Injection Rate Higher is better Connect-IB 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 8 16 32 64 128 256 512 1KB 2KB 4KB MessageRate(putoperations/second) Message Size OpenSHMEM−UCX (mlx5) OpenSHMEM−UCCS (mlx5) OSHMEM (mlx5) OSHMEM−UCX (mlx5) Slide courtesy of ORNL UCXTeam
  • 19. OpenSHMEM and OSHMEM (OpenMPI) GUPs Benchmark Higher is better Connect-IB 0 0.0002 0.0004 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 0.0018 2 4 6 8 10 12 14 16 GUPS(billionupdatespersecond) Number of PEs (two nodes) UCX (mlx5) OSHMEM (mlx5) Slide courtesy of ORNL UCXTeam
  • 20. MPICH - Message rate Preliminary Results 0 1 2 3 4 5 6 1 2 4 8 16 32 64 128 256 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M MMPS MPICH/UCX MPICH/MXM Slide courtesy of Pavan Balaji,ANL - sent to the ucx mailing list Connect-IB “non-blocking tag-send”
  • 21. Where is UCX being used? §  Upcoming release of Open MPI 2.0 (MPI and OpenSHMEM APIs) §  Upcoming release of MPICH §  OpenSHMEM reference implementation by UH and ORNL §  PARSEC – runtime used on Scientific Linear Libraries
  • 22. What Next ? §  UCX Consortium ! §  http://www.csm.ornl.gov/newsite/ §  UCX Specification §  Early draft is available online: http://www.openucx.org/early-draft-of-ucx-specification-is-here/ §  Production releases §  MPICH, Open MPI, Open SHMEM(s), Gasnet, and more… §  Support for more networks and applications and libraries §  UCX Hackathon 2016 ! §  Will be announced on the mailing list and website
  • 23. https://github.com/orgs/openucx WEB: www.openucx.org Contact: info@openucx.org Mailing List: https://elist.ornl.gov/mailman/listinfo/ucx-group ucx-group@elist.ornl.gov
  • 24. Questions ? Unified Communication - X Framework WEB: www.openucx.org Contact: info@openucx.org WE B: https://github.com/orgs/openucx Mailing List: https://elist.ornl.gov/mailman/listinfo/ucx-group ucx-group@elist.ornl.gov