SlideShare a Scribd company logo

UCX-Python - A Flexible Communication Library for Python Applications

Talk by Akshay Venkatesh at GTC 2019 on building Python bindings to the UCX library for high performance communication.

1 of 29
Download to read offline
March 21, 2018
UCX-PYTHON: A FLEXIBLE
COMMUNICATION LIBRARY FOR
PYTHON APPLICATIONS
2
OUTLINE
Motivation and goals
Implementation choices
Features/API
Performance
Next steps
3
WHY PYTHON-BASED GPU COMMUNICATION?
Python use growing
Extensive libraries
Python in Data science/HPC is growing
+ GPU usage and communication needs
4
IMPACT ON DATA SCIENCE
RAPIDS uses dask-distributed for data distribution over python sockets
=> slows down all communication-bound components
Critical to enable dask with the ability to leverage IB, NVLINK
CUDA
PYTHON
APACHE ARROW
DASK
DEEP LEARNING
FRAMEWORKS
CUDNN
RAPIDS
CUMLCUDF CUGRAPH
Courtesy RAPIDS Team
5
CURRENT COMMUNICATION DRAWBACKS
Existing python communication modules primarily
rely on sockets
Low latency / high bandwidth critical for better
system utilization of GPUs (eg: NVLINK, IB)
Frameworks that transfer GPU data between sites
make copies
But CUDA-aware data movement is largely solved
in HPC!
6
REQUIREMENTS AND RESTRICTIONS
Dask – popular framework facilitates scaling python workloads to many nodes
Permits use of cuda-based python objects
Allows workers to be added and removed dynamically
communication backed built around coroutines (more later)
Why not use mpi4py then?
Dimension mpi4py
CUDA-Aware? No - Makes GPU<->CPU copies
Dynamic scaling? No - Imposes MPI restrictions
Coroutine support? No known support
7
GOALS
Provide a flexible communication library that:
1. Supports CPU/GPU buffers over a range of message types
- raw bytes, host objects/memoryview, cupy objects, numba objects
2. Supports Dynamic connection capability
3. Supports pythonesque programming using Futures, Coroutines, etc (if needed)
4. Provides close to native performance from python world
How? – Cython, UCX
8
OUTLINE
Motivation and goals
Implementation choices
Features/API
Performance
Next steps
9
WHY UCX?
Popular unified communication library used for MPI/PGAS
implementations such as OpenMPI, MPICH, OSHMEM, etc
Exposes API for:
Client-server based connection establishment
Point-to-point, RMA, atomics capabilities
Tag matching
Callbacks on communication events
Blocking/Polling progress
Cuda-Aware Point-to-point communication
C library!
10
PYTHON BINDING APPROACHES
Three main considerations:
SWIG, CFFI, Cython
Problems with SWIG, CFFI
Works well for small examples but not for C libraries
UCX definitions of structures isn’t consolidated
Tedious to populate interface file / python script by hand
11
CYTHON
Call C functions and structures from cython code (.pyx)
Expose classes, functions from python which can use C underneath
ucx_echo.py
...
ep=ucp.get_endpoint()
ep.send_obj(…)
…
ucp_py.pyx
cdef class ucp_endpoint:
def send_obj(…):
ucp_py_send_nb(…)
ucp_py_ucp_fxns.c
struct ctx *ucp_py_send_nb()
{
ucp_tag_send_nb(…)
}
Defined in UCX C libraryDefined in UCX-PY module
12
UCX-PY STACK
UCX-PY (.py)
OBJECT META-DATA EXTRACTION/ UCX C-WRAPPERS (.pyx)
RESOURCE MANAGEMENT/CALLBACK HANDLING/UCX CALLS(.c)
UCX C LIBRARY
13
OUTLINE
Motivation and goals
Implementation choices
Features/API
Performance
Next steps
14
COROUTINES
Co-operative concurrent functions
Preempted when
read/write from disk
perform communication
sleep, etc
Scheduler/event loop manages
execution of all coroutines
Single thread utilization increases
def zzz(i):
print("start", i)
time.sleep(2)
print("finish", i)
def main():
zzz(1)
zzz(2)
main()
Ouput:
start 1 # t = 0
finish 1 # t = 2
start 2 # t = 2 + △
finish 2 # t = 4 + △
async def zzz(i):
print("start" , i)
await asyncio.sleep(2)
print("finish“, i)
f = asyncio.create_task
async def main():
task1 = f(zzz(1))
task2 = f(zzz(2))
await task1
await task2
asyncio.run(main())
start 1 # t = 0
start 2 # t = 0 + △
finish 1 # t = 2
finish 2 # t = 2 + △
15
UCX-PY CONNECTION ESTABLISHMENT API
Dynamic connection establishment
.start_listener(accept_cb, port, is_coroutine) : Server creates listener
.get_endpoint (ip, port) : client connects
Multiple listeners allowed, multiple endpoints to server allowed
async def accept_cb(ep, …):
…
await ep.send_obj()
…
await ep.recv_obj()
…
ucp.start_listener(accept_cb, port,
is_coroutine=True)
async def talk_to_client():
ep = ucp.get_endpoint(ip, port)
…
await ep.recv_obj()
…
await ep.send_obj()
…
16
UCX-PY CONNECTION ESTABLISHMENT
ucp.start_listener()
ucp.get_endpoint()
listening state
accept connection
invoke callback
accept_cb()
Server Client
17
UCX-PY DATA MOVEMENT API
Send data (on endpoint)
.send_*() : raw bytes, host objects (numpy), cuda objects (cupy, numba)
Receive data (on endpoint)
.recv_obj() : pass an object as argument where data is received
.recv_future() ‘blind’ : no input; returns received object; low performance
async def accept_cb(ep, …):
…
await ep.send_obj(cupy.array([42]))
…
async def talk_to_client():
ep = ucp.get_endpoint(ip, port)
…
rr = await ep.recv_future()
msg = ucp.get_obj_from_msg(rr)
…
18
UCX-PY DATA MOVEMENT SEMANTICS
Send/Recv operations are non-blocking by default
Issue of the operation returns a future
Calling await on the future or calling future.result() blocks until completion
Caveat - Limited number of object types tested
memoryview, numpy, cupy, and numba
19
UNDER THE HOOD
UCX depends on event notification to avoid the main thread from constantly polling
Layer UCX Calls
Connection management ucp_{listener/ep}_create
Issuing data movement ucp_tag_{send/recv/probe}_nb
Request progress ucp_worker_{arm/signal/progress}
UCX-PY Blocking progress
UCX event notification
mechanism
Completion queue events
from IB event channel
Read/write event from Sockets
20
OUTLINE
Motivation and goals
Implementation choices
Features/API
Performance
Next steps
21
EXPERIMENTAL TESTBED
Hardware includes 2 Nodes:
Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Tesla V100-SXM2 (CUDA 9.2.88, driver version 410.48)
ConnectX-4 Mellanox HCAs (OFED-internal-4.0-1.0.1)
Software:
UCX 1.5, Python 3.7.1
Case UCX progress mode Python functions
Latency bound polling regular
Bandwidth bound Blocking (event notification based) coroutines
22
HOST MEMORY LATENCY
0
1
2
3
4
5
6
7
8
9
1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K
Latency(us)
Message Size (bytes)
Short Message Latency
native-UCX python-UCX
Latency-bound host transfers
0
100
200
300
400
500
600
700
800
16K 32K 64K 128K 256K 512K 1M 2M 4M
Latency(us)
Message Size (bytes)
Large Message Latency
native-UCX python-UCX
23
DEVICE MEMORY LATENCY
0
2
4
6
8
10
12
1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K
Latency(us)
Message Size (bytes)
Short Message Latency
native-UCX python-UCX
Latency-bound device transfers
0
100
200
300
400
500
600
700
800
900
16K 32K 64K 128K 256K 512K 1M 2M 4M
Latency(us)
Message Size (bytes)
Large Message Latency
native-UCX python-UCX
24
DEVICE MEMORY BANDWIDTH
Bandwidth-bound transfers (cupy)
6.15
7.36
8.85
9.86
10.47
10.85
8.7
9.7
10.3
10.7 10.9 11.03
0
2
4
6
8
10
12
10MB 20MB 40MB 80MB 160MB 320MB
BandwidthGB/s
Message Size
cupy native
25
OUTLINE
Motivation and goals
Implementation choices
Features/API
Performance
Next steps
26
NEXT STEPS
Performance validate dask-distributed over UCX-PY with dask-cuda workloads
Objects that have mixed physical backing (CPU and GPU)
Adding blocking support to NVLINK based UCT
Non-contiguous data transfers
Integration into dask-distributed underway
(https://github.com/TomAugspurger/distributed/commits/ucx+data-handling)
Current implementation (https://github.com/Akshay-Venkatesh/ucx/tree/topic/py-
bind)
Push to UCX project underway (https://github.com/openucx/ucx/pull/3165)
27
SUMMARY
UCX-PY is a flexible communication library
Provides python developers a way to leverage high-speed interconnects like IB
Can support pythonesque way of overlap communication with other coroutines
Or can be non-overlapped like in traditional HPC
Can support data-movement of objects residing on CPU memory or on GPU memory
users needn’t explicitly copy GPU<->CPU
UCX-PY is close to native performance for major use case range
28
BIG PICTURE
UCX-PY will serve as a high-performance communication module for dask
CUDA
PYTHON
APACHE ARROW
DASK
DEEP LEARNING
FRAMEWORKS
CUDNN
RAPIDS
CUMLCUDF CUGRAPH
UCX-PY
UCX-Python - A Flexible Communication Library for Python Applications

Recommended

Distributed ML with Dask and Kubernetes
Distributed ML with Dask and KubernetesDistributed ML with Dask and Kubernetes
Distributed ML with Dask and KubernetesRay Hilton
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with DaskUwe Korn
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsVíctor Zabalza
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskVíctor Zabalza
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 

More Related Content

What's hot

Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemDaniel Rodriguez
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache SparkWes McKinney
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabAbhinav Singh
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksDatabricks
 

What's hot (20)

Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 
PySaprk
PySaprkPySaprk
PySaprk
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 

Similar to UCX-Python - A Flexible Communication Library for Python Applications

Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Jakub Botwicz
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
 
CocoaConf: The Language of Mobile Software is APIs
CocoaConf: The Language of Mobile Software is APIsCocoaConf: The Language of Mobile Software is APIs
CocoaConf: The Language of Mobile Software is APIsTim Burks
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonTimothy Spann
 
Raspberry pi Part 23
Raspberry pi Part 23Raspberry pi Part 23
Raspberry pi Part 23Techvilla
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondEd Dodds
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
 
TechWiseTV Open NX-OS Workshop
TechWiseTV  Open NX-OS WorkshopTechWiseTV  Open NX-OS Workshop
TechWiseTV Open NX-OS WorkshopRobb Boyd
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
 

Similar to UCX-Python - A Flexible Communication Library for Python Applications (20)

Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
Cotopaxi - IoT testing toolkit (Black Hat Asia 2019 Arsenal)
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
CocoaConf: The Language of Mobile Software is APIs
CocoaConf: The Language of Mobile Software is APIsCocoaConf: The Language of Mobile Software is APIs
CocoaConf: The Language of Mobile Software is APIs
 
Resume
ResumeResume
Resume
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Raspberry pi Part 23
Raspberry pi Part 23Raspberry pi Part 23
Raspberry pi Part 23
 
Introduction to ns3
Introduction to ns3Introduction to ns3
Introduction to ns3
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
TechWiseTV Open NX-OS Workshop
TechWiseTV  Open NX-OS WorkshopTechWiseTV  Open NX-OS Workshop
TechWiseTV Open NX-OS Workshop
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
 

Recently uploaded

Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentThorsten Huelsmann
 
Trading Software Development_ Trends to Watch in 2024.pdf
Trading Software Development_ Trends to Watch in 2024.pdfTrading Software Development_ Trends to Watch in 2024.pdf
Trading Software Development_ Trends to Watch in 2024.pdfLucas Lagone
 
Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emNho Vĩnh
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfTejal81
 
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlueVM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlueShapeBlue
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...ShapeBlue
 
Why Disability Justice should be at the core of your digital accessibility jo...
Why Disability Justice should be at the core of your digital accessibility jo...Why Disability Justice should be at the core of your digital accessibility jo...
Why Disability Justice should be at the core of your digital accessibility jo...Modality Co
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)François
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewAshraf Fouad
 
MuleSoft Online Meetup Group - B2B Crash Course: PM Insider Lecture
MuleSoft Online Meetup Group - B2B Crash Course: PM Insider LectureMuleSoft Online Meetup Group - B2B Crash Course: PM Insider Lecture
MuleSoft Online Meetup Group - B2B Crash Course: PM Insider LectureManik S Magar
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientKari Kakkonen
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueShapeBlue
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfRodneyThomas28
 
TrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI InnovationsTrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI InnovationsTrustArc
 
Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...BookNet Canada
 
software-quality-assurance question paper 2023
software-quality-assurance question paper 2023software-quality-assurance question paper 2023
software-quality-assurance question paper 2023RohanMistry15
 
Communities, networking and developer culture
Communities, networking and developer cultureCommunities, networking and developer culture
Communities, networking and developer cultureRavi Sanghani
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubShapeBlue
 
eXtended Reality(XR) Basic introductions
eXtended Reality(XR) Basic introductionseXtended Reality(XR) Basic introductions
eXtended Reality(XR) Basic introductionsElanthirayan Madhavan
 

Recently uploaded (20)

Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry development
 
Trading Software Development_ Trends to Watch in 2024.pdf
Trading Software Development_ Trends to Watch in 2024.pdfTrading Software Development_ Trends to Watch in 2024.pdf
Trading Software Development_ Trends to Watch in 2024.pdf
 
Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ em
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
 
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlueVM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 
Why Disability Justice should be at the core of your digital accessibility jo...
Why Disability Justice should be at the core of your digital accessibility jo...Why Disability Justice should be at the core of your digital accessibility jo...
Why Disability Justice should be at the core of your digital accessibility jo...
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book Review
 
MuleSoft Online Meetup Group - B2B Crash Course: PM Insider Lecture
MuleSoft Online Meetup Group - B2B Crash Course: PM Insider LectureMuleSoft Online Meetup Group - B2B Crash Course: PM Insider Lecture
MuleSoft Online Meetup Group - B2B Crash Course: PM Insider Lecture
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdf
 
TrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI InnovationsTrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI Innovations
 
Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...Transcript: Trending now: Book subjects on the move in the Canadian market - ...
Transcript: Trending now: Book subjects on the move in the Canadian market - ...
 
software-quality-assurance question paper 2023
software-quality-assurance question paper 2023software-quality-assurance question paper 2023
software-quality-assurance question paper 2023
 
Communities, networking and developer culture
Communities, networking and developer cultureCommunities, networking and developer culture
Communities, networking and developer culture
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
 
eXtended Reality(XR) Basic introductions
eXtended Reality(XR) Basic introductionseXtended Reality(XR) Basic introductions
eXtended Reality(XR) Basic introductions
 

UCX-Python - A Flexible Communication Library for Python Applications

  • 1. March 21, 2018 UCX-PYTHON: A FLEXIBLE COMMUNICATION LIBRARY FOR PYTHON APPLICATIONS
  • 2. 2 OUTLINE Motivation and goals Implementation choices Features/API Performance Next steps
  • 3. 3 WHY PYTHON-BASED GPU COMMUNICATION? Python use growing Extensive libraries Python in Data science/HPC is growing + GPU usage and communication needs
  • 4. 4 IMPACT ON DATA SCIENCE RAPIDS uses dask-distributed for data distribution over python sockets => slows down all communication-bound components Critical to enable dask with the ability to leverage IB, NVLINK CUDA PYTHON APACHE ARROW DASK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUMLCUDF CUGRAPH Courtesy RAPIDS Team
  • 5. 5 CURRENT COMMUNICATION DRAWBACKS Existing python communication modules primarily rely on sockets Low latency / high bandwidth critical for better system utilization of GPUs (eg: NVLINK, IB) Frameworks that transfer GPU data between sites make copies But CUDA-aware data movement is largely solved in HPC!
  • 6. 6 REQUIREMENTS AND RESTRICTIONS Dask – popular framework facilitates scaling python workloads to many nodes Permits use of cuda-based python objects Allows workers to be added and removed dynamically communication backed built around coroutines (more later) Why not use mpi4py then? Dimension mpi4py CUDA-Aware? No - Makes GPU<->CPU copies Dynamic scaling? No - Imposes MPI restrictions Coroutine support? No known support
  • 7. 7 GOALS Provide a flexible communication library that: 1. Supports CPU/GPU buffers over a range of message types - raw bytes, host objects/memoryview, cupy objects, numba objects 2. Supports Dynamic connection capability 3. Supports pythonesque programming using Futures, Coroutines, etc (if needed) 4. Provides close to native performance from python world How? – Cython, UCX
  • 8. 8 OUTLINE Motivation and goals Implementation choices Features/API Performance Next steps
  • 9. 9 WHY UCX? Popular unified communication library used for MPI/PGAS implementations such as OpenMPI, MPICH, OSHMEM, etc Exposes API for: Client-server based connection establishment Point-to-point, RMA, atomics capabilities Tag matching Callbacks on communication events Blocking/Polling progress Cuda-Aware Point-to-point communication C library!
  • 10. 10 PYTHON BINDING APPROACHES Three main considerations: SWIG, CFFI, Cython Problems with SWIG, CFFI Works well for small examples but not for C libraries UCX definitions of structures isn’t consolidated Tedious to populate interface file / python script by hand
  • 11. 11 CYTHON Call C functions and structures from cython code (.pyx) Expose classes, functions from python which can use C underneath ucx_echo.py ... ep=ucp.get_endpoint() ep.send_obj(…) … ucp_py.pyx cdef class ucp_endpoint: def send_obj(…): ucp_py_send_nb(…) ucp_py_ucp_fxns.c struct ctx *ucp_py_send_nb() { ucp_tag_send_nb(…) } Defined in UCX C libraryDefined in UCX-PY module
  • 12. 12 UCX-PY STACK UCX-PY (.py) OBJECT META-DATA EXTRACTION/ UCX C-WRAPPERS (.pyx) RESOURCE MANAGEMENT/CALLBACK HANDLING/UCX CALLS(.c) UCX C LIBRARY
  • 13. 13 OUTLINE Motivation and goals Implementation choices Features/API Performance Next steps
  • 14. 14 COROUTINES Co-operative concurrent functions Preempted when read/write from disk perform communication sleep, etc Scheduler/event loop manages execution of all coroutines Single thread utilization increases def zzz(i): print("start", i) time.sleep(2) print("finish", i) def main(): zzz(1) zzz(2) main() Ouput: start 1 # t = 0 finish 1 # t = 2 start 2 # t = 2 + △ finish 2 # t = 4 + △ async def zzz(i): print("start" , i) await asyncio.sleep(2) print("finish“, i) f = asyncio.create_task async def main(): task1 = f(zzz(1)) task2 = f(zzz(2)) await task1 await task2 asyncio.run(main()) start 1 # t = 0 start 2 # t = 0 + △ finish 1 # t = 2 finish 2 # t = 2 + △
  • 15. 15 UCX-PY CONNECTION ESTABLISHMENT API Dynamic connection establishment .start_listener(accept_cb, port, is_coroutine) : Server creates listener .get_endpoint (ip, port) : client connects Multiple listeners allowed, multiple endpoints to server allowed async def accept_cb(ep, …): … await ep.send_obj() … await ep.recv_obj() … ucp.start_listener(accept_cb, port, is_coroutine=True) async def talk_to_client(): ep = ucp.get_endpoint(ip, port) … await ep.recv_obj() … await ep.send_obj() …
  • 16. 16 UCX-PY CONNECTION ESTABLISHMENT ucp.start_listener() ucp.get_endpoint() listening state accept connection invoke callback accept_cb() Server Client
  • 17. 17 UCX-PY DATA MOVEMENT API Send data (on endpoint) .send_*() : raw bytes, host objects (numpy), cuda objects (cupy, numba) Receive data (on endpoint) .recv_obj() : pass an object as argument where data is received .recv_future() ‘blind’ : no input; returns received object; low performance async def accept_cb(ep, …): … await ep.send_obj(cupy.array([42])) … async def talk_to_client(): ep = ucp.get_endpoint(ip, port) … rr = await ep.recv_future() msg = ucp.get_obj_from_msg(rr) …
  • 18. 18 UCX-PY DATA MOVEMENT SEMANTICS Send/Recv operations are non-blocking by default Issue of the operation returns a future Calling await on the future or calling future.result() blocks until completion Caveat - Limited number of object types tested memoryview, numpy, cupy, and numba
  • 19. 19 UNDER THE HOOD UCX depends on event notification to avoid the main thread from constantly polling Layer UCX Calls Connection management ucp_{listener/ep}_create Issuing data movement ucp_tag_{send/recv/probe}_nb Request progress ucp_worker_{arm/signal/progress} UCX-PY Blocking progress UCX event notification mechanism Completion queue events from IB event channel Read/write event from Sockets
  • 20. 20 OUTLINE Motivation and goals Implementation choices Features/API Performance Next steps
  • 21. 21 EXPERIMENTAL TESTBED Hardware includes 2 Nodes: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz Tesla V100-SXM2 (CUDA 9.2.88, driver version 410.48) ConnectX-4 Mellanox HCAs (OFED-internal-4.0-1.0.1) Software: UCX 1.5, Python 3.7.1 Case UCX progress mode Python functions Latency bound polling regular Bandwidth bound Blocking (event notification based) coroutines
  • 22. 22 HOST MEMORY LATENCY 0 1 2 3 4 5 6 7 8 9 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K Latency(us) Message Size (bytes) Short Message Latency native-UCX python-UCX Latency-bound host transfers 0 100 200 300 400 500 600 700 800 16K 32K 64K 128K 256K 512K 1M 2M 4M Latency(us) Message Size (bytes) Large Message Latency native-UCX python-UCX
  • 23. 23 DEVICE MEMORY LATENCY 0 2 4 6 8 10 12 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K Latency(us) Message Size (bytes) Short Message Latency native-UCX python-UCX Latency-bound device transfers 0 100 200 300 400 500 600 700 800 900 16K 32K 64K 128K 256K 512K 1M 2M 4M Latency(us) Message Size (bytes) Large Message Latency native-UCX python-UCX
  • 24. 24 DEVICE MEMORY BANDWIDTH Bandwidth-bound transfers (cupy) 6.15 7.36 8.85 9.86 10.47 10.85 8.7 9.7 10.3 10.7 10.9 11.03 0 2 4 6 8 10 12 10MB 20MB 40MB 80MB 160MB 320MB BandwidthGB/s Message Size cupy native
  • 25. 25 OUTLINE Motivation and goals Implementation choices Features/API Performance Next steps
  • 26. 26 NEXT STEPS Performance validate dask-distributed over UCX-PY with dask-cuda workloads Objects that have mixed physical backing (CPU and GPU) Adding blocking support to NVLINK based UCT Non-contiguous data transfers Integration into dask-distributed underway (https://github.com/TomAugspurger/distributed/commits/ucx+data-handling) Current implementation (https://github.com/Akshay-Venkatesh/ucx/tree/topic/py- bind) Push to UCX project underway (https://github.com/openucx/ucx/pull/3165)
  • 27. 27 SUMMARY UCX-PY is a flexible communication library Provides python developers a way to leverage high-speed interconnects like IB Can support pythonesque way of overlap communication with other coroutines Or can be non-overlapped like in traditional HPC Can support data-movement of objects residing on CPU memory or on GPU memory users needn’t explicitly copy GPU<->CPU UCX-PY is close to native performance for major use case range
  • 28. 28 BIG PICTURE UCX-PY will serve as a high-performance communication module for dask CUDA PYTHON APACHE ARROW DASK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUMLCUDF CUGRAPH UCX-PY

Editor's Notes

  1. Ease of use Python was language of the year 2018 Data science is one of the most sought after areas. Large investments
  2. Ease of use Python was language of the year 2018 Data science is one of the most sought after areas. Large investments
  3. Ease of use Python was language of the year 2018 Data science is one of the most sought after areas. Large investments