The document provides an overview of the libfabric interface for high performance networking. It discusses the high-level architecture including interfaces, services, and the object model. It describes the control interface for querying capabilities and attributes. It also presents a simple ping-pong example using reliable datagram messaging endpoints to illustrate basic usage.
Cisco's journey from Verbs to LibfabricJeff Squyres
This is one of two mini-talks that I gave at Euro MPI 2015 / Bordeaux.
It describes the journey Cisco undertook to evaluate two different Linux operating-system bypass APIs: Verbs and Libfabric. I detail the technical points we evaluated in both APIs, and ultimately show why we picked Libfabric over Verbs.
Slides presented by Jeff Squyres at the 2015 OpenFabrics Software Developers' Workshop. This talk discusses Cisco's experiences implementing an ultra-low latency Ethernet plugin / provider for the Linux Verbs API and for for the Libfabric API.
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
Cisco's journey from Verbs to LibfabricJeff Squyres
This is one of two mini-talks that I gave at Euro MPI 2015 / Bordeaux.
It describes the journey Cisco undertook to evaluate two different Linux operating-system bypass APIs: Verbs and Libfabric. I detail the technical points we evaluated in both APIs, and ultimately show why we picked Libfabric over Verbs.
Slides presented by Jeff Squyres at the 2015 OpenFabrics Software Developers' Workshop. This talk discusses Cisco's experiences implementing an ultra-low latency Ethernet plugin / provider for the Linux Verbs API and for for the Libfabric API.
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
Service Function Chaining in Openstack NeutronMichelle Holley
Service Function Chaining (SFC) uses software-defined networking (SDN) capabilities to create a service chain of connected network services (such as L4-7 like firewalls,
network address translation [NAT], intrusion protection) and connect them in a virtual chain. This capability can be used by network operators to set up suites or catalogs
of connected services that enable the use of a single network connection for many services, with different characteristics.
networking-sfc is a service plugin of Openstack neutron. The talk will go over the architecture, implementation, use-cases and latest enhancements to networking-sfc (the APIs and implementation to support service function chaining in neutron).
About the speaker: Farhad Sunavala is currently a principal architect/engineer working on Network Virtualization, Cloud service, and SDN technologies at Huawei Technology USA. He has led several wireless projects in Huawei including virtual EPC, service function chaining, etc. Prior to Huawei, he worked 17 years at Cisco. Farhad received his MS in Electrical and Computer Engineering from University of New Hampshire. His expertise includes L2/L3/L4 networking, Network Virtualization, SDN, Cloud Computing, and
mobile wireless networks. He holds several patents in platforms, virtualization, wireless, service-chaining and cloud computing. Farhad was a core member of networking-sfc.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
An Introduction to eBPF (and cBPF). Topics covered include history, implementation, program types & maps. Also gives a brief introduction to XDP and DPDK
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackShapeBlue
Backroll is not only a production-grade KVM backup solution. It is also being integrated inside Apache Cloudstack using the Backup and restore framework. Pierre and Quentin will show how it works, the feature list, and how the integration has been made.
Quentin is in charge of DIMSI custom developments on Apache Cloudstack deployment : customer portal, backup solution. On a daily basis, he helps our customers and our developers to use and embrace Devops methodology, by building CI/CD pipelines (GitLab, Azure Devops), dockerizing apps and automate things as much as possible... When not DevOps'ing, Quentin loves to binge watch series and movies, play with his cat "Boogie" and is a crazy fan of street food.
Grégoire is a software architect who spends most of his time designing infrastructure applications and CRM systems, on-premise or multi-cloud based. He’s been using Apache Cloudstack for many years, and likes to keep knowledge and data outside black-boxes Father of 4 children, you can meet him in Southern Brittany, sailing Hobbie Cat or supporting Lorient football club at Moustoir stadium.
Pierre is in charge of Backroll integration inside Cloudstack. Pierre has a proven track record of successful c# and Java projects. When not playing with his keyboard, Pierre is surfing, WingFoiling or bodyboarding on Brittany coast.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
Session ID: HKG18-411
Session Name: HKG18-411 - Introduction to OpenAMP which is an open source solution for heterogeneous system orchestration and communication
Speaker: Wendy Liang
Track: IoT, Embedded
★ Session Summary ★
Introduction to OpenAMP which is an open source solution for heterogeneous system orchestration and communication
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-411/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-411.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-411.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: IoT, Embedded
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Overview of kubernetes network functionsHungWei Chiu
In this slides, I briefly introduce the network function in the kubernetes and explain how kubernetes implement them.
Those function includes the container network interface (CNI) and kubernetes service.
In the last, I introduce the multus CNI which is designed for multiple networks in the container and it's necessary in some use case, such as SDN/NFV/5G
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
Many organizations use Apache Kafka® to build data pipelines that span multiple geographically distributed data centers, for use cases ranging from high availability and disaster recovery, to data aggregation and regulatory compliance.
The journey from single-cluster deployments to multi-cluster deployments can be daunting, as you need to deal with networking configurations, security models and operational challenges. Geo-replication support for Kafka has come a long way, with both open-source and commercial solutions that support various replication topologies and disaster recovery strategies.
So, grab your towel, and join us on this journey as we look at tools, practices, and patterns that can help us build reliable, scalable, secure, global (if not inter-galactic) data pipelines that meet your business needs, and might even save the world from certain destruction.
Service Function Chaining in Openstack NeutronMichelle Holley
Service Function Chaining (SFC) uses software-defined networking (SDN) capabilities to create a service chain of connected network services (such as L4-7 like firewalls,
network address translation [NAT], intrusion protection) and connect them in a virtual chain. This capability can be used by network operators to set up suites or catalogs
of connected services that enable the use of a single network connection for many services, with different characteristics.
networking-sfc is a service plugin of Openstack neutron. The talk will go over the architecture, implementation, use-cases and latest enhancements to networking-sfc (the APIs and implementation to support service function chaining in neutron).
About the speaker: Farhad Sunavala is currently a principal architect/engineer working on Network Virtualization, Cloud service, and SDN technologies at Huawei Technology USA. He has led several wireless projects in Huawei including virtual EPC, service function chaining, etc. Prior to Huawei, he worked 17 years at Cisco. Farhad received his MS in Electrical and Computer Engineering from University of New Hampshire. His expertise includes L2/L3/L4 networking, Network Virtualization, SDN, Cloud Computing, and
mobile wireless networks. He holds several patents in platforms, virtualization, wireless, service-chaining and cloud computing. Farhad was a core member of networking-sfc.
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
Philipp explains the best performing Open Source software-defined storage software available to Apache CloudStack today. It consists of two well-concerted components. LINSTOR and DRBD. Each of them also has its independent use cases, where it is deployed alone. In this presentation, the combination of these two is examined. They form the control plane and the data plane of the SDS. We will touch on: Performance, scalability, hyper-convergence (data-locality for high IO performance), resiliency through data replication (synchronous within a site, 2-way, 3-way, or more), snapshots, backup (to S3), encryption at rest, deduplication, compression, placement policies (regarding failure domains), management CLI and webGUI, monitoring interface, self-healing (restoring redundancy after device/node failure), the federation of multiple sites (async mirroring and repeatedly snapshot difference shipping), QoS control (noisy neighbors limitation) and of course: complete integration with CloudStack for KVM guests. It is Open Source software following the Unix philosophy. Each component solves one task, made for maximal re-usability. The solution leverages the Linux kernel, LVM and/or ZFS, and many Open Source software libraries. Building on these giant Open Source foundations, not only saves LINBIT from re-inventing the wheels, it also empowers your day 2 operation teams since they are already familiar with these technologies.
Philipp Reisner is one of the founders and CEO of LINBIT in Vienna/Austria. He holds a Dipl.-Ing. (comparable to MSc) degree in computer science from Technical University in Vienna. His professional career has been dominated by developing DRBD, a storage replication software for Linux. While in the early years (2001) this was writing kernel code, today he leads a company of 30 employees with locations in Austria and the USA. LINBIT is an Open Source company offering enterprise-level support subscriptions for its Open Source technologies.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
An Introduction to eBPF (and cBPF). Topics covered include history, implementation, program types & maps. Also gives a brief introduction to XDP and DPDK
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackShapeBlue
Backroll is not only a production-grade KVM backup solution. It is also being integrated inside Apache Cloudstack using the Backup and restore framework. Pierre and Quentin will show how it works, the feature list, and how the integration has been made.
Quentin is in charge of DIMSI custom developments on Apache Cloudstack deployment : customer portal, backup solution. On a daily basis, he helps our customers and our developers to use and embrace Devops methodology, by building CI/CD pipelines (GitLab, Azure Devops), dockerizing apps and automate things as much as possible... When not DevOps'ing, Quentin loves to binge watch series and movies, play with his cat "Boogie" and is a crazy fan of street food.
Grégoire is a software architect who spends most of his time designing infrastructure applications and CRM systems, on-premise or multi-cloud based. He’s been using Apache Cloudstack for many years, and likes to keep knowledge and data outside black-boxes Father of 4 children, you can meet him in Southern Brittany, sailing Hobbie Cat or supporting Lorient football club at Moustoir stadium.
Pierre is in charge of Backroll integration inside Cloudstack. Pierre has a proven track record of successful c# and Java projects. When not playing with his keyboard, Pierre is surfing, WingFoiling or bodyboarding on Brittany coast.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...Linaro
Session ID: HKG18-411
Session Name: HKG18-411 - Introduction to OpenAMP which is an open source solution for heterogeneous system orchestration and communication
Speaker: Wendy Liang
Track: IoT, Embedded
★ Session Summary ★
Introduction to OpenAMP which is an open source solution for heterogeneous system orchestration and communication
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-411/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-411.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-411.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: IoT, Embedded
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Overview of kubernetes network functionsHungWei Chiu
In this slides, I briefly introduce the network function in the kubernetes and explain how kubernetes implement them.
Those function includes the container network interface (CNI) and kubernetes service.
In the last, I introduce the multus CNI which is designed for multiple networks in the container and it's necessary in some use case, such as SDN/NFV/5G
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
Many organizations use Apache Kafka® to build data pipelines that span multiple geographically distributed data centers, for use cases ranging from high availability and disaster recovery, to data aggregation and regulatory compliance.
The journey from single-cluster deployments to multi-cluster deployments can be daunting, as you need to deal with networking configurations, security models and operational challenges. Geo-replication support for Kafka has come a long way, with both open-source and commercial solutions that support various replication topologies and disaster recovery strategies.
So, grab your towel, and join us on this journey as we look at tools, practices, and patterns that can help us build reliable, scalable, secure, global (if not inter-galactic) data pipelines that meet your business needs, and might even save the world from certain destruction.
10 years ago I presented for the first time at Visug on the topic of Visual Studio Team System, the first iteration of a product family that allowed us to automate the long road from requirement to software in production, and everything in between. At the time software development was mainly a manual process. The software itself was either monolithic or composed of large 'SOA Services' and it was a real challenge to get them into production every few months in a so called 'big bang’ deployment. Since then our profession has gone through some major changes, software development looks a lot different now. Today, many applications consist of small parts called 'microservices'. These microservices make their way into the cloud or datacenter automatically, through API driven Continuous Deployment systems, every time anyone on the team commits a small change. While deployments are now happening continuously, they do have an impact on the system: taking down part of it for maintenance all the time. But at the same time, our customers expect the overall system to stay up 24/7. In this talk I will introduce you to an upcoming technology, called Service Fabric, that can help you maintain your development agility in this new world, but still live up to your customer’s expectations.
A short presentation at a CSC internal workshop of the prospects of using container technologies, especially Docker, in the context of High Performance Computing (HPC).
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
In this video, Dr. Umit Catalyurek from Georgia Institute of Technology presents: Modern Computing: Cloud, Distributed, & High Performance.
Ümit V. Çatalyürek is a Professor in the School of Computational Science and Engineering in the College of Computing at the Georgia Institute of Technology. He received his Ph.D. in 2000 from Bilkent University. He is a recipient of an NSF CAREER award and is the primary investigator of several awards from the Department of Energy, the National Institute of Health, and the National Science Foundation. He currently serves as an Associate Editor for Parallel Computing, and as an editorial board member for IEEE Transactions on Parallel and Distributed Computing, and the Journal of Parallel and Distributed Computing.
Learn more: http://www.bigdatau.org/data-science-seminars
Watch the video presentation: http://wp.me/p3RLHQ-ghU
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the HPC User Forum in Tucson, Earl Joseph from IDC presents: 2016 IDC HPC Market Update.
Watch the video presentation: http://wp.me/p3RLHQ-fbt
Learn more: http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.
http://apachebigdata2016.sched.org/event/6M0L/next-gen-big-data-analytics-with-apache-apex-thomas-weise-datatorrent
Introduce the basic concept of Open vSwitch. In this slide, we talked about how Linux kernel and networking stack worked together to forward and process the network packet and also compare those Linux networking stack functionality with Open vSwitch and Openflow.
At the end of this slide, we talk about the challenge to integrate the Open vSwitch with Kubernetes, what kind of the networking function we need to resolve and what is the benefit we can get from the Open Vswitch.
Introduction to Apache Apex and writing a big data streaming application Apache Apex
Introduction to Apache Apex - The next generation native Hadoop platform, and writing a native Hadoop big data Apache Apex streaming application.
This talk will cover details about how Apex can be used as a powerful and versatile platform for big data. Apache apex is being used in production by customers for both streaming and batch use cases. Common usage of Apache Apex includes big data ingestion, streaming analytics, ETL, fast batch. alerts, real-time actions, threat detection, etc.
Presenter : <b>Pramod Immaneni</b> Apache Apex PPMC member and senior architect at DataTorrent Inc, where he works on Apex and specializes in big data applications. Prior to DataTorrent he was a co-founder and CTO of Leaf Networks LLC, eventually acquired by Netgear Inc, where he built products in core networking space and was granted patents in peer-to-peer VPNs. Before that he was a technical co-founder of a mobile startup where he was an architect of a dynamic content rendering engine for mobile devices.
This is a video of the webcast of an Apache Apex meetup event organized by Guru Virtues at 267 Boston Rd no. 9, North Billerica, MA, on <b>May 7th 2016</b> and broadcasted from San Jose, CA. If you are interested in helping organize i.e., hosting, presenting, community leadership Apache Apex community, please email apex-meetup@datatorrent.com
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...OPNFV
The talk will highlight how Machine Learning techniques can be used to address different aspects of the operation and control of NFV and propose future OPNFV activities in this area. First, Diego will introduce how Machine Learning is being applied by the CogNet project to address intent-based networking, and discuss the architecture defined there as a potential framework for future ML integration. Glen will demonstrate a policy-based system for automating VNF scaling using performance data collection and analytics with machine learning (ML), based on OPNFV Brahmaputra and the underlying OpenStack telemetry system (Ceilometer), as well as the open-source Apache Kafka, Apache Zookeeper and Apache Spark streaming and MLlib libraries. Available as open-source, it combines predictive and reactive inputs to make the VNF scaling decision and trigger action in the MANO stack. The presentation will provide an overview of the system, demonstrate the VNF auto-scaling use case and discuss how this system will fit into a future OPNFV release.
Adding Real-time Features to PHP ApplicationsRonny López
It's possible to introduce real-time features to PHP applications without deep modifications of the current codebase.
Using WAMP you can build distributed systems out of application components which are loosely coupled and communicate in (soft) real-time.
There is no need to learn a whole new language, with the implications it has.
It also opens the door to write reactive, event-based, distributed architectures and to achieve easier scalability by distributing messages to multiple systems.
AWS re:Invent 2016: Advanced Tips for Amazon EC2 Networking and High Availabi...Amazon Web Services
In this session, we focus on designing for high availability, with evaluation criteria for using services and features such as Amazon Route 53, Elastic Load Balancing, Auto Scaling, route tables, network interfaces, device clustering, and the Transit VPC architecture. We also explore how to create highly available networking between regions as well as on-premises.
An introduction to SignalR
This deck was part of my presentation to Virtusa employees on an ASP.NET asynchronous, persistent signaling library known as SignalR
There is also a slide on how to use SignalR with SharePoint.
Date: August 2013
Follow / Tweet me: @ShehanPeruma
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw44CON
44CON 2014 - Binary Protocol Analysis with CANAPE, James Forshaw
CANAPE is an open source network proxy written in .NET. It has been developed to aid in the analysis and exploitation of unknown application network protocols using a similar use case to common HTTP proxies such as Burp or CAT.
This workshop will go through the basics of analysing an unknown application protocol with hands on training examples. By the end of the workshop candidates should be able to better understand CANAPE’s functionality and be able to apply that to other protocols they come across.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
3. ● This tutorial covers libfabric as of the v1.1.1 release
● Future versions might look a little different, but the v1.1
interface should remain available for a long time
● Man pages, source, presentations all available at:
○ http://ofiwg.github.io/libfabric/
○ https://github.com/ofiwg/libfabric
● Code on slides deliberately omits error checking for
clarity
Overview
4. Developer Note
• libfabric supports a sockets provider
• Allows it to run on most Linux systems
• Includes virtual Linux environments
• Also runs on OS X
• brew install libfabric
5. Acknowlegements
Contributors to this tutorial not present today:
• Bob Russell, University of New Hampshire
• Sayantan Sur, Intel
• Jeff Squyres, Cisco
10. Control Services
• Discover information about types of fabric
services available
• Identify most effective ways of utilizing a
provider
• Request specific features
• Convey usage model to providers
11. Communication Services
• Setup communication between processes
• Support connection-oriented and
connectionless communication
• Connection management targets ease of use
• Address vectors target high scalability
12. Completion Services
• Asynchronous completion support
• Event queues for detailed status
• Configurable level of data reported
• Separation between control versus data operations
• Low-impact counters for fast notification
13. Data Transfer Services
Supports different communication paradigms:
• Message Queues - send/receive FIFOs
• Tag Matching - steered message transfers
• RMA - direct memory transfers
• Atomics - direct memory manipulation
16. Fabric Object
• Represent a single physical
or virtual network
• Shares network addresses
• May span multiple providers
• Future: topology information
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
17. Domain Object
• Logical connection into a
fabric
• Physical or virtual NIC
• Boundary for associating
fabric resources
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
18. Passive Endpoint
• Used by connection-
oriented protocols
• Listens for connection
requests
• Often map to software
constructs
• Can span multiple domains
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
19. Event Queue
• Report completion of asyn-
chronous control operations
• Report error and other
notifications
• Explicitly or implicitly subscribe
for events
• Often mix of HW and SW
support
• Usage designed for ease of
use
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
20. Wait Set
• Optimized method of waiting
for events across multiple
event queues, completion
queues, and counters
• Abstraction of wait object(s)
• Enables platform specific,
high-performance wait objects
• Uses single wait object when
possible
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
21. Active Endpoint
• Data transfer communication
portal
• Identified by fabric address
• Often associated with a
single NIC
• Hardware Tx/Rx command
queues
• Support onload, offload, and
partial offload
implementations
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
23. Completion Queue
• Higher performance queues
for data transfer completions
• Associated with a single
domain
• Often mapped to hardware
resources
• Optimized to report
successful completions
• User-selectable completion format
• Detailed error completions
reported ‘out of band’
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
24. Remote CQ Data
• Application data written directly into remote
completion queue
• InfiniBand immediate data
• Support for up to 8 bytes
• Minimum of 4 bytes, if supported
25. Counters
• Lightweight completion
mechanism for data
transfers
• Report only number of
successful/error
completions
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
26. Poll Set
• Designed for providers that
use the host processor to
progress data transfers
• Allows provider to use
application thread
• Allows driving progress
across all objects assigned
to a poll set
• Can optimize where progress
occurs
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
27. Memory Region
• Local memory buffers
exposed to fabric services
• Permissions control access
• Focused on desired
application usage, with
support for existing
hardware
• Registration of locally used
buffers
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
28. Memory Registration Modes
• FI_MR_BASIC
• MR attributes selected by provider
• Buffers identified by virtual address
• Application must exchange MR parameters
• FI_MR_SCALABLE
• MR attributes selected by application
• Buffers accessed starting at address 0
• Eliminates need to exchange MR parameters
29. Address Vector
• Store peer addresses for
connectionless endpoints
• Map higher level addresses
to fabric specific addresses
• Designed for high scalability
• Enable minimal memory footprint
• Optimized address resolution
• Supports shared memory
Passive
endpoints
Fabric Event queues
Wait sets
Completions
queues
Completion
counters
Poll sets
Active
endpoints
Domain
Address
vector
Memory
regions
30. Address Vector Types
• FI_AV_MAP
• Peers identified using a 64-bit fi_addr_t
• Provider can encode fabric address directly
• Enables direct mapping to hardware commands
• FI_AV_TABLE
• Peers identified using an index
• Minimal application memory footprint (0!)
• May require lookup on each data transfer
36. Shared Contexts Endpoints may share
underlying command queues
Enables resource manager to select
where resource sharing occurs
CQs may be
dedicated or shared
37. Scalable Endpoints
Single addressable endpoint
with multiple command queues
Each context may
be associated with
its own CQ
Targets lockless,
multi-threaded usage
40. Control Interface - getinfo
int fi_getinfo(uint32_t version,
const char *node,
const char *service,
uint64_t flags,
struct fi_info *hints,
struct fi_info **info);
Explicit versioning for
forward compatibility
Modeled after getaddrinfo /
rdma_getaddrinfo
Hints used to filter output
Returns list of fabric info structures
41. Control Interface - fi_info
struct fi_info {
struct fi_info *next;
uint64_t caps;
uint64_t mode;
...
};
Primary structure used to query
and configure interfaces
caps and mode flags provide
simple mechanism to request
basic fabric services
42. Control Interface - fi_info
struct fi_info {
...
uint32_t addr_format;
size_t src_addrlen;
size_t dest_addrlen;
void *src_addr;
void *dest_addr;
...
};
Enables apps to be agnostic of
fabric specific addressing
(FI_FORMAT_UNSPEC) ..
But also supports apps
wanting to request a specific
source or destination address
Apps indicate their address
format for all APIs up front
43. Control Interface - fi_info
struct fi_info {
...
fid_t handle;
struct fi_tx_attr *tx_attr;
struct fi_rx_attr *rx_attr;
struct fi_ep_attr *ep_attr;
struct fi_domain_attr *domain_attr;
struct fi_fabric_attr *fabric_attr;
};
Links to attributes related to
the data transfer services that
are being requested
Apps can use default values
or request minimal attributes
45. Capability Bits
• Desired features and services requested by
application
• Primary - application must request to enable
• Secondary - application may request
• Provider may enable if not requested
Providers enable capabilities if requested, or
if it will not impact performance or security
Basic set of features
required by application
46. Capabilities
• FI_MSG
• FI_RMA
• FI_TAGGED
• FI_ATOMIC
Specify desired data transfer
services (interfaces) to
enable
• FI_READ
• FI_WRITE
• FI_SEND
• FI_RECV
• FI_REMOTE_READ
• FI_REMOTE_WRITE
Overrides default capabilities;
used to limit functionality
47. Capabilities
• FI_SOURCE
• Source address returned with completion data
• Enabling may impact performance
• FI_DIRECTED_RECV
• Use source address of an incoming message to select receive buffer
• FI_MULTI_RECV
• Support for single buffer receiving multiple incoming messages
• Enables more efficient use of receive buffers
• FI_NAMED_RX_CTX
• Used with scalable endpoints
• Allows initiator to direct transfers to a desired receive context
Receive oriented capabilities
48. Capabilities
• FI_RMA_EVENT
• Supports generating completion events when endpoint is the target of
an RMA operation
• Enabling can avoid sending separate message after RMA completes
• FI_TRIGGER
• Supports triggered operations
• Triggered operations are specialized use cases of existing data
transfer routines
• FI_FENCE
• Supports fencing operations to a given remote endpoint
49. Mode Bits
• Requirements placed on the application
• Application indicates which modes it supports
• Requests that an application implement a
feature
• Application may see improved performance
• Cost of implementation by application is less than
provider based implementation
• Often related to hardware limitations
Provider requests to the
application
50. Mode Bits
• FI_CONTEXT
• Application provides ‘scratch’ space for providers as part of all data
transfers
• Avoids providers needing to allocate internal structures to track
requests
• Targets providers that have a significant software component
• FI_LOCAL_MR
• Provider requires that locally accessed data buffers be registered with
the provider before being used
• Supports existing iWarp and InfiniBand hardware
51. Mode Bits
• FI_MSG_PREFIX
• Application provides buffer space before their data buffers
• Typically used by provider to implement protocol headers
• FI_ASYNC_IOV
• Indicates that IOVs must remain valid until an operation completes
• Avoids providers needing to buffer IOVs
• FI_RX_CQ_DATA
• Indicates that transfers which carry remote CQ data consume receive
buffer space
• Supports existing InfiniBand hardware
53. Attributes
• Providers encode default sizes for allocated
resources
• Administrator may override defaults
• Attributes reflect configured or optimal sizes
• Not necessarily maximums
• Intent is to guide resource managers to allocate
resources efficiently
54. Fabric Attributes
struct fi_fabric_attr {
struct fid_fabric *fabric;
char *name;
char *prov_name;
uint32_t prov_version;
};
Return a reference to
an already opened
instance, if it exists
Framework will search for
and select most recent
provider version available
55. Domain Attributes
struct fi_domain_attr {
struct fid_domain *domain;
char *name;
enum fi_threading threading;
enum fi_progress control_progress;
enum fi_progress data_progress;
enum fi_resource_mgmt resource_mgmt;
...
};
Specifies how resources (EPs,
CQs, etc.) may be assigned to
threads without needing locking
Indicates if application
threads are used to
progress operationsIndicates if provider will protect
against queue overruns
56. Domain Attributes
struct fi_domain_attr {
...
enum fi_av_type av_type;
enum fi_mr_mode mr_mode;
size_t mr_key_size;
size_t cq_data_size;
size_t cq_cnt;
...
};
Map or indexed
address vector type
Basic or scalable
memory registration
Range of MR key
Size of supported
remote CQ data
Optimal number of CQs
supported by domain
58. Endpoint Attributes
struct fi_ep_attr {
enum fi_ep_type type;
uint32_t protocol;
uint32_t protocol_version;
size_t max_msg_size;
size_t msg_prefix_size;
...
}; If FI_PREFIX
mode set
Maximum transfer
To ensure
interoperability
59. Endpoint Attributes
struct fi_ep_attr {
...
size_t max_order_raw_size;
size_t max_order_war_size;
size_t max_order_waw_size;
uint64_t mem_tag_format;
size_t tx_ctx_cnt;
size_t rx_ctx_cnt;
};
Delivery order of
transport data
Tag matching support
Rx/Tx contexts
61. Tx/Rx Attributes
• op_flags
• Default flags to control operation
• Apply to all operations where flags are not provided
directly or are assumed by the call itself
• size
• Minimum number of operations that may be posted
to a context
• Assumes each operation consumes the maximum
amount of resources
62. Tx/Rx Attributes
• inject_size
• Injected buffers may be re-used immediately on
return from a function call
• Related to FI_INJECT flag and fi_inject() call
• Maximum size of an injected buffer
• total_buffered_recv
• Total available space allocated by provider to buffer
messages for which there is no matching receive
• Handles unexpected messages
63. msg_order
• Order in which transport headers are processed
• [READ | WRITE | SEND] after [R | W | S]
• Determines how receive buffers are associated
with transfers
• Necessary, but insufficient, for data ordering
64. comp_order
• Order in which completed requests are written
to a completion object
• FI_ORDER_NONE - no ordering defined
• FI_ORDER_STRICT - ordered by processing
• FI_ORDER_DATA - bytes are also written in order
• Order depends on communication type
• Unreliable - all operations ordered
• Reliable - ordered per remote endpoint
66. RDM Pingpong
1. Open an endpoint
2. Direct completions to selected queues
3. Setup an address vector
4. Send and receive messages
Client-server test using
reliable unconnected
endpoints
67. RDM Pingpong
struct fi_info *fi, *hints;
struct test_options opts;
int main(int argc, char **argv)
{
init_test_options(&opts, argc, argv);
Use command line to pass in
source/destination address
Assumes no failures!
68. RDM Pingpong
hints = fi_allocinfo();
hints->ep_attr->type = FI_EP_RDM;
hints->caps = FI_MSG;
if (opts.dest_addr) {
/* “client” */
fi_getinfo(FI_VERSION(1,1), opts.dest_addr,
opts.dest_port, 0, hints, &fi);
Send/receive messages
over RDM endpoint
Explicitly define version
supported by app
Never use FI_MAJOR_VERSION
or FI_MINOR_VERSION defines!
69. RDM Pingpong
} else {
/* “server” */
fi_getinfo(FI_VERSION(1,1), opts.src_addr,
opts.src_port, FI_SOURCE, hints, &fi);
}
Node and service parameters
are local addresses
We assume both sides get the same
fabric name and endpoint protocol
We could use the hints
to force this
70. RDM Pingpong
fi_fabric(fi->fabric_attr, &fabric, NULL);
fi_domain(fabric, fi, &domain, NULL);
fi_getinfo returns optimal
options first
Open fabric and domain
using returned defaults
In the absence of any hints,
provider returns attributes most
suited to their implementation
71. RDM Pingpong
struct fi_cq_attr cq_attr = {};
cq_attr.wait_obj = FI_WAIT_NONE;
cq_attr.format = FI_CQ_FORMAT_CONTEXT;
cq_attr.size = fi->tx_attr->size;
fi_cq_open(domain, &cq_attr, &tx_cq, NULL);
Create completion queue
for transmit context
We will never block
waiting for a completion
Only provide request
context for each completion
Size the CQ the same as
the transmit context
72. RDM Pingpong
cq_attr.size = fi->rx_attr->size;
fi_cq_open(domain, &cq_attr, &rx_cq, NULL);
Create completion queue
for receive context
Only adjust the size
Since this is a pingpong test, we really
only need CQ sizes of 1
73. RDM Pingpong
struct fi_av_attr av_attr = {};
av_attr.type = fi->domain_attr->av_type;
av_attr.count = 1;
fi_av_open(domain, fi, &av_attr, &av, NULL);
Create address vector
Use AV type optimal
for provider
By default, addresses inserted into the AV will be
resolved synchronously. We can obtain asynchronous
operation by binding the AV with an event queue
75. RDM Pingpong
fi_ep_bind(ep, av, 0);
fi_ep_bind(ep, tx_cq, FI_TRANSMIT);
fi_ep_bind(ep, rx_cq, FI_RECV);
fi_enable(ep);
Associate the endpoint
with the other resources
And enable it for
data transfers
76. RDM Pingpong
fi_recv(ep, rx_buf, MAX_CTRL_MSG_SIZE, 0, 0, NULL);
Client is given server address
through command line
Client will send its address to
the server as its first message
Server will ack when it is ready
77. RDM Pingpong
int wait_for_comp(struct fid_cq *cq)
{
struct fi_cq_entry entry;
int ret;
while (1) {
ret = fi_cq_read(cq, &entry, 1);
if (ret > 0)
return 0;
Define function to
retrieve a completion
CQ entry based on configured format
(i.e. FI_CQ_FORMAT_CONTEXT)
Return success if we
have a completion
78. RDM Pingpong
if (ret != -FI_EAGAIN) {
struct fi_cq_err_entry err_entry;
fi_cq_readerr(cq, &err_entry, 0);
printf(“%s %sn”, fi_strerror(err_entry.err),
fi_cq_strerror(cq, err_entry.prov_errno,
err_entry.err_data,
NULL, 0);
return ret;
}
}
}
The operation failed
Print some error information
79. RDM Pingpong
size_t addrlen = MAX_CTRL_MSG_SIZE;
if (opts.dest_addr) {
fi_av_insert(av, fi->dest_addr, 1, &remote_addr,
0, NULL);
fi_getname(&ep->fid, tx_buf, &addrlen);
Get client address to send to server
Synchronously insert server
address into client’s AV
80. RDM Pingpong
fi_send(ep, tx_buf, addrlen, NULL, remote_addr,
NULL);
wait_for_comp(rx_cq);
fi_recv(ep, rx_buf, opts.size, 0, 0, NULL);
wait_for_comp(tx_cq);
} else {
Wait for server to
ack that it’s ready
81. RDM Pingpong
wait_for_comp(rx_cq);
fi_av_insert(av, rx_buf, 1, &remote_addr,
0, NULL);
fi_recv(ep, rx_buf, opts.size, 0, 0, NULL);
fi_send(ep, tx_buf, 1, NULL, remote_addr, NULL);
wait_for_comp(tx_cq);
}
Server waits for
message from client
Insert client address
Ack that we’re ready
82. RDM Pingpong
for (i = 0; i < opts.iterations; i++) {
if (opts.dest_addr) {
fi_send(ep, tx_buf, opts.size, NULL,
remote_addr, NULL);
wait_for_comp(tx_cq);
wait_for_comp(rx_cq);
fi_recv(ep, rx_buf, opts.size, 0, 0, NULL);
} else {
Exchange messages
85. MPI: Choosing an OFI mapping
Implements
Counters CQ
MPI progress
Implements
Domain
MPI Initialization
Endpoint
One size does not fit all!
● OFI is a set of building blocks
● Customize for your hardware
This tutorial:
● Presents a possible mapping
of MPI to OFI.
● Designed for providers with
a semantic match to MPI
Examples of this mapping:
● OpenMPI MTL
● MPICH Netmod
Implements
Tagged
MPI Two-sided
Implements
RMA
MPI-3 RMA
MSG
86. MPI_Init
• Initializes processes for communication
• Sets up OFI data structures
• exchanges information necessary to
communicate
• Establishes an MPI communicator
• Processes are addressable by MPI rank
• Initial “world” communicator is all processes in a job
87. MPI_Init
int MPI_init(/* … */){
/* Initialize MPI and OFI */
/* Use process manager for job state (job size, rank, etc) */
/* fi_getinfo(): Query and request OFI capabilities */
/* Map OFI capabilities to MPI API set. */
/* Open OFI objects (fabric, domain, endpoint, counter, cq, av)*/
/* Exchange addresses via process manager */
/* Bind OFI objects */
}
OFI initialization is designed so
critical communication code path
will be lightweight.
There are multiple ways to map
MPI semantics to OFI semantics
89. MPI_Init: Map MPI to OFI
int MPI_init() {
…
hints = fi_allocinfo();
assert(hints != NULL);
hints->mode = FI_CONTEXT;
hints->caps = FI_TAGGED; /* Implements MPI tagged (2-sided) p2p */
hints->caps |= FI_MSG; /* Implements control messages */
hints->caps |= FI_MULTI_RECV; /* Ring buffer for control */
hints->caps |= FI_RMA; /* Implements MPI-3 RMA */
hints->caps |= FI_ATOMIC; /* Implements MPI-3 RMA atomics */
…
}
Allocate and clear an
info struct
Request Capabilities
MPI provides context
via MPI Requests
90. MPI_Init: OFI hints
int MPI_init() {
/* MPI handles locking, OFI should not lock */
hints->domain_attr->threading = FI_THREAD_ENDPOINT ;
/* OFI handles progress: Note that this choice may be provider dependent */
hints->domain_attr->control_progress = FI_PROGRESS_AUTO ;
hints->domain_attr->data_progress = FI_PROGRESS_AUTO ;
/* OFI handles flow control*/
hints->domain_attr->resource_mgmt = FI_RM_ENABLED;
/* MPI does not want to exchange memory regions */
hints->domain_attr->mr_mode = FI_MR_SCALABLE ;
/* Completions indicate data is in memory at the target */
hints->tx_attr->op_flags = FI_DELIVERY_COMPLETE |FI_COMPLETION;
}
91. MPI_Init: Query Provider
int MPI_init() {
…
ret = fi_getinfo(fi_version, NULL, NULL,
0ULL, hints, &prov));
prov_use = choose_prov_from_list(prov);
max_buffered_send = prov_use->tx_attr->inject_size;
max_buffered_write = prov_use->tx_attr->inject_size;
max_send = prov_use->ep_attr->max_msg_size;
max_write = prov_use->ep_attr->max_msg_size;
…
}
Query OFI limits
OFI returns a list of
suitable providers
based on hints
92. MPI_Init: Single Basic endpoint
int MPI_init(/* … */)
{ …
/* Create the endpoint */
struct fid_endpoint *ep;
fi_endpoint(domain, prov_use, &ep, NULL);
gbl.ep = ep;
/* Bind the MR, CQs, counters, and AV to the endpoint object */
/* In this MPI model, we have 1 endpoint, 1 counter, and 1 completion queue */
fi_ep_bind(ep, (fid_t)gbl.p2p_cq, FI_SEND|FI_RECV|FI_SELECTIVE_COMPLETION );
fi_ep_bind(ep, (fid_t)gbl.rma_ctr, FI_READ | FI_WRITE);
fi_ep_bind(ep, (fid_t)gbl.av, 0ULL);
fi_enable(ep);
fi_ep_bind(ep, (fid_t)mr, FI_REMOTE_READ | FI_REMOTE_WRITE );
…
}
93. MPI_Init: Address Exchange
Get publishable
name
Endpoint name is a
serialized name that can be
exchanged
int MPI_init(/* … */)
{ …
/* Get our endpoint name and publish */
/* the socket to the KVS */
addrnamelen = FI_NAME_MAX;
fi_getname((fid_t)gbl.ep, addrname,&addrnamelen);
allgather_addresses(addrname, &all_addrnames);
/* Names are exchanged: Create an address vector and */
/* optionally add a table of mapped addresses */
fi_av_open(gbl.domain, av_attr, &gbl.av, NULL);
fi_av_insert(gbl.av, all_addrnames, job_size, mapped_table, 0ULL, NULL);
…
}
94. MPI Communicators
• MPI communicators remap a per-communicator MPI rank to a global
canonical process (often referenced by process rank in
MPI_COMM_WORLD)
• An OFI Address vector is a logical container for a list of network addresses
• The MPI implementation must map logical per-communicator ranks to a
network address to communicate
• There are several ways that communicators can be mapped
95. How should MPI use address vectors?
Address Table
Communicator A Communicator B
int MPI_Send(comm, rank){
fi_addr_t addr =
addr_table[comm->table[rank]];
fi_tsend(gbl.ep, …, addr, …);
}
Communicator A Communicator B
Method 1: AV_MAP Method 2: AV_TABLE
int MPI_Send(comm, rank){
int addr = comm->table[rank];
fi_tsend(gbl.ep, …, addr, …);
}
fi_addr_t
integer
96. How should MPI use address vectors?
Communicator A Communicator B
Method 3: AV_MAP
int MPI_Send(comm, rank){
fi_addr_t addr =
comm->table[rank];
fi_send(gbl.ep, …, addr, …);
}
fi_addr_t
integer
Communicator B
Method 4: AV_TABLE (per
comm)
int MPI_Send(comm, rank){
int addr = rank;
fi_send(comm->ep, …, addr, …);
}
Communicator A
O(1) communicator storage is possible
● Requires a new AV bound to an endpoint per
communicator.
97. MPI Communication
• Endpoints have been created and bound to resources
• Addresses have been exchanged
• Data can be sent/received
• Send operations
• MPI_Send: blocking send of a buffer to a rank in a communicator
• MPI_Isend: non-blocking send of a buffer to a rank in a
communicator
98. MPI_Send
int MPI_Send(const void *buf, int count, MPI_Datatype datatype,
int rank, int tag, MPI_Comm comm) {
…
/* We can implement a lightweight send if certain conditions are met */
/* Lightweight send maps to fi_tinject */
if (datatype_is_ contiguous && (data_sz <= max_buffered_send ))
mpi_errno = send_lightweight (buf, data_sz, rank, tag, comm);
else
mpi_errno = send_normal(buf, count, datatype, rank, tag, comm)
…
}
99. int send_lightweight (const void *buf, size_t data_sz,
int rank, int tag, Comm *comm)
{
int mpi_errno = MPI_SUCCESS;
uint64_t match_bits;
ssize_t ret;
/* Convert MPI rank to address, initialize the tag, inject! */
/* Tagged inject is buffered, no need to wait for completion*/
match_bits = init_sendtag(comm->comm_id, comm->rank, tag);
ret = fi_tinject(ep, buf, data_sz,
RANK_TO_FIADDR (comm, rank),
match_bits);
if(ret != 0) mpi_errno = handle_mpi_error(ret);
return mpi_errno;
}
MPI_Send: send_lightweight
100. MPI Send: Matching
• MPI enforces message order based on {rank, tag, communicator}.
• OFI uses a 64-bit integer for matching.
• The match bits must pack this information
/* match/ignore bit manipulation
*
* 0123 4567 01234567 0123 4567 01234567 0123 4567 01234567 01234567 01234567
* | | |
* ^ | context id | source | message tag
* | | (communicator) | |
* +---- protocol
*/
101. MPI Send: Matching Alternative
• Use fi_tsenddata/fi_tinjectdata to send immediate data
• Use FI_DIRECTED_RECEIVE to accept messages from a specific
destination address
• Send source rank in immediate data
/* match/ignore bit manipulation
*
* 0123 4567 01234567 01234567 01234567 01234567 01234567 01234567 01234567
* | |
* ^ | context id | message tag
* | | (communicator) |
* +---- protocol
*/
102. MPI_Send: send_normal
int send_normal(const void *buf, size_t data_sz,
int rank, int tag, Comm *comm, Request **req ) { …
/* Create a request, handle datatype processing, send */
/* The MPI request object contains the OFI state via */
/* The fi_context field in the request object */
*req = create_and_setup_mpi_request();
… /* Other MPI processing. Example, datatypes */
match_bits = init_sendtag(comm->comm_id, comm->rank, tag);
ret = fi_tsend(ep, dt_buf, dt_sz,
RANK_TO_FIADDR (comm, rank),
match_bits, &((*r)->ofi_context) );
if(ret != 0) mpi_errno = handle_mpi_error(ret);
/* Block until send is complete */
while((*req)->state != DONE)
PROGRESS();
return mpi_errno;
}
104. MPI_Init: Scalable endpoints
int MPI_init(/* … */)
{ …
/* Create the transmit context using scalable endpoints */
struct fi_tx_attr tx_attr;
fi_scalable_ep(gbl.domain, prov_use, ep, NULL);
/* For Tagged MPI Point to Point */
tx_attr.caps = FI_TAGGED;
fi_tx_context(gbl.ep, index, &tx_attr, &g_txc(index), NULL);
fi_ep_bind(g_txc_tag(index), (fid_t)p2p_cq, FI_SEND);
/* For request based MPI RMA */
tx_attr.caps = FI_RMA|FI_ATOMIC;
fi_tx_context(gbl.ep, index+1, &tx_attr, &g_txc_rma(index), NULL);
fi_ep_bind(g_txc_rma(index), (fid_t) p2p_cq, FI_SEND);
/* For non-request based MPI RMA */
tx_attr.caps = FI_RMA|FI_ATOMIC;
fi_tx_context(gbl.ep, index+3, &tx_attr, &g_txc_cntr(index), NULL);
fi_ep_bind(g_txc_cntr(index), (fid_t) rma_ctr, FI_WRITE | FI_READ);
}
Tagged transmit
context: shared
completion queue
RMA transmit
context: shared
completion queue
RMA transmit
context:
completion counter
105. Advanced MPI_Send: Scalable EP
ret = fi_tsend(g_txc(src_tx_ctx_index), dt_buf, dt_sz,
RANK_TO_FIADDR_SEP(comm, rank, dest_rx_ctx_index),
match_bits, &(*r)->ofi_context );
Endpoint:
● Use a transmit context
instead of endpoint.
● transmit context is an
endpoint created with
fi_tx_context
Addressing:
● Use EP version of RANK_TO_FIADDR
● “endpoint” is an index (offset) into an
array of receive contexts
fi_addr_t RANK_TO_FIADDR_SEP (Comm *comm, int rank,
int endpoint) {
return fi_rx_addr(RANK_TO_FI_ADDR (comm, rank),
endpoint,
MAX_ENDPOINT_BITS);
/* MAX_ENDPOINT_BITS is an address vector attribute */
}
106. Which endpoint is right for my MPI?
• Basic Endpoint
• Possible to implement all of MPI
• Scalable endpoint + contexts
• Useful for internal threading modes, separation of resources, software
parallelization
• Adding capabilities to RMA windows (such as ordering restrictions)
• Binding resources at window creation time (memory regions)
• Shared endpoint + contexts
• Useful on shared or oversubscribed hardware
• Use them all!
• Start with basic, and customize/specialize MPI with a mix of endpoint
types
107. MPI Progress
• MPI_Send has initiated a blocking send operation that may take time to
complete. MPI will need to read the completion queues to complete the
request.
• MPI must handle errors (fatal and recoverable) in the progress loop.
109. MPI Progress
static inline MPI_Request *context_to_req (void *context){
char *base = (char *)context;
return (Request *)container_of(base, Request, context);
}
static inline int handle_cq_entries (cq_tagged_entry_t * wc, ssize_t num){
int i;
Request *req;
for (i = 0; i < num; i++) {
req = context_to_req (wc[i].op_context);
dispatch_function(&wc[i],req);
}
return MPI_SUCCESS;
}
Embedded OFI context
● Embedding helps to prevent
double allocations (provider
and app)
● OFI can also allocate context
Handles multiple
completions
110. MPI Progress: dispatch
static inline int dispatch_function (cq_tagged_entry_t *wc, MPI_Request *req)
{
int mpi_errno;
switch (request->event )) {
case EVENT_SEND:
mpi_errno = send_done_event (wc, req);
break;
case EVENT_RECV:
mpi_errno = recv_done_event (wc, req);
break;
…
};
}
Marks MPI send request
complete. MPI test/wait
routines watch the
completion state.
Marks MPI receive request
complete and populates
the MPI status object
111. MPI_Recv
int MPI_Recv(const void *buf, size_t data_sz,int rank, int tag, Comm * comm,
Request *req ){
…
*req = create_and_setup_mpi_request();
match_bits = init_recvtag(&mask_bits, comm, rank, tag);
… /* Other MPI processing. Example, datatypes */
fi_trecv(ep,recv_buf,data_sz,NULL,
(MPI_ANY_SOURCE == rank) ? FI_ADDR_UNSPEC : RANK_TO_FIADDR(comm, rank),
match_bits, mask_bits, &req->context)
…
/* check completion queue for match */
}
Receive always takes
a context
Mask bits tell receive to
ignore ANY_SOURCE
match bits if rank ==
MPI_ANY_SOURCE
112. MPI_Probe: check for inbound msg
int MPI_Probe(int rank, int tag, Comm * comm,MPI_Status *status){
match_bits = init_recvtag(&mask_bits, comm, rank, tag);
… /* Other MPI processing */
msg.addr = remote_proc;
msg.tag = match_bits;
msg.ignore = mask_bits;
msg.context = req->context;
while(!req->done) {
ret = fi_trecvmsg(ep, &msg, FI_PEEK|FI_COMPLETION);
if(ret == 0)
PROGRESS_WHILE(!req.done);
else if(ret == -FI_ENOMSG)
continue;
else
error();
}
/* Fill out status and return */
}
Peek matches a
message but does not
dequeue it.
Progress to complete
message
add FI_CLAIM to flags
when implementing
MPI_Mprobe
113. int probe_event(cq_tagged_entry_t *wc,
Request *rreq){
rreq->done = TRUE;
rreq->status.MPI_SOURCE = get_source(wc->tag);
rreq->status.MPI_TAG = get_tag(wc->tag);
rreq->status.MPI_ERROR = MPI_SUCCESS;
rreq->status.count = wc->len;
return MPI_SUCCESS;
}
Probe/Receive completion events
int recv_event(cq_tagged_entry_t *wc,
Request *rreq){
rreq->done = TRUE;
rreq->status.MPI_SOURCE = get_source(wc->tag);
rreq->status.MPI_TAG = get_tag(wc->tag);
rreq->status.MPI_ERROR = MPI_SUCCESS;
rreq->status.count = wc->len;
/* Other MPI processing: unpack, request completion */
return MPI_SUCCESS;
}
Probe
● Fills out status to be returned to
the user
● get_tag/get_source are macros
that decode the tag
● length is data bytes, needs to be
converted to MPI count
Recv
● Fills out status to be
returned to the user
● Unpacks data and/or
handles the datatypes
● Handles any protocol acking
required (like ssend acks)
114. MPI 3 RMA Desired Semantics
114
• Memory exposed to incoming read/write by all targets
• Offset based addressing within a window O(1) storage is ideal
• Synchronization per MPI window
• Local and request based completion requirement at the origin for certain ops
• Non-contiguous support
• Hardware accelerated atomics
• Asynchronous progress
Memory
Segment
Memory
CPU
Memory
Segment
Memory
CPU
Memory
Segment
Memory
CPU
Memory
Segment
Memory
Window
CPU
115. • Possible Mappings (Using FI_SCALABLE_MR):
• Mapping 1: Global endpoint, user defined key, map all of memory
• Mapping 2: TX context per window, user defined key
• Mapping 3: TX/RX context per window, no key (offsets embedded in RX)
• Common: Fallback to mapping 1 when resources constrained
• Use symmetric heap if possible
• O(1) if resources and MPI parameters permit, otherwise:
• Displacement unit (if necessary)
• Window bases if Mapping 1 and no symmetric heap.
115
Memory Regions and Windows
116. • Counters used for synchronization (e.g.,
MPI_Win_flush)
• Completion queues used to signal request-based
“R” variants (e.g., MPI_Rput)
• MPI_Win_lock/unlock uses message queue
API for protocol
116
Synchronization
117. Window
• Desirable to not pack the data or send datatype, possible mappings:
•Send a series of RMA operations that align contiguous chunks
•Generate iovec lists that correspond to OFI hardware limits
•Handle datatypes natively with OFI
Memory
Origin[0] Origin[1] Origin[…] Origin[n-1]
Result[…] Result[n-1]Result[1]Result[0]
Memory Target[1]Target[0] Target[n-1]Target[…]
Non-contiguous Data
118. MPI_Put
int MPI_Put() {
…
/* We can implement a lightweight put if conditions are met */
if (origin_contig && target_contig && other_conditions &&
origin_bytes <= max_buffered_write )) {
/* Increment counter to synchronize with fi_cntr_read */
global_cntr++;
fi_inject_write (ep, (char *)origin_addr, target_bytes,
RANK_TO_FIADDR (win->comm, target_rank),
target_address, win->memory_key );
}
}
Target address can
be offset or VA based
Keys can be:
● Exchanged
● App provided: (FI_MR_SCALABLE)
119. MPI_Get
int MPI_Get() {
…
/* We can implement a lighter weight get if conditions are met */
if (origin_contig && target_contig && other_conditions &&
origin_bytes <= max_msg_size)) {
/* Increment counter to synchronize with fi_cntr_read */
global_cntr++;
fi_read (ep, (char *)origin_addr, target_bytes,
RANK_TO_FIADDR (win->comm, target_rank),
target_address, win->memory_key );
}
}
Target address can
be offset or VA based
Keys can be:
● Exchanged
● App provided: (FI_MR_SCALABLE)
120. • Natively supported in libfabric
• FI_ATOMIC capability for fi_getinfo
• see fi_atomic functions in the fi_atomic(3) man page
• Query MPI datatype and MPI op and use valid table
• Fall back to message queue API emulation if hardware atomic is not available
• Provide optimized versions of single element atomics
• At window creation, determine MPI ordering info key values and create the scalable
context with corresponding FI_ORDER_xAy flags
• We’ll discuss more later in the tutorial
120
MPI Atomics
122. Content
❖ OpenSHMEM program model in a nutshell
❖ Mapping to libfabric constructs
➢ Endpoint types
➢ Address vectors
➢ Memory Registration
➢ Completion Queues and Counters
❖ Example Code walkthrough
123. Program Model in a Nutshell
NPES constant
Certain segments
of PE address
space remotely
accessible in a
one-sided fashion
124. OpenSHMEM - FI_EP_RDM
endpoints
❖ FI_EP_RDM likely best choice for OpenShmem
➢ one endpoint can be used to put/get, etc. to all PEs
in job
➢ relatively simple connection setup, but does require
some sort of out-of-band to exchange endpoint
names
➢ Does require an Address Vector instance
125. OpenSHMEM - Which AV Type?
❖ Two types of address vectors -
➢ FI_AV_MAP - using this type means the library must
internally keep a mechanism for mapping a PE to a
fabric fi_addr_t
➢ FI_AV_TABLE - supports a simple indexing scheme
to be used in place of fi_addr_t. Can support using
the PE rank as the address in data transfer
operations (this is likely the better choice for
OpenSHMEM)
126. OpenSHMEM - Memory Registration
❖ libfabric supports two memory registration
modes - basic and scalable
❖ FI_LOCAL_MR mode bit indicates whether local
buffers need to be registered
127. OpenSHMEM - Memory Registration(2)
❖ Scalable memory registration model simpler to
use
➢ does not require O(NPES) bookkeeping of memory keys.
➢ simplifies the implementation of a growable symmetric heap.
❖ Basic memory registration model is likely to be
supported by more providers
❖ Probably a good idea to be able to support both
models at least for the medium term
128. OpenSHMEM - Completion Queues,
Counters
❖ With current OpenSHMEM api, no need to track
data transfers on a per operation basis
❖ But for shmem_quiet/shmem_fence do need to
count outstanding data transfers
❖ fi_cntr’s smart to use here
❖ To support shmem_quiet semantics, want to use
FI_DELIVERY_COMPLETE for tx_attr op_flags
132. shmem_init (3)
fi_fabric(p_info->fabric_attr, /* get a fab desc */
&fab_desc, NULL);
fi_domain(fab_desc,p_info, &dom_desc, NULL); /* get a dom desc */
fi_endpoint(dom_desc, p_info, &ep_desc, NULL); /* get a ep desc */
cntr_attr.events = FI_CNTR_EVENTS_COMP;
fi_cntr_open(dom, &cnt_attr, &putcntr); /* open a put cntr */
fi_ep_bind(ep_desc, &putcntr->fid, FI_WRITE); /* bind to ep */
fi_cntr_open(dom, &cnt_attr, &getcntr); /* open a put cntr */
fi_ep_bind(ep_desc, &getcntr->fid, FI_READ); /* bind to ep */
av_attr.type = FI_AV_TABLE;
fi_av_open(dom, &av_attr, &av_desc, NULL); /* get an av desc */
fi_ep_bind(ep_desc, &av->fid, 0); /* bind to ep */
/* also open CQ and bind to ep (not shown) */
…
134. shmem_init (5)
fi_enable(ep_desc); /* enable ep for data transfers */
len = sizeof(getname_buf);
fi_getname(ep_desc,getname_buf, &len); /* get ep name */
all_ep_names = (char *)malloc(len * n_pes);
out_of_band_xchg(getname_buf, all_ep_names); /* oob exchange of ep names */
n = fi_av_insert(av_desc, /* add entries to av table */
all_ep_names,
npes,
NULL, /* don’t need vec of fi_addr’s for FI_AV_TABLE
*/
0,
NULL);
free(all_ep_names);
/* for !USE_SCALABLE_MR also need to exchange memory keys */
}
139. shmem_swap
long shmem_long_swap(long *target, long value, int pe)
{
extern uint64_t get_count;
uint64_t key = 0ULL, result;
const uint64_t mask = ~0ULL;
#if !USE_SCALABLE_MR
key = key_is_bss_or_symheap(dest); /* assumes sym heap at same VADDR */
#endif
fi_compare_atomic(ep_desc, &value, 1, NULL,
&mask, NULL,
&result, NULL,
(fi_addr_t)pe,
(uint64_t)target, key,
FI_UINT64, FI_MSWAP, NULL);
get_count++;
fi_cntr_wait(getcntr, get_count, -1); /* wait till data has returned*/
return result;
}
140. For more information:
OFIWG BoF - Tuesday 1:30 - 3:00 PM
2016 International OpenFabrics Alliance Workshop
Monterey, CA
April 4-8
https://www.openfabrics.org/index.php/blogs/80-2016-
international-openfabrics-alliance-workshop.html
Mail list - ofiwg@lists.openfabrics.org