PMIx enables dynamic coordination between runtime systems like MPI and OpenMP. Geoffroy Vallee discussed using PMIx to coordinate MPI and OpenMP runtimes for improved resource utilization. Specifically, PMIx could be used to explicitly place MPI ranks and OpenMP threads, allowing dynamic adjustment of the placement layout between application phases. This coordination could optimize application performance on modern complex multicore and manycore architectures.
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
In-kernel Analytics and Tracing with eBPF for OpenStack CloudsPLUMgrid
As the movement of applications from bare metal to the cloud continues, considerations around analytics and tracing are becoming more prevalent for security, monitoring, and accounting. As an open source project under the Linux Foundation, the IO Visor Project is working with the kernel community on extending BPF (eBPF) and is being used by many companies for security, tracing, and analytics. This talk will describe how an OpenStack micro-segmentation framework using eBPF can be utilized for analytics and tracing to secure application workloads. Use cases around application security, intrusion detection using service insertion, identity will be described. While networking is one piece of the solution, sandboxing applications to avoid attacks is also important. We will also touch upon how eBPF technology and a unified policy framework can secure application workloads in areas beyond networking.
How APIs are Transforming Cisco Solutions and Catalyzing an Innovation EcosystemCisco DevNet
This document discusses how APIs are transforming Cisco solutions and catalyzing an innovation ecosystem. It outlines Cisco's DevNet strategy of making the developer the customer and accelerating market opportunities through a vibrant developer ecosystem built on programmable platforms and APIs. It describes how network programmability, APIs, cloudification, new applications and experiences, developer tools, and open source collaboration are driving network innovation and helping developers build solutions.
Transport layer development kit ( on top of DPDK by Intel)
Provide set of libraries for L4 protocol processing (UDP, TCP etc.) and VPP graph nodes, plugins, etc using those libraries to implement a host stack.
The FD.io TLDK project scope is:
The project scope includes implementing a set of libraries for L4 protocol processing (UDP, TCP etc.) for both IPv4 and IPv6.
The project scope includes creating VPP graph nodes, plugins etc using those libraries to implement a host stack.
The project scope includes such mechanisms (netlink agents, packaging, etc) necessary to make the resulting host stack easily usable by existing non-vpp aware software.
About the author: Priya Autee is software engineer at Intel working on various leading edge IA features and Intel(R) RDT expert. She is focused on prototyping and researching open source APIs like DPDK, Intel(R) RDT etc. to support NFV/compute sensitive requirements on Intel Architecture. She holds Masters in Computer Science from Arizona State University, Arizona.
DPDK in depth
This document provides an overview of DPDK (Data Plane Development Kit):
1. DPDK is an open source project for data plane programming and network acceleration. It started at Intel in 2010 and is now maintained by the Linux Foundation.
2. DPDK provides poll mode drivers (PMDs), libraries, and sample applications for fast packet processing. It uses hugepages and avoids kernel involvement for high performance.
3. The document outlines several DPDK projects, libraries, PMDs, advantages and disadvantages, development process, and demonstrates a simple DPDK application (l2fwd) and the testpmd tool.
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...IO Visor Project
As microservices grow, traditional firewall rules based on network ACLs are no longer scalable and fall short of providing fine-grained enforcement. Group Based Policy (GBP) is a flexible policy language that allows users to specify policy enforcement based on intent, independent of network infrastructure and IP addressing. Using micro-segmented virtual domains, administrators can define policies at a centralized location and use IO Visor technology for distributed enforcement. This provides infrastructure independent rules, template-based policy definitions, and scale-out policy enforcement for a solution that secures and scales with microservices. This session will be presented by members of the IO Visor community and will cover how IO Visor technology can be used to define and enforce GBP. The discussion will also cover using GBP for cloud foundry application spaces where microservices are deployed and need scalable, efficient security policies.
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
In-kernel Analytics and Tracing with eBPF for OpenStack CloudsPLUMgrid
As the movement of applications from bare metal to the cloud continues, considerations around analytics and tracing are becoming more prevalent for security, monitoring, and accounting. As an open source project under the Linux Foundation, the IO Visor Project is working with the kernel community on extending BPF (eBPF) and is being used by many companies for security, tracing, and analytics. This talk will describe how an OpenStack micro-segmentation framework using eBPF can be utilized for analytics and tracing to secure application workloads. Use cases around application security, intrusion detection using service insertion, identity will be described. While networking is one piece of the solution, sandboxing applications to avoid attacks is also important. We will also touch upon how eBPF technology and a unified policy framework can secure application workloads in areas beyond networking.
How APIs are Transforming Cisco Solutions and Catalyzing an Innovation EcosystemCisco DevNet
This document discusses how APIs are transforming Cisco solutions and catalyzing an innovation ecosystem. It outlines Cisco's DevNet strategy of making the developer the customer and accelerating market opportunities through a vibrant developer ecosystem built on programmable platforms and APIs. It describes how network programmability, APIs, cloudification, new applications and experiences, developer tools, and open source collaboration are driving network innovation and helping developers build solutions.
Transport layer development kit ( on top of DPDK by Intel)
Provide set of libraries for L4 protocol processing (UDP, TCP etc.) and VPP graph nodes, plugins, etc using those libraries to implement a host stack.
The FD.io TLDK project scope is:
The project scope includes implementing a set of libraries for L4 protocol processing (UDP, TCP etc.) for both IPv4 and IPv6.
The project scope includes creating VPP graph nodes, plugins etc using those libraries to implement a host stack.
The project scope includes such mechanisms (netlink agents, packaging, etc) necessary to make the resulting host stack easily usable by existing non-vpp aware software.
About the author: Priya Autee is software engineer at Intel working on various leading edge IA features and Intel(R) RDT expert. She is focused on prototyping and researching open source APIs like DPDK, Intel(R) RDT etc. to support NFV/compute sensitive requirements on Intel Architecture. She holds Masters in Computer Science from Arizona State University, Arizona.
DPDK in depth
This document provides an overview of DPDK (Data Plane Development Kit):
1. DPDK is an open source project for data plane programming and network acceleration. It started at Intel in 2010 and is now maintained by the Linux Foundation.
2. DPDK provides poll mode drivers (PMDs), libraries, and sample applications for fast packet processing. It uses hugepages and avoids kernel involvement for high performance.
3. The document outlines several DPDK projects, libraries, PMDs, advantages and disadvantages, development process, and demonstrates a simple DPDK application (l2fwd) and the testpmd tool.
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...IO Visor Project
As microservices grow, traditional firewall rules based on network ACLs are no longer scalable and fall short of providing fine-grained enforcement. Group Based Policy (GBP) is a flexible policy language that allows users to specify policy enforcement based on intent, independent of network infrastructure and IP addressing. Using micro-segmented virtual domains, administrators can define policies at a centralized location and use IO Visor technology for distributed enforcement. This provides infrastructure independent rules, template-based policy definitions, and scale-out policy enforcement for a solution that secures and scales with microservices. This session will be presented by members of the IO Visor community and will cover how IO Visor technology can be used to define and enforce GBP. The discussion will also cover using GBP for cloud foundry application spaces where microservices are deployed and need scalable, efficient security policies.
Novartis Institutes for Biomedical Research (NIBR) uses high-performance computing (HPC) to accelerate drug discovery through phenotypic assays, deep learning for image analysis, and pushing analysis tools to scientists. NIBR's HPC resources include x86 servers, specialized nodes with GPUs and large memory, and software for modeling, simulation, and analysis. Examples shown how HPC has increased throughput for monitoring neuron and muscle cell interaction, classified images better than conventional methods, and allowed scientists to directly run analyses on cluster resources.
Linux Kernel Cryptographic API and Use CasesKernel TLV
The Linux kernel has a rich and modular cryptographic API that is used extensively by familiar user facing software such as Android. It's also cryptic, badly documented, subject to change and can easily bite you in unexpected and painful ways.
This talk will describe the crypto API, provide some usage example and discuss some of the more interesting in-kernel users, such as DM-Crypt, DM-Verity and the new fie system encryption code.
Gilad Ben-Yossef is a principal software engineer at ARM. He works on the kernel security sub-system and the ARM CryptCell engine. Open source work done by Gilad includes an experiment in integration of network processors in the networking stack, a patch set for reducing the interference caused to user space processes in large multi-core systems by Linux kernel “maintenance” work and on SMP support for the Synopsys Arc processor among others.
Gilad has co-authored O’Reilly’s “Building Embedded Linux Systems” 2nd edition and presented at such venues as Embedded Linux Conference Europe and the Ottawa Linux Symposium, as well as co-founded Hamakor, an Israeli NGO for the advancement for Open Source and Free Software in Israel. When not hacking on kernel code you can find Gilad meditating and making dad jokes on Twitter.
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...44CON
We have had the luxury of conducting security research and penetration testing activities on a few Supercomputers on the world top 500 list. Having spent some time assessing the security of these awesome machines it’s fair to say that many of the technologies involved have been largely untouched by public security scrutiny and consequently the security bar is much lower than we have come to expect in 2013. This presentation will cover our research and demonstrate some of the most interesting and significant vulnerabilities we have uncovered so far. We will also be demonstrating 0-day exploits and previously undocumented attack techniques live so you can see how to get root on 20,000 nodes all at once. There are many ways to hack the Gibson and today you’ll learn some.
Data Plane and VNF Acceleration Mini Summit Open-NFP
This event will deliver technical sessions related to the acceleration of server-based networking data planes and VNFs using SmartNIC hardware within the framework of the NFVi and orchestration components in the OPNFV Danube software platform. Virtual switching data planes using Open vSwitch (OVS) and FD.IO (with VPP) will be discussed and offload architectures for such data planes into SmartNIC platforms will be presented. Extension of such data planes for VNF specific requirements utilizing micro-VNF sandbox functions utilizing P4 and/or C language programming will be presented and discussed. Early work defining VNF acceleration architectures and APIs for SmartNIC-accelerated data planes and VNF sandbox extensions will be presented and discussed, followed by a proposal to create a collaborative project to complete the definition of such an API within the frameworks of OPNFV and ETSI. In addition, a proposal for a Pharos community lab related VNF acceleration will be put forward by the Open-NFP organization.
This slides indicate an introduction on the definition of real time and RTOSes, then you can find information on introducing RT Linux approaches and comparing them with each other, then finally you can see a latency measurement test done by "Linutronix" in the slides
This document discusses the challenges of real-time computing on Linux and potential solutions. Real-time means very low maximum latency, below 100 microseconds. While Linux was not designed for real-time, it is now used in many embedded systems. Options to address real-time include using separate hardware, a hypervisor with an real-time operating system (RTOS), asymmetric multiprocessing (AMP) with an RTOS, or solutions within Linux like PREEMPT_RT that adds preemption and CPU isolation techniques to reduce worst-case latency without changing applications. The document reviews these approaches and notes that real-time remains an important area as Linux is increasingly used in embedded systems.
Project Vault is a secure computing environment developed by Google's ATAP group. It uses a microSD card to provide an encrypted environment that works with any operating system. The project is open source and uses an FPGA-based hardware security module for encryption and decryption. It also uses a custom real-time operating system called microSEL and an OpenRISC 1200 processor. Project Vault aims to provide a portable secure computing solution.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
The document discusses how Cilium can accelerate Envoy and Istio by using eBPF/XDP to provide transparent acceleration of network traffic between Kubernetes pods and sidecars without any changes required to applications or Envoy. Cilium also provides features like service mesh datapath, network security policies, load balancing, and visibility/tracing capabilities. BPF/XDP in Cilium allows for transparent TCP/IP acceleration during the data phase of communications between pods and sidecars.
This document provides an overview and agenda for a workshop on the RINA simulator (RINASim). It discusses the requirements, installation instructions, file navigation structure, documentation, and contact information for the simulator. It also outlines the key components in RINASim's design including application processes, IPC resource managers, and DIF allocators. Finally, it indicates a live demonstration of a simple relay example will be shown.
Dpdk: rte_security: An update and introducing PDCPHemant Agrawal
Rte_security is being updated to support the PDCP protocol. PDCP handles functions like ciphering, integrity protection, header compression for LTE radio protocol stacks. The rte_security API will be enhanced to allow creation of PDCP security sessions through rte_security_session_create. This will configure PDCP specific parameters like bearer ID, sequence number size. Hardware offloading of PDCP lookaside processing for control and data planes will also be supported.
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...Masaaki Nakagawa
DOCOMO MAIL is 24/7 cloud mail system which has accesses from over 20 million people. This mail system stores user's mail archive in OpenStack Swift with Peta Byte scale capacity deployed by NTT DATA.
We have been successfully operating this service since Sep 2014 without any downtime. In this session, we'll present the actual issues and challenges we have faced and conquered.
Here're some specific points we'd like to highlight.
* No service degrade, no downtime.
* Massive scale and still growing.
* Hundreds of servers operated by few people.
1) The document discusses adding event device support to the ipsec-gw application in DPDK to improve efficiency.
2) It describes how cryptodev works with poll and event modes and introduces an event crypto adapter to connect crypto queues to the event framework.
3) The proposed changes are to assign all crypto and ethernet queues to an event device and dequeue packets from the event device rather than individual devices to allow asynchronous and CPU-efficient processing.
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
This talk will start with a deep dive and hands on examples of BPF, possibly the most promising low level technology to address challenges in application and network security, tracing, and visibility. We will discuss how BPF evolved from a simple bytecode language to filter raw sockets for tcpdump to the a JITable virtual machine capable of universally extending and instrumenting both the Linux kernel and user space applications. The introduction is followed by a concrete example of how the Cilium open source project applies BPF to solve networking, security, and load balancing for highly distributed applications. We will discuss and demonstrate how Cilium with the help of BPF can be combined with distributed system orchestration such as Docker to simplify security, operations, and troubleshooting of distributed applications.
The Linux kernel is undergoing the most fundamental architecture evolution in history and is becoming a microkernel. Why is the Linux kernel evolving into a microkernel? The potentially biggest fundamental change ever happening to the Linux kernel. This talk covers how companies like Facebook and Google use BPF to patch 0-day exploits, how BPF will change the way features are added to the kernel forever, and how BPF is introducing a new type of application deployment method for the Linux kernel.
This document provides an introduction to eBPF (Extended Berkeley Packet Filter), which allows running user-space code in the Linux kernel without needing to compile a kernel module. It describes how eBPF avoids unnecessary copying of packets between kernel and user-space for improved performance. Examples are given of using eBPF for networking tasks like SDN configuration, DDoS mitigation, intrusion detection, and load balancing. The document concludes by noting eBPF provides alternatives to iptables that are better suited for microservices architectures.
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
The document discusses DPDK and software dataplane acceleration for Open vSwitch. It provides an overview of the OVS architecture and its evolution to integrate with DPDK. It shares one user's experience of initial challenges in using DPDK/OVS and improvements over time. Suggestions are made to improve areas like debugging, testing, documentation and training to enhance the usability of DPDK/OVS. Performance tuning techniques like using multiple threads are also briefly covered.
The Impact of Software-based Virtual Network in the Public CloudChunghan Lee
Today’s cloud network consists of sophisticated virtual
networks, and a virtual switch is a key element of these
networks. Although there is tremendous interest in measuring
cloud network performance, little is known about the impact
of software-based virtual network on latency. In this paper, we
conduct the impact of virtual network on latency in the public
cloud based on OpenStack. We measured the throughput of VMs
and simultaneously captured their packets on hosts. We analyzed
the traces by using well-known metrics, such as throughput and
RTT, and investigated the abrupt fluctuation of latency called as
‘the burstiness of latency’. We quantitatively clarify the impact of
software-based virtual network on latency. In our public cloud,
the latency is approximately 35.2% of RTT and 10% of burstiness
mainly contributes to the increased RTT. The total latency was
increased by the receiving side regardless of data and ACK
paths. Our analysis results, discussions, and implications can
not only help cloud researchers and developers design the next
generation of software-based virtual network but can also help
cloud operators improve the performance of virtual network.
SDN & NFV Introduction - Open Source Data Center NetworkingThomas Graf
This document introduces software defined networking (SDN) and network functions virtualization (NFV) concepts. It discusses challenges with traditional networking and how SDN and NFV address these by decoupling the control and data planes, centralizing network intelligence, and abstracting the underlying network infrastructure. It then provides examples of open source SDN technologies like OpenDaylight, Open vSwitch, and OpenStack that can be used to build programmable software-defined networks and virtualized network functions.
This document provides an overview of open source software developed by the Grid Protection Alliance (GPA) for electric utilities. It describes GPA's business approach of developing open source solutions that are in production use. It then summarizes GPA's clients and the core technologies that many of its products are built upon, including the Grid Solutions Framework and Time Series Library. Finally, it outlines GPA's various open source products for synchrophasors, power quality monitoring, security, and other areas, as well as the services GPA provides.
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
The document discusses network element automation using Puppet. It provides context on the challenges of manual network configuration including lack of agility, reliability issues from errors, and time spent on basic tasks. Puppet can automate network elements similar to how it automates servers, reducing errors and improving speed/productivity. The Cisco Nexus platform and NXAPI enable programmatic access for automation using Puppet through technologies like onePK and LXC containers running on the switch.
Novartis Institutes for Biomedical Research (NIBR) uses high-performance computing (HPC) to accelerate drug discovery through phenotypic assays, deep learning for image analysis, and pushing analysis tools to scientists. NIBR's HPC resources include x86 servers, specialized nodes with GPUs and large memory, and software for modeling, simulation, and analysis. Examples shown how HPC has increased throughput for monitoring neuron and muscle cell interaction, classified images better than conventional methods, and allowed scientists to directly run analyses on cluster resources.
Linux Kernel Cryptographic API and Use CasesKernel TLV
The Linux kernel has a rich and modular cryptographic API that is used extensively by familiar user facing software such as Android. It's also cryptic, badly documented, subject to change and can easily bite you in unexpected and painful ways.
This talk will describe the crypto API, provide some usage example and discuss some of the more interesting in-kernel users, such as DM-Crypt, DM-Verity and the new fie system encryption code.
Gilad Ben-Yossef is a principal software engineer at ARM. He works on the kernel security sub-system and the ARM CryptCell engine. Open source work done by Gilad includes an experiment in integration of network processors in the networking stack, a patch set for reducing the interference caused to user space processes in large multi-core systems by Linux kernel “maintenance” work and on SMP support for the Synopsys Arc processor among others.
Gilad has co-authored O’Reilly’s “Building Embedded Linux Systems” 2nd edition and presented at such venues as Embedded Linux Conference Europe and the Ottawa Linux Symposium, as well as co-founded Hamakor, an Israeli NGO for the advancement for Open Source and Free Software in Israel. When not hacking on kernel code you can find Gilad meditating and making dad jokes on Twitter.
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...44CON
We have had the luxury of conducting security research and penetration testing activities on a few Supercomputers on the world top 500 list. Having spent some time assessing the security of these awesome machines it’s fair to say that many of the technologies involved have been largely untouched by public security scrutiny and consequently the security bar is much lower than we have come to expect in 2013. This presentation will cover our research and demonstrate some of the most interesting and significant vulnerabilities we have uncovered so far. We will also be demonstrating 0-day exploits and previously undocumented attack techniques live so you can see how to get root on 20,000 nodes all at once. There are many ways to hack the Gibson and today you’ll learn some.
Data Plane and VNF Acceleration Mini Summit Open-NFP
This event will deliver technical sessions related to the acceleration of server-based networking data planes and VNFs using SmartNIC hardware within the framework of the NFVi and orchestration components in the OPNFV Danube software platform. Virtual switching data planes using Open vSwitch (OVS) and FD.IO (with VPP) will be discussed and offload architectures for such data planes into SmartNIC platforms will be presented. Extension of such data planes for VNF specific requirements utilizing micro-VNF sandbox functions utilizing P4 and/or C language programming will be presented and discussed. Early work defining VNF acceleration architectures and APIs for SmartNIC-accelerated data planes and VNF sandbox extensions will be presented and discussed, followed by a proposal to create a collaborative project to complete the definition of such an API within the frameworks of OPNFV and ETSI. In addition, a proposal for a Pharos community lab related VNF acceleration will be put forward by the Open-NFP organization.
This slides indicate an introduction on the definition of real time and RTOSes, then you can find information on introducing RT Linux approaches and comparing them with each other, then finally you can see a latency measurement test done by "Linutronix" in the slides
This document discusses the challenges of real-time computing on Linux and potential solutions. Real-time means very low maximum latency, below 100 microseconds. While Linux was not designed for real-time, it is now used in many embedded systems. Options to address real-time include using separate hardware, a hypervisor with an real-time operating system (RTOS), asymmetric multiprocessing (AMP) with an RTOS, or solutions within Linux like PREEMPT_RT that adds preemption and CPU isolation techniques to reduce worst-case latency without changing applications. The document reviews these approaches and notes that real-time remains an important area as Linux is increasingly used in embedded systems.
Project Vault is a secure computing environment developed by Google's ATAP group. It uses a microSD card to provide an encrypted environment that works with any operating system. The project is open source and uses an FPGA-based hardware security module for encryption and decryption. It also uses a custom real-time operating system called microSEL and an OpenRISC 1200 processor. Project Vault aims to provide a portable secure computing solution.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
The document discusses how Cilium can accelerate Envoy and Istio by using eBPF/XDP to provide transparent acceleration of network traffic between Kubernetes pods and sidecars without any changes required to applications or Envoy. Cilium also provides features like service mesh datapath, network security policies, load balancing, and visibility/tracing capabilities. BPF/XDP in Cilium allows for transparent TCP/IP acceleration during the data phase of communications between pods and sidecars.
This document provides an overview and agenda for a workshop on the RINA simulator (RINASim). It discusses the requirements, installation instructions, file navigation structure, documentation, and contact information for the simulator. It also outlines the key components in RINASim's design including application processes, IPC resource managers, and DIF allocators. Finally, it indicates a live demonstration of a simple relay example will be shown.
Dpdk: rte_security: An update and introducing PDCPHemant Agrawal
Rte_security is being updated to support the PDCP protocol. PDCP handles functions like ciphering, integrity protection, header compression for LTE radio protocol stacks. The rte_security API will be enhanced to allow creation of PDCP security sessions through rte_security_session_create. This will configure PDCP specific parameters like bearer ID, sequence number size. Hardware offloading of PDCP lookaside processing for control and data planes will also be supported.
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...Masaaki Nakagawa
DOCOMO MAIL is 24/7 cloud mail system which has accesses from over 20 million people. This mail system stores user's mail archive in OpenStack Swift with Peta Byte scale capacity deployed by NTT DATA.
We have been successfully operating this service since Sep 2014 without any downtime. In this session, we'll present the actual issues and challenges we have faced and conquered.
Here're some specific points we'd like to highlight.
* No service degrade, no downtime.
* Massive scale and still growing.
* Hundreds of servers operated by few people.
1) The document discusses adding event device support to the ipsec-gw application in DPDK to improve efficiency.
2) It describes how cryptodev works with poll and event modes and introduces an event crypto adapter to connect crypto queues to the event framework.
3) The proposed changes are to assign all crypto and ethernet queues to an event device and dequeue packets from the event device rather than individual devices to allow asynchronous and CPU-efficient processing.
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
This talk will start with a deep dive and hands on examples of BPF, possibly the most promising low level technology to address challenges in application and network security, tracing, and visibility. We will discuss how BPF evolved from a simple bytecode language to filter raw sockets for tcpdump to the a JITable virtual machine capable of universally extending and instrumenting both the Linux kernel and user space applications. The introduction is followed by a concrete example of how the Cilium open source project applies BPF to solve networking, security, and load balancing for highly distributed applications. We will discuss and demonstrate how Cilium with the help of BPF can be combined with distributed system orchestration such as Docker to simplify security, operations, and troubleshooting of distributed applications.
The Linux kernel is undergoing the most fundamental architecture evolution in history and is becoming a microkernel. Why is the Linux kernel evolving into a microkernel? The potentially biggest fundamental change ever happening to the Linux kernel. This talk covers how companies like Facebook and Google use BPF to patch 0-day exploits, how BPF will change the way features are added to the kernel forever, and how BPF is introducing a new type of application deployment method for the Linux kernel.
This document provides an introduction to eBPF (Extended Berkeley Packet Filter), which allows running user-space code in the Linux kernel without needing to compile a kernel module. It describes how eBPF avoids unnecessary copying of packets between kernel and user-space for improved performance. Examples are given of using eBPF for networking tasks like SDN configuration, DDoS mitigation, intrusion detection, and load balancing. The document concludes by noting eBPF provides alternatives to iptables that are better suited for microservices architectures.
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
The document discusses DPDK and software dataplane acceleration for Open vSwitch. It provides an overview of the OVS architecture and its evolution to integrate with DPDK. It shares one user's experience of initial challenges in using DPDK/OVS and improvements over time. Suggestions are made to improve areas like debugging, testing, documentation and training to enhance the usability of DPDK/OVS. Performance tuning techniques like using multiple threads are also briefly covered.
The Impact of Software-based Virtual Network in the Public CloudChunghan Lee
Today’s cloud network consists of sophisticated virtual
networks, and a virtual switch is a key element of these
networks. Although there is tremendous interest in measuring
cloud network performance, little is known about the impact
of software-based virtual network on latency. In this paper, we
conduct the impact of virtual network on latency in the public
cloud based on OpenStack. We measured the throughput of VMs
and simultaneously captured their packets on hosts. We analyzed
the traces by using well-known metrics, such as throughput and
RTT, and investigated the abrupt fluctuation of latency called as
‘the burstiness of latency’. We quantitatively clarify the impact of
software-based virtual network on latency. In our public cloud,
the latency is approximately 35.2% of RTT and 10% of burstiness
mainly contributes to the increased RTT. The total latency was
increased by the receiving side regardless of data and ACK
paths. Our analysis results, discussions, and implications can
not only help cloud researchers and developers design the next
generation of software-based virtual network but can also help
cloud operators improve the performance of virtual network.
SDN & NFV Introduction - Open Source Data Center NetworkingThomas Graf
This document introduces software defined networking (SDN) and network functions virtualization (NFV) concepts. It discusses challenges with traditional networking and how SDN and NFV address these by decoupling the control and data planes, centralizing network intelligence, and abstracting the underlying network infrastructure. It then provides examples of open source SDN technologies like OpenDaylight, Open vSwitch, and OpenStack that can be used to build programmable software-defined networks and virtualized network functions.
This document provides an overview of open source software developed by the Grid Protection Alliance (GPA) for electric utilities. It describes GPA's business approach of developing open source solutions that are in production use. It then summarizes GPA's clients and the core technologies that many of its products are built upon, including the Grid Solutions Framework and Time Series Library. Finally, it outlines GPA's various open source products for synchrophasors, power quality monitoring, security, and other areas, as well as the services GPA provides.
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
The document discusses network element automation using Puppet. It provides context on the challenges of manual network configuration including lack of agility, reliability issues from errors, and time spent on basic tasks. Puppet can automate network elements similar to how it automates servers, reducing errors and improving speed/productivity. The Cisco Nexus platform and NXAPI enable programmatic access for automation using Puppet through technologies like onePK and LXC containers running on the switch.
This document provides an overview of cBPF and eBPF. It discusses the history and implementation of cBPF, including how it was originally used for packet filtering. It then covers eBPF in more depth, explaining what it is, its history, implementation including different program types and maps. It also discusses several uses of eBPF including networking, firewalls, DDoS mitigation, profiling, security, and chaos engineering. Finally, it introduces XDP and DPDK, comparing XDP's benefits over DPDK.
The document discusses Process Management Interface for Exascale (PMIx). PMIx defines standardized APIs and a convenience library to facilitate interactions between applications and system management software. It aims to support evolving programming models through new API development. The document outlines the PMIx standards process, reviews performance of the v1.x release series, and discusses future roadmap features like tool support, notification capabilities, and generalized data storage.
PMIx provides a standardized interface (APIs and attributes) for process management that is intended to support exascale systems. It defines a reference library implementation and server to ease adoption. The community develops and maintains PMIx through an open process. Current support includes startup operations, tool connections, queries, and event notifications. Future work may include network, file system, debugger, and cross-library support.
This document discusses Typesafe's Reactive Platform and Apache Spark. It describes Typesafe's Fast Data strategy of using a microservices architecture with Spark, Kafka, HDFS and databases. It outlines contributions Typesafe has made to Spark, including backpressure support, dynamic resource allocation in Mesos, and integration tests. The document also discusses Typesafe's customer support and roadmap, including plans to introduce Kerberos security and evaluate Tachyon.
The document discusses new features in .NET 7 including:
1) Static abstract members can now be added to interfaces to allow for generic math and parsing functionality.
2) File scoped types allow types to only exist within the file they are declared.
3) Time related types now support microseconds and nanoseconds for more precision.
4) StringSyntaxAttribute allows marking strings as containing specific types like JSON for validation and syntax highlighting.
Ming Liu has over 10 years of experience in embedded software development and testing using languages like C/C++, Python, and Shell scripts. He has worked as a Validation Engineer at Marvell Semiconductor testing WiFi chip features and bringing up new products. Prior to that, he was a Verification Engineer at Wind River testing their VxWorks operating system and a Software Designer at Nortel Networks developing features for their UMTS wireless system.
QNX Software Systems provides the QNX Neutrino real-time operating system, the QNX Momentics IDE, and support services. QNX Neutrino is a highly reliable, deterministic, and scalable microkernel RTOS that supports a variety of hardware platforms and development tools. It offers features such as fault tolerance, predictable performance, POSIX compliance, and high availability.
This document provides an introduction to real-time systems and discusses approaches to making Linux a real-time operating system. It defines hard and soft real-time systems and explains why Linux is commonly used instead of dedicated real-time operating systems. The document then discusses two main solutions, PREEMPT_RT and Xenomai 3, which provide patches to make Linux meet timing constraints through different approaches. It also provides an overview of basic real-time concepts like scheduling algorithms, preemptive vs. non-preemptive scheduling, and interprocess communication.
The document summarizes several of Andrea Petrucci's research and work projects from 1999-2014. It describes his contributions to:
1) The CMS experiment's data acquisition system (DAQ), including event building and the high-level trigger.
2) The XDAQ framework for CMS's online software.
3) The CMS experiment's run control system.
4) The GridCC project to integrate remote instruments into the grid.
5) A distributed data staging system.
6) The Johanna and API projects involving knowledge management and e-learning.
7) Systems administration for the University of Bologna faculty of agriculture IT lab.
This document summarizes a workshop on the Tulipp project, which aims to develop ubiquitous low-power image processing platforms. The workshop covered shortcomings of existing platforms, introduced the Maestro real-time operating system as the reference platform, and described the concept of the Tulipp project to provide an operating system and tools to support heterogeneous architectures including FPGA and multi-core processors. Attendees participated in hands-on labs demonstrating how to build applications with Maestro, leverage OpenMP for parallelism, and use SDSoC tools to automatically accelerate functions in FPGA hardware.
Intel open stack-summit-session-nov13-finalDeepak Mane
- Intel is a major contributor to OpenStack and open source projects, contributing across every layer of the OpenStack stack. As the #2 Linux kernel contributor, Intel helps improve performance, stability, and efficiency.
- Intel enables OpenStack cloud deployments through contributions to OpenStack projects, open source tools, and optimizations. Intel IT also uses OpenStack in their own private cloud.
- Intel is working on technologies to address challenges in datacenters, including security and compliance, cost reduction, and business uptime. Technologies include trusted compute pools, erasure coding, and enhanced platform awareness.
Radisys' CTO, Andrew Alleman, was one of the featured speakers at the OCP Telco Engineering Workshop during the 2017 Big Communications Event. Andrew discussed carrier-grade open rack architecture (CG-OpenRack-19), the future of open hardware standards and commercial products in the OCP pipeline during his presentation.
Microservices @ Work - A Practice Report of Developing MicroservicesQAware GmbH
Cloud Native Night October 2016, Mainz: Talk by Simon Bäumler (Technical Chief Designer at QAware).
Join our Meetup: www.meetup.com/cloud-native-night
Abstract: This talk takes a practice oriented approach to examine microservice oriented architecture. It will show two real systems, one build from scratch in a microservice architecture, the other migrated from a monolithic system to a microservice architecture.
With the example of these two systems the pittfalls, advantages and lessons learned using microservice oriented architectures will be discussed.
While both systems use the java stack, including spring boot and spring cloud many topics will be kept general and will be of interest for all developers.
Introduction to Civil Infrastructure PlatformSZ Lin
CIP is target to establish an open source base layer of industrial grade software to enable the use and implementation of software. This slide will introduce the current status and road map in CIP
PMIx is a standard for process management that aims to bridge the gap between containers and HPC environments. It defines a set of APIs for tasks like resource management, launch coordination, and tool integration. The PMIx reference library provides an implementation of the standard to ease adoption. PMIx is now used in many HPC libraries and resource managers and enables new use cases like container support. Future work may expand PMIx to support dynamic resource allocation, failure handling, and integration with file systems and fabric managers.
PMIx provides standardized APIs to enable portability of applications across systems and support interactions between applications and resource managers. The document discusses integrating PMIx with tiered storage systems to allow applications to query storage resources, pre-stage files, and coordinate caching across different storage tiers. This integration could improve launch times and support dynamic data movement in response to faults or changing workloads. The document outlines initial plans to leverage existing PMIx functions and consider new APIs to enable these capabilities.
PMIx is a process management interface for exascale environments. It provides scalable launch and control of parallel applications. Current support includes startup operations, tool connections, queries, events, logging, allocations, and job control. Future work includes network support, MPI sessions, security, file systems, and debugger integration. PMIx consists of a standard API, a reference library implementation, and a reference server. Resource managers like SLURM and libraries like OpenMPI have adopted PMIx. The talk will discuss the state of PMIx adoption, scaled performance results, and the upcoming PMIx standards document.
PMIx provides capabilities for debugging and tool integration with parallel applications by:
1) Co-locating debuggers and tools with application processes during launch.
2) Forwarding I/O between debuggers, tools, and applications.
3) Querying process and job information from resource managers and PMIx servers.
This document discusses resource management for high-performance computing systems and proposes a new approach. It outlines emerging needs around dynamic resource allocation, application involvement in decisions, and cross-system integration. It then proposes a multi-tiered strategy including open-source reference implementations and integration with existing resource managers. Key aspects of the new approach include improved launch scaling, flexible allocation support, I/O support, spawn support, network integration, power management, and fault tolerance.
This document discusses Process Management Interface for Exascale (PMIx). It provides an overview and objectives of PMIx, which aims to establish an independent and open community effort to develop scalable client/server libraries for job launch and management. The document discusses performance status showing improvements over PMI2, integration status in Open MPI and SLURM, and roadmap for continued development including supporting evolving application needs through flexible resource allocation and fault tolerance. It also discusses different types of malleable and adaptive jobs that PMIx aims to support.
HPC control systems are evolving into the future. This presentation looks at where this evolution may lead, and describes how the control system of the future might be constructed.
This document introduces PMIx, an extended Process Management Interface for exascale systems. PMIx is a collaborative open source effort led by Intel, Mellanox Technologies, and Adaptive Computing to address limitations of current PMI standards at extreme scale. Version 1.0 focuses on reducing memory footprint and communication overhead. Version 2.0 adds performance optimizations like distributed databases and collective operations. A SLURM plugin implements PMIx support and will be included in the next major SLURM release. The project timeline outlines development through 2017 to integrate with resource managers and update MPI implementations.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
3. Ralph H. Castain, Joshua Hursey, Aurelien Bouteiller, David Solt, “PMIx: Process management for exascale
environments”, Parallel Computing, 2018.
https://doi.org/10.1016/j.parco.2018.08.002
Overview Paper
4. • Standards Documents
v2.0 released Sept. 27, 2018
v2.1 (errata/clarifications, Dec.)
v3.0 (Dec.)
v4.0 (1Q19)
Status Snapshot
Current Library Releases
https://github.com/pmix/pmix-standard/releases
5. • Clarify terms
Session, job, application
• New chapter detailing namespace registration
data organization
• Data retrieval examples
Application size in multi-app scenarios
What rank to provide for which data attributes
PMIx Standard – In Progress
6. • Standards Documents
v2.0 released Sept. 27, 2018
v2.1 (errata/clarifications, Dec.)
v3.0 (Dec.)
v4.0 (1Q19)
• Reference Implementation
v3.0 => v3.1 (Dec)
v2.1 => v2.2 (Dec)
Status Snapshot
Current Library Releases
New dstore
implementation
Full Standard
Compliance
7. • v2 => workflow orchestration
(Ch.7) Job Allocation Management and Reporting, (Ch.8) Event Notification
• v3 => tools support
Security credentials, I/O management
• v4 => tools, network, dynamic programming models
Complete debugger attach/detach/re-attach
• Direct and indirect launch
Fabric topology information (switches, NICs, …)
• Communication cost, connectivity graphs, inventory collection
PMIx Groups
Bindings (Python, …)
What’s in the Standard?
8. • PRRTE
Formal releases to begin in Dec. or Jan.
Ties to external PMIx, libevent, hwloc installations
Support up thru current head PMIx master branch
• Cross-version support
Regularly tested, automated script
• Help Wanted: Container developers for regular regression testing
v2.1.1 and above remain interchangeable
https://pmix.org/support/faq/how-does-pmix-work-with-containers/
Status Snapshot
10. Artem Y. Polyakov, Joshua S. Ladd, Boris I. Karasev
Mellanox Technologies
Shared Memory Optimizations (ds21)
11. • Improvements to intra-node performance
Highly optimized the intra-node communication component dstor/ds21[1]
• Significantly improved the PMIx_Get latency using the 2N-lock algorithm on systems with a large
number of processes per node accessing many keys.
• Designed for exascale: Deployed in production on CORAL systems, Summit and Sierra,
POWER9-based systems.
• Available in PMIx v2.2, v3.1, and subsequent releases.
Enabled by default.
• Improvements to the SLURM PMIx plug-in
Added a ring-based PMIx_Fence collective (Slurm v18.08)
Added support for high-performance RDMA point-to-point
via UCX (Slurm v17.11) [2]
Verified PMIx compatibility through v3.x API (Slurm v18.08)
[1] Artem Polyakov et al. A Scalable PMIx Database (poster) EuroMPI’18: https://eurompi2018.bsc.es/sites/default/files/uploaded/dstore_empi2018.pdf
[2] Artem Polyakov PMIx Plugin with UCX Support (SC17), SchedMD Booth talk: https://slurm.schedmd.com/publications.html
Mellanox update
PMIx_Get() latency (POWER8 system)
14. IBM
Systems
Harnessing PMIx capabilities
volatile bool isdone =false;
static void cbfunc(pmix_status_t status, pmix_info_t*info, size_t ninfo, void *cbdata,
pmix_release_cbfunc_t release_fn, void *release_cbdata)
{
size_t n, i;
if (0 < ninfo) {
for (n=0; n < ninfo; n++) {
// iterate trough the info[n].value.data.darray->arrayof
/*
typedef struct pmix_proc_info {
pmix_proc_tproc;
char *hostname;
char *executable_name;
pid_tpid;
int exit_code;
pmix_proc_state_t state;
} pmix_proc_info_t;
*/
}
}
if (NULL !=release_fn) {
release_fn(release_cbdata);
}
isdone =true;
}
• https://pmix.org/support/how-to/example-direct-launch-debugger-tool/
• https://pmix.org/support/how-to/example-indirect-launch-debugger-tool/
#include <pmix.h>
intmain(int argc, char**argv) {
pmix_query_t*query;
char clientspace[PMIX_MAX_NSLEN+1];
// ... Initialize and setup – PMIx_tool_init()...
/* get the proctable forthis nspace */
PMIX_QUERY_CREATE(query, 1);
PMIX_ARGV_APPEND(rc, query[0].keys, PMIX_QUERY_PROC_TABLE);
query[0].nqual =1;
PMIX_INFO_CREATE(query->qualifiers, query[0].nqual);
PMIX_INFO_LOAD(&query->qualifiers[0], PMIX_NSPACE, clientspace, PMIX_STRING);
if (PMIX_SUCCESS!=(rc=PMIx_Query_info_nb(query, 1,cbfunc, NULL))) {
exit(42);
}
/* wait toget a response */
while( !isdone ) { usleep(10); }
// ... Finalize andcleanup
}
15. Tony Curtis <anthony.curtis@stonybrook.edu>
Dr. Barbara Chapman
Abdullah Shahneous Bari Sayket, Wenbin Lü, Wes Suttle
Stony Brook University
OSSS OpenSHMEM-on-UCX / Fault Tolerance
https://www.iacs.stonybrook.edu/
16. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• OpenSHMEM
Partitioned Global Address Space library
Take advantage of RDMA
• put/get directly to/from remote memory
• Atomics, locks
• Collectives
• And more…
http://www.openshmem.org/
19. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Reference Implementation
PMIx/UCX for OpenSHMEM ≥ 1.4
Also for < 1.4 based on GASNet
Open-Source @ github
• Or my dev repo if you’re brave
• Our research vehicle
20. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Reference Implementation
PMIx/PRRTE used as o-o-b startup / launch
Exchange of wireup info e.g. heap addresses/rkeys
Rank/locality and other “ambient” queries
Stays out of the way until OpenSHMEM finalized
Could kill it once we’re up but used for fault tolerance
project
21. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Fault Tolerance (NSF) #1
Project with UTK and Rutgers
Current OpenSHMEM specification lacks general
fault tolerance (FT) features
PMIx has basic FT building blocks already
• Event handling, process monitoring, job control
Using these features to build FT API for
OpenSHMEM
22. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Fault Tolerance (NSF) #2
User specifies
• Desired process monitoring scheme
• Application-specific fault mitigation procedure
Then our FT API takes care of:
• Initiating specified process monitoring
• Registering fault mitigation routine with PMIx server
PMIx takes care of the rest
23. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Fault Tolerance (NSF) #3
Longer-term goal: automated selection and
implementation of FT techniques
Compiler chooses from a small set of pre-packaged
FT schemes
Appropriate technique selected based on application
structure and data access patterns
Compiler implements the scheme at compile-time
24. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• PMIx @ github
We try to keep up on the bleeding edge of both PMIx
& PRRTE for development
But make sure releases work too!
We’ve opened quite a few tickets
• Always fixed or nixed quickly!
25. Any (easy) Questions?
Tony Curtis <anthony.curtis@stonybrook.edu>
Dr. Barbara Chapman
Abdullah Shahneous Bari Sayket, Wenbin Lü, Wes Suttle
Stony Brook University
OSSS OpenSHMEM-on-UCX / Fault Tolerance
https://www.iacs.stonybrook.edu/
http://www.openshmem.org/
http://www.openucx.org/
27. Introduction
• Many pre-exascale systems show a fairly
drastic hardware architecture change
Bigger/complex compute nodes
Summit: 2 IBM Power-9 CPUs, 6 GPUs per node
• Requires most scientific simulations to switch
from pure MPI to a MPI+X paradigm
• MPI+OpenMP is a popular solution
28. Challenges
• MPI and OpenMP are independent communities
• Implementations un-aware of each other
MPI will deploy ranks without any knowledge of the
OpenMP threads that will run under each rank
OpenMP assumes by default that all available
resources can be used
It is very difficult for users to have a fine-grain control
over application execution
29. Context
• U.S. DOE Exascale Computing Project (ECP)
• ECP OMPI-X project
• ECP SOLLVE project
• Oak Ridge Leadership Computing Facility
(OCLF)
30. Challenges (2)
• Impossible to coordinate runtimes and optimize
resource utilization
Runtimes independently allocate resources
No fine-grain & coordinated control over the
placement of MPI ranks and OpenMP threads
Difficult to express affinity across runtimes
Difficult to optimize applications
31. Illustration
• On Summit nodes
Numa Node 1
Numa Node 2
GPU
GPU
GPU
GPU
GPU
GPU
Node architecture
Numa Node 1
Numa Node 2
GPU
GPU
GPU
GPU
GPU
GPU
Desired placement
MPI rank / OpenMP master thread
OpenMP thread
32. Proposed Solution
• Make all runtimes PMIx aware for data
exchange and notification through events
• Key PMIx characteristics
Distributed key/value store for data exchange
Asynchronous events for coordination
Enable interactions with the resource manager
• Modification of LLVM OpenMP and Open MPI
33. Two Use-cases
• Fine-grain placement over MPI ranks and
OpenMP threads (prototyping stage)
• Inter-runtime progress (design stage)
34. Fine-grain Placement
• Goals
Explicit placement of MPI ranks
Explicit placement of OpenMP threads
Re-configure MPI+OpenMP layout at runtime
• Propose the concept of layout
Static layout: explicit definition of ranks and threads
placement at initialization time only
Dynamic layouts: change placement at runtime
36. Static Layouts
• New MPI mapper
Get the layout specified by the user
Publish it through PMIx
• New MPI OpenMP Coordination (MOC) helper-
library gets the layout and set OpenMP places
to specify threads will be created placement
• No modification to OpenMP standard or runtime
37. Dynamic Layouts
• New API to define phases within the application
A static layout is deployed for each phase
Modification of the layout between phases (w/ MOC)
Well adapted to the nature of some applications
• Requires
OpenMP runtime modifications (LLVM prototype)
• Get phases’ layouts and set new places internally
OpenMP standard modifications to allow dynamicity
38. Architecture Overview
• MOC ensures inter-runtime data
exchanged (MOC-specific PMIx
keys)
• Runtimes are PMIx-aware
• Runtimes have a layout-aware
mapper/affinity manager
• Respect the separation between
standards, resource manager
and runtimes
39. Inter-runtime Progress
• Runtime usually guarantee internal progress but
are not aware of other runtimes
• Runtimes can end up in a deadlock if mutually
waiting for completion
MPI operation is ready to complete
Application calls and block in other runtimes
40. Inter-runtime Progress (2)
• Proposed solution
Runtimes use PMIx to notify of progress
When a runtime gets a progress notification, it yields
once done with all tasks that can be completed
• Currently working on prototyping
41. Conclusion
• There is a real need for making runtimes aware
of each other
• PMIx is a privileged option
Asynchronous / event based
Data exchange
Enable interaction with the resource manager
• Opens new opportunities for both research and
development (e.g. new project focusing on QoS)
42. Dan Holmes
EPCC University of Edinburgh
Howard Pritchard, Nathan Hjelm
Los Alamos National Laboratory
MPI Sessions: Dynamic Proc. Groups
43. Using PMIx to Help
Replace MPI_Init
14th November 2018
Dan Holmes, Howard Pritchard, Nathan Hjelm
LA-UR-18-30830
44. Problems with MPI_Init
● All MPI processes must initialize MPI exactly once
● MPI cannot be initialized within an MPI process from different
application components without coordination
● MPI cannot be re-initialized after MPI is finalized
● Error handling for MPI initialization cannot be specified
45. Sessions – a new way to start
MPI
● General scheme:
● Query the underlying
run-time system
● Get a “set” of
processes
● Determine the processes
you want
● Create an MPI_Group
● Create a communicator
with just those processes
● Create an MPI_Comm
Query runtime
for set of processes
MPI_Group
MPI_Comm
MPI_Session
46. MPI Sessions proposed API
● Create (or destroy) a session:
● MPI_SESSION_INIT (and MPI_SESSION_FINALIZE)
● Get names of sets of processes:
● MPI_SESSION_GET_NUM_PSETS,
MPI_SESSION_GET_NTH_PSET
● Create an MPI_GROUP from a process set name:
● MPI_GROUP_CREATE_FROM_SESSION
● Create an MPI_COMM from an MPI_GROUP:
● MPI_COMM_CREATE_FROM_GROUP
PMIx
groups
helps here
47. MPI_COMM_CREATE_FROM_GROU
P
The ‘uri’ is supplied by the application.
Implementation challenge: ‘group’ is a local object.
Need some way to synchronize with other “joiners” to the
communicator. The ‘uri’ is different than a process set
name.
MPI_Create_comm_from_group(IN MPI_Group group,
IN const char *uri,
IN MPI_Info info,
IN MPI_Erhandler hndl,
OUT MPI_Comm *comm);
48. Using PMIx Groups
● PMIx Groups - a collection of processes desiring a unified
identifier for purposes such as passing events or participating in
PMIx fence operations
● Invite/join/leave semantics
● Sessions prototype implementation currently uses
PMIX_Group_construct/PMIX_Group_destruct
● Can be used to generate a “unique” 64-bit identifier for the group.
Used by the sessions prototype to generate a communicator ID.
● Useful options for future work
● Timeout for processes joining the group
● Asynchronous notification when a process leaves the group
49. Using PMIx_Group_Construct
● ‘id’ maps to/from the ‘uri’ in MPI_Comm_create_from_group
(plus additional Open MPI internal info)
● ‘procs’ array comes from information previously supplied by PMIx
○ “mpi://world” and “mpi://self” already available
○ mpiexec -np 2 --pset user://ocean ocean.x :
-np 2 --pset user://atmosphere atmosphere.x
PMIx_Group_Construct(const char id[],
const pmix_proc_t procs[],
const pmix_info_t info[],
size_t ninfo);
50. MPI Sessions Prototype Status
● All “MPI_*_from_group” functions have been implemented
● Only pml/ob1 supported at this time
● Working on MPI_Group creation (from PSET) now
● Will start with mpi://world and mpi://self
● User-defined process sets need additional support from PMIx
● Up next: break apart MPI initialization
● Goal is to reduce startup time and memory footprint
51. Summary
● PMIx Groups provides an OOB mechanism for MPI processes to
bootstrap the formation of a communication context (MPI
Communicator) from a group of MPI processes
● Functionality for future work
● Handling (unexpected) process exit
● User-defined process sets
● Group expansion